Home Polars can help if your data is sorted
Post
Cancel

Polars can help if your data is sorted

Published on: 13th September 2022

Polars can help if your data is sorted

This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters

Check out a video version of this post here!

Polars has optimizations for when you’re working with sorted data.

To access them you tell Polars the data is sorted with the set_sorted flag.

In this simple example we find the median 1500x faster when we tell Polars the series is sorted.

1
2
3
4
5
6
7
8
9
10
# Create a series with 10 million entries
s = pl.Series("a", range(0,int(1e7)))

# Call .median without set_sorted
s.median()
# Time: 0.3 s

# Call .median with set_sorted
s.set_sorted().median()
# Time: 0.0002 s

You may already be taking advantage of set_sorted without realising it. Polars will apply set_sorted automatically if you do any operations with an implicit or explicit sort.

set_sorted also works with other operations - in some of my workflows a groupby on a large dataset is 40% faster on a column that Polars knows is sorted.

Learn more

Want to know more about Polars for high performance data science and ML? Then you can:

or let me know if you would like a Polars workshop for your organisation.

This post is licensed under CC BY 4.0 by the author.