Published on: 13th September 2022
Polars can help if your data is sorted
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
Check out a video version of this post here!
Polars has optimizations for when you’re working with sorted data.
To access them you tell Polars the data is sorted with the
In this simple example we find the median 1500x faster when we tell Polars the series is sorted.
1 2 3 4 5 6 7 8 9 10 # Create a series with 10 million entries s = pl.Series("a", range(0,int(1e7))) # Call .median without set_sorted s.median() # Time: 0.3 s # Call .median with set_sorted s.set_sorted().median() # Time: 0.0002 s
You may already be taking advantage of
set_sorted without realising it. Polars will apply set_sorted automatically if you do any operations with an implicit or explicit sort.
set_sorted also works with other operations - in some of my workflows a
groupby on a large dataset is 40% faster on a column that Polars knows is sorted.
Want to know more about Polars for high performance data science and ML? Then you can:
- check out my Polars course on Udemy
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.