Published on: 5th September 2022
Don’t loop over columns in Polars
This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters
If you’re writing Polars code like this
1 2 for col in df.columns: do stuff
Instead, use expressions and then Polars will parallelise the loop over the columns for you. By looping explicitly in python you’re killing the parallelisation.
For example if we want to count the number of unique values in every column we do
or if we wanted to count the number of unique values but only in string (Utf8) columns we do
Doing it this way with expressions will will give you the 🚀 performance you expect!
Want to know more about Polars for high performance data science? Then you can:
or let me know if you would like a Polars workshop for your organisation.