Home Don't loop over columns in Polars
Post
Cancel

Don't loop over columns in Polars

Published on: 5th September 2022

Don’t loop over columns in Polars

This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters

If you’re writing Polars code like this

1
2
for col in df.columns:
do stuff

then STOP!!!!

Instead, use expressions and then Polars will parallelise the loop over the columns for you. By looping explicitly in python you’re killing the parallelisation.

For example if we want to count the number of unique values in every column we do

1
df.select(pl.all().n_unique())

or if we wanted to count the number of unique values but only in string (Utf8) columns we do

1
df.select(pl.col(pl.Utf8)).select(pl.all().n_unique())

Doing it this way with expressions will will give you the 🚀 performance you expect!

Learn more

Want to know more about Polars for high performance data science? Then you can:

or let me know if you would like a Polars workshop for your organisation.

This post is licensed under CC BY 4.0 by the author.