Published on: 4th October 2022

Breaking your bad habits with Polars

This post was created while writing my Up & Running with Polars course. Check it out here with a free preview of the first chapters

One comment we get on the Polars discourse is that the Polars syntax encourages people to break bad habits they developed in Pandas.

Take the .apply (or .applymap) function for example. I see lots of people using this in Kaggle comps, even though it’s bad news.

In this example we want to map positive values to 1 and negative values to -1 for all columns.

Using the standard pl.when method in Polars is 100x faster than an apply method in Pandas*

*Further optimizations are available on this toy problem in both libraries!

        
      
import polars as pl
import numpy as np

# Create a random DataFrame
N = 100_000
dfNumeric = pl.DataFrame(np.random.standard_normal((N,100)))
dfp = dfNumeric.to_pandas()

# Set values to 1 when they are positive and 0 otherwise
(
    dfp
    .applymap(lambda x: 1 if x > 0 else 0)
)
# Time: 2.5 seconds
(
    dfNumeric
    .with_columns(
        [
            pl.when(pl.col(col) > 0).then(1).otherwise(0).alias(col) for col in df.columns
            ]
            )
)
# Time: 30 milliseconds

The shift away from .apply functions happened for me as well.

In Pandas I used to call .apply fairly often, but the only time I’ve used .apply in Polars was…when writing the docs to tell people not to use .apply!

Learn more

Want to know more about Polars for high performance data science and ML? Then you can:

or let me know if you would like a Polars workshop for your organisation.

Breaking your bad habits with Polars

Breaking your bad habits with Polars

Learn more

Further Reading

What does ChatGPT's Advanced Data Analysis have installed?

AWS Lambda with Polars

Streaming large datasets in Polars