Home
Rho Signal
Cancel

Filtering one df by another

One of the most common questions we get on the Polars discord is how to filter rows in one dataframe by values in another. I think people don’t realise this is a basically a join because they don’...

Embrace streaming mode in Polars

Polars can handle larger-than-memory datasets with its streaming mode. In this mode Polars processes your data in batches rather than all at once. However, the streaming mode is not some emergency ...

Lazy mode's hidden timesaver in Polars

Lazy mode in Polars does not only provide query optimisation and allow you to work with larger than memory datasets. It also provides some type security that can find errors in your pipeline before...

Polars 🤝 Seaborn

Update October 2023 As of Seaborn version v.13.0 Seaborn accepts Polars DataFrames natively🎆. Note that this is not full native support though. Polars copies the data internally to a Pandas Data...

Nested dtypes in Polars 1: the `pl.List` dtype

Polars uses Apache Arrow to store its data in-memory. One of the big advantages of Arrow is that it supports a variety of nested data types (or “dtypes”). In this post we look at the pl.List dtype ...

Talking Polars on the Real Python podcast

I appeared on the Real Python podcast to talk Polars! We chatted about: why lazy mode in Polars is so important working with larger-than-memory datasets transitioning from Pandas to Polars ...

Sinking larger-than-memory Parquet files

Polars now allows you to write Parquet files even when the file is too large to fit in memory. It does this by using streaming to process data in batches and then writing these batches to a Parquet...

Polars ❤️ sorted data 2: groupby

In a previous post we saw that Polars has fast-track algorithms for calculating some statistics on sorted data. In this post we see that Polars also has a fast-track algorithm for getting groupby k...

To go big you must be lazy

I was consulting for a client recently who needs to process hundreds of Gb of CSV files. On their first pass with Polars they had read from their CSVs with a pattern like this (simplified) version....

AWS Lambda with Polars II: PyArrow

In a recent post I showed how to use Polars in AWS Lambda using the smart_open library. There are a variety of ways that you can work with Polars in AWS Lambda, however. In this post we look at how...