Home
Rho Signal
Cancel

Reading from S3 with Polars

In this post we see how to read and write from a CSV or Parquet file in S3 with Polars. We also see how to filter the file on S3 before downloading it to reduce the amount of data transferred acros...

Crucial parameters for streaming in Polars

In this post we see how Polars sets some crucial parameters that affect streaming mode. Understanding these concepts is important if you want to optimize the performance of a large streaming query ...

Ordering of groupby and unique in Polars

Polars (and Apache Arrow) has been designed to be careful with your data so you don’t get surprises like the following Pandas code where the ints column has been cast to float because of the missin...

Exploding a Polars pivot for feature engineering

In my ML pipelines these days I find I replace some of the simpler scikit-learn metrics such as root-mean-squared-error with my own hand-rolled Polars expressions. This approach saves me from copyi...

Polars, Altair and Vegafusion

Altair has been my favourite visualisation library for a long time. It allows me to make beautiful visualisations with an API that is concise and consistent. I was sad to find last year that I coul...

Filtering one df by another

One of the most common questions we get on the Polars discord is how to filter rows in one dataframe by values in another. I think people don’t realise this is a basically a join because they don’...

Embrace streaming mode in Polars

Polars can handle larger-than-memory datasets with its streaming mode. In this mode Polars processes your data in batches rather than all at once. However, the streaming mode is not some emergency ...

Lazy mode's hidden timesaver in Polars

Lazy mode in Polars does not only provide query optimisation and allow you to work with larger than memory datasets. It also provides some type security that can find errors in your pipeline before...

Concat, extend or vstack?

On the face of it the concat,extend and vstack functions in Polars do the same job: they take two initial DataFrames and turn them into a single DataFrame. In this post I show that they do quite di...

Polars 🤝 Seaborn

While posting on my frustrations with Matplotlib Gaurav Sablok pointed out on LinkedIn that I had overlooked the Seaborn library. I’ve been using Altair in recent years and so hadn’t given Seaborn...