Home
Rho Signal
Cancel

Maybe they should just call it Regular Data Analysis

Open AI used to have a product called Code Interpreter. Which was a name that didn’t make much sense because it doesn’t interpret code. Instead it’s a language model that can ingest CSVs and genera...

Reading and writing files on S3 with Polars

Updated June 2024 for Polars version 1.0 In this post we see how to read and write from a CSV or Parquet file in S3 with Polars. We also see how to filter the file on S3 before downloading it to r...

Understanding the Polars nested column types

Polars has 4 native nested column types. These can be very helpful at solving problems such as: working with ML embeddings splitting strings working with nested JSON data working with aggr...

Comparison of Matplotlib and Plotly in Polars

Updated July 2023 From Plotly v5.15.0 onwards Plotly has native support for Polars😊. So you can pass the DataFrame as the first argument and the column names as strings to the x and y encoding argu...

Filling time series gaps in lazy mode

Two major advantages of Polars over Pandas is that Polars has a lazy mode with query optimization and that Polars can scale to larger-than-memory datasets with its streaming mode. Taking advantage ...

Crucial parameters for streaming in Polars

In this post we see how Polars sets some crucial parameters that affect streaming mode. Understanding these concepts is important if you want to optimize the performance of a large streaming query ...

Exploding a Polars pivot for feature engineering

In my ML pipelines these days I find I replace some of the simpler scikit-learn metrics such as root-mean-squared-error with my own hand-rolled Polars expressions. This approach saves me from copyi...

Ordering of groupby and unique in Polars

Polars (and Apache Arrow) has been designed to be careful with your data so you don’t get surprises like the following Pandas code where the ints column has been cast to float because of the missin...

Polars, Altair and Vegafusion

Altair has been my favourite visualisation library for a long time. It allows me to make beautiful visualisations with an API that is concise and consistent. I was sad to find last year that I coul...

Concat, extend or vstack?

On the face of it the concat,extend and vstack functions in Polars can do the same job: they can take two initial DataFrames and turn them into a single DataFrame. In this post I show that they do ...