Home Polars 🤝 Seaborn
Post
Cancel

Polars 🤝 Seaborn

While posting on my frustrations with Matplotlib Gaurav Sablok pointed out on LinkedIn that I had overlooked the Seaborn library. I’ve been using Altair in recent years and so hadn’t given Seaborn much thought. However, I’ve been impressed by Seaborn’s new interface and happy to find that Seaborn will accept Polars DataFrames directly for many plots.

In this post I look at how we can pass a Polars DataFrame to Seaborn for some advanced plots and some other tips for visualisng a Polars DataFrame with Seaborn.

Jointplot

One of the advanced Seaborn visualisations is the jointplot. This is a scatter plot of two columns but with the distribution of each column also plotted. In addition we can add a hue field to colour these plots by a third column.

In this example I use the Titanic dataset and do a jointplot of the passenger age and fare paid columns and colour by the passenger class column.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import polars as pl
import seaborn as sns

df = pl.read_csv("titanic.csv")

sns.jointplot(
    data=(
        df
        .with_columns(
            [   
                # Take the log of the Age and Fare floating-point columns
                pl.col(pl.Float64).log(),
                # Cast the passenger class column to string
                pl.col("Pclass").cast(pl.Utf8)
            ]
        )
    ),
    x="Age", 
    y="Fare", 
    hue="Pclass",
)

Numerical or categorical?

One common feature of plotting libraries like Plotly or Seaborn is that they infer how the data should be presented based on the dtype of the data. This can lead to charts that display in a confusing way!

In this example we want to colour by the Pclass column for passenger class. The values in this column are 1,2 or 3 and so it has an integer dtype. However, from a plotting perspective this column is really a kind of ordered categorical column rather than numerical. However, because of the integer dtype Seaborn and Plotly see this as a numerical column and try to treat it as quantiative data.

To address this we must convert the Pclass column to a string dtype. We do this in the example above with the expression pl.col("Pclass").cast(pl.Utf8). The charts then display in the way we expect.

Conversion to Pandas?

Note that we can pass the Polars DataFrame directly to sns.jointplot without having to convert to Pandas.

Not all Seaborn chart types accept a Polars DataFrame directly. For example the bar chart function sns.barplot function raises an exception as it tries to call a method that a Pandas DataFrame has but a Polars DataFrame does not. To resolve this you can call df.to_pandas.

Calling df.to_pandas requires copying your dara. To avoid uncessary copying I recommend calling select to only copy the subset of columns required for your chart e.g. df.select(["Age","Fare","Pclass"]).to_pandas.

Want to learn more? Check out my Data Analysis with Polars course on Udemy with this discount code

Next steps

Want to know more about Polars for high performance data science? Then you can:

This post is licensed under CC BY 4.0 by the author.