Update October 2023
As of Seaborn version v.13.0 Seaborn accepts Polars DataFrames natively🎆. Note that this is not full native support though. Polars copies the data internally to a Pandas DataFrame. To avoid copying your full dataset ensure you only pass the subset of columns you need to do the plot.
While posting on my frustrations with Matplotlib Gaurav Sablok pointed out on LinkedIn that I had overlooked the Seaborn library.
I’ve been using Altair in recent years and so hadn’t given Seaborn much thought. However, I’ve been impressed by Seaborn’s new interface and happy to find that Seaborn will accept Polars
DataFrames directly for many plots.
In this post I look at how we can pass a Polars
DataFrame to Seaborn for some advanced plots and some other tips for visualisng a Polars
DataFrame with Seaborn.
One of the advanced Seaborn visualisations is the
jointplot. This is a scatter plot of two columns but with the distribution of each column also plotted. In addition we can add a
hue field to colour these plots by a third column.
In this example I use the Titanic dataset and do a
jointplot of the passenger age and fare paid columns and colour by the passenger class column.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 import polars as pl import seaborn as sns df = pl.read_csv("titanic.csv") sns.jointplot( data=( df .with_columns( [ # Take the log of the Age and Fare floating-point columns pl.col(pl.Float64).log(), # Cast the passenger class column to string pl.col("Pclass").cast(pl.Utf8) ] ) ), x="Age", y="Fare", hue="Pclass", )
One common feature of plotting libraries like Plotly or Seaborn is that they infer how the data should be presented based on the dtype of the data. This can lead to charts that display in a confusing way!
In this example we want to colour by the
Pclass column for passenger class. The values in this column are 1,2 or 3 and so it has an integer dtype. However, from a plotting perspective this column is really a kind of ordered categorical column rather than numerical. However, because of the integer dtype Seaborn and Plotly see this as a numerical column and try to treat it as quantiative data.
To address this we must convert the
Pclass column to a string dtype. We do this in the example above with the expression
pl.col("Pclass").cast(pl.Utf8). The charts then display in the way we expect.
Passing a Polars
DataFrame leads to a copy of your data internally in Seaborn. To avoid uncessary copying I recommend calling
select to only copy the subset of columns required for your chart e.g.
Want to know more about Polars for high performance data science? Then you can: