Home Polars, Altair and Vegafusion
Post
Cancel

Polars, Altair and Vegafusion

Altair has been my favourite visualisation library for a long time. It allows me to make beautiful visualisations with an API that is concise and consistent. I was sad to find last year that I couldn’t pass a Polars DataFrame to an Altair chart.

Those days are gone, however. In this post we look at how we can use Altair and Poalrs with the release of Altair 5 and how VegaFusion is helping Altair to scale up to larger datasets.

Want to get going with Polars? This post is an extract from my Up & Running with Polars course - learn more here or check out the preview of the first chapters

Here’s a simple chart from my course that I made using the Titanic dataset.

1
2
3
4
5
6
7
8
9
10
11
12
13
class_survival_counts = (
    df
    .groupby('Survived','Pclass')
    .count()
)
alt.Chart(
    class_survival_counts,
    width=600
).mark_bar().encode(
    x="Pclass:N",
    y="count:Q",
    color="Survived:N"
)

The great thing is that this code and the output look just the same as if you were coming from Pandas.

However, it is still early days for Altair with Arrow-based libraries like Polars you might still find the odd bug.

You need Altair v5+ to use Polars. At the time of writing you can install this as a release candidate from PyPi.

Vegafusion

The other exciting development around Altair is Vegafusion. Vegafusion can help Altair charts to overcome the infamous MaxRowsError with serverside rendering.

What is serverside rendering?

In a traditional Altair plot you give Altair your rows of data and Altair (via the Vega-lite library) passes this data to your browser which uses Vega-lite turn the data into html objects that can be rendered. However, with this clientside rendering your browser has to deal with more and more data as the number of rows grows.

With Vegafusion the rendering happens serverside. This means that you create your Altair chart as normal, but then the data is passed to the Vegafusion engine which is happier processing large datasets than your browser. The Vegafusion engine then passes the rendered html to your browser.

To use Vegafusion you import it and run vf.enable which tells Altair that Vegafusion will handle rendering.

1
2
3
import altair as alt
import vegafusion as vf
vf.enable()

Of course your browser may still be unhappy if there are many html objects to render! For example, if you are making a scatter plot with thousands of points this will still be a lot of work for your browser (and may crash it). However, if Vegafusion can reduce the data size (e.g. by binning, aggregating or filtering) then you will see a larger benefit.

For more on visualisation with Polars see these posts:

Want to get going with Polars? This post is an extract from my Up & Running with Polars course - learn more here or check out the preview of the first chapters

Next steps

Want to know more about Polars for high performance data science? Then you can:

This post is licensed under CC BY 4.0 by the author.