Altair has been my favourite visualisation library for a long time. It allows me to make beautiful visualisations with an API that is concise and consistent. I was sad to find last year that I couldn’t pass a Polars
DataFrame to an Altair chart.
Those days are gone, however. In this post we look at how we can use Altair and Poalrs with the release of Altair 5 and how VegaFusion is helping Altair to scale up to larger datasets.
Here’s a simple chart from my course that I made using the Titanic dataset.
1 2 3 4 5 6 7 8 9 10 11 12 13 class_survival_counts = ( df .groupby('Survived','Pclass') .count() ) alt.Chart( class_survival_counts, width=600 ).mark_bar().encode( x="Pclass:N", y="count:Q", color="Survived:N" )
The great thing is that this code and the output look just the same as if you were coming from Pandas.
However, it is still early days for Altair with Arrow-based libraries like Polars you might still find the odd bug.
You need Altair v5+ to use Polars. At the time of writing you can install this as a release candidate from PyPi.
The other exciting development around Altair is Vegafusion. Vegafusion can help Altair charts to overcome the infamous
MaxRowsError with serverside rendering.
In a traditional Altair plot you give Altair your rows of data and Altair (via the Vega-lite library) passes this data to your browser which uses Vega-lite turn the data into html objects that can be rendered. However, with this clientside rendering your browser has to deal with more and more data as the number of rows grows.
With Vegafusion the rendering happens serverside. This means that you create your Altair chart as normal, but then the data is passed to the Vegafusion engine which is happier processing large datasets than your browser. The Vegafusion engine then passes the rendered html to your browser.
To use Vegafusion you import it and run
vf.enable which tells Altair that Vegafusion will handle rendering.
1 2 3 import altair as alt import vegafusion as vf vf.enable()
Of course your browser may still be unhappy if there are many html objects to render! For example, if you are making a scatter plot with thousands of points this will still be a lot of work for your browser (and may crash it). However, if Vegafusion can reduce the data size (e.g. by binning, aggregating or filtering) then you will see a larger benefit.
For more on visualisation with Polars see these posts:
Want to know more about Polars for high performance data science? Then you can: