Published on: 11th October 2022
Can you use Polars and Apache Arrow to fit ML models?
This post was created while writing my Data Analysis with Polars course. Check it out on Udemy
Update: The XGBoost developers may withdraw support for fitting models with Arrow - see my discussion with them in this issue. I recommend following their advice to call to_pandas
on your Polars DataFrame
. I wouldn’t lose too much sleep over this: in my current ML pipeline that runs for about 5 minutes this adds about 2 seconds to the total timing.
Here’s the original blog post:
Polars is backed by Apache Arrow rather than Numpy. One argument you hear against working in Polars is that you’ll have to convert back to Numpy to fit ML models.
Does this argument against using Polars and Apache Arrow libraries hold water?
Nope - it’s not true now and will be more invalid over time.
Let’s take a Polars dataframe of the Titanic data for an example.
- Do some simple feature engineering
- Pass it to XGBoost in its Arrow form
- Fit the model.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import polars as pl
import xgboost as xgb
df = pl.read_csv(csvPath)
X = (
df
.select(["Pclass"])
.to_dummies()
.to_arrow()
)
y = df["Survived"]
model = xgb.XGBClassifier(objective='binary:logistic')
model.fit(X, y)
df = pl.concat([
df,
pl.DataFrame(model.predict_proba(X)[:,1],columns=["pos"])
],
how="horizontal"
)
No Numpy or Pandas required.
We can do this because XGBoost introduced support for Arrow in recent months. Other ML and feature engineering libraries are working on Arrow support as well.
In addition, if your library does need a Numpy array then it’s often quicker to load and pre-process your data in Polars and then convert to a Numpy array at the last minute rather than using Pandas.
Learn more
Want to know more about Polars for high performance data science and ML? Then you can:
- check out my Polars course on Udemy
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.