This post was created while writing my Data Analysis with Polars course. Check it out on Udemy with a half price discount
Working with cloud storage such as AWS S3 can be a pain with lots of boilerplate to adapt code that works beautifully on your machine to the weird environment of the cloud.
One way to make things easier is by using the smart_open
library. This library tries to make working with cloud storage work in a similar manner to working with local files using Python’s open
statement.
In the simple example below I show how to use smart_open
to create performant AWS Lambda runtimes that are almost like reading a local file.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import polars as pl
import smart_open
def lambda_handler(event, context):
try:
s3_bucket_name = event["s3_bucket_name"]
s3_file_name = event["s3_file_name"]
url = f"s3://{s3_bucket_name}/{s3_file_name}"
with smart_open.open(url, "rb") as file:
df = (
pl.read_parquet(file, columns=["id1", "v1"])
.groupby("id1")
.agg(pl.col("v1").mean())
)
except Exception as err:
print(err)
return df.write_json()
This is just the start of what Polars can do in serverless environments. Get in touch if you’d like to discuss using Polars to reduce your cloud compute spend.
Learn more
Want to know more about Polars for high performance data science and ML? Then you can:
- get in touch to discuss your data processing challenges
- join my Polars course on Udemy
- follow me on twitter
- connect with me at linkedin
- check out my youtube videos
or let me know if you would like a Polars workshop for your organisation.