Home AWS Lambda with Polars
Post
Cancel

AWS Lambda with Polars

This post was created while writing my Data Analysis with Polars course. Check it out on Udemy with a half price discount

Working with cloud storage such as AWS S3 can be a pain with lots of boilerplate to adapt code that works beautifully on your machine to the weird environment of the cloud.

One way to make things easier is by using the smart_open library. This library tries to make working with cloud storage work in a similar manner to working with local files using Python’s open statement.

In the simple example below I show how to use smart_open to create performant AWS Lambda runtimes that are almost like reading a local file.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
import polars as pl
import smart_open

def lambda_handler(event, context):
    try:
        s3_bucket_name = event["s3_bucket_name"]
        s3_file_name = event["s3_file_name"]
        url = f"s3://{s3_bucket_name}/{s3_file_name}"

        with smart_open.open(url, "rb") as file:
            df = (
                pl.read_parquet(file, columns=["id1", "v1"])
                .groupby("id1")
                .agg(pl.col("v1").mean())
            )
    except Exception as err:
        print(err)

    return df.write_json()

This is just the start of what Polars can do in serverless environments. Get in touch if you’d like to discuss using Polars to reduce your cloud compute spend.

Learn more

Want to know more about Polars for high performance data science and ML? Then you can:

or let me know if you would like a Polars workshop for your organisation.

This post is licensed under CC BY 4.0 by the author.