Lightnews — Scholar-powered news

Polars

@pola.rs

Are you looking to get started with Polars over the summer?

We've partnered with @datacamp.bsky.social to create an interactive course that covers the fundamentals so you can write your next query with Polars.

The course is free till the end of August: www.datacamp.com/courses/intr...

August 13, 2025 at 3:11 PM

Polars

@pola.rs

We've partnered with @datacamp.bsky.social to create an interactive Polars course.

Learn the fundamentals and get familiar with our API through hands-on exercises. The course is available for everyone and free until the end of August.

Start the free course here: www.datacamp.com/courses/intr...

May 28, 2025 at 1:24 PM

Polars

@pola.rs

Polars has gotten 4x faster than Polars! 🚀

In the last months, the team has worked incredibly hard on the new-streaming engine and the results pay off. It is incredibly fast, and beats the Polars in-memory engine by a factor of 4 on a 96vCPU machine.

May 1, 2025 at 2:05 PM

Polars

@pola.rs

Polars provides a number of xxx_horizontal operations.

These expressions perform computations across columns. (Or along rows, depending on how you look at it.)

If your horizontal operation isn’t implemented, you can use the general-purpose fold.

Diagram showing examples of applying horizontal operations to a dataframe with random numerical values. The example shows the expressions max_horizontal, sum_horizontal, mean_horizontal, and cum_sum_horizontal. The first three produce a numerical column and the expression cum_sum_horizontal produces a struct column with as many fields as there are input columns/series.

April 30, 2025 at 2:43 PM

Polars

@pola.rs

Polars provides 3 functions you can use to generate temporal ranges:

date_range, datetime_range, and time_range.

These can be executed eagerly or lazily.

You can also customize the interval between consecutive values and whether the start/end points are included.

Diagram showing how the functions date_range, datetime_range, and time_range, can be used to generate series of consecutive temporal values. The interval between consecutive values can be changed with the parameter interval and the two endpoints may or may not be included in the generated range, depending on the value the parameter closed is set to.

March 19, 2025 at 12:52 PM

Polars

@pola.rs

It does:

March 11, 2025 at 2:15 PM

Polars

@pola.rs

The expression over can be used to compute expressions within isolated groups.

This means you can do computations per group without having to group first and then explode after.

In this example, we rank swimmers based on their time, but within their race type.

Diagram showing how the use of the expression over impacts the result of an expression. Using rank on a dataframe will rank all swimmers by their time, but that means that the fastest swimmer of the second race type is actually ranked 3rd. By using over, the fastest swimmer of each race type gets the rank 1.

February 27, 2025 at 3:11 PM

Polars

@pola.rs

The context filter lets you filter out rows from a dataframe based on some conditions.

Within an aggregation, you can also use filter to filter values from aggregated groups.

In this example we ignore unverified times when computing the current record.

Diagram showing how filter works inside aggregations. We use an aggregation to compute the current record for different swimming race types and we use a filter to ignore unverified times, while also using information about unverified times to compute another column.

February 20, 2025 at 5:22 PM

Polars

@pola.rs

The expression `clip` is pretty straightforward:

You provide a lower and an upper bound, and Polars makes sure all values fall within those bounds.

If a value is too small/too large, it's replaced by the bound.

Bounds can be literals, other columns, or arbitrary expressions.

Diagram showing how the expression clip works. The diagram shows `clip` used with integers and with `datetime` objects. The diagram also shows the usage of literals, other columns, and arbitrary expressions, as the bounds.

February 3, 2025 at 3:46 PM

Polars

@pola.rs

Join our webinar with NVIDIA on January 28 for an in-depth session on how the GPU engine works, from collecting your query to parallel execution on the GPU.

Sign up at info.nvidia.com/nvidia-polar...

See you there?

January 7, 2025 at 3:18 PM

Polars

@pola.rs

Polars 1.19 comes with support for arbitrary predicates in join_where.

This means that inequality joins are now more flexible than ever!

Here is a small example of something you couldn't do before:

January 6, 2025 at 11:58 AM

Polars

@pola.rs

Polars supports dynamic aggregations based on time windows via the function `group_by_dynamic`.

To use it, you specify a date(time) column to group by, and then determine the windows over which values are aggregated.

Note how data points can fall within multiple windows 👇

Diagram showing how the window boundaries of 10-year long windows align neatly with the decades and the decade halfpoints, 1980, 1985, 1990, etc, although the first datapoint is from 1981.

December 12, 2024 at 7:36 PM

Polars

@pola.rs

Can't remember how many days each month has?
(Me neither!)

Memorise this Polars snippet instead.

Using some calendar-aware functions, we can get the answer in a tidy dataframe, as the diagram below shows.

December 10, 2024 at 3:32 PM

Polars

@pola.rs

You want to join two tables on their ID column, but only when the dates in one table fall within the range of the other table.

Polars lets you do that with `join_where`, which supports inequality joins through the use of inequality predicates.

Here's an example 👇

Diagram exemplifying how `join_where` performs inequality joins.

We're joining two dataframes that have 3 rows each and each row has a corresponding row on the other dataframe because they have matching IDs.

The dataframe on the right has a column `dt` with date values, which fall within the range created by the columns `start` and `end` of the other dataframe, except for the third row, in which `dt` falls outside the range.

When using `join_where` with the predicate `pl.col("dt").is_between("start", "end")`, the third row of each dataframe will not be matched and won't show up in the join result, which only joined the first 2 rows of each dataframe.

November 28, 2024 at 10:52 AM

Polars

@pola.rs

Polars has essentially 18 different data types.

If you are unsure what each type is, the conversion table below might help you.

Each Polars data type is presented next to the **most similar** Python type.

November 22, 2024 at 10:47 AM

Polars

@pola.rs

How to “expand” ranges like "3-5" across new rows with the values 3, 4, 5?

This comes straight from our Discord server (discord.com/invite/4UfP5...)

A diagram shows how to use str.split, list.first / list.last, cast, pl.int_ranges, and explode, all together, to turn a dataframe where a column may contain ranges like "3-5" into a similar dataframe where all ranges have been expanded, or exploded, across multiple rows.

The full code is:

range_start = pl.col("nrs").str.split("-").list.first().cast(pl.Int64)
range_end = pl.col("nrs").str.split("-").list.last().cast(pl.Int64)
df.with_columns(pl.int_ranges(range_start, range_end + 1)).explode("nrs")

November 21, 2024 at 2:36 PM

Polars

@pola.rs

Why is there a `struct` data type?

A single expression produces a single column, so expressions like `value_counts` need to output structs to map the values to their counts.

With that said, do you understand why `.struct.unnest` doesn't break the 1 expr = 1 column principle?

Diagram showing how `value_counts` produces a column with struct values, mapping column values to their counts.
We then show how to use `.struct.field` to extract a single field from the struct and how to use `.struct.unnest` to extract all fields into corresponding columns.

November 20, 2024 at 11:14 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news