Lightnews — Scholar-powered news

Dmitriy Ryaboy

@squarecog.bsky.social

2.8K followers 230 following 170 posts

Works with data, runs with swords.

Posts Replies Media Videos

Dmitriy Ryaboy

@squarecog.bsky.social

Right, it's all about the ecosystem. Writers are always going to be more conservative than readers, rightfully so. This f3 idea is essentially about letting writers adopt new stuff without worrying too much about older gen readers (once older gen can read this sort of thing, another decade later).

October 8, 2025 at 9:42 PM

Dmitriy Ryaboy

@squarecog.bsky.social

bsky.app/profile/andy...

Andy Pavlo @andypavlo.bsky.social · Oct 1

One problem with Parquet is many implementations are not updated when the official spec improves. Everyone just uses the lowest version feature set. That means if Parquet adds a better data encoding scheme and a file uses it, many common reader libraries won't be able retrieve the data.

Survey of the features used in public Parquet files.

October 8, 2025 at 4:15 PM

Dmitriy Ryaboy

@squarecog.bsky.social

But anyway the point is not whether rle is useful, but if there is a world where parquet format improvements introduced since like 2018 get adopted, and more useful encodings can be propagated.

October 8, 2025 at 3:42 PM

Dmitriy Ryaboy

@squarecog.bsky.social

RLE+delta allows filter pushdowns to work without decompressing. If you have repeated strings and sort, dict encode, and rle+delta, even regex searches become blazing fast. Parquet enables this, but who implements it?

October 8, 2025 at 3:40 PM

Dmitriy Ryaboy

@squarecog.bsky.social

To be fair, you would not require it. An implementor would only do this if they want to future proof, and are ok with the whole executable data file thing. Otherwise, same as now: implement the reader for every encoding.
It's painful how little even basic RLE is being used in the wild :(

October 7, 2025 at 8:32 PM

Dmitriy Ryaboy

@squarecog.bsky.social

I had the same 2 thoughts in the same sequence :)

October 1, 2025 at 11:52 PM

Dmitriy Ryaboy

@squarecog.bsky.social

Do you think there's anything blocking parquet from adopting the same wasm reader approach to unlock new encodings and other schemes?

October 1, 2025 at 11:24 PM

Dmitriy Ryaboy

@squarecog.bsky.social

The original Yo app.

September 26, 2025 at 11:01 AM

Dmitriy Ryaboy

@squarecog.bsky.social

Ask Ketan, I've been trying to find a good excuse to get my teams to use Flyte for half a decade now 😆

September 20, 2025 at 6:24 PM

Dmitriy Ryaboy

@squarecog.bsky.social

Thanks for the reference, hadn't seen that!
Are these all one-shotting or doing an agentic workflow to explore before formulating final answer?

September 17, 2025 at 3:07 PM

Dmitriy Ryaboy

@squarecog.bsky.social

Also, everyone trips at least once on average of ratios vs ratio of sums (which becomes obvious once you describe them as unweighted vs weighted means).

September 11, 2025 at 8:52 PM

Dmitriy Ryaboy

@squarecog.bsky.social

It shows up in different ways in different places. The most basic being, you don't know if the rario moved cause numerator went up or denominator went down. Correct course of action is often different depending on which!

September 11, 2025 at 8:47 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news