Dmitriy Ryaboy
@squarecog.bsky.social
Works with data, runs with swords.
Right, it's all about the ecosystem. Writers are always going to be more conservative than readers, rightfully so. This f3 idea is essentially about letting writers adopt new stuff without worrying too much about older gen readers (once older gen can read this sort of thing, another decade later).
October 8, 2025 at 9:42 PM
Right, it's all about the ecosystem. Writers are always going to be more conservative than readers, rightfully so. This f3 idea is essentially about letting writers adopt new stuff without worrying too much about older gen readers (once older gen can read this sort of thing, another decade later).
One problem with Parquet is many implementations are not updated when the official spec improves. Everyone just uses the lowest version feature set. That means if Parquet adds a better data encoding scheme and a file uses it, many common reader libraries won't be able retrieve the data.
October 8, 2025 at 4:15 PM
But anyway the point is not whether rle is useful, but if there is a world where parquet format improvements introduced since like 2018 get adopted, and more useful encodings can be propagated.
October 8, 2025 at 3:42 PM
But anyway the point is not whether rle is useful, but if there is a world where parquet format improvements introduced since like 2018 get adopted, and more useful encodings can be propagated.
RLE+delta allows filter pushdowns to work without decompressing. If you have repeated strings and sort, dict encode, and rle+delta, even regex searches become blazing fast. Parquet enables this, but who implements it?
October 8, 2025 at 3:40 PM
RLE+delta allows filter pushdowns to work without decompressing. If you have repeated strings and sort, dict encode, and rle+delta, even regex searches become blazing fast. Parquet enables this, but who implements it?
To be fair, you would not require it. An implementor would only do this if they want to future proof, and are ok with the whole executable data file thing. Otherwise, same as now: implement the reader for every encoding.
It's painful how little even basic RLE is being used in the wild :(
It's painful how little even basic RLE is being used in the wild :(
October 7, 2025 at 8:32 PM
To be fair, you would not require it. An implementor would only do this if they want to future proof, and are ok with the whole executable data file thing. Otherwise, same as now: implement the reader for every encoding.
It's painful how little even basic RLE is being used in the wild :(
It's painful how little even basic RLE is being used in the wild :(
I had the same 2 thoughts in the same sequence :)
October 1, 2025 at 11:52 PM
I had the same 2 thoughts in the same sequence :)
Do you think there's anything blocking parquet from adopting the same wasm reader approach to unlock new encodings and other schemes?
October 1, 2025 at 11:24 PM
Do you think there's anything blocking parquet from adopting the same wasm reader approach to unlock new encodings and other schemes?
The original Yo app.
September 26, 2025 at 11:01 AM
The original Yo app.
Ask Ketan, I've been trying to find a good excuse to get my teams to use Flyte for half a decade now 😆
September 20, 2025 at 6:24 PM
Ask Ketan, I've been trying to find a good excuse to get my teams to use Flyte for half a decade now 😆
Thanks for the reference, hadn't seen that!
Are these all one-shotting or doing an agentic workflow to explore before formulating final answer?
Are these all one-shotting or doing an agentic workflow to explore before formulating final answer?
September 17, 2025 at 3:07 PM
Thanks for the reference, hadn't seen that!
Are these all one-shotting or doing an agentic workflow to explore before formulating final answer?
Are these all one-shotting or doing an agentic workflow to explore before formulating final answer?
Also, everyone trips at least once on average of ratios vs ratio of sums (which becomes obvious once you describe them as unweighted vs weighted means).
September 11, 2025 at 8:52 PM
Also, everyone trips at least once on average of ratios vs ratio of sums (which becomes obvious once you describe them as unweighted vs weighted means).
It shows up in different ways in different places. The most basic being, you don't know if the rario moved cause numerator went up or denominator went down. Correct course of action is often different depending on which!
September 11, 2025 at 8:47 PM
It shows up in different ways in different places. The most basic being, you don't know if the rario moved cause numerator went up or denominator went down. Correct course of action is often different depending on which!