Dmitriy Ryaboy
squarecog.bsky.social
Dmitriy Ryaboy
@squarecog.bsky.social
Works with data, runs with swords.
This is a pretty intriguing idea for future proofing file formats.
It does assume wasm is future proof, of course, but that feels like a safer bet than "assume readers are updated"
Our F3 files embed small WASM programs to decode data. If somebody creates a new encoding and the DBMS does not have native impl, it can still read data using WASM passing Arrow buffers. Our experiments show WASM is 15-20% slower than native. We use @spiraldb.com's Vortex encoding impls.
October 1, 2025 at 3:23 PM
If you love this sort of thing, read up on C-store, which introduced this idea in 2005 and commercialized it in Vertica. Stonebraker, Sam Madden, Daniel Abadi.
Parquet was also partially inspired by Vertica (and Google's Dremel, and PaX by Natassa Ailamaki et al) :-).
duckdb.org DuckDB @duckdb.org · Sep 29
Are you streaming into your Lakehouse?

Traditional formats suffered with the “many small files” problem — OLAP engines merge them reactively with long jobs. ⏳

DuckLake takes a proactive path: Data Inlining + async flush to parquet while always keeping data queryable ⚡
September 30, 2025 at 3:04 PM
ML is just applied stats.
Stats is just applied algebra.
LLM is just ML backward and with an extra L.
September 22, 2025 at 10:50 PM
The obvious reaction here is to shift at least some of the hiring out of the country to get access to the talent. The obvious counter reaction is to tax payments and wages to foreign employees and contractors. Which will also provoke a reaction. And none of this makes the US stronger or smarter.
September 20, 2025 at 9:17 PM
About a decade late with this, but:
Someone should have started a social media ad agency called Twaddle.
September 20, 2025 at 6:25 PM
It's tempting to take shortcuts that give you speed today by mortgaging speed tomorrow.

Trouble is, today is yesterday's tomorrow.
September 18, 2025 at 11:07 PM
I tried 2 different english to insights sql llm agents from reputable vendors in the past week. Data analyst jobs remain safe.
Firmly in the toy category for now.
September 17, 2025 at 12:25 AM
Happened to be by the Cloudera building in the south bay earlier. Checked LinkedIn and discovered I have literally 0 1st degree connections who work there now. Not unexpected, I guess, but, man... betwen hnwx and cldr I used to know like 100s of folks there
September 16, 2025 at 5:14 AM
Heck of a vote of confidence from LinkedIn for Flyte: www.linkedin.com/blog/enginee...
OpenConnect: LinkedIn's Next-Generation AI Pipeline Ecosystem
www.linkedin.com
September 15, 2025 at 3:02 AM
About once every two years I have cause to re-learn a very important data lesson: never, ever, trust analysis based on ratio metrics.
September 10, 2025 at 9:09 PM
Trying and failing to make the page edits look right? Is offline access lackluster? Tired of AI upsell as a replacement for poor search quality?

You might be suffering from Notion sickness.
August 30, 2025 at 4:13 AM
Phone book, noun: an ebook you read on your phone.
August 26, 2025 at 1:11 AM
Looking up latin phrases on Google results in an AI response in French a good % of the time. Fortunately, my French is slightly better than my latin.
August 14, 2025 at 3:15 PM
The scariest thing about dinosaurs is that they were huge, absolutely dominant, *and humans had nothing to do with them dying out*
August 13, 2025 at 1:03 AM
Andor is a very boolean show.
July 19, 2025 at 5:12 AM
Just 45 minutes north of SF, and AI means something completely different on a dairy farm in Sonoma, in the context of breeding livestock. I seriously wasn't tracking for a few minutes there.
July 10, 2025 at 12:29 AM
A Random Walk In Question Space:

A particularly ineffective, but popular, investigation method commonly found when the investigator does not have direct access to the data, and does not directly incur the cost of asking for irrelevant analysis that does not get them any closer to useful answers.
June 10, 2025 at 4:11 PM
Still thinking about it. Serious "retired samurai" energy.

Man's just standing there, pouring beers, drying glasses, listening to young coders in this city of startups go through their ups and downs. Dispensing wisdom about code and life.
Just saw a LinkedIn profile that is a real career to aspire to:
* CS Professor
* Google Director of Eng
* Distinguished Eng at Microsoft
* Brewer and bar owner.
June 5, 2025 at 3:55 PM
A friend once said that experienced software folks get a weird superpower: they can diagnose and even predict bugs / seams in systems they've never seen the code for, just from knowing how stuff is put together.
It's a thrill every time one pulls this off successfully :).
June 4, 2025 at 7:54 PM
Just saw a LinkedIn profile that is a real career to aspire to:
* CS Professor
* Google Director of Eng
* Distinguished Eng at Microsoft
* Brewer and bar owner.
June 4, 2025 at 7:42 PM
DuckLake makes tons of sense, as all things DuckDB does. How long till they implement CMETA style optimizations right in the query planner? vldb.org/pvldb/vol14/...
vldb.org
June 3, 2025 at 12:35 AM
I am just a boy
Standing in front of an agent
Asking it to try again and think harder.
May 29, 2025 at 10:59 PM
Sure, an english to sql agent on your db won't give you an answer as accurate as a data analyst would, but it will give you incorrect answers so much faster!
May 29, 2025 at 3:51 PM
Seeing com.google.hadoop is still a mind bender.
Don't think any of us saw this coming back in the day... or at least I didn't.
May 28, 2025 at 9:01 PM
Surprising how much people talk about Iceberg providing "separation of data and compute" given that it, you know, doesn't.
May 28, 2025 at 6:10 PM