Author | Lightnews

Reposted by Mike Driscoll

Simon Späti 🏔️ @ssp.sh · 21d

In data analytics, we're facing a paradox. AI agents can theoretically analyze anything, but without the right foundations, they're as likely to hallucinate a metric as to calculate it correctly. They can write SQL in seconds, but will it answer the right business question?

Data Modeling for the Agentic Era: Semantics, Speed, and Stewardship

Master the three pillars of agentic data modeling: Metrics SQL for semantics, sub-second analytics for speed, and AI guardrails for trusted insights.

www.ssp.sh

1 4 4

Mike Driscoll @medriscoll.com · May 27

DuckLake is a simpler, SQL-friendlier alternative to Iceberg.

“There are no Avro or JSON files. There is no additional catalog server or additional API to integrate with. It’s all just SQL.“

That said, choose your catalog database — a single-point of failure — *very carefully*.

DuckDB @duckdb.org · May 27

Today we're launching DuckLake, an integrated data lake and catalog format powered by SQL. DuckLake unlocks next-generation data warehousing where compute is local, consistency central, and storage scales till infinity. ⁠ducklake is an open standard and we implemented it in the "ducklake" extension.

1 10

Reposted by Mike Driscoll

MotherDuck @motherduck.com · May 13

Quack... Quack... and code!
@mehdio.com and @medriscoll.com from @rilldata.com are diving into how GenAI is reshaping BI-as-code — from idea to implementation.

This one’s for data folks who want to see beyond the hype.

Register : lu.ma/w4ncmttn

BI-as-Code with GenAI+DuckDB Real Use, Not Just Hype · Luma

Mehdi and Michael dive into how GenAI is reshaping BI-as-code. And as always — it’s not just talk, it’s real code. Get ready for pragmatic insights and…

lu.ma

4 4

Mike Driscoll @medriscoll.com · Apr 15

And it's only fitting that we'll be hosting this event a true lakehouse, the Lake Chalet, the best waterfront restaurant on Lake Merritt, steps from the Data Council main event.

RSVP here while tickets last:

www.rilldata.com/events/data-...

Real-time Roundtable Live from Data Council | Rill Data

Register now for Real-time Roundtable Live from Data Council.

www.rilldata.com

1

Mike Driscoll @medriscoll.com · Apr 15

Similarly, Toby and his team at Tobiko Data have built an powerful yet elegant transformation platform -- combining SQLMesh and SQL dialect transpilation (SQLGlot) to allow portability of pipelines between databases, warehouses, and lakehouses.

techcrunch.com/2024/06/05/w...

With $21.8M in funding, Tobiko aims to build a modern data platform | TechCrunch

Tobiko aims to reimagine how teams work with data by offering a dbt-compatible data transformation platform.

techcrunch.com

1 1

Mike Driscoll @medriscoll.com · Apr 15

The sub-second speed-at-scale of these real-time engines enable new kinds of applications: point-of-sale fraud detection, IoT monitoring, real-time context for AI agents -- these use cases just aren't supported by traditional data warehouses like Snowflake.

www.rilldata.com/blog/scaling...

Rill | Scaling Beyond Postgres: How to Choose a Real-Time Analytical Database

This blog explores how real-time databases address critical analytical requirements. We highlight the differences between cloud data warehouses like Snowflake and BigQuery, legacy OLAP databases like ...

www.rilldata.com

1

Mike Driscoll @medriscoll.com · Apr 15

Why am I so excited to bring this crew together on stage? It's because real-time analytical databases like ClickHouse, Apache Pinot, and MotherDuck / DuckDB are reshaping data stacks the fastest-moving engineering teams on earth -- OpenAI, DoorDash, and
@stackblitz.com.

1 1

Mike Driscoll @medriscoll.com · Apr 15

This legendary panel of technical founders includes Yury Izrailevsky (co-founder of ClickHouse), Kishore Gopalakrishna (founder of StarTree, creator of Apache Pinot), @jrdntgn.bsky.social (co-founder of MotherDuck), and @captaintobs.bsky.social (founder of Tobiko, creators of SQLMesh and SQLGlot).

1 1 2

Mike Driscoll @medriscoll.com · Apr 15

Yo SF Bay Area #databs crew, want to talk lakehouses at a real Lake House? :)

Next week after Data Council, join the founders of @clickhouse.com, @motherduck.com, @startreedata.bsky.social, and @tobikodata.com to talk real-time databases and next-generation ETL.

www.rilldata.com/events/data-...

1 3 10

Mike Driscoll @medriscoll.com · Apr 11

If you're interested in learning more about the "Shift Left" trend, please check out data engineering author @ssp.sh's blog post released today.

www.rilldata.com/blog/what-sh...

Rill | What

By shifting left, data teams can create more maintainable, performant, and reliable data systems while reducing duplication and inconsistency throughout the data stack.

www.rilldata.com

1 2

Mike Driscoll @medriscoll.com · Apr 11

At @rilldata.com, we've taken the step of shifting metric layers left from BI tools and pushing them into real-time analytical databases like ClickHouse and DuckDB -- to power insanely fast exploratory dashboards. (I'll be discussing at my Data Council in two weeks).

docs.google.com/presentation...

DuckCon 6 - A SQL-Based Metrics Layer Powered by DuckDB

INTRODUCING A SQL-BASED METRICS LAYER POWERED BY DUCKDB Mike Driscoll Co-Founder, CEO at Rill Data

docs.google.com

2 1 6

Mike Driscoll @medriscoll.com · Apr 11

Now that SQL-on-data-lake frameworks are maturing (DuckDB SQL on Iceberg, Spark SQL on DeltaLake), and transpiling between SQL dialects is possible (thanks to SQLMesh and @tobikodata.com), it's possible to shift these SQL transformations left, out of the warehouse and onto object storage.

1 1

Mike Driscoll @medriscoll.com · Apr 11

The advantage was that transformations could be written in SQL. The disadvantage is you pay the Snowflake tax for every compute cycle in their warehouse.

1 2

Mike Driscoll @medriscoll.com · Apr 11

Transformation logic is another use case.

"Shifting left" is in some ways a reaction to the "ELT" pattern (or anti-pattern, in my opinion) that big data warehouses like Snowflake were pushing -- whereby you extract, load, and only *then* transform data in the warehouse.

2 1

Mike Driscoll @medriscoll.com · Apr 11

Data validation is a great example: an eCommerce platform might validate that order prices contain no negative numbers after its loaded into the database. "Shifting left" means moving that validation to the ingestion or even the collection step in the pipeline, before it hits the database.

1 2

Mike Driscoll @medriscoll.com · Apr 11

Data pipelines can be visualized as flowing data left to right, starting with raw sources, ingested and modeled into database tables, and eventually served out through user-facing applications and dashboards.

"Shifting left" means taking logic that lives on the right side and moving it leftward.

1 1

Mike Driscoll @medriscoll.com · Apr 11

"Shifting left" is the new trend among in data stacks -- but what does it mean and what does it matter?

2 1 5

Mike Driscoll @medriscoll.com · Apr 4

Apache Pinot is one of the world’s fastest and most scalable real-time analytical databases, relied on by LinkedIn, Uber, and Stripe. It was awesome diving into the secrets behind its unique architecture with creator and @startreedata.bsky.social founder Kishore Gopalakrishna.

Rill Data @rilldata.com · Apr 4

The past few Data Talks on the Rocks interviews have been with the creators of real-time analytical databases. For our 7th round, we decided to chat with Kishore Gopalakrishna, CEO of @startreedata.bsky.social and creator of the wildly popular database #ApachePinot.

www.rilldata.com/blog/kishore...

Rill | Data Talks on the Rocks 7 - Kishore Gopalakrishna, StarTree

Mike Driscoll, CEO of Rill, is joined by Kishore Gopalakrishna, founder and CEO of StarTree. This video interview covers Pinot's architectural decisions and actual use cases from Uber, Stripe, and Wal...

www.rilldata.com

1 11

Mike Driscoll @medriscoll.com · Mar 15

I wish I could say "yes almost certainly" but if the levels of competence we're witnessing in other areas I'm not placing any bets on DOGE's data security practices.

1

Mike Driscoll @medriscoll.com · Mar 14

So what are DOGE's true priorities?

As Maya Angelou wrote: "When someone shows you who they are, believe them the first time."

1 4

Mike Driscoll @medriscoll.com · Mar 14

Cloud data centers have climate control, and more compute power than your MacBook Air!

This set up could be done by competent data engineer in less time than it took to run her query.

The DOGE tech wiz acknowledged this and wrote "it hasn't been a priority to get that done."

1 1 3

Mike Driscoll @medriscoll.com · Mar 14

They should haved loaded this multi-terabytes contracts data set into a cloud database, or even better -- a database built for real-time analytics like @clickhouse.com, Pinot, or StarRocks (sorry @duckdb.org, this is more than you can handle).

1 8

Mike Driscoll @medriscoll.com · Mar 14

But it doesn't absolve her, or her team, from ridicule.

The DOGE tech wiz kids shouldn't be toting federal databases around on USB-attached external hard drives in "hot, humid hotel rooms" (her literal words): that's what database servers were invented for.

1 1 6

Mike Driscoll @medriscoll.com · Mar 14

What actually overheated was an USB external hard drive, with several terabytes of contract data, that she was reading into her MacBook Air and then filtering to find contracts matching her criteria. (High-speed reads on NVMe drives can heat up to 175 °F before thermal throttling kicks in).

2 1 4

Mike Driscoll @medriscoll.com · Mar 14

Like others, I jumped on the bandwagon to ridicule the DOGE analyst who "overheated her hard drive" by analyzing just 60k rows of data.

I was wrong.

The truth is even dumber.

🧵

3 3 8