David Jayatillake
jayatillake.bsky.social
David Jayatillake
@jayatillake.bsky.social
Writer @ davidsj.substack.com
Reposted by David Jayatillake
How big should your data team be?

Data teams are often oversized. A company of 200 people rarely needs 15+ data staff, usually 5% of org size is enough

dataactionmentor.com/knowledge-ba...
How big should your data team be?
Founders and CEOs are wondering if their data function is bloated and if they should replace everyone with AI agents. Data Leaders are scrambling to defend why they need a 15-people data team in a 200...
dataactionmentor.com
September 28, 2025 at 6:42 AM
The amount you love someone is proportional to how often you Ghiblify their pictures.
July 29, 2025 at 4:59 PM
This week I look at agents.

I think this is a new way to build where we don’t intentionally build code-based software.

open.substack.com/pub/davidsj/...
July 1, 2025 at 4:34 PM
I don't usually share photos of my family on social media for good reason, but I'm happy to share these ones!
June 20, 2025 at 4:00 PM
This post encapsulates how I feel about the current state of LLMs and doomers etc. Really great read:

fly.io/blog/youre-a...
My AI Skeptic Friends Are All Nuts
My smartest friends have bananas arguments about LLM coding.
fly.io
June 6, 2025 at 5:07 PM
So when I've attended Snowflake summit before, I've usually written a blog post talking about the new features released, etc. Is someone going to do that this year, given I didn't go? 😊

#datasky #databs
June 6, 2025 at 8:24 AM
Reposted by David Jayatillake
It is possible to build machine learning systems which punch up instead of punching down.
A lot of people say generative AI shouldn't infringe on copyright. These researchers actually tried to do it. The result: an 8 terabyte dataset of text that's openly licensed or in the public domain & 7 B parameter model that performs as well as Meta's Llama 7B www.washingtonpost.com/politics/202...
Analysis | AI firms say they can’t respect copyright. These researchers tried.
A new effort using only openly licensed data may have implications on thorny policy disputes around copyright and AI
www.washingtonpost.com
June 6, 2025 at 1:52 AM
Reposted by David Jayatillake
Got a cool story about something in the data engineering space? You should 💯 submit it as a talk to Current 2025 in New Orleans 😁

Do it! Now! CfP is open until 15th June.

sessionize.com/current-2025...

(Pro-tip: you only need an abstract at this point; writing the talk can be later 😅)

#dataBS
June 5, 2025 at 8:59 AM
Reposted by David Jayatillake
At the London Data Practitioners Meetup with @pedramnavid.com @jayatillake.bsky.social @rittmananalytics.bsky.social and the London Dagster community
May 14, 2025 at 5:15 PM
Doctor’s orders 🫡
April 27, 2025 at 12:54 PM
I still think this is the biggest prize in AI. If Siri could actually do most things you do on a phone manually...

9to5mac.com/2025/04/22/s...
Siri’s new boss is already making big internal changes, per report - 9to5Mac
Siri’s new boss at Apple, Mike Rockwell, has reportedly wasted no time making big changes internally to the people building its assistant.
9to5mac.com
April 24, 2025 at 7:00 PM
Has anyone tried Llama 4 Maverick yet? How big a machine does it need to run locally?

@simonwillison.net
April 7, 2025 at 3:36 PM
Looks like Nintendo became the best at console FPS.
April 2, 2025 at 1:30 PM
Once again, we've devised a derogatory name for something many of us are doing: "Vibe coding".

Just like "Citizen Data Scientist", "Excel Data Analyst", and many other terms made to belittle by the supposed true artisans that came before.

open.substack.com/pub/davidsj/...
Vibe coder
Free like a puppy
open.substack.com
March 24, 2025 at 4:06 PM
Clickbench says @duckdb.org is a great analytics database

www.mooncake.dev/blog/clickbe...
Clickbench says Postgres is a great analytics database
Postgres is now top 10 in Clickbench with pg_mooncake.
www.mooncake.dev
March 14, 2025 at 6:13 PM
Reposted by David Jayatillake
I enabled the Cube AI API Slack App today, and I found a little treat 👀

Shout-out @mjirv.bsky.social and @jayatillake.bsky.social!

If you're curious how easy this is to set up, check out: cube.dev/blog/make-yo...
March 13, 2025 at 1:26 AM
A couple of weeks ago I wrote a post about @pola.rs Cloud, and as coincidence would have it, DeepSeek then releases smallpond - a way to use @duckdb.org scaled out.

It's pretty cool that an org has managed to build this, but is it the right way?

davidsj.substack.com/p/many-ducks...
Many ducks, a big pond
Splitting the query
davidsj.substack.com
March 4, 2025 at 6:07 PM
🗣️ It was a real honour to speak at the very first Forward Data Conference in Paris last November, not to mention being introduced with a special piano verse just for me! 🎹

www.youtube.com/watch?v=QNQW...
AI on Data - snake oil or actually useful?
YouTube video by ForwardDataConf
www.youtube.com
February 28, 2025 at 9:30 AM
🏭 @tobikodata.com released a great new feature for SQLMesh last week called blueprinting.

This is a feature that the dbt Labs community has been asking for for ages, and I’ve been wanting it too!

open.substack.com/pub/davidsj/...
Model Generation
A big feature release by SQLMesh
open.substack.com
February 26, 2025 at 4:48 PM
Reposted by David Jayatillake
New episode: “Semantic Layers: The Missing Link Between AI and Data” with @jayatillake.bsky.social .

We discuss how semantic layers bridge raw data and AI, achieving 100% accuracy for natural language queries, and what’s next for LLM-powered data pipelines.

🎧 https://techontherocks.show/14 🎧
Tech on the Rocks | Semantic Layers: The Missing Link Between AI and Data with David Jayatillake from Cube
In this episode, we chat with David Jayatillake, VP of AI at Cube, about semantic layers and their crucial role in making AI work reliably with data. We explore how semantic layers act as a bridge ...
techontherocks.show
February 20, 2025 at 7:19 PM
I've been interacting with r/dataengineering a bit recently, and I'm impressed by the size and engagement with the community there, but...

Folks on #databs / #datasky seem much less likely to just mouth off without thinking or reading! I think I much prefer what we have here on BlueSky.
February 20, 2025 at 4:19 PM
Reposted by David Jayatillake
Yep! And the new(ish) rust-based data engine also brought the ability to do multi-stage calculations.

Bonus - "Tesseract" is an amazing name for a Cube engine 😀
cube.dev/blog/introdu...
Introducing Next-Generation Data Modeling Engine - Cube Blog
Previewing multi-stage calculations and performance leaps in the SQL Planner
cube.dev
February 19, 2025 at 9:51 PM