Lightnews — Scholar-powered news

Reposted by David Jayatillake

Samuel Wong

@samuelwong.bsky.social

How big should your data team be?

Data teams are often oversized. A company of 200 people rarely needs 15+ data staff, usually 5% of org size is enough

dataactionmentor.com/knowledge-ba...

How big should your data team be?

Founders and CEOs are wondering if their data function is bloated and if they should replace everyone with AI agents. Data Leaders are scrambling to defend why they need a 15-people data team in a 200...

dataactionmentor.com

September 28, 2025 at 6:42 AM

David Jayatillake

@jayatillake.bsky.social

The amount you love someone is proportional to how often you Ghiblify their pictures.

July 29, 2025 at 4:59 PM

David Jayatillake

@jayatillake.bsky.social

This week I look at agents.

I think this is a new way to build where we don’t intentionally build code-based software.

open.substack.com/pub/davidsj/...

July 1, 2025 at 4:34 PM

David Jayatillake

@jayatillake.bsky.social

BERT and ERNIE! 😂

tracking.tldrnewsletter.com/CL0/https:%2...

China's biggest public AI drop since DeepSeek, Baidu's open source Ernie, is about to hit the market

Chinese internet search giant Baidu will open source its Ernie gen AI large language model as soon as this week, with uncertain consequences for the market.

tracking.tldrnewsletter.com

June 30, 2025 at 8:37 PM

David Jayatillake

@jayatillake.bsky.social

I don't usually share photos of my family on social media for good reason, but I'm happy to share these ones!

June 20, 2025 at 4:00 PM

David Jayatillake

@jayatillake.bsky.social

This post encapsulates how I feel about the current state of LLMs and doomers etc. Really great read:

fly.io/blog/youre-a...

My AI Skeptic Friends Are All Nuts

My smartest friends have bananas arguments about LLM coding.

fly.io

June 6, 2025 at 5:07 PM

David Jayatillake

@jayatillake.bsky.social

So when I've attended Snowflake summit before, I've usually written a blog post talking about the new features released, etc. Is someone going to do that this year, given I didn't go? 😊

#datasky #databs

June 6, 2025 at 8:24 AM

Reposted by David Jayatillake

Anil Dash

@anildash.com

It is possible to build machine learning systems which punch up instead of punching down.

nitasha tiku @nitasha.bsky.social · Jun 5

A lot of people say generative AI shouldn't infringe on copyright. These researchers actually tried to do it. The result: an 8 terabyte dataset of text that's openly licensed or in the public domain & 7 B parameter model that performs as well as Meta's Llama 7B www.washingtonpost.com/politics/202...

Analysis | AI firms say they can’t respect copyright. These researchers tried.

A new effort using only openly licensed data may have implications on thorny policy disputes around copyright and AI

www.washingtonpost.com

June 6, 2025 at 1:52 AM

Reposted by David Jayatillake

rmoff 🏃‍♂️🫖🥓

@rmoff.net

Got a cool story about something in the data engineering space? You should 💯 submit it as a talk to Current 2025 in New Orleans 😁

Do it! Now! CfP is open until 15th June.

sessionize.com/current-2025...

(Pro-tip: you only need an abstract at this point; writing the talk can be later 😅)

#dataBS

June 5, 2025 at 8:59 AM

Reposted by David Jayatillake

Mark Rittman

@markrittman.bsky.social

At the London Data Practitioners Meetup with @pedramnavid.com @jayatillake.bsky.social @rittmananalytics.bsky.social and the London Dagster community

May 14, 2025 at 5:15 PM

David Jayatillake

@jayatillake.bsky.social

Doctor’s orders 🫡

April 27, 2025 at 12:54 PM

David Jayatillake

@jayatillake.bsky.social

I still think this is the biggest prize in AI. If Siri could actually do most things you do on a phone manually...

9to5mac.com/2025/04/22/s...

Siri’s new boss is already making big internal changes, per report - 9to5Mac

Siri’s new boss at Apple, Mike Rockwell, has reportedly wasted no time making big changes internally to the people building its assistant.

9to5mac.com

April 24, 2025 at 7:00 PM

David Jayatillake

@jayatillake.bsky.social

Has anyone tried Llama 4 Maverick yet? How big a machine does it need to run locally?

@simonwillison.net

April 7, 2025 at 3:36 PM

David Jayatillake

@jayatillake.bsky.social

Looks like Nintendo became the best at console FPS.

April 2, 2025 at 1:30 PM

David Jayatillake

@jayatillake.bsky.social

Once again, we've devised a derogatory name for something many of us are doing: "Vibe coding".

Just like "Citizen Data Scientist", "Excel Data Analyst", and many other terms made to belittle by the supposed true artisans that came before.

open.substack.com/pub/davidsj/...

Vibe coder

Free like a puppy

open.substack.com

March 24, 2025 at 4:06 PM

David Jayatillake

@jayatillake.bsky.social

Clickbench says @duckdb.org is a great analytics database

www.mooncake.dev/blog/clickbe...

Clickbench says Postgres is a great analytics database

Postgres is now top 10 in Clickbench with pg_mooncake.

www.mooncake.dev

March 14, 2025 at 6:13 PM

Reposted by David Jayatillake

Arynnpost

@arynn.bsky.social

I enabled the Cube AI API Slack App today, and I found a little treat 👀

Shout-out @mjirv.bsky.social and @jayatillake.bsky.social!

If you're curious how easy this is to set up, check out: cube.dev/blog/make-yo...

March 13, 2025 at 1:26 AM

David Jayatillake

@jayatillake.bsky.social

A couple of weeks ago I wrote a post about @pola.rs Cloud, and as coincidence would have it, DeepSeek then releases smallpond - a way to use @duckdb.org scaled out.

It's pretty cool that an org has managed to build this, but is it the right way?

davidsj.substack.com/p/many-ducks...

Many ducks, a big pond

Splitting the query

davidsj.substack.com

March 4, 2025 at 6:07 PM

David Jayatillake

@jayatillake.bsky.social

🗣️ It was a real honour to speak at the very first Forward Data Conference in Paris last November, not to mention being introduced with a special piano verse just for me! 🎹

www.youtube.com/watch?v=QNQW...

AI on Data - snake oil or actually useful?

YouTube video by ForwardDataConf

www.youtube.com

February 28, 2025 at 9:30 AM

David Jayatillake

@jayatillake.bsky.social

🏭 @tobikodata.com released a great new feature for SQLMesh last week called blueprinting.

This is a feature that the dbt Labs community has been asking for for ages, and I’ve been wanting it too!

open.substack.com/pub/davidsj/...

Model Generation

A big feature release by SQLMesh

open.substack.com

February 26, 2025 at 4:48 PM

David Jayatillake

@jayatillake.bsky.social

🦆🦆🦆🦆 everywhere these days

count.co/blog/announc...

Keep canvases moving with DuckDB on the server

We're super excited to announce the general availability of DuckDB on the server. This is our way of dramatically speeding up query speed while reducing computer cost for customers, helping you drive ...

count.co

February 21, 2025 at 11:26 AM

Reposted by David Jayatillake

Tech on the Rocks

@totrrocks.bsky.social

New episode: “Semantic Layers: The Missing Link Between AI and Data” with @jayatillake.bsky.social .

We discuss how semantic layers bridge raw data and AI, achieving 100% accuracy for natural language queries, and what’s next for LLM-powered data pipelines.

🎧 https://techontherocks.show/14 🎧

Tech on the Rocks | Semantic Layers: The Missing Link Between AI and Data with David Jayatillake from Cube

In this episode, we chat with David Jayatillake, VP of AI at Cube, about semantic layers and their crucial role in making AI work reliably with data. We explore how semantic layers act as a bridge ...

techontherocks.show

February 20, 2025 at 7:19 PM

David Jayatillake

@jayatillake.bsky.social

I've been interacting with r/dataengineering a bit recently, and I'm impressed by the size and engagement with the community there, but...

Folks on #databs / #datasky seem much less likely to just mouth off without thinking or reading! I think I much prefer what we have here on BlueSky.

February 20, 2025 at 4:19 PM

Reposted by David Jayatillake

Arynnpost

@arynn.bsky.social

Yep! And the new(ish) rust-based data engine also brought the ability to do multi-stage calculations.

Bonus - "Tesseract" is an amazing name for a Cube engine 😀
cube.dev/blog/introdu...

Introducing Next-Generation Data Modeling Engine - Cube Blog

Previewing multi-stage calculations and performance leaps in the SQL Planner

cube.dev

February 19, 2025 at 9:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news