Avik Dey
banner
avikdey.bsky.social
Avik Dey
@avikdey.bsky.social
Mostly Data, ML, OSS & Society • Stop chasing Approximately Generated Illusions; focus on Specialized Small LMs • To understand it well enough, learn to explain it simply • Shadow self of https://linkedin.com/in/avik-dey, have a beard now
Still hasn’t read Ilya’s memo …
December 3, 2025 at 2:12 AM
Ilya finally answers the question: What did Ilya see?

“this disconnect between eval performance and actual real-world performance,”

Next time someone goes - LLMs beat ‘So & So’ Olympiad - just quote Ilya.
November 27, 2025 at 5:42 PM
Having faced this exact same repetitive issue since 2023, I would have laughed at this - if we didn’t have 1% of the GDP invested in this caricature of an “AI”.

www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 9:40 PM
Ilya appears to be progressively approaching the right conclusion. Remain confident that in time he will consolidate his insights from first 5 minutes and recognize that complex explanations are unnecessary when simpler ones suffice.

(screenshots not chronological)

www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 8:17 PM
Good to see research on what math always said - low-average performers that’s your LLM “employee”:

> This supports our assertion that the ceiling on LLM creativity (0.25) corresponds to the boundary between little-c and Pro-c human creative performance (Figure 6).

www.academia.edu/144621465/_T...
November 25, 2025 at 5:19 PM
“warm-up”: Under the guidance of an expert human the model was finally able to get the answer right when nudged towards it.

Not the model, not the prompt - still the human.

The amount of shilling these guys do, no wonder they can’t get anything serious built.

cdn.openai.com/pdf/4a25f921...
November 23, 2025 at 5:33 PM
Think they might have answered their own question … ?

bsky.app/profile/slas...
November 22, 2025 at 4:04 AM
If these Gemini 3 Pro benchmarks are accurate, time for OpenAI to sell to Microsoft. Microsoft won’t want their management team or their prolifically tweeting engineers, but I am sure most engineers would thrive if led by seasoned engineering management.

storage.googleapis.com/deepmind-med...
November 18, 2025 at 4:51 PM
Perfect prediction, even if I say so myself!

Actually their realization dawned a few weeks back, but these things take a little while to surface externally.

Image of tweet from bird site because I won’t link to it.
November 16, 2025 at 1:45 AM
From the bird site, the acceleration continues:
November 16, 2025 at 1:30 AM
The real star of the show is:
November 14, 2025 at 5:42 PM
Mensch goes on to dunk on pre training. This paragraph rhetorically sounds good but is both technically and ethically shallow. Data and pre training IS the entire foundation of the model. He discounts both to shift onus from AI companies to the “deployer”. Complete hogwash; he learnt well from Sam.
November 12, 2025 at 4:11 PM
Demo coming soon …

bsky.app/profile/avik...
November 5, 2025 at 4:06 PM
Every author writing like this should be required to rewrite abstracts in plain English and read it aloud to an audience of their peers, before they can publish it.

Summary: Conjectural with nice diagrams but no quantitative measures and ignores prior literature.

arxiv.org/pdf/2510.26745
November 3, 2025 at 4:14 PM
Karpathy’s tweet is a live demo of the learning loop he promotes. Consciously or not, he is channeling:

- Kolb: Experimental learning theory
- Feynman: Explain in your own words
- Dweck: Growth mindset scale

The medium is the message.
November 1, 2025 at 4:16 PM
She’s making the classic layman’s mistake of thinking DeepMind is synonymous with AI. If she had actually read even the first paragraph of their paper, she might have clued in it’s a great example of purely statistical machine learning, but that’s probably asking too much.

arxiv.org/pdf/2506.10772
October 30, 2025 at 4:02 PM
Today for some reason, I felt the urge to share this tweet posted back in 2018:
October 20, 2025 at 6:22 PM
Read this if you work with LLMs. For those of us who have been hands on, both under the hood and in front of the shiny bits, it’s been obvious from the get go.

Folks still refusing to acknowledge the obvious are invested in it, directly or indirectly.

For the rest of us it’s just another tool.
October 18, 2025 at 11:20 PM
If even Karpathy can’t get AI coding to work for him, are you willing to bet on it working for you? Your’s are IID you say? You will soon find out dimensionality means yours are OOD too.
October 13, 2025 at 9:48 PM
Nice paper. Observations:

- Verifier assisted code snippet optimization
- Before solutions are sub-optimal as are the after solutions
- Some of the paper’s commentary is contradictory
- If framework included chaos monkey to simulate real world, would these hold?

arxiv.org/abs/2510.061...
October 10, 2025 at 4:47 PM
All those words just to say - ‘If you view them thru an anthropomorphic lens, LLMs are just shallow replicas of human intelligence.’

Buddy - the rest of us knew.

(Not going link, but it’s on the bird site)
October 2, 2025 at 4:08 PM
Used to be Sunday breakfast staple with “luchi”, in my younger days.
August 29, 2025 at 8:18 PM
Something like this? Very popular in other parts of Asia too.

i.ytimg.com/vi/fGODsn6NR...
August 29, 2025 at 7:01 PM
Was always the plan. A lawsuit needs to be filed to save copies of all of those server logs indefinitely.

bsky.app/profile/tech...
August 26, 2025 at 5:39 PM
Yes, they are or they wouldn’t be redistricting while it’s still 5 years to next census.

Not a great idea to listen to ‘"center-left, corporate and GOP donor-funded nonprofit", which advocates for neoliberal policies and is staunchly opposed to Medicare for All.’

en.m.wikipedia.org/wiki/Third_W...
August 23, 2025 at 9:31 PM