Avik Dey
banner
avikdey.bsky.social
Avik Dey
@avikdey.bsky.social
Mostly Data, ML, OSS & Society • Stop chasing Approximately Generated Illusions; focus on Specialized Small LMs • To understand it well enough, learn to explain it simply • Shadow self of https://linkedin.com/in/avik-dey, have a beard now
Pinned
Alignment isnt only thing LLMs are faking. Reasoning is another one that they are good at faking. Reading paper on LLM performance on reasoning tasks of doctors. Just started reading but either going to be:
1. Memorization or
2. Priming or
2. Confirmation prompting

www.anthropic.com/research/ali...
Alignment faking in large language models
A paper from Anthropic's Alignment Science team on Alignment Faking in AI large language models
www.anthropic.com
Still hasn’t read Ilya’s memo …
December 3, 2025 at 2:12 AM
OS level automation is brittle even at its best. Deterministic frozen workflows break more easily with even small changes to the environment and because it’s pre-frozen there is no chance of recovery at runtime, requiring extensive human intervention.

en.wikipedia.org/wiki/Robotic...
December 2, 2025 at 5:10 PM
Restated: IBM CEO says AI has no sustainable path to pay off infrastructure costs, forget making a profit
December 2, 2025 at 4:30 PM
Pathetic mimicry of the brain was not enough, they had to go for the soul too?
Amanda Askell has confirmed the soul document is indeed a real thing they trained Claude on. x.com/AmandaAskell...
December 2, 2025 at 12:03 AM
Crucial distinction: in LLMs training data doesn’t just inform the model - it is the model. The model initializes token path specific pattern probability densely - it’s like training trillions of ML models where each pattern is instantiated into its own tiny ML model.

And that’s also why it fails.
"Data are no longer things to be accounted for by a theoretical model...but rather inputs to the process of creating models". Many in LLM-ML don't care about the problems they are actually building models of: "the nature of languages...how we work with language...and the specific contexts [of use]."
What makes something data? Some thoughts on that question, and how answers to it help us understand AI hype:

medium.com/@emilymenonb...
December 1, 2025 at 7:13 PM
Reposted by Avik Dey
Three years into the generative-AI wave, demand for the technology seems surprisingly flimsy
Investors expect AI use to soar. That’s not happening
Recent surveys point to flatlining business adoption
econ.st
November 29, 2025 at 9:20 PM
Hey Apple,

Please stop reminding me to“Take a moment to reflect in your journal.”

My memory’s great, thank you. And when I am eventually demented, reading journal entries ain’t going to help even if it’s still possible to read.
November 29, 2025 at 6:25 PM
Ilya finally answers the question: What did Ilya see?

“this disconnect between eval performance and actual real-world performance,”

Next time someone goes - LLMs beat ‘So & So’ Olympiad - just quote Ilya.
November 27, 2025 at 5:42 PM
At tiny scale - a fun experiment. At data center scale - silicon swiss cheese.

Also, see en.wikipedia.org/wiki/Project...
November 27, 2025 at 3:28 PM
Proxying the Apple byte - are we?

Amateur move guys.
November 26, 2025 at 1:37 AM
Having faced this exact same repetitive issue since 2023, I would have laughed at this - if we didn’t have 1% of the GDP invested in this caricature of an “AI”.

www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 9:40 PM
Ilya appears to be progressively approaching the right conclusion. Remain confident that in time he will consolidate his insights from first 5 minutes and recognize that complex explanations are unnecessary when simpler ones suffice.

(screenshots not chronological)

www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 8:17 PM
Good to see research on what math always said - low-average performers that’s your LLM “employee”:

> This supports our assertion that the ceiling on LLM creativity (0.25) corresponds to the boundary between little-c and Pro-c human creative performance (Figure 6).

www.academia.edu/144621465/_T...
November 25, 2025 at 5:19 PM
Any PhD who endorses that a LLM constitutes “PhD level” intelligence is at minimum engaging in a questionable use of their academic authority. These endorsements function less as rigorous assessments and more as signal that the symbolism conferred by their credential is - available for rent.
Deeply absurd. This Google PDF published on a blog (arxiv, not peer reviewed) claims an LLM is "PhD level" but in most cases the MAJORITY of reference URLs were invalid or inaccessible.

A PhD sitting down and just fabricating >50% of sources = career ending

arxiv.org/abs/2511.11597
November 24, 2025 at 9:39 PM
They were convinced “AI“ would rewrite it all in a week and ship by end of that month, the ‘year or two’ estimate was just sandbagging so they could pose as 100x devs.
November 24, 2025 at 5:09 AM
“warm-up”: Under the guidance of an expert human the model was finally able to get the answer right when nudged towards it.

Not the model, not the prompt - still the human.

The amount of shilling these guys do, no wonder they can’t get anything serious built.

cdn.openai.com/pdf/4a25f921...
November 23, 2025 at 5:33 PM
Think they might have answered their own question … ?

bsky.app/profile/slas...
November 22, 2025 at 4:04 AM
The problem with most financial analysis of Nvidia’s quarterly performance, is these folks don’t seem to understand data center hardware lead times and revenue recognition cycle.
November 20, 2025 at 6:36 AM
Great article with learned insights - the best kind.

Unfortunately, this is a societal failure. Tech didn’t invent loneliness, it offered a new way to cope with it - in an empathetic echo chamber.

We are failing the kids. Others too, but mostly it’s the kids that I worry about.
I agree that emotional addiction to chatbots is the number one risk of AI today. Here is a gift link to an important OpEd in the NYTimes:
www.nytimes.com/2025/11/17/o...
Opinion | The Sad and Dangerous Reality Behind ‘Her’
www.nytimes.com
November 20, 2025 at 6:10 AM
You watch a video of a professor from a random internet post and are filled with regret because you didn’t have the opportunity to learn from him in person:

en.wikipedia.org/wiki/Ramamur...
19. Quantum Mechanics I: The key experiments and wave-particle duality
YouTube video by YaleCourses
youtu.be
November 19, 2025 at 6:16 AM
Smaller bag, same toss.
Nvidia and Microsoft will invest up to $15 billion in OpenAI competitor Anthropic. Anthropic, in turn, said it would buy $30 billion of compute capacity from Microsoft Azure and use advanced AI chips supplied by Nvidia.
Nvidia, Microsoft Pour $15 Billion Into Anthropic for New AI Alliance
Anthropic also commits to purchase $30 billion from Microsoft’s cloud computing business Azure.
on.wsj.com
November 18, 2025 at 7:40 PM
For ancillary text based foo foo services or core financial services? I am have a hard time believing that their engineers, a few of who I know, would sign off on this integration - but leadership prevailed?
November 18, 2025 at 7:38 PM
Don’t worry about it this quarter - they have enough to prop it up.

But next quarter you should be terrified.
November 18, 2025 at 7:22 PM
If these Gemini 3 Pro benchmarks are accurate, time for OpenAI to sell to Microsoft. Microsoft won’t want their management team or their prolifically tweeting engineers, but I am sure most engineers would thrive if led by seasoned engineering management.

storage.googleapis.com/deepmind-med...
November 18, 2025 at 4:51 PM
I too would like my taxpayer backed trillion dollar fantasy fund. Why should Sama have all the fun?
Anthropic CEO Dario Amodei thinks AI could help find cures for most cancers, prevent Alzheimer’s, and even double the human lifespan. cbsn.ws/4oRZ8Nm
November 18, 2025 at 6:50 AM