Alexander Doria
banner
dorialexander.bsky.social
Alexander Doria
@dorialexander.bsky.social
LLM for the commons.
Pinned
Breaking: we release a fully synthetic generalist dataset for pretraining, SYNTH and two new SOTA reasoning models exclusively trained on it. Despite having seen only 200 billion tokens, Baguettotron is currently best-in-class in its size range. pleias.fr/blog/blogsyn...
I mean, it's a language model, how big should it be? 1 million parameters?
November 28, 2025 at 6:28 PM
DeepSeek just released a new state of the art math prover, DeepSeek-Math-V2, competitive with Google, OpenAI or ByteDance, while being a publicly documented open weight models. A few reading notes along the way:
November 27, 2025 at 3:41 PM
And a major open science release from Prime Intellect: they don’t stress it enough but SFT part is beyond post-training. This is a fully documented mid-training with tons of insights/gems on MoE training, asynchronous infra RL, deep research. storage.googleapis.com/intellect-3-...
November 27, 2025 at 7:47 AM
Not a fan so far of "sovereign" displacing "open" in all things AI/tech in the EU.
November 26, 2025 at 8:58 PM
And another social event on repeat:
>What are you doing?
>So we train from scratch.
>Ok but which models are you fine tuning
>From **scratch**. Zero, nihil, zilch.
November 26, 2025 at 7:47 PM
The threshold for consistent English/query understanding is now 3M parameters.
November 26, 2025 at 9:21 AM
YES. Main reason classic pretraining dominated for so long is just that you don’t have to think so much about the data or what elicits reasoning. It’s "here".

For Sutskever/Patel new podcast: www.dwarkesh.com/p/ilya-sutsk...
November 25, 2025 at 9:27 PM
As far as bubbles go, looks like multiple anti-AI movements are popping before Nvidia.
November 23, 2025 at 9:54 AM
For all the talk about code, I think +50% of my ChatGPT use is daily appliances.
November 22, 2025 at 1:52 PM
Actually an additional note on SYNTH: it might well be the fastest (pre-)training dataset ever created. Due to some major infrastructure issue, we had to reconstitute most of it in a handful of days.
November 21, 2025 at 7:22 PM
Almost coming to regret writing this paper: easily 90% of issues/complaints for no material benefit. Why classic non-synth open data can’t happen in AI.
Announcing the release of the official Common Corpus paper: a 20 page report detailing how we collected, processed and published 2 trillion tokens of reusable data for LLM pretraining arxiv.org/pdf/2506.01732
November 21, 2025 at 9:18 AM
Lol someone trying to sell me the creation of a Wikipedia page. I’ve seen enough as an admin to know it should *only* happen organically. Speedy deletion is far from the worst outcome.
November 20, 2025 at 9:58 PM
one week later, sorry to announce baguettotron has consistently climbed in popularity and prophecy is taking shape.
November 18, 2025 at 3:47 PM
We’re getting fanart now.
I can't help it. I am feeling overly absurd, today
November 17, 2025 at 8:42 PM
First successful fine tune of Baguettotron. And very on brand to see it’s about poetry.
November 17, 2025 at 2:58 PM
At some EU LLM thing and don’t really have to present myself: everyone knows Baguettotron.
November 16, 2025 at 9:59 PM
Still can’t believe i got the opportunity to beta test gemini 4. Model is wild.
November 16, 2025 at 9:33 PM
Nothing to do with AI, but this, this was an incredible novel. One of Borges’ favorite too.
November 16, 2025 at 6:02 PM
german man boarding the plane from france with no less than four baguette: here goes my potential customer target.
November 16, 2025 at 2:31 PM
now reading (1964 SF novel, but it’s really about synthetic environments)
November 15, 2025 at 3:59 PM
Since people were wondering what could be the use cases for Monad:
It's pretty good for text classification as well ngl. Half the size of Bert and can still do nearly as well
November 15, 2025 at 1:27 PM
Getting into pretraining has never been cheaper.
November 15, 2025 at 12:18 PM
Now a concept: vintage computer use model, distributed on disquette, only trained on classic core unix.
November 14, 2025 at 7:32 PM
Looking back, one of my main disappointment in LLM/AI research is seeing the non-commercial space shrinking, becoming more conservative, fragmented and less cooperative
November 14, 2025 at 5:10 PM
Apparently even Monad is not small enough.
Playing around with the PleIAs "smallest viable model" Monad, and realizing that with 4-bit quantization (storing 56 M parameters in ~27 MB) and a SuperDisk drive (to use the FD32MB format), you could turn it into a chat model that fits on a standard 3.5 inch diskette
November 13, 2025 at 8:46 PM