Ramon Astudillo
banner
ramon-astudillo.bsky.social
Ramon Astudillo
@ramon-astudillo.bsky.social
Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary.
ramon.astudillo.com
They are actually closest to a general purpose computer! See e.g. @karpathy.bsky.social 's Software 3.0 view. IBM defines this as "Generative Computer", I prefer "Neural Computer". The idea is basically what I thought would be the GPT4 paper's title (my title was less boring)

x.com/RamonAstudil...
November 7, 2025 at 5:08 PM
News from the LLM companies vs Neuro-Software companies front. LLM ones own the upstream and can optimize training to their products. NS companies have domain knowledge but more limitations on applying it. However ... they can optimize for runtime infrastructure, which they own!. Maybe this is key?
October 31, 2025 at 3:51 PM
Wait is this true? LOL

x.com/m2saxon/stat...
October 28, 2025 at 4:18 PM
Full circle
October 14, 2025 at 9:49 PM
You think I am exaggerating but look at this other one and tell me it does not look evil
October 6, 2025 at 7:01 PM
Can't wait until the Brooklyn Tower is fully functional
October 4, 2025 at 4:51 PM
When you start writing complex instructions for a Coding Agent and halfway, you realize you didn't fully understand what you wanted and then have to go back and rewrite prior sections ... I'd argue that is programming at its core. xkcd568 should be both a necessary and sufficient condition.
October 3, 2025 at 3:26 PM
I don't have Instagram, so here is a nice day at Central Park. Also the weather was great. Seasons change, not warm, not cold (reminded me of Asturias). The new CP pool was already closed and preparing itself to be an ice skating ring
September 21, 2025 at 12:17 AM
If domain adaptation provides a moat against LLM providers is a million (billion) $ question. Watching Cursor's trajectory is probably one of the best ways to answer it
September 17, 2025 at 4:24 PM
September 6, 2025 at 6:57 PM
This is IMO the right take
August 10, 2025 at 11:19 AM
LxMLS dinner at Casa do Alentejo
July 24, 2025 at 12:13 PM
Labs team and part of or org. for LxMLS2025. Starting our ... 🤔 ... 15th edition!
July 19, 2025 at 1:15 PM
The Watson lab has a bunch and they don't back off even in front of a bus. Here is one looking for trouble
May 23, 2025 at 7:22 PM
I think this table was missing
April 5, 2025 at 8:49 PM
"The future immigrant lodging house" Judge magazine 1890
February 22, 2025 at 2:46 PM
The party continues

+7 AIME2024
+27 AIME2025

by doing s1 swapping Gemini Flash Thinking by DeepSeek-R1 as teacher x.com/Muennighoff/...
February 12, 2025 at 1:46 AM
We either are about to discover something amazing that will receive a brotastic name like "abstraction-hyper-grokking" or we have serious case of test poisoning, maybe two fold poisoning. Here are LIMO/s1 results side to side. Same base model, 800/1K SFT on human/LongCoT-Machine highly curated data.
February 9, 2025 at 8:33 PM
👆unlike s1's magic 1K samples, here they did not run decontamination against test. Also, the CoTs are human and not Gemini Flash Thinking. Also this happens when you swap Qwen-2.5-32B-instruct (MATH ~80s) with Qwen-1.5-chat (MATH ~35s)
February 9, 2025 at 8:13 PM
🚨Important Update: Third link points to a malicious source
February 9, 2025 at 2:00 PM
AIME 2025 results are here! matharena.ai Its a pity that the provide accuracy, not pass@1 so not fully comparable with e.g. DS-R1 2024 scores, but distilled models seem to hold surprisingly well ... although there is some suspicion of memorization going on, as indicated by @dimitrisp.bsky.social.
February 8, 2025 at 12:17 PM
How on earth can this be on my timeline. I have a curated list for ML.
February 7, 2025 at 6:22 PM
"Deep Research" is O3 RL-ed to use tools for data analysis. It can take 5-30min to finish, not sure how much of it is thought, but probably many tool calls? This has to be interesting to watch. Great scores in "Humanities Last Exam". Given what happened with "Frontier Math" ... skepticism is good,
February 3, 2025 at 2:45 AM
"Humanities Last Exam" I really hope its not agi.safe.ai
January 23, 2025 at 6:40 PM
One additional consequence is the high risk of test contamination. Given that these datasets are crawlable (with some caveats I think)

(from @peterhenderson.bsky.social 's screen caps)
January 19, 2025 at 8:01 PM