Lightnews — Scholar-powered news

Ramon Astudillo

@ramon-astudillo.bsky.social

6.2K followers 320 following 1.5K posts

Principal Research Scientist at IBM Research AI in New York. Speech, Formal/Natural Language Processing. Currently LLM post-training, structured SDG and RL. Opinions my own and non stationary.
ramon.astudillo.com

Posts Replies Media Videos

Ramon Astudillo

@ramon-astudillo.bsky.social

They are actually closest to a general purpose computer! See e.g. @karpathy.bsky.social 's Software 3.0 view. IBM defines this as "Generative Computer", I prefer "Neural Computer". The idea is basically what I thought would be the GPT4 paper's title (my title was less boring)

x.com/RamonAstudil...

November 7, 2025 at 5:08 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

News from the LLM companies vs Neuro-Software companies front. LLM ones own the upstream and can optimize training to their products. NS companies have domain knowledge but more limitations on applying it. However ... they can optimize for runtime infrastructure, which they own!. Maybe this is key?

October 31, 2025 at 3:51 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

Wait is this true? LOL

x.com/m2saxon/stat...

October 28, 2025 at 4:18 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

Full circle

October 14, 2025 at 9:49 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

You think I am exaggerating but look at this other one and tell me it does not look evil

October 6, 2025 at 7:01 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

Can't wait until the Brooklyn Tower is fully functional

October 4, 2025 at 4:51 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

When you start writing complex instructions for a Coding Agent and halfway, you realize you didn't fully understand what you wanted and then have to go back and rewrite prior sections ... I'd argue that is programming at its core. xkcd568 should be both a necessary and sufficient condition.

xkcd568: You'll never find a programming language that frees you from the burden of clarifying your ideas

October 3, 2025 at 3:26 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

I don't have Instagram, so here is a nice day at Central Park. Also the weather was great. Seasons change, not warm, not cold (reminded me of Asturias). The new CP pool was already closed and preparing itself to be an ice skating ring

Central Park upper north-east corner and Summer to Fall season change.

September 21, 2025 at 12:17 AM

Ramon Astudillo

@ramon-astudillo.bsky.social

If domain adaptation provides a moat against LLM providers is a million (billion) $ question. Watching Cursor's trajectory is probably one of the best ways to answer it

September 17, 2025 at 4:24 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

September 6, 2025 at 6:57 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

This is IMO the right take

August 10, 2025 at 11:19 AM

Ramon Astudillo

@ramon-astudillo.bsky.social

LxMLS dinner at Casa do Alentejo

July 24, 2025 at 12:13 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

Labs team and part of or org. for LxMLS2025. Starting our ... 🤔 ... 15th edition!

July 19, 2025 at 1:15 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

The Watson lab has a bunch and they don't back off even in front of a bus. Here is one looking for trouble

May 23, 2025 at 7:22 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

I think this table was missing

April 5, 2025 at 8:49 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

"The future immigrant lodging house" Judge magazine 1890

February 22, 2025 at 2:46 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

The party continues

+7 AIME2024
+27 AIME2025

by doing s1 swapping Gemini Flash Thinking by DeepSeek-R1 as teacher x.com/Muennighoff/...

February 12, 2025 at 1:46 AM

Ramon Astudillo

@ramon-astudillo.bsky.social

We either are about to discover something amazing that will receive a brotastic name like "abstraction-hyper-grokking" or we have serious case of test poisoning, maybe two fold poisoning. Here are LIMO/s1 results side to side. Same base model, 800/1K SFT on human/LongCoT-Machine highly curated data.

February 9, 2025 at 8:33 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

👆unlike s1's magic 1K samples, here they did not run decontamination against test. Also, the CoTs are human and not Gemini Flash Thinking. Also this happens when you swap Qwen-2.5-32B-instruct (MATH ~80s) with Qwen-1.5-chat (MATH ~35s)

February 9, 2025 at 8:13 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

🚨Important Update: Third link points to a malicious source

February 9, 2025 at 2:00 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

AIME 2025 results are here! matharena.ai Its a pity that the provide accuracy, not pass@1 so not fully comparable with e.g. DS-R1 2024 scores, but distilled models seem to hold surprisingly well ... although there is some suspicion of memorization going on, as indicated by @dimitrisp.bsky.social.

February 8, 2025 at 12:17 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

How on earth can this be on my timeline. I have a curated list for ML.

February 7, 2025 at 6:22 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

"Deep Research" is O3 RL-ed to use tools for data analysis. It can take 5-30min to finish, not sure how much of it is thought, but probably many tool calls? This has to be interesting to watch. Great scores in "Humanities Last Exam". Given what happened with "Frontier Math" ... skepticism is good,

February 3, 2025 at 2:45 AM

Ramon Astudillo

@ramon-astudillo.bsky.social

"Humanities Last Exam" I really hope its not agi.safe.ai

January 23, 2025 at 6:40 PM

Ramon Astudillo

@ramon-astudillo.bsky.social

One additional consequence is the high risk of test contamination. Given that these datasets are crawlable (with some caveats I think)

(from @peterhenderson.bsky.social 's screen caps)

January 19, 2025 at 8:01 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news