🥈 Spreading science over hype in #ML & #NLP
Proud shareLM💬 Donor
@IBMResearch & @MIT_CSAIL
Presented in his humble way, Rich Sutton shares his vision of what AI needs
General, experiential, discovers its own abstractions and not bitter🤢
#NeurIPS2025 #NeurIPS
🤖📈🧠
Presented in his humble way, Rich Sutton shares his vision of what AI needs
General, experiential, discovers its own abstractions and not bitter🤢
#NeurIPS2025 #NeurIPS
🤖📈🧠
LLMs do not learn from explicit corrections
LLMs do not learn from being told the answer
LLMs do not learn from being shown how to solve it
We study Machine Learning, these are opportunities!
A gold mine of research.
LLMs do not learn from explicit corrections
LLMs do not learn from being told the answer
LLMs do not learn from being shown how to solve it
We study Machine Learning, these are opportunities!
A gold mine of research.
⚡️BzZzZz⚡️
"Hey dude,..."
Would you press the button again?
Would an LLM?
Evolving LLMs, diverse open LLMs, and their evaluation are on my mind.
Before I share more, I encourage you to say hi here or in #NeurIPS 🤖📈🧠
⚡️BzZzZz⚡️
"Hey dude,..."
Would you press the button again?
Would an LLM?
Evolving LLMs, diverse open LLMs, and their evaluation are on my mind.
Before I share more, I encourage you to say hi here or in #NeurIPS 🤖📈🧠
Explore theory of mind, game intelligence, and multi-agent LLMs in interactive game environments.
🗓 Sunday, December 7
⏰ 8:00–10:45 AM
📍 San Diego Convention Center, Ballroom 6CF
🤖📈🧠
Explore theory of mind, game intelligence, and multi-agent LLMs in interactive game environments.
🗓 Sunday, December 7
⏰ 8:00–10:45 AM
📍 San Diego Convention Center, Ballroom 6CF
🤖📈🧠
Kudo.
There are now datasets of
over 4.5m chats open for research and all
in the same format (shareLM)!
huggingface.co/datasets/sha...
t\h
@msheshera.bsky.social
Kudo.
There are now datasets of
over 4.5m chats open for research and all
in the same format (shareLM)!
huggingface.co/datasets/sha...
t\h
@msheshera.bsky.social
AAT-TTA TAT-?
In context learning emerge outside language wonderful finding
AAT-TTA TAT-?
In context learning emerge outside language wonderful finding
See you soon in BabyLM (emnlp)
See you soon in BabyLM (emnlp)
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
LLMs learn from vastly more data than humans ever experience. BabyLM challenges this paradigm by focusing on developmentally plausible data
We extend this effort to 45 new languages!
We release this pipeline and welcome new contributions!
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
We release this pipeline and welcome new contributions!
Website: babylm.github.io/babybabellm/
Paper: arxiv.org/pdf/2510.10159
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
Here’s the proof! 𝐁𝐚𝐛𝐲𝐁𝐚𝐛𝐞𝐥𝐋𝐌 is the first Multilingual Benchmark of Developmentally Plausible Training Data available for 45 languages to the NLP community 🎉
arxiv.org/abs/2510.10159
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.
Join us in Montréal! @colmweb.org
🗓️ October 10th, Room 518C
🔹 Invited talks from @sarah-nlp.bsky.social John Hewitt @amuuueller.bsky.social @kmahowald.bsky.social
🔹 Paper presentations and posters
🔹 Closing roundtable discussion.
Join us in Montréal! @colmweb.org
3x over JPEG\PNG etc.
6x Zlib, gzip etc.
How?
We all know they provide a probability over data, which is all classical compression needs
(arithmetic coding, see below)
Understanding is compressing, but this time not by the weights themselves
🤖📈🧠
#AI #compress #data
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
and they fail😆
They show that humans are bad at predicting what is helpful, so are reward models (all close to chance).
Reward models don't even predict what helps LLMs
RL🤔
🤖📈🧠
#AI #LLM
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?
The writing guide:
docs.google.com/document/d/1...
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?
The writing guide:
docs.google.com/document/d/1...
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791
For many reasons such as domain, mismatch between current abilities and what post training unfolds, "emergence" etc.
A big factor is that next token prediction != choice comparison != accuracy
www.alphaxiv.org/abs/2406.04391
For many reasons such as domain, mismatch between current abilities and what post training unfolds, "emergence" etc.
A big factor is that next token prediction != choice comparison != accuracy
www.alphaxiv.org/abs/2406.04391
Still, LLMs appear to consistently follow the values of secular\rational people who strive for self-expression (sounds like me😅)
To show it they collect and release
200K human-model chats+feedback, 5 languages and 21 LLMs
🤖📈🧠
Still, LLMs appear to consistently follow the values of secular\rational people who strive for self-expression (sounds like me😅)
To show it they collect and release
200K human-model chats+feedback, 5 languages and 21 LLMs
🤖📈🧠
Remember, exciting questions drive science, exciting answers follow.
Setting the right goal may make all their sota chasing worthwhile.
Make an insightful dataset, lead by evaluation
🤖📈🧠
Remember, exciting questions drive science, exciting answers follow.
Setting the right goal may make all their sota chasing worthwhile.
Make an insightful dataset, lead by evaluation
🤖📈🧠
Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! no hand-holding! 🙌📈🚀
www.youtube.com/watch?v=qW8S...
Gist: autonomous post-training from conversational signals for LLM bootstrapping ... look ma, no annotations! no hand-holding! 🙌📈🚀
www.youtube.com/watch?v=qW8S...
Ones that trick the reviewers but do not raise our scores?
Proposals?
Ones that trick the reviewers but do not raise our scores?
Proposals?