🥈 Spreading science over hype in #ML & #NLP
Proud shareLM💬 Donor
@IBMResearch & @MIT_CSAIL
There are atoms, physics, you are a tiny spec on earth etc.
You need a model that abstracts
There are atoms, physics, you are a tiny spec on earth etc.
You need a model that abstracts
Presented in his humble way, Rich Sutton shares his vision of what AI needs
General, experiential, discovers its own abstractions and not bitter🤢
#NeurIPS2025 #NeurIPS
🤖📈🧠
Presented in his humble way, Rich Sutton shares his vision of what AI needs
General, experiential, discovers its own abstractions and not bitter🤢
#NeurIPS2025 #NeurIPS
🤖📈🧠
They already interact and use more compute
Yes, some scenarios require learning conflicting things (e.g. personalization)
Ok, let's start training models that fit our needs, but also share some of this knowledge across them?
They already interact and use more compute
Yes, some scenarios require learning conflicting things (e.g. personalization)
Ok, let's start training models that fit our needs, but also share some of this knowledge across them?
LLMs do not learn from explicit corrections
LLMs do not learn from being told the answer
LLMs do not learn from being shown how to solve it
We study Machine Learning, these are opportunities!
A gold mine of research.
LLMs do not learn from explicit corrections
LLMs do not learn from being told the answer
LLMs do not learn from being shown how to solve it
We study Machine Learning, these are opportunities!
A gold mine of research.
⚡️BzZzZz⚡️
"Hey dude,..."
Would you press the button again?
Would an LLM?
Evolving LLMs, diverse open LLMs, and their evaluation are on my mind.
Before I share more, I encourage you to say hi here or in #NeurIPS 🤖📈🧠
⚡️BzZzZz⚡️
"Hey dude,..."
Would you press the button again?
Would an LLM?
Evolving LLMs, diverse open LLMs, and their evaluation are on my mind.
Before I share more, I encourage you to say hi here or in #NeurIPS 🤖📈🧠
Explore theory of mind, game intelligence, and multi-agent LLMs in interactive game environments.
🗓 Sunday, December 7
⏰ 8:00–10:45 AM
📍 San Diego Convention Center, Ballroom 6CF
🤖📈🧠
Explore theory of mind, game intelligence, and multi-agent LLMs in interactive game environments.
🗓 Sunday, December 7
⏰ 8:00–10:45 AM
📍 San Diego Convention Center, Ballroom 6CF
🤖📈🧠
See you soon in BabyLM (emnlp)
See you soon in BabyLM (emnlp)
3x over JPEG\PNG etc.
6x Zlib, gzip etc.
How?
We all know they provide a probability over data, which is all classical compression needs
(arithmetic coding, see below)
Understanding is compressing, but this time not by the weights themselves
🤖📈🧠
#AI #compress #data
And these are shifting quite rapidly at a certain part in training
And these are shifting quite rapidly at a certain part in training
So crosscoders map activations into a sparse representations and to decode those back into the activations (classic compress decompress).
A single crosscoder is then trained to map activations of all pretrain checkpoints, creating a shared space
So crosscoders map activations into a sparse representations and to decode those back into the activations (classic compress decompress).
A single crosscoder is then trained to map activations of all pretrain checkpoints, creating a shared space
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
2 papers find:
There are phase transitions where features emerge and stay throughout learning
🤖📈🧠
alphaxiv.org/pdf/2509.17196
@amuuueller.bsky.social @abosselut.bsky.social
alphaxiv.org/abs/2509.05291
But also that plans, even bad ones help LLMs' and humans performance (but slow them down)
But also that plans, even bad ones help LLMs' and humans performance (but slow them down)
arxiv.org/abs/2509.18632
@nbalepur.bsky.social
arxiv.org/abs/2509.18632
@nbalepur.bsky.social
and they fail😆
They show that humans are bad at predicting what is helpful, so are reward models (all close to chance).
Reward models don't even predict what helps LLMs
RL🤔
🤖📈🧠
#AI #LLM
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?
The writing guide:
docs.google.com/document/d/1...
@iclr_conf
writing
Know anyone who needs tips?
Want a graph checklist?
Know any good tips you wanna add?
The writing guide:
docs.google.com/document/d/1...
They also foresee that the amount of unpaid labour would continue to grow, with the demand for data.
arxiv.org/pdf/2504.12427
They also foresee that the amount of unpaid labour would continue to grow, with the demand for data.
arxiv.org/pdf/2504.12427
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
Nikhil Kandpal & Colin Raffel calculate a really low bar for how much it would cost to produce LLM training data with 3.8$\h
Well, several scales more than the compute.
Luckily (?), companies don't pay for the data
🤖📈🧠
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791
With 10K words, mapping to modern word (when applicable)
There are so many fascinating questions out there
www.arxiv.org/abs/2508.15791
As support, the wrong answer is highly correlated with the right answer, so most of the signal comes from the sentence and form, not knowledge.
As support, the wrong answer is highly correlated with the right answer, so most of the signal comes from the sentence and form, not knowledge.
For example, negative answers can be reranked among them and change whether the right answer is picked or accuracy ignores a 49-51 confidence.
For example, negative answers can be reranked among them and change whether the right answer is picked or accuracy ignores a 49-51 confidence.
🔻(log)probability of the right answer
🔻Probability of the right answer normalized by the probability of the rest of the answers
🔻A metric such as accuracy or Brier
Each step gets us further from next token pred.
🔻(log)probability of the right answer
🔻Probability of the right answer normalized by the probability of the rest of the answers
🔻A metric such as accuracy or Brier
Each step gets us further from next token pred.