Paper: arxiv.org/abs/2506.15446
Project Page: enjeeneer.io/projects/bfm...
Code: github.com/enjeeneer/bf...
with Tom Bewley and Jon Cullen.
Paper: arxiv.org/abs/2506.15446
Project Page: enjeeneer.io/projects/bfm...
Code: github.com/enjeeneer/bf...
with Tom Bewley and Jon Cullen.
To our surprise, we found GRUs to be far-and-away the most effective, and Transformers to be disappointingly ineffective.
Why? The combined F^T x B representation seems unstable for all non-GRU methods.
To our surprise, we found GRUs to be far-and-away the most effective, and Transformers to be disappointingly ineffective.
Why? The combined F^T x B representation seems unstable for all non-GRU methods.
In aggregate, we improve performance across all partially observed settings.
In aggregate, we improve performance across all partially observed settings.
We call the resultant family of methods: Behaviour Foundation Models with Memory.
We call the resultant family of methods: Behaviour Foundation Models with Memory.
We call these failure models *state* misidentification, and *task* misidentification.
Each inhibits performance in isolation; together they kill the model.
We call these failure models *state* misidentification, and *task* misidentification.
Each inhibits performance in isolation; together they kill the model.
- Fig 3 could imply that it learns to solve questions that require shorter reasoning chains first, before moving to those that require longer reasoning chains.
- Fig 3 could imply that it learns to solve questions that require shorter reasoning chains first, before moving to those that require longer reasoning chains.