Scott Jeen
enjeeneer.io
Scott Jeen
@enjeeneer.io
PhD Student at Cambridge University. AI and reinforcement learning.
It's dedicated to the late Barry Sealey CBE and Helen Sealey whose funding of my earlier postgraduate studies opened the door to a PhD. I'm hugely indebted to them for their kindness and generosity.
September 3, 2025 at 9:01 PM
More detail in the paper, at the project page or in the repo!

Paper: arxiv.org/abs/2506.15446
Project Page: enjeeneer.io/projects/bfm...
Code: github.com/enjeeneer/bf...

with Tom Bewley and Jon Cullen.
Zero-Shot Reinforcement Learning Under Partial Observability
Recent work has shown that, under certain assumptions, zero-shot reinforcement learning (RL) methods can generalise to any unseen task in an environment after reward-free pre-training. Access to Marko...
arxiv.org
July 31, 2025 at 9:01 PM
We explored different sequence models: Transformers, GRUs, LSTMs, S4d, S5.

To our surprise, we found GRUs to be far-and-away the most effective, and Transformers to be disappointingly ineffective.

Why? The combined F^T x B representation seems unstable for all non-GRU methods.
July 31, 2025 at 9:01 PM
We run experiments on amended ExORL environments with different types of partial observability. In particular, we explore partially observed states, and partially observed changes in dynamics.

In aggregate, we improve performance across all partially observed settings.
July 31, 2025 at 9:01 PM
We solve both failure modes by replacing BFMs' standard MLPs with sequence models that condition on trajectories of observations and actions.

We call the resultant family of methods: Behaviour Foundation Models with Memory.
July 31, 2025 at 9:01 PM
When Behaviour Foundation Models are fed unreliable observations, rather than states, they fail in two predictable ways.

We call these failure models *state* misidentification, and *task* misidentification.

Each inhibits performance in isolation; together they kill the model.
July 31, 2025 at 9:01 PM
It all feels a bit hacky though, yeh.
January 21, 2025 at 4:31 PM
- It's probs not doing pure policy exploration in the classical RL sense. The prior provided by pre-training should reduce the effective search space hugely. I could imagine that small amounts of exploration on top of the reasoning traces provided by the base model could be enough to get signal.
January 21, 2025 at 4:31 PM
I don't disagree, but a couple of possible explanations:
- Fig 3 could imply that it learns to solve questions that require shorter reasoning chains first, before moving to those that require longer reasoning chains.
January 21, 2025 at 4:31 PM
Thank you for this Jane, it's beautiful and heart-wrenching. I didn't know Felix well, but my few interactions with him always left me awed by his all-round brilliance. My thoughts are with you and everyone who knew him more closely. ❤️
January 3, 2025 at 3:34 PM
NeurIPS revolves around demonstration. This year’s @rl-conference.bsky.social revolved around conversation. I much prefer the latter.
December 17, 2024 at 2:54 PM
My bad for messing up the photo!
December 10, 2024 at 4:57 PM