To our surprise, we found GRUs to be far-and-away the most effective, and Transformers to be disappointingly ineffective.
Why? The combined F^T x B representation seems unstable for all non-GRU methods.
To our surprise, we found GRUs to be far-and-away the most effective, and Transformers to be disappointingly ineffective.
Why? The combined F^T x B representation seems unstable for all non-GRU methods.
In aggregate, we improve performance across all partially observed settings.
In aggregate, we improve performance across all partially observed settings.
We call the resultant family of methods: Behaviour Foundation Models with Memory.
We call the resultant family of methods: Behaviour Foundation Models with Memory.
We call these failure models *state* misidentification, and *task* misidentification.
Each inhibits performance in isolation; together they kill the model.
We call these failure models *state* misidentification, and *task* misidentification.
Each inhibits performance in isolation; together they kill the model.
Train them on expressive (s,a,s′) data and you'll get the optimal policy for *any* reward function in an env.
But, what if instead of states you have observations, as is almost always the case in practice?
Excited to share our new @rl-conference.bsky.social paper! 🧵
Train them on expressive (s,a,s′) data and you'll get the optimal policy for *any* reward function in an env.
But, what if instead of states you have observations, as is almost always the case in practice?
Excited to share our new @rl-conference.bsky.social paper! 🧵
Poster #6008
West Ballroom A-D
Friday 13th Dec 4:30-7:30pm
More details: neurips.cc/virtual/2024...
Poster #6008
West Ballroom A-D
Friday 13th Dec 4:30-7:30pm
More details: neurips.cc/virtual/2024...