Kaustubh Sridhar
kaustubhsridhar.bsky.social
Kaustubh Sridhar
@kaustubhsridhar.bsky.social
Research Scientist at Google Deepmind.

Prev: @UPenn @Amazon @IITBombay

http://kaustubhsridhar.github.io/
REGENT is far from perfect.

It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).

Few failed rollouts:
December 14, 2024 at 9:50 PM
Here is a qualitative visualization of deploying REGENT in the unseen atari-pong environment.
December 14, 2024 at 9:50 PM
While REGENT’s design choices are aimed at generalization, its gains are not limited to unseen environments: it even performs better than current generalist agents when deployed within the pre-training environments.
December 14, 2024 at 9:50 PM
In the four unseen ProcGen environments, REGENT also outperforms the only other generalist agent, MTT, that can generalize to unseen environments via in-context learning. REGENT does so with an OOM less pretraining data and 1/3rd the number of params.
December 14, 2024 at 9:50 PM
In the unseen metaworld & atari envs in the Gato setting, REGENT and R&P outperform SOTA generalist agents like JAT/Gato (the open source reproduction of Gato). REGENT outperforms JAT/Gato even after JAT/Gato is finetuned on data from the unseen envs.
December 14, 2024 at 9:50 PM
We also evaluate on unseen levels and unseen environments in the ProcGen setting.
December 14, 2024 at 9:50 PM
We evaluate REGENT on unseen robotics and game environments in the Gato setting.
December 14, 2024 at 9:50 PM
REGENT has a few key ingredients, including an interpolation between R&P and the transformer. This allows the transformer to more readily generalize to unseen envs, since it is given the easier task of predicting the residual to the R&P action rather than the complete action.
December 14, 2024 at 9:50 PM
Inspired by RAG and the success of a simple retrieval-based 1-nearest neighbor baseline that we call Retrieve-and-Play (R&P),

REGENT pretrains a transformer policy whose inputs are not just the query state st and previous reward rt-1, but also retrieved tuples of (state, previous reward, action).
December 14, 2024 at 9:50 PM
REGENT is pretrained on data from many training envs (left). REGENT is then deployed on the held-out envs (right) with a few demos from which it can retrieve states, rewards, and actions to use for in-context learning. **It never finetunes on the demos in the held-out envs.**
December 14, 2024 at 9:50 PM
Is scaling current agent architectures the most effective way to build generalist agents that can rapidly adapt?

Introducing 👑REGENT👑, a generalist agent that can generalize to unseen robotics tasks and games via retrieval-augmentation and in-context learning.
December 14, 2024 at 9:50 PM
REGENT is far from perfect.

It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).

Few failed rollouts:
December 14, 2024 at 7:39 PM
Vancouver is so beautiful!
December 10, 2024 at 7:28 PM