Lightnews — Scholar-powered news

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

REGENT is far from perfect.

It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).

Few failed rollouts:

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

Here is a qualitative visualization of deploying REGENT in the unseen atari-pong environment.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

While REGENT’s design choices are aimed at generalization, its gains are not limited to unseen environments: it even performs better than current generalist agents when deployed within the pre-training environments.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

In the four unseen ProcGen environments, REGENT also outperforms the only other generalist agent, MTT, that can generalize to unseen environments via in-context learning. REGENT does so with an OOM less pretraining data and 1/3rd the number of params.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

In the unseen metaworld & atari envs in the Gato setting, REGENT and R&P outperform SOTA generalist agents like JAT/Gato (the open source reproduction of Gato). REGENT outperforms JAT/Gato even after JAT/Gato is finetuned on data from the unseen envs.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

We also evaluate on unseen levels and unseen environments in the ProcGen setting.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

We evaluate REGENT on unseen robotics and game environments in the Gato setting.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

REGENT has a few key ingredients, including an interpolation between R&P and the transformer. This allows the transformer to more readily generalize to unseen envs, since it is given the easier task of predicting the residual to the R&P action rather than the complete action.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

Inspired by RAG and the success of a simple retrieval-based 1-nearest neighbor baseline that we call Retrieve-and-Play (R&P),

REGENT pretrains a transformer policy whose inputs are not just the query state st and previous reward rt-1, but also retrieved tuples of (state, previous reward, action).

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

REGENT is pretrained on data from many training envs (left). REGENT is then deployed on the held-out envs (right) with a few demos from which it can retrieve states, rewards, and actions to use for in-context learning. **It never finetunes on the demos in the held-out envs.**

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

Is scaling current agent architectures the most effective way to build generalist agents that can rapidly adapt?

Introducing 👑REGENT👑, a generalist agent that can generalize to unseen robotics tasks and games via retrieval-augmentation and in-context learning.

December 14, 2024 at 9:50 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

REGENT is far from perfect.

It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).

Few failed rollouts:

December 14, 2024 at 7:39 PM

Kaustubh Sridhar

@kaustubhsridhar.bsky.social

Vancouver is so beautiful!

December 10, 2024 at 7:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news