Prev: @UPenn @Amazon @IITBombay
http://kaustubhsridhar.github.io/
We have many more results, ablations, code, dataset, model, and the paper at our website: bit.ly/regent-research
The arxiv link: arxiv.org/abs/2412.04759
We have many more results, ablations, code, dataset, model, and the paper at our website: bit.ly/regent-research
The arxiv link: arxiv.org/abs/2412.04759
It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).
Few failed rollouts:
It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).
Few failed rollouts:
For context, the Multi-Game DT uses 1M states to finetune to new atari envs. REGENT generalizes via RAG from ~10k states. REGENT Finetuned further improves over REGENT
For context, the Multi-Game DT uses 1M states to finetune to new atari envs. REGENT generalizes via RAG from ~10k states. REGENT Finetuned further improves over REGENT
REGENT retrieves the 19 closest states, throws the corresponding (s, r, a) tuples in the context with query (st, rt-1), and acts via in-context learning in unseen envs.
REGENT retrieves the 19 closest states, throws the corresponding (s, r, a) tuples in the context with query (st, rt-1), and acts via in-context learning in unseen envs.
REGENT pretrains a transformer policy whose inputs are not just the query state st and previous reward rt-1, but also retrieved tuples of (state, previous reward, action).
REGENT pretrains a transformer policy whose inputs are not just the query state st and previous reward rt-1, but also retrieved tuples of (state, previous reward, action).
We have many more results, ablations, code, dataset, model, and the paper at our website: bit.ly/regent-research
The arxiv link: arxiv.org/abs/2412.04759
We have many more results, ablations, code, dataset, model, and the paper at our website: bit.ly/regent-research
The arxiv link: arxiv.org/abs/2412.04759
It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).
Few failed rollouts:
It cannot generalize to new embodiments (unseen mujoco envs) or long-horizon envs (like spaceinvaders & stargunner). It cannot generalize to completely new suites (i.e. requires similarities between pre-training and unseen envs).
Few failed rollouts: