Animesh Garg
banner
animesh-garg.bsky.social
Animesh Garg
@animesh-garg.bsky.social
Foundation Models for Generalizable Autonomy.

Assistant Professor in AI Robotics, Georgia Tech

prev Berkeley, Stanford, Toronto, Nvidia
On our real-world multitask IL benchmark, Adapt3R achieves a strong in-distribution success rate and sees the smallest performance loss by far under a dramatically new viewpoint.
July 25, 2025 at 4:56 PM
Here we rotate the scene camera by θ radians about the EE starting position, and see that Adapt3R consistently achieves the strongest performance. Notably, it maintains >80% success rate on LIBERO, and has the only nonzero MimicGen success rate when θ ≥ 0.6.
July 25, 2025 at 4:56 PM
When we evaluate policies trained with Adapt3R on the multitask LIBERO benchmark or the high-precision tasks from the MimicGen paper, we see that they are just as performant as their RGB counterparts.
July 25, 2025 at 4:55 PM
Adapt3R unprojects 2D features into a point cloud, transforms them into the end effector’s coordinate frame, and uses attention pooling to condense them into a single conditioning vector for IL. Notice that Adapt3R attends to the same points before and after the camera change!
July 25, 2025 at 4:55 PM
Learning 3D representations is hard without 3D data
💡 The key idea is to use a 2D foundation model to extract semantic features, and use 3D information to localize those features in a canonical 3D space without extracting any semantic information from the 3D data.
July 25, 2025 at 4:55 PM
Albert Wilcox has been working on the using canonical 3D reps instead.
Yet, naive 3D alternatives don't work since most of the data is not easily featurized.

Adapt3R is a 3D backbone that works with your favorite robot learning method and generalizes to unseen embodiments & camera viewpoints!
July 25, 2025 at 4:54 PM
Imitation learning frameworks are often with 2D inputs.
but 2D limits generalization even to camera poses.

This has been an ongoing challenge, especially for humanoids since the camera pose is not steady and need not match the training data.

We build Adapt3R to solve this problem!
Read on for more
July 25, 2025 at 4:54 PM
Unitree R1 is a new $6K humanoid with a onboard LLM/VLM!

The price & complexity is not a barrier for ML folks to enter robot learning.

Stable, low-cost developer platforms will accelerate humanoid development with new ideas coming from everywhere.

Unitree did it for quadrupeds & now bipeds!
July 25, 2025 at 1:28 PM
This summer, I've picked up my long-form writing again on my blog at praxiscurrents.substack.com.

I'm starting with a deep dive into the importance of data-driven methods in the robotics -- The age of empiricism in Physical AI

Feel free to visit and subscribe for regular updates.
June 25, 2025 at 5:12 PM
Come chat with us at ICLR 2025 today 3-5pm in Hall 3 + Hall 2B

#182 EgoSim: Egocentric Exploration in VirtualWorlds with
Multi-modal Conditioning
egosim.github.io/EgoSim/
Wei Yu

#401 PWM: Policy Learning with Multi-task World Models
imgeorgiev.com/pwm/
Varun Giridhar ncklashansen.bsky.social
April 25, 2025 at 1:25 AM
How well does AnyPlace perform?
🏆 Simulation results: Outperforms baselines in
✔ Success rate
✔ Coverage of placement modes
✔ Fine-placement precision
📌 Real-world results: Our method transfers directly from synthetic to real-world tasks, succeeding where others struggle!
February 24, 2025 at 10:11 PM
To generalize across objects & placements, we generate a fully synthetic dataset with:
✅ Randomly generated objects in Blender
✅ Diverse placement configurations (stacking, insertion, hanging) in IsaacSim
This allows us to train our model without real-world data collection! 🚀
February 24, 2025 at 10:11 PM
Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently.
February 24, 2025 at 10:11 PM
How can robots reliably place objects in diverse real-world tasks?

🤖🔍 Placement is hard! - objects vary in shape & placement modes (such as stacking, hanging, insertion)

AnyPlace predicts placement poses of unseen objects in real-world with ony using synthetic training data!

Read on👇
February 24, 2025 at 10:11 PM
Reality is stranger than fiction!

Satire is limited by the imagination of well-meaning creators, real stupidity knows no bounds
February 16, 2025 at 6:24 AM
If oligarchs, far removed from reality, make the laws unilaterally without regard, then calling them “laws” is laughable

This reminiscent of colonial history when the anything could be outlawed without discussion or opposition!
February 15, 2025 at 7:51 PM