Lightnews — Scholar-powered news

Animesh Garg

@animesh-garg.bsky.social

On our real-world multitask IL benchmark, Adapt3R achieves a strong in-distribution success rate and sees the smallest performance loss by far under a dramatically new viewpoint.

July 25, 2025 at 4:56 PM

Animesh Garg

@animesh-garg.bsky.social

Here we rotate the scene camera by θ radians about the EE starting position, and see that Adapt3R consistently achieves the strongest performance. Notably, it maintains >80% success rate on LIBERO, and has the only nonzero MimicGen success rate when θ ≥ 0.6.

July 25, 2025 at 4:56 PM

Animesh Garg

@animesh-garg.bsky.social

When we evaluate policies trained with Adapt3R on the multitask LIBERO benchmark or the high-precision tasks from the MimicGen paper, we see that they are just as performant as their RGB counterparts.

July 25, 2025 at 4:55 PM

Animesh Garg

@animesh-garg.bsky.social

Adapt3R unprojects 2D features into a point cloud, transforms them into the end effector’s coordinate frame, and uses attention pooling to condense them into a single conditioning vector for IL. Notice that Adapt3R attends to the same points before and after the camera change!

July 25, 2025 at 4:55 PM

Animesh Garg

@animesh-garg.bsky.social

Learning 3D representations is hard without 3D data
💡 The key idea is to use a 2D foundation model to extract semantic features, and use 3D information to localize those features in a canonical 3D space without extracting any semantic information from the 3D data.

July 25, 2025 at 4:55 PM

Animesh Garg

@animesh-garg.bsky.social

Albert Wilcox has been working on the using canonical 3D reps instead.
Yet, naive 3D alternatives don't work since most of the data is not easily featurized.

Adapt3R is a 3D backbone that works with your favorite robot learning method and generalizes to unseen embodiments & camera viewpoints!

July 25, 2025 at 4:54 PM

Animesh Garg

@animesh-garg.bsky.social

Imitation learning frameworks are often with 2D inputs.
but 2D limits generalization even to camera poses.

This has been an ongoing challenge, especially for humanoids since the camera pose is not steady and need not match the training data.

We build Adapt3R to solve this problem!
Read on for more

July 25, 2025 at 4:54 PM

Animesh Garg

@animesh-garg.bsky.social

Unitree R1 is a new $6K humanoid with a onboard LLM/VLM!

The price & complexity is not a barrier for ML folks to enter robot learning.

Stable, low-cost developer platforms will accelerate humanoid development with new ideas coming from everywhere.

Unitree did it for quadrupeds & now bipeds!

July 25, 2025 at 1:28 PM

Animesh Garg

@animesh-garg.bsky.social

This summer, I've picked up my long-form writing again on my blog at praxiscurrents.substack.com.

I'm starting with a deep dive into the importance of data-driven methods in the robotics -- The age of empiricism in Physical AI

Feel free to visit and subscribe for regular updates.

June 25, 2025 at 5:12 PM

Animesh Garg

@animesh-garg.bsky.social

Come chat with us at ICLR 2025 today 3-5pm in Hall 3 + Hall 2B

#182 EgoSim: Egocentric Exploration in VirtualWorlds with
Multi-modal Conditioning
egosim.github.io/EgoSim/
Wei Yu

#401 PWM: Policy Learning with Multi-task World Models
imgeorgiev.com/pwm/
Varun Giridhar ncklashansen.bsky.social

April 25, 2025 at 1:25 AM

Animesh Garg

@animesh-garg.bsky.social

How well does AnyPlace perform?
🏆 Simulation results: Outperforms baselines in
✔ Success rate
✔ Coverage of placement modes
✔ Fine-placement precision
📌 Real-world results: Our method transfers directly from synthetic to real-world tasks, succeeding where others struggle!

February 24, 2025 at 10:11 PM

Animesh Garg

@animesh-garg.bsky.social

To generalize across objects & placements, we generate a fully synthetic dataset with:
✅ Randomly generated objects in Blender
✅ Diverse placement configurations (stacking, insertion, hanging) in IsaacSim
This allows us to train our model without real-world data collection! 🚀

February 24, 2025 at 10:11 PM

Animesh Garg

@animesh-garg.bsky.social

Our key insight is that by leveraging a Vision-Language Model (VLM) to identify rough placement locations, we focus only on the relevant regions for local placement, which enables us to train the low-level placement-pose-prediction model to capture diverse placements efficiently.

February 24, 2025 at 10:11 PM

Animesh Garg

@animesh-garg.bsky.social

How can robots reliably place objects in diverse real-world tasks?

🤖🔍 Placement is hard! - objects vary in shape & placement modes (such as stacking, hanging, insertion)

AnyPlace predicts placement poses of unseen objects in real-world with ony using synthetic training data!

Read on👇

February 24, 2025 at 10:11 PM

Animesh Garg

@animesh-garg.bsky.social

Reality is stranger than fiction!

Satire is limited by the imagination of well-meaning creators, real stupidity knows no bounds

February 16, 2025 at 6:24 AM

Animesh Garg

@animesh-garg.bsky.social

If oligarchs, far removed from reality, make the laws unilaterally without regard, then calling them “laws” is laughable

This reminiscent of colonial history when the anything could be outlawed without discussion or opposition!

February 15, 2025 at 7:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news