Siddhant Haldar
haldarsiddhant.bsky.social
Siddhant Haldar
@haldarsiddhant.bsky.social
Excited about generalizing AI | PhD student @NYU
Pinned
The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us?

Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.
The robot behaviors shown below are trained without any teleop, sim2real, genai, or motion planning. Simply show the robot a few examples of doing the task yourself, and our new method, called Point Policy, spits out a robot-compatible policy!
The most frustrating part of imitation learning is collecting huge amounts of teleop data. But why teleop robots when robots can learn by watching us?

Introducing Point Policy, a novel framework that enables robots to learn from human videos without any teleop, sim2real, or RL.
The robot behaviors shown below are trained without any teleop, sim2real, genai, or motion planning. Simply show the robot a few examples of doing the task yourself, and our new method, called Point Policy, spits out a robot-compatible policy!
February 28, 2025 at 11:28 PM
Reposted by Siddhant Haldar
We just released AnySense, an iPhone app for effortless data acquisition and streaming for robotics. We leverage Apple’s development frameworks to record and stream:

1. RGBD + Pose data
2. Audio from the mic or custom contact microphones
3. Seamless Bluetooth integration for external sensors
February 26, 2025 at 3:14 PM
Reposted by Siddhant Haldar
Can we extend the power of world models beyond just online model-based learning? Absolutely!

We believe the true potential of world models lies in enabling agents to reason at test time.
Introducing DINO-WM: World Models on Pre-trained Visual Features for Zero-shot Planning.
January 31, 2025 at 7:24 PM
Reposted by Siddhant Haldar
BAKU is fully open source and surprisingly effective. We found it easily adaptable for a host of visuotactile tasks in visuoskin.github.io
December 10, 2024 at 6:23 PM
I will be presenting BAKU at the #NeurIPS2024 poster session on Thursday, December 12, from 11 a.m. to 2 p.m. PST at East Exhibit Hall A-C #4206!

Do drop in to chat about efficient robot policy architectures as well as some of the more recent work using BAKU.
Modern policy architectures are unnecessarily complex. In our #NeurIPS2024 project called BAKU, we focus on what really matters for good policy learning.

BAKU is modular, language-conditioned, compatible with multiple sensor streams & action multi-modality, and importantly fully open-source!
December 11, 2024 at 3:42 PM
Reposted by Siddhant Haldar
P3-PO is a great example of how simple human priors can facilitate significantly better generalizability for robot policies.
New paper! We show that by using keypoint-based image representation, robot policies become robust to different object types and background changes.

We call this method Prescriptive Point Priors for robot Policies or P3-PO in short. Full project is here: point-priors.github.io
December 10, 2024 at 8:48 PM
Turns out that replacing images with keypoint-based representations can enable enhanced generalization across spatial positions and orientations and novel object instances! We just released P3-PO, a method for learning generalizable policies with minimal data. 🚀
New paper! We show that by using keypoint-based image representation, robot policies become robust to different object types and background changes.

We call this method Prescriptive Point Priors for robot Policies or P3-PO in short. Full project is here: point-priors.github.io
December 11, 2024 at 8:03 AM
Reposted by Siddhant Haldar
Modern policy architectures are unnecessarily complex. In our #NeurIPS2024 project called BAKU, we focus on what really matters for good policy learning.

BAKU is modular, language-conditioned, compatible with multiple sensor streams & action multi-modality, and importantly fully open-source!
December 9, 2024 at 11:33 PM