Lightnews — Scholar-powered news

Dimitrije Antić

@anticdimi.bsky.social

CV & ML Ph.D. student at @uva.nl | prev. Univ. of Tuebingen, MPI-IS | Teaching machines to perceive humans. | anticdimi.github.io

Posts Replies Media Videos

Reposted by Dimitrije Antić

Sai Kumar Dwivedi

@saidwivedi.in

To bridge this 2D-to-3D gap, we propose "Render-Localize-Lift":
- Render: 3D human/object meshes into multiview 2D images.
- Localize: A Multiview Localization (MV-Loc) model, guided by VLM tokens, predicts 2D contact masks.
- Lift: 2D contact masks to 3D.
(5/10)

June 15, 2025 at 12:23 PM

Reposted by Dimitrije Antić

Sai Kumar Dwivedi

@saidwivedi.in

How can we infer 3D contact with limited 3D data? InteractVLM exploits foundational models—a VLM & localization model fine tuned to reason about contact. Given an image & prompt, the VLM outputs tokens for localization. But these models work in 2D, while contact is 3D. (4/10)

June 15, 2025 at 12:23 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news