- Render: 3D human/object meshes into multiview 2D images.
- Localize: A Multiview Localization (MV-Loc) model, guided by VLM tokens, predicts 2D contact masks.
- Lift: 2D contact masks to 3D.
(5/10)
- Render: 3D human/object meshes into multiview 2D images.
- Localize: A Multiview Localization (MV-Loc) model, guided by VLM tokens, predicts 2D contact masks.
- Lift: 2D contact masks to 3D.
(5/10)