📍Paris, France 🔗 cedricrommel.github.io
Or at our oral presentation during the @neur_reps workshop on Saturday 14th!
Paper: arxiv.org/abs/2312.06386
Github: github.com/cedricrommel...
Or at our oral presentation during the @neur_reps workshop on Saturday 14th!
Paper: arxiv.org/abs/2312.06386
Github: github.com/cedricrommel...
In the end, our model works as a conditional density estimator, taking the shape of a mixture of Dirac deltas.
In the end, our model works as a conditional density estimator, taking the shape of a mixture of Dirac deltas.
- A multi-head subnetwork is used to predict different possible rotations for each joint, together with their corresponding likelihoods.
- Both are then merged into predicted poses.
- A multi-head subnetwork is used to predict different possible rotations for each joint, together with their corresponding likelihoods.
- Both are then merged into predicted poses.
We hence propose ManiPose, a manifold-constrained multi-hypothesis deep network capable of better dealing with depth ambiguity.
We hence propose ManiPose, a manifold-constrained multi-hypothesis deep network capable of better dealing with depth ambiguity.
In our work, we prove this is unavoidable because of points 1 and 2.
In our work, we prove this is unavoidable because of points 1 and 2.
1. Existing training losses and evaluation metrics (MPJPE) are blind to such inconsistencies ;
2. Many possible 3D poses can map to the same 2D input ;
3. Pose sequences cannot occupy the whole space: they lie on a smooth manifold because of limbs rigidity.
1. Existing training losses and evaluation metrics (MPJPE) are blind to such inconsistencies ;
2. Many possible 3D poses can map to the same 2D input ;
3. Pose sequences cannot occupy the whole space: they lie on a smooth manifold because of limbs rigidity.
In our work, we prove these are not isolated cases and that these methods always predict *inconsistent* 3D pose sequences.
In our work, we prove these are not isolated cases and that these methods always predict *inconsistent* 3D pose sequences.
This can be achieved with a single camera by detecting human keypoints on a video, then lifting them into a 3D pose.
This can be achieved with a single camera by detecting human keypoints on a video, then lifting them into a 3D pose.
Our work accepted to #NeurIPS2024, ManiPose, gets one step closer to solving this, by leveraging prior knowledge about poses topology and cool multiple-choice learning techniques.
Our work accepted to #NeurIPS2024, ManiPose, gets one step closer to solving this, by leveraging prior knowledge about poses topology and cool multiple-choice learning techniques.