Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.
Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
Self-supervised learning from video does scale! In our latest work, we scaled masked auto-encoding models to 22B params, boosting performance on pose estimation, tracking & more.
Paper: arxiv.org/abs/2412.15212
Code & models: github.com/google-deepmind/representations4d
We investigated this question and more in our latest work, please check it out!
*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001
We investigated this question and more in our latest work, please check it out!
*From Image to Video: An Empirical Study of Diffusion Representations*
arxiv.org/abs/2502.07001