Stefan Baumann
stefanabaumann.bsky.social
Stefan Baumann
@stefanabaumann.bsky.social
PhD Student at @compvis.bsky.social & @ellis.eu working on generative computer vision.

Interested in extracting world understanding from models and more controlled generation. 🌐 https://stefan-baumann.eu/
The work I linked is relating to pretraining, too. Doing this for multiple rewards at once is indeed an aspect I haven't seen previously, I was just curious whether I was missing something about the general method
November 3, 2025 at 4:11 PM
Hasn't this idea been around for a while? E.g., proceedings.neurips.cc/paper_files/...
proceedings.neurips.cc
October 31, 2025 at 7:49 PM
Lovely work!
Let's make everything generative! No reason to forgo the availability of an (at least implicit) distribution for every prediction to make, if we can make it at least as accurate and similarly efficient as discriminative baselines in the long run
October 29, 2025 at 6:59 PM
Classic case of xkcd 2501
October 17, 2025 at 4:58 PM
Thank you! I think that might be possible, although I'd likely consider incorporating more information in that case
October 16, 2025 at 8:12 AM
We make code and weights available.
We'll also be in Honolulu to present the paper at #ICCV2025 next week 🌺.

Take a look now!
🌐 Project Page: compvis.github.io/flow-poke-tr...
📝 Paper: arxiv.org/abs/2510.12777
💻 Code & Weights: github.com/CompVis/flow...
What If: Understanding Motion Through Sparse Interactions
FPT enables fast prediction of multimodal motion distributions in open settings
compvis.github.io
October 15, 2025 at 2:00 AM
All of this wouldn't have been possible without the support of my amazing collaborators
@rmsnorm.bsky.social, @timyphan.bsky.social, and Björn Ommer at @compvis.bsky.social. A giant thank you to them! ❤️
October 15, 2025 at 1:59 AM
⚡️ FPT generalizes from open-set training. Applications:
• Articulated motion (Drag-A-Move): fine-tuned FPT outperforms specialized models for motion prediction
• Face motion: zero-shot, beats specialized baselines
• Moving part segmentation: emerges from formulation
October 15, 2025 at 1:58 AM
⚙️ Unlike other methods, we don't regress or sample one trajectory.
FPT 𝘳𝘦𝘱𝘳𝘦𝘴𝘦𝘯𝘵𝘴 𝘵𝘩𝘦 𝘧𝘶𝘭𝘭 𝘮𝘰𝘵𝘪𝘰𝘯 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯, enabling:
• interpretable uncertainty
• controllable interaction effects
• efficient prediction (>100k predictions/s)
October 15, 2025 at 1:57 AM
💡 Our idea:
Predict 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀 of motion, not just one flow field instance.

Given a few pokes, our model outputs the probability 𝘥𝘪𝘴𝘵𝘳𝘪𝘣𝘶𝘵𝘪𝘰𝘯 of how parts of the scene might move.

→ This directly captures 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯𝘵𝘺 and interactions.
October 15, 2025 at 1:57 AM
🧠 Understanding how the world 𝘤𝘰𝘶𝘭𝘥 change is core to physical intelligence.

But most models predict 𝗼𝗻𝗲 𝗳𝘂𝘁𝘂𝗿𝗲, a single deterministic motion.

The reality is 𝘶𝘯𝘤𝘦𝘳𝘵𝘢𝘪𝘯 and 𝘮𝘶𝘭𝘵𝘪-𝘮𝘰𝘥𝘢𝘭: one poke can lead to many outcomes.
October 15, 2025 at 1:57 AM
Oh yeah, sorry, I should've made it more clear that I was talking in the more general case
October 3, 2025 at 6:48 PM
Let's for example say (zero-shot) semantic correspondence working quite well based on activations of image diffusion models.

The model has never been trained for it, and, while it's obvious that related capabilities might be useful for denoising, I'd still consider this an emergent capability
October 3, 2025 at 6:45 PM
Not in the sense of, e.g., generating new kinds of videos when the model was trained for video generation, but capabilities w.r.t. other tasks could still be considered emergent, right?
October 3, 2025 at 6:43 PM
Fair :D
September 18, 2025 at 3:21 PM
First time I ever hear someone from the 3D CV community actually say this out loud! This has been bugging me for a long time
September 18, 2025 at 2:48 PM
Ah, makes sense :)
September 11, 2025 at 1:18 PM
Why are you not on a current stable version?
September 11, 2025 at 11:59 AM
The bugs I ran into reproduce across 2.7, 2.8 and current nightlies
September 11, 2025 at 11:35 AM
Welcome to the club! I've somehow managed to find two bugs with torch.compile() in the last few days 🥲
September 10, 2025 at 11:26 PM
That process really sounds like a labor of love! Penrose looks really interesting, I'll play around with it! Thanks!
August 31, 2025 at 4:47 PM