Lightnews — Scholar-powered news

Julien Gaubil

@jgaubil.bsky.social

PhD student at École Polytechnique
Interested in Computer Vision, Geometry, and learning both at the same time

https://www.jgaubil.com/

Posts Replies Media Videos

Julien Gaubil

@jgaubil.bsky.social

We also find that the decoder turns 𝐬𝐞𝐦𝐚𝐧𝐭𝐢𝐜 correspondences into 𝐠𝐞𝐨𝐦𝐞𝐭𝐫𝐢𝐜 𝐜𝐨𝐫𝐫𝐞𝐬𝐩𝐨𝐧𝐝𝐞𝐧𝐜𝐞𝐬.

We identified attention heads specialized in finding correspondences across views.

We can clearly see the geometric refinement on this difficult image pair by visualizing their cross-attention maps! [6/8]

November 4, 2025 at 7:40 PM

Julien Gaubil

@jgaubil.bsky.social

Can we dive deeper into the network? Yes!

We can observe the impact of each layer on the iterative reconstruction process by comparing the pointmap error before and after the layer.

Here, we plot of the error difference for every layer of DUSt3R’s second-view decoder [4/8]

November 4, 2025 at 7:40 PM

Julien Gaubil

@jgaubil.bsky.social

We observe that 𝐫𝐞𝐜𝐨𝐧𝐬𝐭𝐫𝐮𝐜𝐭𝐢𝐨𝐧 𝐢𝐬 𝐚𝐧 𝐢𝐭𝐞𝐫𝐚𝐭𝐢𝐯𝐞 𝐩𝐫𝐨𝐜𝐞𝐬𝐬, with decoder blocks progressively refining the pointmaps.⁣

For easy image pairs, a good estimate of the relative position emerges early in the decoder, whereas harder pairs require more decoder blocks, sometimes even failing to converge [3/8]

November 4, 2025 at 7:40 PM

Julien Gaubil

@jgaubil.bsky.social

To open up DUSt3R, we train individual MLP probes on intermediate layers of an early checkpoint, using the same pointmap objective.

We can then analyze its inference through the sequence of reconstructions - see below! [2/8]

November 4, 2025 at 7:40 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news