Interested in Computer Vision, Geometry, and learning both at the same time
https://www.jgaubil.com/
We identified attention heads specialized in finding correspondences across views.
We can clearly see the geometric refinement on this difficult image pair by visualizing their cross-attention maps! [6/8]
We identified attention heads specialized in finding correspondences across views.
We can clearly see the geometric refinement on this difficult image pair by visualizing their cross-attention maps! [6/8]
We can observe the impact of each layer on the iterative reconstruction process by comparing the pointmap error before and after the layer.
Here, we plot of the error difference for every layer of DUSt3R’s second-view decoder [4/8]
We can observe the impact of each layer on the iterative reconstruction process by comparing the pointmap error before and after the layer.
Here, we plot of the error difference for every layer of DUSt3R’s second-view decoder [4/8]
For easy image pairs, a good estimate of the relative position emerges early in the decoder, whereas harder pairs require more decoder blocks, sometimes even failing to converge [3/8]
For easy image pairs, a good estimate of the relative position emerges early in the decoder, whereas harder pairs require more decoder blocks, sometimes even failing to converge [3/8]
We can then analyze its inference through the sequence of reconstructions - see below! [2/8]
We can then analyze its inference through the sequence of reconstructions - see below! [2/8]