I am somewhat surprised that MAE is worse than DinoV3 on the monocular dense benchmark. Do you have an intuition why this is the case?
Have you also tried sampling a single view instead of only 2 or more views? Mostly wondering if that has negative side-effects on the outcome.
I am somewhat surprised that MAE is worse than DinoV3 on the monocular dense benchmark. Do you have an intuition why this is the case?
Have you also tried sampling a single view instead of only 2 or more views? Mostly wondering if that has negative side-effects on the outcome.