Lightnews — Scholar-powered news

Kwang Moo Yi @kmyid.bsky.social · 12h

nianticspatial.github.io/ace-g/
@ericbrachmann.bsky.social

ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training

Better generalization for scene coordinate regression through pre-training of a scene-agnostic coordinate regressor.

nianticspatial.github.io

1

Kwang Moo Yi @kmyid.bsky.social · 12h

Bruns et al., "ACE-G: Improving Generalization of Scene Coordinate Regression Through Query Pre-Training"

Train a scene coordinate regressor with "map codes" (ie, trainable inputs) so that you can train one generalizable regressor. Then, find these "map codes" to localize.

1 1 1

Kwang Moo Yi @kmyid.bsky.social · 1d

point-prompting.github.io

Point Prompting: Counterfactual Tracking with Video Diffusion Models

point-prompting.github.io

Kwang Moo Yi @kmyid.bsky.social · 1d

Shrivastava and Mehta et al., "Point Prompting: Counterfactual Tracking with Video Diffusion Models"

Put a red dot where you want to track, and SDEdit the video with a video model --> zero-shot point tracking. Not as good as supervised ones, but zero-shot!

1

Kwang Moo Yi @kmyid.bsky.social · 2d

github.com/YuanJianhao5...

GitHub - YuanJianhao508/LikePhys

Contribute to YuanJianhao508/LikePhys development by creating an account on GitHub.

github.com

Kwang Moo Yi @kmyid.bsky.social · 2d

Yuan et al., "LikePhys: Evaluating intuitive physics understanding in video diffusion models via likelihood preference"

I will keep promoting physics benchmark papers for video models until people stop claiming world models :) tl;dr -- Still not there yet.

1

Kwang Moo Yi @kmyid.bsky.social · 6d

haofeixu.github.io/resplat/

ReSplat

ReSplat: Learning Recurrent Gaussian Splats

haofeixu.github.io

Kwang Moo Yi @kmyid.bsky.social · 6d

Xu et al., "ReSplat: Learning Recurrent Gaussian Splats"

Feed-forward Gaussian Splatting + Learned Corrector = Fast high-quality reconstruction. Uses global + kNN attention. Reminds me of pointnet++

1 1 4

Kwang Moo Yi @kmyid.bsky.social · 7d

pixel-perfect-depth.github.io

pixel-perfect-depth

pixel-perfect-depth.github.io

Kwang Moo Yi @kmyid.bsky.social · 7d

Xu and Lin et al., "Pixel-Perfect Depth with Semantics-Prompted Diffusion Transformers"

Append foundational features at the later stages when doing marigold-like denoising to get monocular depth. Simple straightforward idea that works.

1 1

Kwang Moo Yi @kmyid.bsky.social · 8d

arxiv.org/abs/2510.05930

Carré du champ flow matching: better quality-generalisation tradeoff in generative models

Deep generative models often face a fundamental tradeoff: high sample quality can come at the cost of memorisation, where the model reproduces training data rather than generalising across the underly...

arxiv.org

Kwang Moo Yi @kmyid.bsky.social · 8d

Bamberger and Jones et al., "Carré du champ flow matching: better quality-generalisation tradeoff in generative models"

Geometric regularization of the flow manifold. Boils down to adding anisotropic Gaussian Noise to flow matching training. Neat idea, enhances generalization.

1 1

Kwang Moo Yi @kmyid.bsky.social · 9d

arxiv.org/abs/2510.03348

Visual Odometry with Transformers

Modern monocular visual odometry methods typically combine pre-trained deep learning components with optimization modules, resulting in complex pipelines that rely heavily on camera calibration and hy...

arxiv.org

Kwang Moo Yi @kmyid.bsky.social · 9d

Yugay and Nguyen et al., “Visual Odometry with Transformers”

Instead of point maps, you can also directly output poses. This used to be much less accurate, but now it's the opposite. Simple architecture that directly predicts camera embeddings, which then regress rot and trans.

1 3

Kwang Moo Yi @kmyid.bsky.social · 10d

rover-xingyu.github.io/TTT3R/

TTT3R: 3D Reconstruction as Test-Time Training

3D Reconstruction as Test-Time Training

rover-xingyu.github.io

Kwang Moo Yi @kmyid.bsky.social · 10d

Chen et al., "TTT3R: 3D Reconstruction as Test-Time Training"

Cut3R + gated updates for states (test-time training layers) = fast/efficient performance of cut3r, but with high-quality estimates.

1 2

Kwang Moo Yi @kmyid.bsky.social · 13d

arxiv.org/abs/2509.25705
arxiv.org/abs/2510.01378

How Diffusion Models Memorize

Despite their success in image generation, diffusion models can memorize training data, raising serious privacy and copyright concerns. Although prior work has sought to characterize, detect, and miti...

arxiv.org

Kwang Moo Yi @kmyid.bsky.social · 13d

Two today: Kim et al., "How Diffusion Models Memorize" and Song and Kim et al., "Selective Underfitting in Diffusion Models"

A deep dive into how memorization and generalization happen in diffusion models. Still trying to digest what these mean. Though-provoking.

1 2

Kwang Moo Yi @kmyid.bsky.social · 14d

nianticspatial.github.io/fastforward/
@ericbrachmann.bsky.social
VERY cool

A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features

Fast mapping and localization through a single feed-forward pass.

nianticspatial.github.io

3

Kwang Moo Yi @kmyid.bsky.social · 14d

Barroso-Laguna et al., "A Scene is Worth a Thousand Features: Feed-Forward Camera Localization from a Collection of Image Features"

When contexting your feed-forward 3D point-map estimator, don't use full image pairs -- just randomly subsample! -> fast compute, more images.

1 3 15