Kwang Moo Yi
kmyid.bsky.social
Kwang Moo Yi
@kmyid.bsky.social
Assistant Professor of Computer Science at the University of British Columbia. I also post my daily finds on arxiv.
Baek et al., "SONIC: Spectral Optimization of Noise for Inpainting with Consistency"

Initial seed noise matters. And you can optimize it **without** any backprop through your denoiser via good-ol linearization. Importantly, you need to do this in the Fourier space.
November 26, 2025 at 3:06 AM
SAM 3D Team, "SAM 3D: 3Dfy Anything in Images"

image (point map) + mask -> Transformer -> pose / voxel -> transformer (image/mask/voxel) -> mesh / splat. Staged training with synthetic and real data & RL. I wish I could see more failure examples to know the limits.
November 21, 2025 at 8:35 PM
Chen et al., "Co-Me: Confidence-Guided Token Merging for Visual Geometric Transformers"

Train a confidence predictor for tokens and merge low-confidence ones for acceleration -> faster reconstruction with VGGT/MapAnything.
November 19, 2025 at 8:40 PM
Lin and Chen and Liew and Chen, et al., and Kang "Depth Anything 3: Recovering the Visual Space from Any Views"

VGGT-like architecture, but simplified to estimate depth and ray maps (not point maps). Uses teacher-student training of Depth Anything v2.
November 14, 2025 at 9:25 PM
Singer and Rotstein et al., "Time-to-Move: Training-Free Motion Controlled Video Generation via Dual-Clock Denoising"

Make a rough warp, push it through Image-to-Video model with denoise together up until a timestep, then let it finish the rest without interference.
November 13, 2025 at 7:54 PM
Ren and Wen et al., "FastGS: Training 3D Gaussian Splatting in 100 Seconds"

I like simple ideas -- this one says you should consider multiple views when you prune/clone, which allows fewer Gaussians to be used for training.
November 7, 2025 at 6:32 PM
Gao and Mao et al., "Seeing the Wind from a Falling Leaf"

Extract Dynamic 3D Gaussians for an object -> Vision Language Models to extract physics parameters -> model force field (wind). Leads to some fun.
November 5, 2025 at 5:31 PM
Zhou et al., "PAGE-4D: Disentangled Pose and Geometry Estimation for 4D Perception"

VGGT extended to dynamic scenes with a dynamic mask predictor.
November 4, 2025 at 8:17 PM
Tesfaldet et al., "Generative Point Tracking with Flow Matching"

Tracking, waaaaaay back in the days, used to be solved using sampling methods. They are now back. Also reminds me of my first major conference work, where I looked into how much impact the initial target point has.
October 31, 2025 at 6:42 PM
Bai et al., "Positional Encoding Field"

Make your RoPE encoding 3D by including a z axis, then manipulate your image by simply manipulating your positional encoding in 3D --> novel view synthesis. Neat idea.
October 24, 2025 at 6:20 PM