Anand Bhattad
banner
anandbhattad.bsky.social
Anand Bhattad
@anandbhattad.bsky.social
Incoming Assistant Professor at Johns Hopkins University | RAP at Toyota Technological Institute at Chicago | web: https://anandbhattad.github.io/ | Knowledge in Generative Image Models, Intrinsic Images, Image-based Relighting, Inverse Graphics
A special shout-out to all the job-market candidates this year: it’s been tough with interviews canceled and hiring freezes🙏

After UIUC's blue and @tticconnect.bsky.social blue, I’m delighted to add another shade of blue to my journey at Hopkins @jhucompsci.bsky.social. Super excited!!
June 2, 2025 at 7:46 PM
I’m thrilled to share that I will be joining Johns Hopkins University’s Department of Computer Science (@jhucompsci.bsky.social, @hopkinsdsai.bsky.social) as an Assistant Professor this fall.
June 2, 2025 at 7:46 PM
[1/2] Not really... there's quite a bit of variation.

When we remove the top bowl, we get diverse semantics: fruits, plants, and other objects that just happen to fit the shape. As we go down, it becomes less diverse: occasional flowers, new bowls in the middle, & finally just bowls at the bottom.
April 2, 2025 at 4:49 AM
[8/10] This simple idea surprisingly scales to a wide range of scenes: from clean setups like a cat on a table or a stack of bowls... to messy, real-world scenes (yes, even Alyosha’s office).
March 29, 2025 at 7:36 PM
[7/10] Why does this work? Because generative models have internalized asymmetries in the visual world.

Search for “cups” → You’ll almost always see a table.
Search for “tables” → You rarely see cups.

So: P(table | cup) ≫ P(cup | table)

We exploit this asymmetry to guide counterfactual inpainting
March 29, 2025 at 7:36 PM
[6/10] We measure dependencies by masking each object, then using a large inpainting model to hallucinate what should be there. If the replacements are diverse, the object likely isn't critical. If it consistently reappears, like the table under the cat, it’s probably a support.
March 29, 2025 at 7:36 PM
[1/10] Is scene understanding solved?

Models today can label pixels and detect objects with high accuracy. But does that mean they truly understand scenes?

Super excited to share our new paper and a new task in computer vision: Visual Jenga!

📄 arxiv.org/abs/2503.21770
🔗 visualjenga.github.io
March 29, 2025 at 7:36 PM
I can’t believe this! Mind-blowing! There are small errors (a flipped logo, rotated chairs), but still, this is incredible!!

Xiaoyan, who’s been working with me on relighting, sent this over. It’s one of the hardest examples we’ve consistently used to stress-test LumiNet: luminet-relight.github.io
March 27, 2025 at 8:00 PM
Can we create realistic renderings of urban scenes from a single video while enabling controllable editing: relighting, object compositing, and nighttime simulation?

Check out our #3DV2025 UrbanIR paper, led by @chih-hao.bsky.social that does exactly this.

🔗: urbaninverserendering.github.io
March 16, 2025 at 3:39 AM
Exciting work led by Zitian & @freudfc.bsky.social. This is their first Computer Vision paper acceptance! 🚀

w/ Mathieu Garon and @jflalonde.bsky.social
February 28, 2025 at 6:32 PM
ZeroComp will be presented as an Oral at #WACV2025!

Train a diffusion model as a renderer that takes intrinsic images as input. Once trained, we can perform zero-shot object compositing & can easily extend this to other object editing tasks, such as material swapping.
arxiv.org/abs/2410.08168
February 28, 2025 at 6:32 PM
Thanks! Our workshop has a broader goal as you mentioned. We hope to see you there :)

sites.google.com/view/standou...
January 14, 2025 at 10:37 AM
A short description of our workshop is on our website.

Here's what "Standing Out" means to us:
"A forum to reflect on personal growth & unique identity in a rapidly evolving field."

The goal is to help researchers find their authentic voice while building a more vibrant, inclusive CV community.
January 14, 2025 at 10:33 AM
3/3 Bringing this to you with the help of my amazing co-organizers: Aditya Prakash, @unnatjain.bsky.social, @akanazawa.bsky.social, Georgia Gkioxari, & Lana Lazebnink

⚠️ Important: When registering for #CVPR2025, please check-mark our workshop in the interest form!
@cvprconference.bsky.social
January 13, 2025 at 4:16 PM
🧵 1/3 Many at #CVPR2024 & #ECCV2024 asked what would be next in our workshop series.

We're excited to announce "How to Stand Out in the Crowd?" at #CVPR2025 Nashville - our 4th community-building workshop featuring this incredible speaker lineup!

🔗 sites.google.com/view/standou...
January 13, 2025 at 4:16 PM
[1/2] Latent Intrinsics Emerge from Training to Relight.

We discovered an easy way to extract albedo without any albedo-like images. By questioning why intrinsic image representations must be 3-channel maps, we trained a simple relighting model where intrinsics & lighting are latent variables.
December 11, 2024 at 10:05 PM
[2/3] Some of the results are compelling—light flares as sunlight pierces through the edge of the building, reflections on glass facades, drainage, stop signs appearing, and even cars on the street that weren’t visible in the original frame. There is no explicit geometry required.
December 11, 2024 at 9:32 PM
[1/3] From a single image to the 3D world—it’s possible with the right data, & we have it for you🚀

We’re thrilled to release our 360° video dataset. Training a simple conditional diffusion model with explicit camera control can synthesize novel 3D scenes—all from a single image input! #NeurIPS2024
December 11, 2024 at 9:32 PM
Spotlight Poster #NeurIPS2024

📍 East Exhibit Hall A-C, #1500
🗓️ Wed 11 Dec, 16:30 – 19:30 PT

Latent Intrinsics: Intrinsic image representation as latent variables, resulting in emergent albedo-like maps for free.

Code is now available: github.com/xiao7199/Lat...
arXiv: arxiv.org/abs/2405.21074
December 11, 2024 at 5:55 AM
Our latest work is on arXiv now and the thread is updated with the movies :)

arXiv: arxiv.org/abs/2412.00177
project: luminet-relight.github.io
December 5, 2024 at 4:01 PM
For more results and details check out our paper!

📝: Paper link: arxiv.org/abs/2412.00177
🔗 :Project page: luminet-relight.github.io
🐦: Twitter thread (longer version):https://x.com/anand_bhattad/status/1864479658353242485

Work led by Xiaoyan Xing (PhD student at the University of Amsterdam)!
December 5, 2024 at 3:58 PM
Here are a few more random relighting! ✨

How accurate are these results? That's very hard to tell at the moment 🤔

But our tests on the MIT data, our user study, plus our qualitative results all point to us being on the right track. Gen models seem to know about how light interacts with our world.
December 5, 2024 at 3:58 PM
Quantitative results 📊

Evaluation is done on challenging MIT multi-illum dataset (only real data we have with GT and similar lighting conditions across scenes). By combining latent intrinsic with generative models, we cut down the error (RMSE) by 35%! See our paper for qualitative results.
December 5, 2024 at 3:58 PM
Where do we get data for training relighting models? 🤔

We used our good old StyLitGAN from #CVPR2024 to generate diverse training data and filter it to get the top 1000 using CLIP. We combine this with the existing MIT Multi-Illum & Big Time datasets. About ~2500 unique images make our training set
December 5, 2024 at 3:58 PM
LumiNet's architecture is quite simple! 🏗️

Given two images (source image to be relit and target lighting condition), we first extract latent intrinsic image representations using our NeurIPS2024 work for both these images and then train a simple latent ControlNet.
December 5, 2024 at 3:58 PM