Gowthami Somepalli
banner
gowthami.bsky.social
Gowthami Somepalli
@gowthami.bsky.social
PhD-ing at UMD. Knows a little about multimodal generative models. Check out my website to know more - https://somepago.github.io/
Pinned
Started a list of some researchers working on image/video generation. (Not comprehensive at all)

Reply with a paper link and TLDR to get added to the list! I request all grad students to not feel imposter-y and just reply if you work in this field!

#computervision #diffusion

go.bsky.app/SP1uWoE
Reposted by Gowthami Somepalli
What’s the right resolution for such ontologies? 1,000-10,000 seems like the sweet spot.

H/t @aneeshsathe.com
aneeshsathe.com/2025/01/15/d...
Domain Ontologies: Indispensable for Knowledge Graph Construction
AI slop is all around and increasingly extraction of useful information will face difficulties as we start to feed more noise into the already noisy world of knowledge. We are in an era of unpreced…
aneeshsathe.com
January 21, 2025 at 5:47 PM
Reposted by Gowthami Somepalli
About to send my last DLCT email of the year today (in 2 hours).

Join the 7-year-old mailing list if you haven't heard of it. (And if you have heard of it but haven't joined, I trust that it's a well thought decision that suits you the best.)

groups.google.com/g/deep-learn...
Deep Learning Classics and Trends - Google Groups
groups.google.com
December 19, 2024 at 4:12 PM
Reposted by Gowthami Somepalli
The recording of my #NeurIPS2024 workshop talk on multimodal iterative refinement is now available to everyone who registered: neurips.cc/virtual/2024...

My talk starts at 1:10:45 into the recording.

I believe this will be made publicly available eventually, but I'm not sure when exactly!
December 18, 2024 at 4:38 AM
Reposted by Gowthami Somepalli
One of the best tutorials for understanding Transformers!

📽️ Watch here: www.youtube.com/watch?v=bMXq...

Big thanks to @giffmana.ai for this excellent content! 🙌
[M2L 2024] Transformers - Lucas Beyer
YouTube video by Mediterranean Machine Learning (M2L) summer school
www.youtube.com
December 8, 2024 at 9:58 AM
Reposted by Gowthami Somepalli
Anne Gagneux, Ségolène Martin, @quentinbertrand.bsky.social Remi Emonet and I wrote a tutorial blog post on flow matching: dl.heeere.com/conditional-... with lots of illustrations and intuition!

We got this idea after their cool work on improving Plug and Play with FM: arxiv.org/abs/2410.02423
November 27, 2024 at 9:00 AM
Reposted by Gowthami Somepalli
congratulations, @ian-goodfellow.bsky.social, for the test-of-time award at @neuripsconf.bsky.social!

this award reminds me of how GAN started with this one email ian sent to the Mila (then Lisa) lab mailing list in May 2014. super insightful and amazing execution!
November 27, 2024 at 6:31 PM
Reposted by Gowthami Somepalli
Trying to build a "books you must read" list for my lab that everyone gets when they enter. Right now its:

- Sutton and Barto
- The Structure of Scientific Revolutions
- Strunk and White
- Maybe "Prediction, Learning, and Games", TBD

Kinda curious what's missing in an RL / science curriculum
November 25, 2024 at 5:43 PM
Reposted by Gowthami Somepalli
This is a simple and good paper, which somehow nobody working on these things cites, or even seems to be aware of arxiv.org/abs/2406.05213 It is simple idea that seems useful; it formulates the subjective uncertainty for natural language generation in a decision-theoretic setup.
On Subjective Uncertainty Quantification and Calibration in Natural Language Generation
Applications of large language models often involve the generation of free-form responses, in which case uncertainty quantification becomes challenging. This is due to the need to identify task-specif...
arxiv.org
November 25, 2024 at 2:16 AM
Reposted by Gowthami Somepalli
A real-time (or very fast) open-source txt2video model dropped: LTXV.

HF: huggingface.co/Lightricks/L...
Gradio: huggingface.co/spaces/Light...
Github: github.com/Lightricks/L...

Look at that prompt example though. Need to be a proper writer to get that quality.
November 23, 2024 at 8:03 PM
Reposted by Gowthami Somepalli
Perhaps an unpopular opinion, but I don't think the problem with Large Language Model evaluations is the lack of error bars.
November 22, 2024 at 2:25 PM
Reposted by Gowthami Somepalli
let me say it once more: "the gap between OAI/Anthropic/Meta/etc. and a large group of companies all over the world you've never cared to know of, in terms of LM pre-training? tiny"
November 22, 2024 at 3:29 PM
Reposted by Gowthami Somepalli
The return of the Autoregressive Image Model: AIMv2 now going multimodal.
Excellent work by @alaaelnouby.bsky.social & team with code and checkpoints already up:

arxiv.org/abs/2411.14402
November 22, 2024 at 9:44 AM
Reposted by Gowthami Somepalli
Interesting paper on arxiv this morning: arxiv.org/abs/2411.13683
It's a video masked autoencoder in which you learn which tokens to mask to process fewer of them and scale to longer videos. It's a #NeurIPS2024 apparently.
I wonder if there could be such strategy in the pure generative setup.
Extending Video Masked Autoencoders to 128 frames
Video understanding has witnessed significant progress with recent video foundation models demonstrating strong performance owing to self-supervised pre-training objectives; Masked Autoencoders (MAE) ...
arxiv.org
November 22, 2024 at 7:57 AM
I’m not getting notifications for comments here, anyone facing the same issue?
November 22, 2024 at 3:19 AM
Reposted by Gowthami Somepalli
Discrete diffusion has become a very hot topic again this year. Dozens of interesting ICLR submissions and some exciting attempts at scaling. Here's a bibliography on the topic from the Kuleshov group (my open office neighbors).

github.com/kuleshov-gro...
GitHub - kuleshov-group/awesome-discrete-diffusion-models: A curated list for awesome discrete diffusion models resources.
A curated list for awesome discrete diffusion models resources. - kuleshov-group/awesome-discrete-diffusion-models
github.com
November 21, 2024 at 6:39 PM
I only got to know today this awesome diffusion starter pack exists! I’ll try to fill up my generative models pack with some complementary folks. :)
In a gratuitous attempt to acquire more followers myself 😁, I've made a start on a "starter pack". Hopefully as more people from 🐦 make it over to 🦋, we can extend this a bit. Suggestions welcome!

I've noticed not all accounts seem to be eligible to be added, anyone know what's up with that? 🤔
November 21, 2024 at 6:10 PM
Can people create accounts here without invite now? 🤔
November 21, 2024 at 7:56 AM
I would miss not having a character limit since my rants grew larger, longer I’m in grad school! 😅
November 21, 2024 at 7:07 AM
Started a list of some researchers working on image/video generation. (Not comprehensive at all)

Reply with a paper link and TLDR to get added to the list! I request all grad students to not feel imposter-y and just reply if you work in this field!

#computervision #diffusion

go.bsky.app/SP1uWoE
November 21, 2024 at 6:18 AM
Reposted by Gowthami Somepalli
November 20, 2024 at 7:26 PM
Reposted by Gowthami Somepalli
My growing list of #computervision researchers on Bsky.

Missed you? Let me know.

go.bsky.app/M7HGC3Y
November 19, 2024 at 11:00 PM
I think we broke the app! I’m trying to retweet something and it’s not working! 😅
November 20, 2024 at 5:37 PM
Reposted by Gowthami Somepalli
Importantly, starter packs are intended as a way for newcomers to the platform to conveniently get started with finding people from their community. They cannot and should not be considered authoritative “VIP lists.”
November 20, 2024 at 10:58 AM
Need bookmarks features asap! 🥺
@bsky.app
November 20, 2024 at 6:54 AM