Winston Smith
@smithwinst0n.bsky.social
Graduate Student, ML, CV, Robotics
Reposted by Winston Smith
Reposted by Winston Smith
Yesterday the hyped Genesis simulator released. But it's up to 10x slower than existing GPU sims, not 10-80x faster or 430,000x faster than realtime since they benchmark mostly static environments
blog post with corrected open source benchmarks & details: stoneztao.substack.com/p/the-new-hy...
blog post with corrected open source benchmarks & details: stoneztao.substack.com/p/the-new-hy...
December 20, 2024 at 11:49 PM
Yesterday the hyped Genesis simulator released. But it's up to 10x slower than existing GPU sims, not 10-80x faster or 430,000x faster than realtime since they benchmark mostly static environments
blog post with corrected open source benchmarks & details: stoneztao.substack.com/p/the-new-hy...
blog post with corrected open source benchmarks & details: stoneztao.substack.com/p/the-new-hy...
Reposted by Winston Smith
Excellent post about the recent OpenAI o3 results on ARC (& other benchmarks). I don't know how @natolambert.bsky.social manages to write these so quickly! I highly recommend his newsletter.
www.interconnects.ai/p/openais-o3...
I am (more slowly) writing my own take on all this, coming soon.
www.interconnects.ai/p/openais-o3...
I am (more slowly) writing my own take on all this, coming soon.
o3: The grand finale of AI in 2024
A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.
www.interconnects.ai
December 21, 2024 at 7:52 PM
Excellent post about the recent OpenAI o3 results on ARC (& other benchmarks). I don't know how @natolambert.bsky.social manages to write these so quickly! I highly recommend his newsletter.
www.interconnects.ai/p/openais-o3...
I am (more slowly) writing my own take on all this, coming soon.
www.interconnects.ai/p/openais-o3...
I am (more slowly) writing my own take on all this, coming soon.
Reposted by Winston Smith
Waymo's "superhuman" crash rate is an indicator that the frequent argument that we need human-level intelligence to solve hard robotics tasks is seemingly wrong, we just need time and elbow grease
December 20, 2024 at 1:44 AM
Waymo's "superhuman" crash rate is an indicator that the frequent argument that we need human-level intelligence to solve hard robotics tasks is seemingly wrong, we just need time and elbow grease
Reposted by Winston Smith
Just gave a talk on "Grounding LLMs in Code Execution" at the NeurIPS Hacker-Cup AI Competition, here are the slides docs.google.com/presentation...
[NeurIPS HackerCup 2024] Grounding LLMs in Code Execution
Grounding LLMs in Code Execution Gabriel Synnaeve, Meta, FAIR
docs.google.com
December 14, 2024 at 7:11 PM
Just gave a talk on "Grounding LLMs in Code Execution" at the NeurIPS Hacker-Cup AI Competition, here are the slides docs.google.com/presentation...
Reposted by Winston Smith
Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts
Jonathan Crabbé, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas
Action editor: Pavel Izmailov
https://openreview.net/forum?id=1SCptTFtmV
#imagenet #robust #robustness
Jonathan Crabbé, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas
Action editor: Pavel Izmailov
https://openreview.net/forum?id=1SCptTFtmV
#imagenet #robust #robustness
December 15, 2024 at 4:07 AM
Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts
Jonathan Crabbé, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas
Action editor: Pavel Izmailov
https://openreview.net/forum?id=1SCptTFtmV
#imagenet #robust #robustness
Jonathan Crabbé, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas
Action editor: Pavel Izmailov
https://openreview.net/forum?id=1SCptTFtmV
#imagenet #robust #robustness
Reposted by Winston Smith
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Jiahao Lu et 10 al.
tl;dr: DepthPro for all frames -> inject depth ControlNet-style into Dust3r decoder, finetune on dynamic scenes. Long videos process in coarse-to-fine
arxiv.org/abs/2412.03079
Jiahao Lu et 10 al.
tl;dr: DepthPro for all frames -> inject depth ControlNet-style into Dust3r decoder, finetune on dynamic scenes. Long videos process in coarse-to-fine
arxiv.org/abs/2412.03079
December 13, 2024 at 12:42 PM
Align3R: Aligned Monocular Depth Estimation for Dynamic Videos
Jiahao Lu et 10 al.
tl;dr: DepthPro for all frames -> inject depth ControlNet-style into Dust3r decoder, finetune on dynamic scenes. Long videos process in coarse-to-fine
arxiv.org/abs/2412.03079
Jiahao Lu et 10 al.
tl;dr: DepthPro for all frames -> inject depth ControlNet-style into Dust3r decoder, finetune on dynamic scenes. Long videos process in coarse-to-fine
arxiv.org/abs/2412.03079
Reposted by Winston Smith
🚀 Introducing the Byte Latent Transformer (BLT) – A LLM architecture that scales better than Llama 3 using patches instead of tokens 🤯
Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat...
Code 🛠️ github.com/facebookrese...
Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat...
Code 🛠️ github.com/facebookrese...
December 13, 2024 at 4:53 PM
🚀 Introducing the Byte Latent Transformer (BLT) – A LLM architecture that scales better than Llama 3 using patches instead of tokens 🤯
Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat...
Code 🛠️ github.com/facebookrese...
Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat...
Code 🛠️ github.com/facebookrese...
Reposted by Winston Smith
One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.
December 13, 2024 at 4:27 PM
One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.
Reposted by Winston Smith
Our paper PRISM alignment won a best paper award at #neurips2024!
All credits to @hannahrosekirk.bsky.social A.Whitefield, P.Röttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale
Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804
All credits to @hannahrosekirk.bsky.social A.Whitefield, P.Röttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale
Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804
blog.neurips
December 11, 2024 at 4:20 PM
Our paper PRISM alignment won a best paper award at #neurips2024!
All credits to @hannahrosekirk.bsky.social A.Whitefield, P.Röttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale
Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804
All credits to @hannahrosekirk.bsky.social A.Whitefield, P.Röttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale
Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804
Reposted by Winston Smith
Welcome to Gemini 2.0 era!
I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
The next chapter of the Gemini era for developers
Explore the latest with the release of Gemini 2.0 Flash and new coding agents, now available for testing in Google AI Studio.
developers.googleblog.com
December 11, 2024 at 4:16 PM
Welcome to Gemini 2.0 era!
I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...
Reposted by Winston Smith
🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocation—we predict global locations by refining random guesses into trajectories across the Earth's surface!
🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk
🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk
December 10, 2024 at 3:56 PM
🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocation—we predict global locations by refining random guesses into trajectories across the Earth's surface!
🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk
🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk
Reposted by Winston Smith
Gemini 2.0 is out, and there's a ton of interesting stuff about it. From my testing it looks like Gemini 2.0 Flash may be the best currently available multi-modal model - I upgraded my LLM plugin to support that here: github.com/simonw/llm-g...
Gemini 2.0 announcement: blog.google/technology/g...
Gemini 2.0 announcement: blog.google/technology/g...
Release 0.7 · simonw/llm-gemini
New Gemini 2.0 Flash model: llm -m gemini-2.0-flash-exp 'prompt goes here'. #28
github.com
December 11, 2024 at 5:55 PM
Gemini 2.0 is out, and there's a ton of interesting stuff about it. From my testing it looks like Gemini 2.0 Flash may be the best currently available multi-modal model - I upgraded my LLM plugin to support that here: github.com/simonw/llm-g...
Gemini 2.0 announcement: blog.google/technology/g...
Gemini 2.0 announcement: blog.google/technology/g...
Reposted by Winston Smith
Can we enhance the performance of T2I models without any fine-tuning?
We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024
We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024
December 11, 2024 at 11:05 PM
Can we enhance the performance of T2I models without any fine-tuning?
We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024
We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024
Reposted by Winston Smith
The best paper awardee from NeuRIPS 2024 has been apparently accused of misconduct by his ByteDance peers. This raises many questions certainly:
var-integrity-report.github.io
var-integrity-report.github.io
Ethical Challenges Related to the NeurIPS 2024 Best Paper Award
var-integrity-report.github.io
December 12, 2024 at 1:35 AM
The best paper awardee from NeuRIPS 2024 has been apparently accused of misconduct by his ByteDance peers. This raises many questions certainly:
var-integrity-report.github.io
var-integrity-report.github.io
Reposted by Winston Smith
1/ 🎉 Excited to share our work, "Composed Image Retrieval for Training-Free Domain Conversion", accepted at WACV 2025! 🚀
December 5, 2024 at 12:59 PM
1/ 🎉 Excited to share our work, "Composed Image Retrieval for Training-Free Domain Conversion", accepted at WACV 2025! 🚀
Reposted by Winston Smith
Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵
December 5, 2024 at 3:01 PM
Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵
Reposted by Winston Smith
So, now that our move to OpenAI became public, @kolesnikov.ch @xzhai.bsky.social and I are drowning in notifications. I read everything, but may not reply.
Excited about this new journey! 🚀
Quick FAQ thread...
Excited about this new journey! 🚀
Quick FAQ thread...
Ok, it is yesterdays news already, but good night sleep is important.
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
December 4, 2024 at 9:23 PM
So, now that our move to OpenAI became public, @kolesnikov.ch @xzhai.bsky.social and I are drowning in notifications. I read everything, but may not reply.
Excited about this new journey! 🚀
Quick FAQ thread...
Excited about this new journey! 🚀
Quick FAQ thread...
Reposted by Winston Smith
Ok, it is yesterdays news already, but good night sleep is important.
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
December 4, 2024 at 9:14 AM
Ok, it is yesterdays news already, but good night sleep is important.
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
Reposted by Winston Smith
Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.
December 4, 2024 at 1:55 PM
Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.
Reposted by Winston Smith
🤔 Why do we extract diffusion features from noisy images? Isn’t that destroying information?
Yes, it is - but we found a way to do better. 🚀
Here’s how we unlock better features, no noise, no hassle.
📝 Project Page: compvis.github.io/cleandift
💻 Code: github.com/CompVis/clea...
🧵👇
Yes, it is - but we found a way to do better. 🚀
Here’s how we unlock better features, no noise, no hassle.
📝 Project Page: compvis.github.io/cleandift
💻 Code: github.com/CompVis/clea...
🧵👇
December 4, 2024 at 11:31 PM
🤔 Why do we extract diffusion features from noisy images? Isn’t that destroying information?
Yes, it is - but we found a way to do better. 🚀
Here’s how we unlock better features, no noise, no hassle.
📝 Project Page: compvis.github.io/cleandift
💻 Code: github.com/CompVis/clea...
🧵👇
Yes, it is - but we found a way to do better. 🚀
Here’s how we unlock better features, no noise, no hassle.
📝 Project Page: compvis.github.io/cleandift
💻 Code: github.com/CompVis/clea...
🧵👇
Reposted by Winston Smith
In arxiv.org/abs/2303.00848, @dpkingma.bsky.social and @ruiqigao.bsky.social had suggested that noise augmentation could be used to make other likelihood-based models optimise perceptually weighted losses, like diffusion models do. So cool to see this working well in practice!
December 2, 2024 at 6:36 PM
In arxiv.org/abs/2303.00848, @dpkingma.bsky.social and @ruiqigao.bsky.social had suggested that noise augmentation could be used to make other likelihood-based models optimise perceptually weighted losses, like diffusion models do. So cool to see this working well in practice!
Reposted by Winston Smith
A common question nowadays: Which is better, diffusion or flow matching? 🤔
Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
December 2, 2024 at 6:45 PM
A common question nowadays: Which is better, diffusion or flow matching? 🤔
Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.
Reposted by Winston Smith
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?
We have been pondering this during summer and developed a new model: JetFormer 🌊🤖
arxiv.org/abs/2411.19722
A thread 👇
1/
We have been pondering this during summer and developed a new model: JetFormer 🌊🤖
arxiv.org/abs/2411.19722
A thread 👇
1/
December 2, 2024 at 4:41 PM
Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?
We have been pondering this during summer and developed a new model: JetFormer 🌊🤖
arxiv.org/abs/2411.19722
A thread 👇
1/
We have been pondering this during summer and developed a new model: JetFormer 🌊🤖
arxiv.org/abs/2411.19722
A thread 👇
1/