Lightnews — Scholar-powered news

Reposted by Winston Smith

Thomas Wolf

@thomwolf.bsky.social

challenge!

December 21, 2024 at 4:44 PM

Reposted by Winston Smith

Stone Tao

@stonet2000.bsky.social

Yesterday the hyped Genesis simulator released. But it's up to 10x slower than existing GPU sims, not 10-80x faster or 430,000x faster than realtime since they benchmark mostly static environments

blog post with corrected open source benchmarks & details: stoneztao.substack.com/p/the-new-hy...

December 20, 2024 at 11:49 PM

Reposted by Winston Smith

Melanie Mitchell

@melaniemitchell.bsky.social

Excellent post about the recent OpenAI o3 results on ARC (& other benchmarks). I don't know how @natolambert.bsky.social manages to write these so quickly! I highly recommend his newsletter.

www.interconnects.ai/p/openais-o3...

I am (more slowly) writing my own take on all this, coming soon.

o3: The grand finale of AI in 2024

A step change as influential as the release of GPT-4. Reasoning language models are the current big thing.

www.interconnects.ai

December 21, 2024 at 7:52 PM

Reposted by Winston Smith

Eugene Vinitsky 🍒

@eugenevinitsky.bsky.social

Waymo's "superhuman" crash rate is an indicator that the frequent argument that we need human-level intelligence to solve hard robotics tasks is seemingly wrong, we just need time and elbow grease

December 20, 2024 at 1:44 AM

Reposted by Winston Smith

syhw.bsky.social

@syhw.bsky.social

Just gave a talk on "Grounding LLMs in Code Execution" at the NeurIPS Hacker-Cup AI Competition, here are the slides docs.google.com/presentation...

[NeurIPS HackerCup 2024] Grounding LLMs in Code Execution

Grounding LLMs in Code Execution Gabriel Synnaeve, Meta, FAIR

docs.google.com

December 14, 2024 at 7:11 PM

Reposted by Winston Smith

TMLR Published Papers

@tmlr-pub.bsky.social

Interpreting CLIP: Insights on the Robustness to ImageNet Distribution Shifts

Jonathan Crabbé, Pau Rodriguez, Vaishaal Shankar, Luca Zappella, Arno Blaas

Action editor: Pavel Izmailov

https://openreview.net/forum?id=1SCptTFtmV

#imagenet #robust #robustness

December 15, 2024 at 4:07 AM

Reposted by Winston Smith

Dmytro Mishkin

@ducha-aiki.bsky.social

Align3R: Aligned Monocular Depth Estimation for Dynamic Videos

Jiahao Lu et 10 al.

tl;dr: DepthPro for all frames -> inject depth ControlNet-style into Dust3r decoder, finetune on dynamic scenes. Long videos process in coarse-to-fine

arxiv.org/abs/2412.03079

December 13, 2024 at 12:42 PM

Reposted by Winston Smith

Artidoro Pagnoni

@artidoro.bsky.social

🚀 Introducing the Byte Latent Transformer (BLT) – A LLM architecture that scales better than Llama 3 using patches instead of tokens 🤯
Paper 📄 dl.fbaipublicfiles.com/blt/BLT__Pat...
Code 🛠️ github.com/facebookrese...

December 13, 2024 at 4:53 PM

Reposted by Winston Smith

Lucas Beyer (bl16)

@giffmana.ai

One of the physics of llm papers studied that and found you need a certain amour of repetitions of a factoid before it’s memorized. Repetition can be either multi epochs or just the same fact in another document. Number of needed repeats is also related to model size.

December 13, 2024 at 4:27 PM

Reposted by Winston Smith

Adina Williams

@adinawilliams.bsky.social

Our paper PRISM alignment won a best paper award at #neurips2024!

All credits to @hannahrosekirk.bsky.social A.Whitefield, P.Röttger, A.M.Bean, K.Margatina, R.Mosquera-Gomez, J.Ciro, @maxbartolo.bsky.social H.He, B.Vidgen, S.Hale

Catch Hannah tomorrow at neurips.cc/virtual/2024/poster/97804

blog.neurips

December 11, 2024 at 4:20 PM

Reposted by Winston Smith

🇺🇦 Alex Polozov

@alexpolozov.com

Welcome to Gemini 2.0 era!

I am thrilled about ✨ Gemini 2.0 Flash as it allowed us to build the next generation of Code Agents experience: developers.googleblog.com/en/the-next-...

The next chapter of the Gemini era for developers

Explore the latest with the release of Gemini 2.0 Flash and new coding agents, now available for testing in Google AI Studio.

developers.googleblog.com

December 11, 2024 at 4:16 PM

Reposted by Winston Smith

Nicolas Dufour

@nicolasdufour.bsky.social

🌍 Guessing where an image was taken is a hard, and often ambiguous problem. Introducing diffusion-based geolocation—we predict global locations by refining random guesses into trajectories across the Earth's surface!

🗺️ Paper, code, and demo: nicolas-dufour.github.io/plonk

December 10, 2024 at 3:56 PM

Reposted by Winston Smith

Simon Willison

@simonwillison.net

Gemini 2.0 is out, and there's a ton of interesting stuff about it. From my testing it looks like Gemini 2.0 Flash may be the best currently available multi-modal model - I upgraded my LLM plugin to support that here: github.com/simonw/llm-g...

Gemini 2.0 announcement: blog.google/technology/g...

Release 0.7 · simonw/llm-gemini

New Gemini 2.0 Flash model: llm -m gemini-2.0-flash-exp 'prompt goes here'. #28

github.com

December 11, 2024 at 5:55 PM

Reposted by Winston Smith

Luca Eyring

@lucaeyring.bsky.social

Can we enhance the performance of T2I models without any fine-tuning?

We show that with our ReNO, Reward-based Noise Optimization, one-step models consistently surpass the performance of all current open-source Text-to-Image models within the computational budget of 20-50 sec!
#NeurIPS2024

December 11, 2024 at 11:05 PM

Reposted by Winston Smith

Sourav

@souravmishra.bsky.social

The best paper awardee from NeuRIPS 2024 has been apparently accused of misconduct by his ByteDance peers. This raises many questions certainly:

var-integrity-report.github.io

Ethical Challenges Related to the NeurIPS 2024 Best Paper Award

var-integrity-report.github.io

December 12, 2024 at 1:35 AM

Reposted by Winston Smith

Nikos Efthymiadis

@nikos-efth.bsky.social

1/ 🎉 Excited to share our work, "Composed Image Retrieval for Training-Free Domain Conversion", accepted at WACV 2025! 🚀

December 5, 2024 at 12:59 PM

Reposted by Winston Smith

Dima Damen

@dimadamen.bsky.social

Now on ArXiv
ShowHowTo: Generating Scene-Conditioned Step-by-Step Visual Instructions
arxiv.org/abs/2412.01987
soczech.github.io/showhowto/
Given one real image &variable sequence of text instructions, ShowHowTo generates a multi-step sequence of images *conditioned on the scene in the REAL image*
🧵

December 5, 2024 at 3:01 PM

Reposted by Winston Smith

Lucas Beyer (bl16)

@giffmana.ai

So, now that our move to OpenAI became public, @kolesnikov.ch @xzhai.bsky.social and I are drowning in notifications. I read everything, but may not reply.

Excited about this new journey! 🚀

Quick FAQ thread...

Alexander Kolesnikov @kolesnikov.ch · Dec 4

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

December 4, 2024 at 9:23 PM

Reposted by Winston Smith

Alexander Kolesnikov

@kolesnikov.ch

Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.

December 4, 2024 at 9:14 AM

Reposted by Winston Smith

Gabriel Peyré

@gabrielpeyre.bsky.social

Optimal transport, convolution, and averaging define interpolations between probability distributions. One can find vector fields advecting particles that match these interpolations. They are the Benamou-Brenier, flow-matching, and Dacorogna-Moser fields.

December 4, 2024 at 1:55 PM

Reposted by Winston Smith

Nick Stracke

@rmsnorm.bsky.social

🤔 Why do we extract diffusion features from noisy images? Isn’t that destroying information?

Yes, it is - but we found a way to do better. 🚀

Here’s how we unlock better features, no noise, no hassle.

📝 Project Page: compvis.github.io/cleandift
💻 Code: github.com/CompVis/clea...

🧵👇

December 4, 2024 at 11:31 PM

Reposted by Winston Smith

Sander Dieleman

@sedielem.bsky.social

In arxiv.org/abs/2303.00848, @dpkingma.bsky.social and @ruiqigao.bsky.social had suggested that noise augmentation could be used to make other likelihood-based models optimise perceptually weighted losses, like diffusion models do. So cool to see this working well in practice!

December 2, 2024 at 6:36 PM

Reposted by Winston Smith

ruiqigao.bsky.social

@ruiqigao.bsky.social

A common question nowadays: Which is better, diffusion or flow matching? 🤔

Our answer: They’re two sides of the same coin. We wrote a blog post to show how diffusion models and Gaussian flow matching are equivalent. That’s great: It means you can use them interchangeably.

December 2, 2024 at 6:45 PM

Reposted by Winston Smith

Michael Tschannen

@mtschannen.bsky.social

Have you ever wondered how to train an autoregressive generative transformer on text and raw pixels, without a pretrained visual tokenizer (e.g. VQ-VAE)?

We have been pondering this during summer and developed a new model: JetFormer 🌊🤖

arxiv.org/abs/2411.19722

A thread 👇

1/

December 2, 2024 at 4:41 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news