Lightnews — Scholar-powered news

Phillip Isola

@phillipisola.bsky.social

This reminds me of my favorite talk giving advice, which is from Matt Stone and Trey Parker: www.youtube.com/watch?v=vGUN...

Writing Advice from Matt Stone & Trey Parker @ NYU | MTVU's "Stand In"

YouTube video by Fabian Valdez

www.youtube.com

October 29, 2025 at 9:10 PM

Phillip Isola

@phillipisola.bsky.social

I agree, I just want to push back on this being pseudoscience, I feel like that's too strong a critique.

But just for the chance of a meal in Paris, happy to take that bet and probably end up wrong :)

October 17, 2025 at 7:36 PM

Phillip Isola

@phillipisola.bsky.social

I agree that just knowing a lot of facts is not everything. But it seems like their benchmark includes lots more than that: working memory, reasoning, perception, etc?

October 17, 2025 at 6:40 PM

Phillip Isola

@phillipisola.bsky.social

I get that people might disagree with the framing / marketing. But what makes you feel it is pseudoscience? I only skimmed it.

October 17, 2025 at 5:50 PM

Phillip Isola

@phillipisola.bsky.social

I agree, I think at certain scale modality alignment happens without additional explicit incentives. At smaller scale, explicit alignment can be necessary.

This paper shows some effect of alignment increasing with scale, for a domain closer to remote sensing: www.arxiv.org/abs/2509.19453

The Platonic Universe: Do Foundation Models See the Same Sky?

We test the Platonic Representation Hypothesis (PRH) in astronomy by measuring representational convergence across a range of foundation models trained on different data types. Using spectroscopic and...

www.arxiv.org

October 13, 2025 at 4:59 PM

Phillip Isola

@phillipisola.bsky.social

Right! It's a text only LLM.

October 13, 2025 at 4:02 PM

Phillip Isola

@phillipisola.bsky.social

This work is with an amazing team including @sophielwang.bsky.social, @thisismyhat.bsky.social, Sharut Gupta, @shobsund.bsky.social, Chenyu Wang, and Stefanie Jegelka.

9/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

More broadly, I think confusion has been created by forming hard distinctions between different modalities, especially between text and sensory data. These distinctions can obscure commonalities. We take the rhetorical stance of erasing the distinctions, and seeing where this leads.

8/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

This work was partially inspired by Ilya Sutskever's talk here: www.youtube.com/watch?v=AKMu...

If you concatenate datasets, the model “should” figure out all the synergies and cross-modal relationships, then exploit them to make better inferences. We now have some evidence this can happen.

7/9

An Observation on Generalization

YouTube video by Simons Institute for the Theory of Computing

www.youtube.com

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

Suppose you have separate datasets X, Y, Z, without known correspondences.

We do the simplest thing: just train a model (e.g., a next-token predictor) on all elements of the concatenated dataset [X,Y,Z].

You end up with a better model of dataset X than if you had trained on X alone!

6/9

Architecture for Unpaired Multimodal Learner.

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

In “Better Together: Leveraging Unpaired Multimodal Data for Stronger Unimodal Models,” we study a question I’ve wanted to make progress on for years: can you learn useful multimodal representations from *unpaired* data?

5/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

In short: you can “just ask” an LLM to act (a bit) like an image model or an audio model.

This tells us that LLMs know more about the sensory world than we might suspect; you just have to find ways to elicit the knowledge.

4/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

In “Words That Make Language Models Perceive,” we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.

3/9

Diagram showing how prompts can steer an LLM toward kernel structure that better matches that of sensory encoders.

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

For context, this work stems from the idea that all data modalities (images, sounds, text, etc) are views of the same underlying world, and that treating them as such is useful.

We are interested in identifying commonalities between different models and modalities, and providing unifications.

2/9

October 10, 2025 at 10:13 PM

Phillip Isola

@phillipisola.bsky.social

Oh I think you are right about the review process at least. Sometimes it rewards the inverse of my metric: a fancy new technique that doesn't actually achieve any new result / understanding :)

October 9, 2025 at 8:57 PM

Phillip Isola

@phillipisola.bsky.social

I think papers like that are great! One of my personal metrics for paper quality is: delta in capability / delta in technique. A paper that only changes one parameter and achieves much better results should get a best paper award by this metric :)

October 9, 2025 at 2:32 PM

Phillip Isola

@phillipisola.bsky.social

Unless it turns out it that capable intelligence is actually not so simple!

July 31, 2025 at 9:22 PM

Phillip Isola

@phillipisola.bsky.social

Yeah, it helps me to consider that much of the history of science has been about finding a simpler-than-expected explanation of something that previously seemed magical: life (evolution), motion of the planets (law of gravitation), etc. Now those are among our most celebrated discoveries.

July 31, 2025 at 9:10 PM

Phillip Isola

@phillipisola.bsky.social

Of course, personally, I think we need not shy away from this possibility. Maybe intelligence is simpler than we thought, and there's a beauty in that too.

July 31, 2025 at 12:54 AM

Phillip Isola

@phillipisola.bsky.social

I think part of it is that people might be overestimating the complexity of intelligence, and it's hard not to.

How weird it would be if an LLM (a Markov chain!) could explain "thinking".

It feels like it makes us less special, like Copernicus placing the sun at the center, rather than the Earth.

July 31, 2025 at 12:54 AM

Phillip Isola

@phillipisola.bsky.social

I enjoy your posts! I hope you keep at it.

July 27, 2025 at 4:34 AM

Phillip Isola

@phillipisola.bsky.social

Finite: right, you would need to train the student on inputs beyond the GT x's.

Wrong: the teacher could underfit and be more correct than the "GT" y's. This paper is about one version of this: arxiv.org/abs/2206.15477

Denoised MDPs: Learning World Models Better Than the World Itself

The ability to separate signal from noise, and reason with clean abstractions, is critical to intelligence. With this ability, humans can efficiently perform real world tasks without considering all p...

arxiv.org

July 16, 2025 at 4:21 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news