Lightnews — Scholar-powered news

Sam Barrett, PhD @ai4geo.bsky.social · 1d

Oh that's fascinating! Thanks for the paper!

Sam Barrett, PhD @ai4geo.bsky.social · 1d

I just gave GPT-5 pro a manuscript with around 50 references lazily pasted in as urls in the text and asked it to generate the .bib file I'll need when I convert to latex. I'll check it in depth. How many mistakes do we expect?

1 2

Sam Barrett, PhD @ai4geo.bsky.social · 1d

My intuition is that this will start to emerge *without* explicit effort and architecture design but only with sufficient scale.

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 1d

This is fascinating stuff. I assume this is something we expect to emerge with scale? In remote sensing there's a lot of effort made to align modalities but we've not trained anything on the scale of even modestly sized language models.

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 1d

Not the first or even that word, but similar concept:
bsky.app/profile/ai4g...

Sam Barrett, PhD @ai4geo.bsky.social · Aug 21

LLMs: an alien cognitive substrate raised entirely inside the greenhouse of human language, but at scales and in ways utterly unlike humans resulting in an intelligence more alien than we can imagine yet disturbingly fluent in human language.

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 1d

Of course!

1

Sam Barrett, PhD @ai4geo.bsky.social · 1d

Haha, what are you meant to be reading?

1

Sam Barrett, PhD @ai4geo.bsky.social · 1d

Hold on. Am I missing something? The first papers model *isn't* trained on multimodal data. It's only trained in text, but you can elicit representations which are surprisingly aligned with other modalities even though the model hasn't directly seen them in training.

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 1d

Hold up, the first of those two papers IS a text only LLM.

2 2

Sam Barrett, PhD @ai4geo.bsky.social · 1d

This statement is also pretty relevant in the world of Earth Observation re different sensors and other modalities. And that last sentence very well sums up my own explorations in EO modelling recently, though on a much smaller scale.

Phillip Isola @phillipisola.bsky.social · 3d

More broadly, I think confusion has been created by forming hard distinctions between different modalities, especially between text and sensory data. These distinctions can obscure commonalities. We take the rhetorical stance of erasing the distinctions, and seeing where this leads.

8/9

Sam Barrett, PhD @ai4geo.bsky.social · 1d

To all the "we know how LLMs work and therefore X" folks - understanding attention and gradient descent doesn't tell you stuff like this. On at least some level we *don't* know how LLMs actually do their thing and are slowly figuring out even just extremely simple things like addition.

Grace @gracekind.net · 2d

We did not tell LLMs, “implement addition using this algorithm.” It learned the algorithm upstream of next-token prediction

2 2 23

Sam Barrett, PhD @ai4geo.bsky.social · 1d

My impression is that most of the exports are low labour commodities like soy and that high labour crops like vegetables are mostly for the internal market, so not sure about international food shortages.

4

Sam Barrett, PhD @ai4geo.bsky.social · 4d

I sort of want to push back on the interpretation that the "bigotry neuron" post was "framed badly". In that it was widely misunderstood, I guess, but from my perspective it powerfully and concisely explained something which I hadn't managed to articulate yet...

4

Sam Barrett, PhD @ai4geo.bsky.social · 5d

I've seen that occasionally but largely on pretty complex topics and on sources like research papers. You're right, still not perfect, but depends on the situation. I've not seen it in more straightforward factual situations in a long time now.

2

Sam Barrett, PhD @ai4geo.bsky.social · 5d

In summary, "don't use llms fir search", or, how good are they (openAI version)?
1. 2023: Terrible, don't
2. 2024: Unpredictable, use caution
3. Early 2025: Good with right version.
4. Late 2025: Good with general free version.

1 4

Sam Barrett, PhD @ai4geo.bsky.social · 5d

5. October 2025. I use GPT-5 instant to figure out all the above dates...

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 5d

4. August 2025 - GPT-5. Instant mode much stronger for simple search than 4o, and thinking mode strong for more complex queries. Deep research and 5-pro incredibly good for highly complex and lengthy research. Router means public quality of search via chat interface with llm generally very good now.

1 2

Sam Barrett, PhD @ai4geo.bsky.social · 5d

3. Ca. Jan-Feb 2025 - reasoning models, o1 pro, deep research. Massive improvements if using those tools - generally trustworthy for search (check sources for anything which matters!), but gpt4o with standard search still relatively week.

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 5d

Like re llms as search engines:
1. Pre search period - don't use them as search engines.
2. From ca. April 2023, gpt-4 gets search - use with care, still fragile. Gradual minor improvements.

1 1

Sam Barrett, PhD @ai4geo.bsky.social · 5d

It would be sort of helpful and interesting to catalogue and track the progress and applicability of various kinds of advice around using llms.

1 3

Sam Barrett, PhD @ai4geo.bsky.social · 5d

I agree though it's worth distinguishing between the plain LLM all on it's own vs a reasoning model with a web search tool. The former genuinely shouldn't be used as a search engine. The latter however...

1 3

Sam Barrett, PhD @ai4geo.bsky.social · 5d

No, but using LLMs requires communication and effective communication with LLMs is easier in the frame of dignity and respect, whether or not that makes any philosophical sense. It's a pragmatic suggestion.

1 4

Sam Barrett, PhD @ai4geo.bsky.social · 5d

100% this. This also means it takes some intellectual discipline to use well. People usually avoid the possibility of being tild their wrong, let alone *asking* for it!

7

Sam Barrett, PhD @ai4geo.bsky.social · 6d

FFS Pulse! 5 times now!

Sam Barrett, PhD @ai4geo.bsky.social · 8d

Lovely that ChatGPT Pulse does me a Spanish lesson each day but I already knew "guagua" even before the 1st of the 4 times it's tried to teach me in the last 10 days.

An AI generated photo-like image of a bus, Che Guevara, and the Cuban flag with text "Spanish word of the day: 《guagua》"

An AI generated cartoon of a bus and a baby with text explaining the Spanish word "guagua".

1