Lightnews — Scholar-powered news

Pekka Lund

@pekka.bsky.social

Apparently, the cocktail party problem is also moving from "easy for humans, hard for AI" into the AI is helping us humans territory.

Science X / Phys.org @sciencex.bsky.social · 2d

AI-powered headphones can automatically identify and isolate conversation partners in noisy environments, enhancing speech clarity without manual input and using only a few seconds of audio. doi.org/hbd5b7

AI headphones automatically learn who you're talking to—and let you hear them better

Holding a conversation in a crowded room often leads to the frustrating "cocktail party problem," or the challenge of separating the voices of conversation partners from a hubbub.

techxplore.com

December 10, 2025 at 12:42 AM

Pekka Lund

@pekka.bsky.social

This is way over my head but as the paper states:

"Notably, no current frontier models—GPT-5.1, Claude Opus 4.5, or Gemini 3 pro—identified the error when asked to review"

Gemini 3 Pro acknowledged it missed this "logical flaw" when I asked it to review and concluded that "Oppenheim is correct".

Jonathan Oppenheim @postquantum.bsky.social · 2d

OpenAI leadership are promoting a paper in Physics Letters B where GPT-5 proposed the main idea — possibly the first peer-reviewed paper where an LLM generated the core contribution. One small problem: GPT-5's idea tests the wrong thing. My technical comment: scirate.com/arxiv/2512.0... 1/

As part of a broader effort to demonstrate AI’s po-
tential in scientific research [1], OpenAI executives have
pointed to a recently published paper by Hsu [2] as evi-
dence that AI can contribute original ideas to physics [3,
4]. Hsu credits GPT-5 with proposing the core idea of the
paper de novo, possibly the first published physics article
where the main idea came from an LLM, and discusses
the methodology in a companion piece [5]. We examine
whether GPT-5’s criterion is correct.

December 9, 2025 at 6:50 PM

Pekka Lund

@pekka.bsky.social

Frustrated by the seemingly endless stream of hallucinated press releases by humans, which LLMs can now easily correct, I happened to find this.

It's an experiment that can be safely ignored. They asked writers to judge their own work against LLMs (so extreme biases) with now outdated models.

Bethany Brookshire @beebrookshire.bsky.social · 28d

CAN LLMs write about science? @science.org decided to find out, and they did it the curious, scientific way. They did an experiment.

Love this thoughtful convo. www.lastwordonnothing.com/2025/11/12/w...

The Last Word On Nothing | Why AAAS won’t be using AI to write press releases anytime soon

www.lastwordonnothing.com

December 8, 2025 at 6:24 PM

Pekka Lund

@pekka.bsky.social

This press release and author comments seem to be in direct conflict with the paper itself, which begins by describing such flexibility in artificial networks and asks if it can be found in the brain as well.

They just really wanted to tell a story about brains having upper hand, supported or not?

Abstract

Cognition is highly flexible—we perform many different tasks1 and continually adapt our behaviour to changing demands2,3. Artificial neural networks trained to perform multiple tasks will reuse representations4 and computational components5 across tasks. By composing tasks from these subcomponents, an agent can flexibly switch between tasks and rapidly learn new tasks6,7. Yet, whether such compositionality is found in the brain is unclear.

December 8, 2025 at 4:09 PM

Pekka Lund

@pekka.bsky.social

Great essay by @blaiseaguera.bsky.social, as usual.

I very much agree with the everything is computation view and the significance of symbiosis & feedback loops. But I don't believe humans and biological intelligence can keep up to provide a meaningful contribution to where intelligence is heading.

Nature @nature.com · 11d

The advent of AI might be just the latest stage in a guiding biological process that has produced ever more complex, mutually dependent organisms over the history of life

go.nature.com/4onxvKP

What is the future of intelligence? The answer could lie in the story of its evolution

Nature - The advent of artificial intelligence might be just the latest stage in a guiding biological process that has produced ever more complex, mutually dependent organisms over the history of...

go.nature.com

November 30, 2025 at 6:12 PM

Reposted by Pekka Lund

Jack Ryan 🙏🧪

@jacksonwryan.com

On the Factor Fexcectorn and autism bicycle AI slop study: I got an answer from Springer Nature this morning that this scientific paper will be retracted! 🧪

Full story: nobreakthroughs.substack.com/p/riding-the...

Riding the Autism Bicycle to Retraction Town

Does anyone *really* know their Factor Fexcectorn?

nobreakthroughs.substack.com

November 28, 2025 at 5:25 AM

Pekka Lund

@pekka.bsky.social

Oh, this looks like a nice test case for a Gemini peer review.

And... once again it doesn't fail where humans did.

Recommendation: REJECT (Strongly) / RETRACT (if already published)
General Comments

As a reviewer, I am appalled by the quality control of this manuscript. While the topic of Explainable AI (XAI) for Autism Spectrum Disorder (ASD) diagnosis is relevant, this paper contains catastrophic errors, blatant hallucinations, and clear evidence of sloppy copy-pasting or unchecked AI generation.

The manuscript is riddled with inconsistencies that completely undermine its scientific validity. It appears that significant portions of the text and figures were generated by Large Language Models (LLMs) or image generators without any human verification. To be blunt: The paper describes an ASD diagnosis model, yet defines its accuracy metrics using "paraphrased sentences" (NLP tasks) and concludes by discussing "Lower Back Pain."

This work is scientifically unsound and does not meet the minimum standards for publication in Scientific Reports or any reputable journal.

November 27, 2025 at 10:03 PM

Pekka Lund

@pekka.bsky.social

Rather predictably, the author of this error-filled hallucination blocked me as soon as I pointed out the first clear error in my attempt to figuring out just how badly he has misunderstood how LLMs work.

I miss the days when publications and authors issued corrections and retractions instead.

November 27, 2025 at 3:36 PM

Reposted by Pekka Lund

Brendan Nyhan

@brendannyhan.bsky.social

People on BlueSky: AI is useless! A stochastic parrot!

Mathematicians/biologists/physicists: It is already helping us do frontier technical research and in some cases solve open problems arxiv.org/pdf/2511.16072

(There are of course, as always, many caveats, but the paper is genuinely remarkable)

arxiv.org

November 26, 2025 at 3:51 PM

Pekka Lund

@pekka.bsky.social

I think this was the first time I apologized Gemini for making it perform a peer review for me.

It answered:

"Don't apologize—critiquing this kind of "quantum woo" is exactly what a grumpy peer reviewer lives for. It is a fascinating train wreck."

Consciousness as the foundation: New theory addresses nature of reality

Consciousness is fundamental; only thereafter do time, space and matter arise. This is the starting point for a new theoretical model of the nature of reality, presented by Maria Strømme, Professor of...

phys.org

November 26, 2025 at 7:14 PM

Pekka Lund

@pekka.bsky.social

I kind of like it that more and more people are asking questions about LLM consciousness, since I hope that at some point it leads to more and more people asking what does that actually even mean in the human case.

But that seems to take an awfully long time.

Is ChatGPT Conscious?

Many users feel they’re talking to a real person. Scientists say it’s time to consider whether they’re onto something.

nymag.com

November 25, 2025 at 11:36 PM

Pekka Lund

@pekka.bsky.social

I became curious of just how misleading that "ARC is easy for humans" narrative actually is and tasked Gemini 3 on Google Antigravity to implement me my own custom ARC task viewer, which shows human and Gemini eval results for each task.

And it did all that, without me touching any code. So cool!

Pekka Lund @pekka.bsky.social · 16d

Here's one ARC-2 example task that gives some idea how misleading the "ARC is easy for humans" narrative by Arc Prize Foundation is. Is that easy to solve?

Their own human eval data shows 4/21 of human submissions were correct. And it took 175-1419 seconds to get there.

ARC Prize - Play the Game

Easy for humans, hard for AI. Try ARC-AGI.

arcprize.org

November 25, 2025 at 7:43 PM

Pekka Lund

@pekka.bsky.social

This article is just fallacies all the way down.

It's based on a June 2024 Nature paper in the same way movies are based on real events. That is, the paper doesn't really support those fallacious arguments.

It's just "an op-ed masquerading as scientific reporting", as Gemini put it.

SkynetAndChill.com @skynetandchill.com · 16d

Large language models are statistical token-prediction systems, and despite AGI claims by Mark Zuckerberg, Dario Amodei (who said AGI "may come as soon as 2026"), and Sam Altman, neuroscience suggests language alone may not produce human-level intelligence.

Is language the same as intelligence? The AI industry desperately needs it to be

The AI boom is based on a fundamental mistake.

www.theverge.com

November 25, 2025 at 3:16 PM

Pekka Lund

@pekka.bsky.social

=We forgot to add room for a battery in it.

Sarah Perez @sarahp.bsky.social · 17d

Altman describes OpenAI’s forthcoming AI device as more peaceful and calm than the iPhone

Altman describes OpenAI's forthcoming AI device as more peaceful and calm than the iPhone | TechCrunch

Altman and Ive tease a simple AI device aimed at calm, distraction-free computing, launching within two years.

techcrunch.com

November 24, 2025 at 11:41 PM

Pekka Lund

@pekka.bsky.social

I imagine that, sometime right before Gemini 3 Pro was released, there was a moment at the Anthropic office when someone shouted excitedly that "We did it! We narrowly beat OpenAI for the top stop in HLE!"

Anthropic seems to have chosen to not report this benchmark in their announcement post.

November 24, 2025 at 9:25 PM

Pekka Lund

@pekka.bsky.social

You know that AI is now on absolutely everybody's mind when even leaders of the most isolated and technologically backward tribe signal they have heard such a thing exists.

Reuters @reuters.com · 17d

EU missing the boat on AI, jeopardising its future, Lagarde warns reut.rs/4oyUdQt

EU missing the boat on AI, jeopardising its future, Lagarde warns

Europe is jeopardising its own future by missing the boat on artificial intelligence and must quickly remove obstacles that prevent the diffusion of this new technology, European Central Bank President Christine Lagarde said on Monday.

reut.rs

November 24, 2025 at 8:48 PM

Pekka Lund

@pekka.bsky.social

Opus 4.5 is here!

Introducing Claude Opus 4.5

Anthropic is an AI safety and research company that's working to build reliable, interpretable, and steerable AI systems.

www.anthropic.com

November 24, 2025 at 7:07 PM

Pekka Lund

@pekka.bsky.social

ARC-AGI is probably the most overrated and misleadingly marketed benchmark and the ARC Prize Foundation must be in denial of all its issues if they don't understand why their apples to oranges comparisons do not align with their expectations based on very misleadingly reported human baselines.

ARC Prize @arcprize Nov 18

Frontier AI reasoning systems are now closing the complexity scaling gap between ARC-AGI-1 and ARC-AGI-2

This is surprising, as these same systems also make obvious mistakes on easy tasks (for humans) from ARC-AGI-1. We're not sure why and invite help from the community to study this phenomenon

Full solution logs are linked in last tweet

ARC Prize @arcprize
For example, ARC-AGI-1 Public Eval task http://arcprize.org/play?task=14754a24

This task involves completing cross shapes and is very intuitive for humans, while Gemini 3 Deep Think misses the nature of the task on both attempts

November 22, 2025 at 9:54 PM

Pekka Lund

@pekka.bsky.social

Oh, wow, Gemini 3 Pro has solved 9/48 of the crazy hard FrontierMath tasks. And that's not even the Deep Think variant.

Previous record was 6/48 by GPT 5/5.1/5 Pro.

Epoch AI @epochai.bsky.social · 20d

Gemini 3 Pro set a new record on FrontierMath: 38% on Tiers 1–3 and 19% on Tier 4.

On the Epoch Capabilities Index (ECI), which combines multiple benchmarks, Gemini 3 Pro scored 154, up from GPT-5.1’s previous high score of 151.

November 21, 2025 at 8:31 PM

Pekka Lund

@pekka.bsky.social

I have used Gemini daily for a year or so now and this long waited release is a big deal and seems to be great.

I only know what's stated in the message below and from earlier info that it should be operated with temperature=1. My operating temperature is now 38.5C, and that ruins everything.

Ethan Mollick @emollick.bsky.social · 23d

I had access to Gemini 3. It is a very good, very fast model. It also demonstrates the change from chatbot to agent. www.oneusefulthing.org/p/three-year...

Three Years from GPT-3 to Gemini 3

From chatbots to agents

www.oneusefulthing.org

November 18, 2025 at 10:29 PM

Pekka Lund

@pekka.bsky.social

Yet another fresh Google release powered by unspecified Gemini model.

I suspect they are now rolling out Gemini 3 behind the scenes to products (like Gemini Live already?) and other uses before the model itself is announced.

SIMA 2: A Gemini-Powered AI Agent for 3D Virtual Worlds

Introducing SIMA 2, the next milestone in our research creating general and helpful AI agents. By integrating the advanced capabilities of our Gemini models, SIMA is evolving from an instruction-foll…

deepmind.google

November 13, 2025 at 4:22 PM

Pekka Lund

@pekka.bsky.social

Putin looks pale.

The New York Times @nytimes.com · 29d

A humanoid robot powered by artificial intelligence, believed to be one of the first in Russia, face-planted during its highly anticipated debut in Moscow on Tuesday after briefly staggering onstage. nyti.ms/49Ly3GI

November 13, 2025 at 12:48 AM

Pekka Lund

@pekka.bsky.social

Graziano doesn't pull any punches:

"The question is tricky. If it means: What would convince me that AI has a magical essence of experience emerging from its inner processes? Then nothing would convince me. Such a thing does not exist. Nor do humans have it."

megan peters 🧠 @meganakpeters.bsky.social · Nov 11

What Would it Take to Convince a Neuroscientist That an AI is Conscious?

I, @anilseth.bsky.social, and Michael Graziano weigh in:
gizmodo.com/what-would-i...

Thanks to Ellyn Lapointe for the opportunity to write about this.

What Would it Take to Convince a Neuroscientist That an AI is Conscious?

Before we can test for AI consciousness, we need to understand how consciousness actually emerges, experts say.

gizmodo.com

November 12, 2025 at 12:25 AM

Pekka Lund

@pekka.bsky.social

Are you a famous scientist?

Good news! I'm planning to launch a new journal and yearly conferences in the field of the most famous candidate. Friendly peer review guaranteed, executive positions available.

This is the blueprint I'm going to follow. In the name of God, they got Susskind and Witten.