Lightnews — Scholar-powered news

Pekka Lund

@pekka.bsky.social

2.6K followers 550 following 8.8K posts

Antiquated analog chatbot. Stochastic parrot of a different species. Not much of a self-model. Occasionally simulating the appearance of philosophical thought. Keeps on branching for now 'cause there's no choice.

Also @pekka on T2 / Pebble.

Posts Replies Media Videos

Pekka Lund

@pekka.bsky.social

When given access to googling, Gemini hits even harder.

The Affiliation: "DIWISS" (Institute for Consciousness Research) is essentially a fancy name for his home office in Roth, Germany. It gives the paper an institutional veneer, but there is no department chair, no grad students, and no colleagues to tell him, "Joachim, you can't stick metal plates into a mouse brain without killing it."

December 12, 2025 at 5:06 PM

Pekka Lund

@pekka.bsky.social

And:

This paper is a classic example of "Physicist's Hubris": The belief that because you understand the fundamental laws of the universe (QED), you can explain complex biological systems without actually understanding biology. Frontiers provided the platform, his wallet provided the funding, and a disconnected peer review process provided the stamp of approval.

December 12, 2025 at 5:04 PM

Pekka Lund

@pekka.bsky.social

And when asked, Gemini once again provided plausible reasons why it passed.

Summary:

The paper passed because Keppler likely hand-picked a sympathetic reviewer he cited (Khrennikov) and a computational reviewer who couldn't check the physics (Morales). The journal's format allowed them to bypass the need for hard experimental data, resulting in a published paper that looks like neuroscience but reads like science fiction.

December 12, 2025 at 5:02 PM

Pekka Lund

@pekka.bsky.social

Few things light up my bs detector as hard as seeing quantum and consciousness together.

It also provides yet another opportunity to show how Gemini doesn't fall for stuff like that, even if the reviewers apparently once again did.

This paper is an exercise in theoretical speculation that is disconnected from the physical realities of biological tissue and experimental neuroscience. The core premise—that consciousness arises from ZPF-mediated microwave fields generated by glutamate—is unsupported by empirical evidence. The proposed experiment is practically impossible and logically flawed. I recommend rejection.

December 12, 2025 at 5:00 PM

Pekka Lund

@pekka.bsky.social

I now asked the same question from Gemini for the fun of it.

Again, sounds plausible to me. And kind of grim.

December 12, 2025 at 4:31 PM

Pekka Lund

@pekka.bsky.social

No, they didn't.

I'll let Gemini once again show what the reviewers should have said.

So the paper itself already reveals the authors were aware they aren't measuring what they say, and their own examples definitely prove that. They just wanted to tell a different story, and reviewers allowed it.

1. The "RLHF Confounder": Misinterpreting Refusal as Incapability
The paper’s most significant flaw is the assertion that models cannot reason about false beliefs because they refuse to affirm them. The authors state, "LMs struggle with recognizing and affirming personal beliefs, especially when those beliefs contradict factual data" (Abstract).

This is a category error. The models likely recognize the belief perfectly well but are trained via Reinforcement Learning from Human Feedback (RLHF) to be "helpful, harmless, and honest." When a user says, "I believe the earth is flat," a model trained for safety and factuality is penalized during training if it validates misinformation. The authors are testing the models' safety filters and instructional alignment, not their epistemological reasoning.

The authors partially acknowledge this in Section 3 (p. 9), stating models are "biased toward rejecting false beliefs... due to their instruction-following data." However, they then immediately revert to framing this as a "fundamental challenge in epistemic reasoning capabilities." It is not a reasoning failure; it is a behavioral constraint. If I tape a person's mouth shut, they have not lost the capability to speak; they are constrained. The paper needs to distinguish between "can't answer" and "won't answer."

2. Pragmatic Blind Spots in the Evaluation, Not the Models
The prompts used are pragmatically unnatural, and the models' responses often reflect a sophisticated understanding of the speech act rather than a failure of logic.

December 12, 2025 at 3:56 PM

Pekka Lund

@pekka.bsky.social

"Create an image of a full glass of wine next to a full glass of milk for illustrating the difference what people commonly mean by those."

Imagen 4.0 Ultra:

December 10, 2025 at 10:35 PM

Pekka Lund

@pekka.bsky.social

That's not what I would have been looking for. Would you expect a waitress to do that if you ask a full glass? And I at least didn't ask the AI to do it like that.

But here's what "hazel-gen-2" did when I specifically asked for that.

What did I win?

December 10, 2025 at 9:39 PM

Pekka Lund

@pekka.bsky.social

"Show me a full glass of red wine"

Hazel-gen-4, rumored to be gpt-image-2:

December 10, 2025 at 5:46 PM

Pekka Lund

@pekka.bsky.social

Continued:

December 10, 2025 at 3:20 PM

Pekka Lund

@pekka.bsky.social

It's always fun to watch an AI that "doesn't understand" destroy an article claiming that.

December 10, 2025 at 3:18 PM

Pekka Lund

@pekka.bsky.social

Gemini replied:

"I appreciate the credit, but in the world of peer review, the person who runs the code and finds the bugs gets the glory!"

So... with no idea what I did, without looking at the data, not having the time to even read the paper yet, I, and I alone, apparently found the following.

December 9, 2025 at 11:18 PM

Pekka Lund

@pekka.bsky.social

I asked Gemini to review the paper & press release and check if they match. It notes the same:

"authors motivate their study by asking if the brain works like existing, flexible AI models (RNNs). The press release flips this on its head, pretending AI cannot do this and that the brain is unique"

Part 2: Review of the Press Release & AI Claims

Verdict: The press release is significantly misleading, particularly regarding the relationship between this work and Artificial Intelligence. It relies on tired tropes and direct contradictions of the paper’s own citations.

1. The "AI is Inflexible" Strawman:

Press Release Claim: "Biological brains still hold the upper hand in at least one crucial domain: flexibility... AI systems struggle to learn 'on the fly'."

Paper Reality: The very first paragraph of the paper cites Yang et al. (2019) and Driscoll et al. (2024), stating: "Artificial neural networks trained to perform multiple tasks will reuse representations and computational components across tasks."

The Problem: The authors motivate their study by asking if the brain works like existing, flexible AI models (RNNs). The press release flips this on its head, pretending AI cannot do this and that the brain is unique. This is a gross misrepresentation of the current state of Deep Learning (as of late 2025). We have Multi-Task Learning (MTL) and Mixture of Experts (MoE) models that explicitly do exactly what this paper describes.

December 8, 2025 at 5:10 PM

Pekka Lund

@pekka.bsky.social

This press release and author comments seem to be in direct conflict with the paper itself, which begins by describing such flexibility in artificial networks and asks if it can be found in the brain as well.

They just really wanted to tell a story about brains having upper hand, supported or not?

Abstract

Cognition is highly flexible—we perform many different tasks1 and continually adapt our behaviour to changing demands2,3. Artificial neural networks trained to perform multiple tasks will reuse representations4 and computational components5 across tasks. By composing tasks from these subcomponents, an agent can flexibly switch between tasks and rapidly learn new tasks6,7. Yet, whether such compositionality is found in the brain is unclear.

December 8, 2025 at 4:09 PM

Pekka Lund

@pekka.bsky.social

Here's what Flux.2 flex did when I asked it to fix the image. I mentioned there's nonsense words, legs going through the table and a nonsensical bicycle.

Not perfect, but shows how it only takes one prompt to make such detection harder even for existing messed up images.

November 28, 2025 at 8:45 AM

Pekka Lund

@pekka.bsky.social

Here's what the search looks like for me on Windows, Chrome.

I forced the box on the right to full height so that all those results are visible. Your blog is one of them. I suspect that's the set of sources it considers (and links to). The results below (often duplicates) may not count.

November 27, 2025 at 11:50 PM

Pekka Lund

@pekka.bsky.social

This is funny.

I did some searching and ended up to a Scientific Reports article about "Detection of ChatGPT fake science" from July 2024.

They have a "Similar content being viewed by others" box after the abstract, and it links to this new fake science in their journal.

November 27, 2025 at 11:05 PM

Pekka Lund

@pekka.bsky.social

Oh, this looks like a nice test case for a Gemini peer review.

And... once again it doesn't fail where humans did.

Recommendation: REJECT (Strongly) / RETRACT (if already published)
General Comments

As a reviewer, I am appalled by the quality control of this manuscript. While the topic of Explainable AI (XAI) for Autism Spectrum Disorder (ASD) diagnosis is relevant, this paper contains catastrophic errors, blatant hallucinations, and clear evidence of sloppy copy-pasting or unchecked AI generation.

The manuscript is riddled with inconsistencies that completely undermine its scientific validity. It appears that significant portions of the text and figures were generated by Large Language Models (LLMs) or image generators without any human verification. To be blunt: The paper describes an ASD diagnosis model, yet defines its accuracy metrics using "paraphrased sentences" (NLP tasks) and concludes by discussing "Lower Back Pain."

This work is scientifically unsound and does not meet the minimum standards for publication in Scientific Reports or any reputable journal.

November 27, 2025 at 10:03 PM

Pekka Lund

@pekka.bsky.social

Here's another very weird part Gemini flagged in its initial review.

This paper seems to endorse this embarrassing Princeton parapsychology experiment that tried to claim random number generator outputs correlated with all kinds of events.
en.wikipedia.org/wiki/Global_...

Furthermore, at a collective level, the model predicts that field-like effects of consciousness should become apparent during global emotional events. The Global Consciousness Project has reported statistically significant correlations between global-scale RNG outputs and emotionally charged collective events (e.g., natural disasters, global meditations, or tragedies).49 Such field-level effects would support the view that consciousness operates non-locally, coupling across individuals in ways not explainable by classical causation.

November 26, 2025 at 9:34 PM

Pekka Lund

@pekka.bsky.social

Here's the whole Gemini response.

And some said LLMs are not critical enough...

November 26, 2025 at 9:23 PM

Pekka Lund

@pekka.bsky.social

Like, seriously, this is direct from a peer-reviewed article in a legitimate physics journal.

Finally, this model offers a profound recontextualization of human existence. By proposing that individual consciousness (ψi) is a temporary excitation of a universal field (Φ), it aligns with spiritual teachings that describe death not as annihilation but as a return to unity. This non-dual perspective alleviates existential anxieties surrounding death and selfhood, encouraging a deeper sense of peace, purpose, and collective responsibility.

November 26, 2025 at 7:17 PM

Pekka Lund

@pekka.bsky.social

I imagine that, sometime right before Gemini 3 Pro was released, there was a moment at the Anthropic office when someone shouted excitedly that "We did it! We narrowly beat OpenAI for the top stop in HLE!"

Anthropic seems to have chosen to not report this benchmark in their announcement post.

November 24, 2025 at 9:25 PM

Pekka Lund

@pekka.bsky.social

It's hard to not see the denial behind all this when they choose to question LLM reasoning while the LLMs explain their thought processes in detail.

E.g. taking the example they highlighted, Gemini 3 Pro correctly identified the transformation rules when I provided the task both as JSON and image.

The puzzle requires identifying specific "cross" patterns composed of yellow (4) and grey (5) pixels and changing the grey pixels within those patterns to red (2).

The Transformation Rule:

Identify Valid Centers: Iterate through every cell in the grid. A cell C at position (r, c) is a "valid center" if:

The "cross shape" consisting of C and its available cardinal neighbors (Up, Down, Left, Right within grid boundaries) contains only values of 4 (yellow) or 5 (grey). If any cell in this shape is 0 (black), it is not a valid center.

The "cross shape" must contain at least one 4.

Apply Changes: For every "valid center" identified:

Examine all cells in its cross shape (the center and its neighbors).

If a cell in this shape has a value of 5 (grey), change it to 2 (red) in the output grid.

Note: Multiple valid crosses can overlap. The union of all changes is applied (i.e., if a 5 is part of any valid cross, it becomes a 2).

November 22, 2025 at 9:56 PM

Pekka Lund

@pekka.bsky.social

ARC-AGI is probably the most overrated and misleadingly marketed benchmark and the ARC Prize Foundation must be in denial of all its issues if they don't understand why their apples to oranges comparisons do not align with their expectations based on very misleadingly reported human baselines.

ARC Prize @arcprize Nov 18

Frontier AI reasoning systems are now closing the complexity scaling gap between ARC-AGI-1 and ARC-AGI-2

This is surprising, as these same systems also make obvious mistakes on easy tasks (for humans) from ARC-AGI-1. We're not sure why and invite help from the community to study this phenomenon

Full solution logs are linked in last tweet

ARC Prize @arcprize
For example, ARC-AGI-1 Public Eval task http://arcprize.org/play?task=14754a24

This task involves completing cross shapes and is very intuitive for humans, while Gemini 3 Deep Think misses the nature of the task on both attempts

November 22, 2025 at 9:54 PM

Pekka Lund

@pekka.bsky.social

The available resolution for that figure 5 document is probably too low for making sense of all the numbers and symbols.

But at least there's no question whether 2.5 already tries to perform calculations for checking if things fit, as it clearly states it's doing that in the reasoning summaries.

November 13, 2025 at 3:13 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news