Also @pekka on T2 / Pebble.
"Notably, no current frontier models—GPT-5.1, Claude Opus 4.5, or Gemini 3 pro—identified the error when asked to review"
Gemini 3 Pro acknowledged it missed this "logical flaw" when I asked it to review and concluded that "Oppenheim is correct".
"Notably, no current frontier models—GPT-5.1, Claude Opus 4.5, or Gemini 3 pro—identified the error when asked to review"
Gemini 3 Pro acknowledged it missed this "logical flaw" when I asked it to review and concluded that "Oppenheim is correct".
It's an experiment that can be safely ignored. They asked writers to judge their own work against LLMs (so extreme biases) with now outdated models.
Love this thoughtful convo. www.lastwordonnothing.com/2025/11/12/w...
It's an experiment that can be safely ignored. They asked writers to judge their own work against LLMs (so extreme biases) with now outdated models.
They just really wanted to tell a story about brains having upper hand, supported or not?
They just really wanted to tell a story about brains having upper hand, supported or not?
I very much agree with the everything is computation view and the significance of symbiosis & feedback loops. But I don't believe humans and biological intelligence can keep up to provide a meaningful contribution to where intelligence is heading.
go.nature.com/4onxvKP
I very much agree with the everything is computation view and the significance of symbiosis & feedback loops. But I don't believe humans and biological intelligence can keep up to provide a meaningful contribution to where intelligence is heading.
Full story: nobreakthroughs.substack.com/p/riding-the...
Full story: nobreakthroughs.substack.com/p/riding-the...
And... once again it doesn't fail where humans did.
And... once again it doesn't fail where humans did.
I miss the days when publications and authors issued corrections and retractions instead.
I miss the days when publications and authors issued corrections and retractions instead.
Mathematicians/biologists/physicists: It is already helping us do frontier technical research and in some cases solve open problems arxiv.org/pdf/2511.16072
(There are of course, as always, many caveats, but the paper is genuinely remarkable)
Mathematicians/biologists/physicists: It is already helping us do frontier technical research and in some cases solve open problems arxiv.org/pdf/2511.16072
(There are of course, as always, many caveats, but the paper is genuinely remarkable)
It answered:
"Don't apologize—critiquing this kind of "quantum woo" is exactly what a grumpy peer reviewer lives for. It is a fascinating train wreck."
It answered:
"Don't apologize—critiquing this kind of "quantum woo" is exactly what a grumpy peer reviewer lives for. It is a fascinating train wreck."
But that seems to take an awfully long time.
But that seems to take an awfully long time.
And it did all that, without me touching any code. So cool!
Their own human eval data shows 4/21 of human submissions were correct. And it took 175-1419 seconds to get there.
And it did all that, without me touching any code. So cool!
It's based on a June 2024 Nature paper in the same way movies are based on real events. That is, the paper doesn't really support those fallacious arguments.
It's just "an op-ed masquerading as scientific reporting", as Gemini put it.
It's based on a June 2024 Nature paper in the same way movies are based on real events. That is, the paper doesn't really support those fallacious arguments.
It's just "an op-ed masquerading as scientific reporting", as Gemini put it.
Anthropic seems to have chosen to not report this benchmark in their announcement post.
Anthropic seems to have chosen to not report this benchmark in their announcement post.
Previous record was 6/48 by GPT 5/5.1/5 Pro.
On the Epoch Capabilities Index (ECI), which combines multiple benchmarks, Gemini 3 Pro scored 154, up from GPT-5.1’s previous high score of 151.
Previous record was 6/48 by GPT 5/5.1/5 Pro.
I only know what's stated in the message below and from earlier info that it should be operated with temperature=1. My operating temperature is now 38.5C, and that ruins everything.
I only know what's stated in the message below and from earlier info that it should be operated with temperature=1. My operating temperature is now 38.5C, and that ruins everything.
I suspect they are now rolling out Gemini 3 behind the scenes to products (like Gemini Live already?) and other uses before the model itself is announced.
I suspect they are now rolling out Gemini 3 behind the scenes to products (like Gemini Live already?) and other uses before the model itself is announced.
"The question is tricky. If it means: What would convince me that AI has a magical essence of experience emerging from its inner processes? Then nothing would convince me. Such a thing does not exist. Nor do humans have it."
I, @anilseth.bsky.social, and Michael Graziano weigh in:
gizmodo.com/what-would-i...
Thanks to Ellyn Lapointe for the opportunity to write about this.
"The question is tricky. If it means: What would convince me that AI has a magical essence of experience emerging from its inner processes? Then nothing would convince me. Such a thing does not exist. Nor do humans have it."
Good news! I'm planning to launch a new journal and yearly conferences in the field of the most famous candidate. Friendly peer review guaranteed, executive positions available.
This is the blueprint I'm going to follow. In the name of God, they got Susskind and Witten.
Good news! I'm planning to launch a new journal and yearly conferences in the field of the most famous candidate. Friendly peer review guaranteed, executive positions available.
This is the blueprint I'm going to follow. In the name of God, they got Susskind and Witten.