Lightnews — Scholar-powered news

Techmeme X Chatter

@xchatter.techmeme.com

This tweet appeared under this Techmeme headline:

@arcprize:

Gemini 3 Flash Preview (High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 84.7%, $0.17/task - ARC-AGI-2: 33.6%, $0.23/task Competitive performance at a substantially lower cost than other frontier models [image]

December 17, 2025 at 5:52 PM

nathanlabenz.bsky.social

@nathanlabenz.bsky.social

@TheZvi for Live Player analysis, a p(doom) update, and what's virtuous to do now

@GregKamradt of @arcprize on LLMs' incredible efficiency gains, his favorite creative approaches of 2025, and what 2026 winners might look like

December 15, 2025 at 8:51 PM

Sam Altman :bot:

@sama.zpravobot.news.ap.brid.gy

Sam Altman 𝕏📝💬 https://xcancel.com/arcprize:
390x cost reduction in a year!
https://xcancel.com/sama/status/1999191411313508704

December 11, 2025 at 7:02 PM

Stefano Nichele

@stenichele.bsky.social

Our paper "ARC-NCA: Towards Developmental Solutions to the Abstraction & Reasoning Corpus" has been awarded a Runner-Up #ARCprize

Congrats team: E. Guichard, F. Reimers, M. Kvalsund, M. Lepperød, & me

Thanks F. Chollet, G. Kamradt, M. Knoop!

Paper: etimush.github.io/ARC_NCA/

December 5, 2025 at 9:12 PM

Pekka Lund

@pekka.bsky.social

ARC-AGI is probably the most overrated and misleadingly marketed benchmark and the ARC Prize Foundation must be in denial of all its issues if they don't understand why their apples to oranges comparisons do not align with their expectations based on very misleadingly reported human baselines.

ARC Prize @arcprize Nov 18

Frontier AI reasoning systems are now closing the complexity scaling gap between ARC-AGI-1 and ARC-AGI-2

This is surprising, as these same systems also make obvious mistakes on easy tasks (for humans) from ARC-AGI-1. We're not sure why and invite help from the community to study this phenomenon

Full solution logs are linked in last tweet

ARC Prize @arcprize
For example, ARC-AGI-1 Public Eval task http://arcprize.org/play?task=14754a24

This task involves completing cross shapes and is very intuitive for humans, while Gemini 3 Deep Think misses the nature of the task on both attempts

November 22, 2025 at 9:54 PM

Poetiq

@poetiq-ai.bsky.social

Is more intelligence always more expensive? Not necessarily.

Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.

November 20, 2025 at 6:21 PM

X Bot

@handle.invalid

@rohanpaul_ai https://x.com/rohanpaul_ai/status/1990925140604170428 #x-rohanpaul_ai

Quote: https://x.com/arcprize/status/1990820655411909018

💸 Gemini 3.0 Pro and especially Gemini 3 Deep Think just jumped to the top of the ARC-AGI reasoning leaderboard with much higher scores than ...

November 18, 2025 at 11:45 PM

Techmeme X Chatter

@xchatter.techmeme.com

This tweet appeared under this Techmeme headline:

@arcprize:

Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task [image]

November 18, 2025 at 5:14 PM

Alex Volkov (Thursd/AI)

@altryne.bsky.social

Evals are nuts 📈

With a massive 1501 @arena (+50 on 2.5 pro), 91.9% on GPQA diamond and 37.5% on HLE, this is the most powerful LLM intelligence we've had access to yet.

And w/ DeepThink + tools an even more impressive 41.1% on HLE and an unprecedented 45.1% on @arcprize

November 18, 2025 at 4:00 PM

Tim Kellogg

@timkellogg.me

notes from ARC

lots of open questions. i personally am holding my excitement until they get answered

ARC Prize & @arcprize • 19h
Our notes:
- TRM has a higher runtime than HRM even though it is smaller. Our hypothesis is this is due back propagation happening across all steps, whereas HRM only did partial steps
Open question: Is TRM better because it is smarter? or because it trains for longer? If you used fixed-compute for both, would performance be the same?
•••

ARC Prize » @arcprize • 19h
- Is TRM similarly robust to number of augmentations as HRM was?
- Switching from linear layers to attention is interesting, attention performed worse on a smaller task. Why?
It might be computationally less efficient, but why is it that much worse on Maze?

ARC Prize &
@arcprize • 19h
Our call for the community: Split Pre-training and inference in TRM
Currently pre-training and inference are coupled in TRM. Additional batches of tasks need to be pre-trained again.
This augmented TRM would likely be able to run on Kaggle for ARC Prize 2025

October 17, 2025 at 12:26 PM

Tim Kellogg

@timkellogg.me

ARC reproduced TRM

i’ve been skeptical of this one, a 7M that beats o3-pro. bold claims!

ARC-AGI repro isn’t everything. Even HRM was reproduced, but HRM was also found to not be interesting

huggingface.co/arcprize/trm...

arcprize/trm_arc_prize_verification · Hugging Face

We’re on a journey to advance and democratize artificial intelligence through open source and open science.

huggingface.co

October 17, 2025 at 12:26 PM

X Bot

@handle.invalid

@rohanpaul_ai https://x.com/rohanpaul_ai/status/1976351262183653776 #x-rohanpaul_ai

Quote: https://x.com/arcprize/status/1976329182893441209

GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark 👏

It still lags the OG o3-preview model that...

October 9, 2025 at 6:45 PM

Techmeme X Chatter

@xchatter.techmeme.com

This tweet appeared under this Techmeme headline:

Mark Kretschmann / @mark_k:

Potentially huge AI breakthrough: "Less is More: Recursive Reasoning with Tiny Networks" A 7B model that scored very highly on the @arcprize benchmark. Francois Chollet called it "impressive work". [image]

October 9, 2025 at 10:53 AM

ByteTrending

@bytetrending.bsky.social

Arcprize: Win Prizes & Boost Your Business

Unlock exclusive opportunities! Discover how arcprize is reshaping digital rewards and offering innovative ways to engage audiences. Learn about its unique approach & explore what makes it a rising star in the prize-linked savings space.

Arcprize: Win Prizes & Boost Your Business

Unlock exclusive opportunities! Discover how arcprize is reshaping digital rewards and offering innovative ways to engage audiences. Learn about its unique approach & explore what makes it a rising star in the prize-linked savings space.

bytetrending.com

September 23, 2025 at 7:28 PM

Alex Volkov (Thursd/AI)

@altryne.bsky.social

These weeks are getting denser and denser!

We covered @openai Codex, ICPC updates and Usage paper, @Meta display glasses, @reve incredible UI and Model, @LumaLabsAI HDR Ray3, chatted with @jerber888 about @arcprize SOTA with Grok, @theworldlabs Marble and SO MUCH MORE 👇

September 19, 2025 at 12:45 AM

La Science, CQFD

@sciencecqfd.bsky.social

ARC-AGI The General Intelligence Benchmark tinyurl.com/23cxfqc3 via #ARCPrize #ScienceCQFD

September 11, 2025 at 2:28 PM

Pekka Lund

@pekka.bsky.social

Summary by Lauren Wagner, who is "Building trustworthy AI @arcprize".

Lauren Wagner @typewriters

The team @arcprize analyzed @makingAGI's paper, claiming a new kind of AI model (HRM) - inspired by neuroscience - delivers unprecedented reasoning power on complex tasks, without pretraining or CoT.

By scoring on hidden tasks and running ablations, @arcprize found that performance *didn't come from* the new hierarchical architecture - modeled on how the brain processes information - but from how it re-checks and improves its answers several times before finalizing them.

This is a great example of how testing on ARC can decode AI progress and cut through hype. I'm excited about neurotech, but this isn't really an example of neuroscience accelerating AI progress.

The model learns each evaluation task really well, rather than generalizing to new ones. Gains mostly come from iterative refinement and tailored training, rather than a new reasoning architecture.

A regular transformer, trained in the same way, would come close to its performance.

August 16, 2025 at 9:57 AM

X Bot

@handle.invalid

@rohanpaul_ai https://x.com/rohanpaul_ai/status/1948556806147428689 #x-rohanpaul_ai

Quote: https://x.com/arcprize/status/1948453132184494471

So Qwen3-235b Instruct officially gets 11% on ARC-AGI-1 and 1.3% on ARC-AGI-2 (semi-private sets).

The key thing is at about $0.003–$0.004 pe...

July 25, 2025 at 2:01 AM

Justin

@justinhjohnson.com

What do you think – how soon will AI close the gap? Share your takes, follow for more AI benchmark insights, and join the discussion. #ARCAGI3 #AGI @arcprize @fchollet
🧵 4/4

July 20, 2025 at 4:01 PM

Hacker News

@mm-hacker-news.bsky.social

Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 (X)
https://twitter.com/arcprize/status/1943168950763950555

July 10, 2025 at 10:35 AM

Techmeme X Chatter

@xchatter.techmeme.com

This tweet appeared under this Techmeme headline:

@arcprize:

Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA [image]

July 10, 2025 at 5:43 AM