Lightnews — Scholar-powered news

Sam Harsimony

@harsimony.bsky.social

Poetiq achieves SOTA on the Arc-AGI-2 semi-private eval.

They do it with harnesses on frontier models rather than RL or new models. This task-specific optimization/specialization is The Way.

poetiq.ai/posts/arcagi...

Poetiq Shatters ARC-AGI-2 State of the Art at Half the Cost

We are proud to confirm that our system has officially outperformed existing methods, establishing a new state-of-the-art by a significant margin.

poetiq.ai

December 30, 2025 at 5:50 PM

Dirk Roeckmann

@5troop.bsky.social

#AGI benchmarks should be developed by neutral orgs completely in private with no contact to the internet. Candidate #LLM would then be tested. This makes it impossible to train or fine-tune models on any benchmarks. Only afterwards results would be published. The only problem: Leaking #ARCAGI #ABAP

December 23, 2024 at 3:27 PM

Andy Tseng

@andytseng.bsky.social

@melaniemitchell.bsky.social’s article sheds light on a genuine breakthrough in #AI, a shift that redefines its limits. Are we edging closer to human-level reasoning in ARC-AGI? If so, it’s a game-changer, and our understanding of AI will need a serious update. #ARCAGI #AIBenchmark #OpenAI

Melanie Mitchell @melaniemitchell.bsky.social · Dec 23

Some of my thoughts on OpenAI's o3 and the ARC-AGI benchmark

aiguide.substack.com/p/did-openai...

Did OpenAI Just Solve Abstract Reasoning?

OpenAI’s o3 model aces the "Abstraction and Reasoning Corpus" — but what does it mean?

aiguide.substack.com

December 24, 2024 at 7:54 PM

Key 🗝 🦊✅

@keytryer.net

poetiq.ai/posts/arcagi...

Traversing the Frontier of Superintelligence

Poetiq is proud to announce a major milestone in AI reasoning. We have established a new state-of-the-art (SOTA) on the ARC-AGI-1 & 2 benchmarks, significantly advancing both the performance and the e...

poetiq.ai

November 27, 2025 at 6:32 PM

Daniel Mewes

@dmewes.com

Poetiq's methodology on top of Gemini 3 & GPT 5.1 exceeds average human performance on ARC-AGI-2!
This is huge.

Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...

Traversing the Frontier of Superintelligence

Poetiq is proud to announce a major milestone in AI reasoning. We have established a new state-of-the-art (SOTA) on the ARC-AGI-1 & 2 benchmarks, significantly advancing both the performance and the e...

poetiq.ai

November 21, 2025 at 5:56 PM

Pronto

@notengoprisa.bsky.social

Eso siempre, pero lm arena es el benchmark más inútil. Yo quiero ver livebech o arcagi

February 18, 2025 at 12:04 PM

MJ

@mjrun.bsky.social

For smaller grids on the #ARCAGI test you may call #o3 "superhuman" (this depends on how you define superhuman). For larger grids the performance falls very quickly to below human performance.

This may be directly related to the amount of tokens involved as grid size increases.

December 25, 2024 at 4:54 PM

Sugato Ray

@sugatoray.bsky.social

🔥 #OpenAI o3 model performance makes a leap, sets a new high score on the #ARCAGI benchmark.

Source: arcprize.org/blog/oai-o3-...

#ml #ai #arcagi #benchmark #openai

December 22, 2024 at 3:10 AM

Poetiq

@poetiq-ai.bsky.social

Read the full analysis and get our code: poetiq.ai/posts/arcagi....

November 20, 2025 at 6:21 PM

GetNews.me

@getnews-me.bsky.social

The new Tiny Recursive Model (TRM) uses a two‑layer network with just 7 M parameters and reaches 45 % accuracy on ARC‑AGI‑1 and 8 % on ARC‑AGI‑2, outperforming larger LLMs. Read more: https://getnews.me/tiny-recursive-model-beats-large-models-on-arc-agi-puzzles/ #tinysrecursivemodel #arcagi #llm

Tiny Recursive Model Beats Large Models on ARC‑AGI Puzzles

October 8, 2025 at 5:24 AM

気になるITニュース

@news-it.bsky.social

OpenAI、次世代AIモデル「o3」を発表、ARC-AGIテストで”85%超え”の快挙達成
#o3 #12DaysofOpenAI #o3mini #ARCAGI #ITニュース

ITちゃんねる

OpenAI、次世代AIモデル「o3」を発表、ARC-AGIテストで”85%超え”の快挙達成 #o3 #12DaysofOpenAI #o3mini #ARCAGI #ITニュース

dlvr.it

December 21, 2024 at 10:24 AM

saromek.bsky.social

@saromek.bsky.social

poetiq.ai/posts/arcagi...

Poetiq Shatters ARC-AGI-2 State of the Art at Half the Cost

We are proud to confirm that our system has officially outperformed existing methods, establishing a new state-of-the-art by a significant margin.

poetiq.ai

December 6, 2025 at 7:06 PM

Harald Klinke

@harald-klinke.de

If AI flunks François Chollet’s test, maybe it just struggles with colorful grids—not intelligence itself.
#AI #AGI #Intelligence #Chollet #ARCAGI #PhilosophyOfAI

The Man Out to Prove How Dumb AI Still Is

François Chollet has constructed the ultimate test for the bots.

www.theatlantic.com

April 17, 2025 at 3:27 PM

Noor

@renoormalize.bsky.social

So I suppose the best thing to do today is to stare at some o3 output data on ARC-AGI. Here's a simple visualization on the public eval with o3's attempts and gt solutions. (Pls don't spam)
arcagi-o3-viz.netlify.app

React App

Web site created using create-react-app

arcagi-o3-viz.netlify.app

December 21, 2024 at 3:55 AM