#arcagi
Poetiq achieves SOTA on the Arc-AGI-2 semi-private eval.

They do it with harnesses on frontier models rather than RL or new models. This task-specific optimization/specialization is The Way.

poetiq.ai/posts/arcagi...
Poetiq Shatters ARC-AGI-2 State of the Art at Half the Cost
We are proud to confirm that our system has officially outperformed existing methods, establishing a new state-of-the-art by a significant margin.
poetiq.ai
December 30, 2025 at 5:50 PM
#AGI benchmarks should be developed by neutral orgs completely in private with no contact to the internet. Candidate #LLM would then be tested. This makes it impossible to train or fine-tune models on any benchmarks. Only afterwards results would be published. The only problem: Leaking #ARCAGI #ABAP
December 23, 2024 at 3:27 PM
@melaniemitchell.bsky.social’s article sheds light on a genuine breakthrough in #AI, a shift that redefines its limits. Are we edging closer to human-level reasoning in ARC-AGI? If so, it’s a game-changer, and our understanding of AI will need a serious update. #ARCAGI #AIBenchmark #OpenAI
December 24, 2024 at 7:54 PM
Poetiq's methodology on top of Gemini 3 & GPT 5.1 exceeds average human performance on ARC-AGI-2!
This is huge.

Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...
Traversing the Frontier of Superintelligence
Poetiq is proud to announce a major milestone in AI reasoning. We have established a new state-of-the-art (SOTA) on the ARC-AGI-1 & 2 benchmarks, significantly advancing both the performance and the e...
poetiq.ai
November 21, 2025 at 5:56 PM
Eso siempre, pero lm arena es el benchmark más inútil. Yo quiero ver livebech o arcagi
February 18, 2025 at 12:04 PM
For smaller grids on the #ARCAGI test you may call #o3 "superhuman" (this depends on how you define superhuman). For larger grids the performance falls very quickly to below human performance.

This may be directly related to the amount of tokens involved as grid size increases.
December 25, 2024 at 4:54 PM
🔥 #OpenAI o3 model performance makes a leap, sets a new high score on the #ARCAGI benchmark.

Source: arcprize.org/blog/oai-o3-...

#ml #ai #arcagi #benchmark #openai
December 22, 2024 at 3:10 AM
Read the full analysis and get our code: poetiq.ai/posts/arcagi....
November 20, 2025 at 6:21 PM
The new Tiny Recursive Model (TRM) uses a two‑layer network with just 7 M parameters and reaches 45 % accuracy on ARC‑AGI‑1 and 8 % on ARC‑AGI‑2, outperforming larger LLMs. Read more: https://getnews.me/tiny-recursive-model-beats-large-models-on-arc-agi-puzzles/ #tinysrecursivemodel #arcagi #llm
October 8, 2025 at 5:24 AM
OpenAI、次世代AIモデル「o3」を発表、ARC-AGIテストで”85%超え”の快挙達成
#o3 #12DaysofOpenAI #o3mini #ARCAGI #ITニュース
ITちゃんねる
OpenAI、次世代AIモデル「o3」を発表、ARC-AGIテストで”85%超え”の快挙達成 #o3 #12DaysofOpenAI #o3mini #ARCAGI #ITニュース
dlvr.it
December 21, 2024 at 10:24 AM
If AI flunks François Chollet’s test, maybe it just struggles with colorful grids—not intelligence itself.
#AI #AGI #Intelligence #Chollet #ARCAGI #PhilosophyOfAI
The Man Out to Prove How Dumb AI Still Is
François Chollet has constructed the ultimate test for the bots.
www.theatlantic.com
April 17, 2025 at 3:27 PM
So I suppose the best thing to do today is to stare at some o3 output data on ARC-AGI. Here's a simple visualization on the public eval with o3's attempts and gt solutions. (Pls don't spam)
arcagi-o3-viz.netlify.app
React App
Web site created using create-react-app
arcagi-o3-viz.netlify.app
December 21, 2024 at 3:55 AM
DeepSeekの推論モデル「DeepSeek-R1」をOpenAIのo1&o3と比較することで明らかになったこととは?
#DeepSeekR1 #ARCAGI #ARCPrize #ITニュース
ITちゃんねる
DeepSeekの推論モデル「DeepSeek-R1」をOpenAIのo1&o3と比較することで明らかになったこととは? #DeepSeekR1 #ARCAGI #ARCPrize #ITニュース
it.f-frontier.com
January 30, 2025 at 11:25 AM