@arcprize:
Gemini 3 Flash Preview (High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 84.7%, $0.17/task - ARC-AGI-2: 33.6%, $0.23/task Competitive performance at a substantially lower cost than other frontier models [image]
@arcprize:
Gemini 3 Flash Preview (High) on ARC-AGI Semi-Private Eval - ARC-AGI-1: 84.7%, $0.17/task - ARC-AGI-2: 33.6%, $0.23/task Competitive performance at a substantially lower cost than other frontier models [image]
@GregKamradt of @arcprize on LLMs' incredible efficiency gains, his favorite creative approaches of 2025, and what 2026 winners might look like
@GregKamradt of @arcprize on LLMs' incredible efficiency gains, his favorite creative approaches of 2025, and what 2026 winners might look like
390x cost reduction in a year!
https://xcancel.com/sama/status/1999191411313508704
390x cost reduction in a year!
https://xcancel.com/sama/status/1999191411313508704
Congrats team: E. Guichard, F. Reimers, M. Kvalsund, M. Lepperød, & me
Thanks F. Chollet, G. Kamradt, M. Knoop!
Paper: etimush.github.io/ARC_NCA/
Congrats team: E. Guichard, F. Reimers, M. Kvalsund, M. Lepperød, & me
Thanks F. Chollet, G. Kamradt, M. Knoop!
Paper: etimush.github.io/ARC_NCA/
Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.
Introducing Poetiq. We’ve established a new SOTA and Pareto frontier on @arcprize using Gemini 3 and GPT-5.1.
Quote: https://x.com/arcprize/status/1990820655411909018
💸 Gemini 3.0 Pro and especially Gemini 3 Deep Think just jumped to the top of the ARC-AGI reasoning leaderboard with much higher scores than ...
Quote: https://x.com/arcprize/status/1990820655411909018
💸 Gemini 3.0 Pro and especially Gemini 3 Deep Think just jumped to the top of the ARC-AGI reasoning leaderboard with much higher scores than ...
@arcprize:
Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task [image]
@arcprize:
Gemini 3 models from @Google @GoogleDeepMind have made a significant 2X SOTA jump on ARC-AGI-2 (Semi-Private Eval) Gemini 3 Pro: 31.11%, $0.81/task Gemini 3 Deep Think (Preview): 45.14%, $77.16/task [image]
With a massive 1501 @arena (+50 on 2.5 pro), 91.9% on GPQA diamond and 37.5% on HLE, this is the most powerful LLM intelligence we've had access to yet.
And w/ DeepThink + tools an even more impressive 41.1% on HLE and an unprecedented 45.1% on @arcprize
With a massive 1501 @arena (+50 on 2.5 pro), 91.9% on GPQA diamond and 37.5% on HLE, this is the most powerful LLM intelligence we've had access to yet.
And w/ DeepThink + tools an even more impressive 41.1% on HLE and an unprecedented 45.1% on @arcprize
lots of open questions. i personally am holding my excitement until they get answered
lots of open questions. i personally am holding my excitement until they get answered
i’ve been skeptical of this one, a 7M that beats o3-pro. bold claims!
ARC-AGI repro isn’t everything. Even HRM was reproduced, but HRM was also found to not be interesting
huggingface.co/arcprize/trm...
i’ve been skeptical of this one, a 7M that beats o3-pro. bold claims!
ARC-AGI repro isn’t everything. Even HRM was reproduced, but HRM was also found to not be interesting
huggingface.co/arcprize/trm...
Quote: https://x.com/arcprize/status/1976329182893441209
GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark 👏
It still lags the OG o3-preview model that...
Quote: https://x.com/arcprize/status/1976329182893441209
GPT-5 Pro now holds the highest verified frontier LLM score on ARC-AGI’s Semi-Private benchmark 👏
It still lags the OG o3-preview model that...
Mark Kretschmann / @mark_k:
Potentially huge AI breakthrough: "Less is More: Recursive Reasoning with Tiny Networks" A 7B model that scored very highly on the @arcprize benchmark. Francois Chollet called it "impressive work". [image]
Mark Kretschmann / @mark_k:
Potentially huge AI breakthrough: "Less is More: Recursive Reasoning with Tiny Networks" A 7B model that scored very highly on the @arcprize benchmark. Francois Chollet called it "impressive work". [image]
Unlock exclusive opportunities! Discover how arcprize is reshaping digital rewards and offering innovative ways to engage audiences. Learn about its unique approach & explore what makes it a rising star in the prize-linked savings space.
Unlock exclusive opportunities! Discover how arcprize is reshaping digital rewards and offering innovative ways to engage audiences. Learn about its unique approach & explore what makes it a rising star in the prize-linked savings space.
We covered @openai Codex, ICPC updates and Usage paper, @Meta display glasses, @reve incredible UI and Model, @LumaLabsAI HDR Ray3, chatted with @jerber888 about @arcprize SOTA with Grok, @theworldlabs Marble and SO MUCH MORE 👇
We covered @openai Codex, ICPC updates and Usage paper, @Meta display glasses, @reve incredible UI and Model, @LumaLabsAI HDR Ray3, chatted with @jerber888 about @arcprize SOTA with Grok, @theworldlabs Marble and SO MUCH MORE 👇
Quote: https://x.com/arcprize/status/1948453132184494471
So Qwen3-235b Instruct officially gets 11% on ARC-AGI-1 and 1.3% on ARC-AGI-2 (semi-private sets).
The key thing is at about $0.003–$0.004 pe...
Quote: https://x.com/arcprize/status/1948453132184494471
So Qwen3-235b Instruct officially gets 11% on ARC-AGI-1 and 1.3% on ARC-AGI-2 (semi-private sets).
The key thing is at about $0.003–$0.004 pe...
https://twitter.com/arcprize/status/1943168950763950555
https://twitter.com/arcprize/status/1943168950763950555
@arcprize:
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA [image]
@arcprize:
Grok 4 (Thinking) achieves new SOTA on ARC-AGI-2 with 15.9% This nearly doubles the previous commercial SOTA and tops the current Kaggle competition SOTA [image]
Quote: https://x.com/aiDotEngineer/status/1941632491292590235
Connections is a fun word game that is still unsolvable for top reasoning models. It's like the @arcprize AGI benchmark, except that it a...
Quote: https://x.com/aiDotEngineer/status/1941632491292590235
Connections is a fun word game that is still unsolvable for top reasoning models. It's like the @arcprize AGI benchmark, except that it a...