They do it with harnesses on frontier models rather than RL or new models. This task-specific optimization/specialization is The Way.
poetiq.ai/posts/arcagi...
They do it with harnesses on frontier models rather than RL or new models. This task-specific optimization/specialization is The Way.
poetiq.ai/posts/arcagi...
This is huge.
Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...
This is huge.
Only caveat is that they evaluated on the public set - it might have been used in post training of Gemini 3? Looking forward to see private eval results! poetiq.ai/posts/arcagi...
This may be directly related to the amount of tokens involved as grid size increases.
Source: arcprize.org/blog/oai-o3-...
#ml #ai #arcagi #benchmark #openai
Source: arcprize.org/blog/oai-o3-...
#ml #ai #arcagi #benchmark #openai
#AI #AGI #Intelligence #Chollet #ARCAGI #PhilosophyOfAI
#AI #AGI #Intelligence #Chollet #ARCAGI #PhilosophyOfAI
arcagi-o3-viz.netlify.app
arcagi-o3-viz.netlify.app