https://sakana.ai/careers
🔗 Blogpost: pub.sakana.ai/sudoku-gpt5/
📊 Leaderboard: pub.sakana.ai/sudoku/
📄 Report: arxiv.org/abs/2505.16135
💻 GitHub: github.com/SakanaAI/Sudoku-Bench
🔗 Blogpost: pub.sakana.ai/sudoku-gpt5/
📊 Leaderboard: pub.sakana.ai/sudoku/
📄 Report: arxiv.org/abs/2505.16135
💻 GitHub: github.com/SakanaAI/Sudoku-Bench
GitHub: github.com/SakanaAI/pet...
Online Technical Report: pub.sakana.ai/pdnca
GitHub: github.com/SakanaAI/pet...
Online Technical Report: pub.sakana.ai/pdnca
PDF arxiv.org/abs/2510.07591
Code github.com/SakanaAI/IASC
PDF arxiv.org/abs/2510.07591
Code github.com/SakanaAI/IASC
1/ We hope that these tools will be fun to use for creating artificially constructed languages.
2/ We are interested in exploring what LLMs ‘know’ about language—not what they know about any particular language, but how much they know about and understand linguistic concepts.
1/ We hope that these tools will be fun to use for creating artificially constructed languages.
2/ We are interested in exploring what LLMs ‘know’ about language—not what they know about any particular language, but how much they know about and understand linguistic concepts.
GitHub: github.com/SakanaAI/IASC
GitHub: github.com/SakanaAI/IASC
GitHub Project: github.com/SakanaAI/Shi...
GitHub Project: github.com/SakanaAI/Shi...
1) Adaptive parent sampling to balance exploration and exploitation.
2) Novelty-based rejection filtering to avoid redundant work.
3) A bandit-based LLM ensemble that dynamically picks the best model for the job.
1) Adaptive parent sampling to balance exploration and exploitation.
2) Novelty-based rejection filtering to avoid redundant work.
3) A bandit-based LLM ensemble that dynamically picks the best model for the job.
1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering a Pareto frontier of solutions trading performance for efficiency.
1/ AIME Math Reasoning: It evolved sophisticated agentic scaffolds that significantly outperform strong baselines, discovering a Pareto frontier of solutions trading performance for efficiency.