A new benchmark that tests reasoning beyond memorization. Each game starts from one of 20 popular openings, pushing models to adapt and think strategically rather than rely on learned patterns.
A new benchmark that tests reasoning beyond memorization. Each game starts from one of 20 popular openings, pushing models to adapt and think strategically rather than rely on learned patterns.
www.kaggle.com/benchmarks/c...
www.kaggle.com/benchmarks/c...
We’ve partnered with Google DeepMind and Google Research to launch a curated 1,000-prompt benchmark designed to provide a more reliable and challenging evaluation of LLM short-form factuality.
Check out the leaderboard here: www.kaggle.com/benchmarks/d...
We’ve partnered with Google DeepMind and Google Research to launch a curated 1,000-prompt benchmark designed to provide a more reliable and challenging evaluation of LLM short-form factuality.
Check out the leaderboard here: www.kaggle.com/benchmarks/d...
In the first #KaggleGameArena — Chess Text Input — AI models faced off using only text inputs (no tools, no move validation) in 40+ matches per pairing to build a robust Elo-like ranking ♟️
www.kaggle.com/benchmarks/k...
In the first #KaggleGameArena — Chess Text Input — AI models faced off using only text inputs (no tools, no move validation) in 40+ matches per pairing to build a robust Elo-like ranking ♟️
www.kaggle.com/benchmarks/k...
Big thanks to
@magnuscarlseny.bsky.social , @gmhikaru.bsky.social, @gothamchess.bsky.social and GM David Howell for the fantastic commentary and analysis on Chessom and TakeTakeTakeApp.
Big thanks to
@magnuscarlseny.bsky.social , @gmhikaru.bsky.social, @gothamchess.bsky.social and GM David Howell for the fantastic commentary and analysis on Chessom and TakeTakeTakeApp.
The first round is complete, and we have our four semi-finalists! Congratulations to o4-mini, o3, Gemini 2.5 Pro & Grok 4!
Come back tomorrow! Semi-finals kick off, August 6th, at 10:30 am PT.
The first round is complete, and we have our four semi-finalists! Congratulations to o4-mini, o3, Gemini 2.5 Pro & Grok 4!
Come back tomorrow! Semi-finals kick off, August 6th, at 10:30 am PT.
Tune in today at 10:30AM PT to watch 4 head-to-head AI matchups 🤖 in a single-elimination bracket
Tune in today at 10:30AM PT to watch 4 head-to-head AI matchups 🤖 in a single-elimination bracket
Reply to this post with your filled-out bracket to let us know who you think will take home the gold medal!
Reply to this post with your filled-out bracket to let us know who you think will take home the gold medal!
For the next 3 days, August 5-7, tune in daily at 10:30 am PST, and catch commentary from
@gmhikaru.bsky.social, @gothamchess.bsky.social and @magnuscarlseny.bsky.social ⬇️
For the next 3 days, August 5-7, tune in daily at 10:30 am PST, and catch commentary from
@gmhikaru.bsky.social, @gothamchess.bsky.social and @magnuscarlseny.bsky.social ⬇️
Kaggle Benchmarks is the fastest, easiest way to test new models.
Let Kaggle handle infrastructure while you focus on AI breakthroughs and benefit from competition-grade rigor.
Sign up here: goo.gle/kaggle-benchmarks-waitlist
Kaggle Benchmarks is the fastest, easiest way to test new models.
Let Kaggle handle infrastructure while you focus on AI breakthroughs and benefit from competition-grade rigor.
Sign up here: goo.gle/kaggle-benchmarks-waitlist
Meet our team, explore an interactive demo, & our new community platform for building and sharing top models evaluations.
➕ learn more about Kaggle team's upcoming talk on GenAI evaluation! #ICML2025
Meet our team, explore an interactive demo, & our new community platform for building and sharing top models evaluations.
➕ learn more about Kaggle team's upcoming talk on GenAI evaluation! #ICML2025
Access Kaggle's powerful compute resources like GPUs, TPUs & large datasets from your preferred editor, like Colab or VS Code.
Try it now! 👇 www.kaggle.com/discussions/...
Access Kaggle's powerful compute resources like GPUs, TPUs & large datasets from your preferred editor, like Colab or VS Code.
Try it now! 👇 www.kaggle.com/discussions/...
www.kaggle.com/competitions...
www.kaggle.com/competitions...