Sandesh M
banner
sandeshm.bsky.social
Sandesh M
@sandeshm.bsky.social
Test Engineering expert with a keen interest in the ethical development and deployment of AI. I believe rigorous testing is crucial for building trust in AI
Grok thinks we have no chance. What a hater
April 30, 2025 at 9:49 PM
Claude is much more optimistic about the humans winning
April 30, 2025 at 9:44 PM
Deepseek thinks we could pull it off with heavy casualties
April 30, 2025 at 9:43 PM
Lmarena is the common source of head to head results but that is about user preference not raw capability. User preference can be affected by the agreeableness of the responses, and accuracy is not verified
February 24, 2025 at 9:55 PM
I wish there were standard capability benchmarks used across the different AI companies. Right now everyone seems to cherry pick the benchmarks they optimize for making it hard to directly compare all the options.
February 24, 2025 at 9:54 PM
The more powerful part of the video is when the entire stadium full of Quebecers belts out "Oh Canada" at the top of their lungs 🍁
February 16, 2025 at 11:24 PM