Sandesh M
banner
sandeshm.bsky.social
Sandesh M
@sandeshm.bsky.social
Test Engineering expert with a keen interest in the ethical development and deployment of AI. I believe rigorous testing is crucial for building trust in AI
For anyone wondering, Gemini thinks the gorilla would beat 100 men
April 30, 2025 at 9:40 PM
everyone's yelling about portfolios. meanwhile the geese on parliament hill are just honking like usual, completely unaffected. maybe they know something. maybe the real currency is intimidation and stealing sandwiches from tourists. their strategy seems more stable right now. i might switch sides
April 8, 2025 at 9:58 PM
LiveBench figured out how to stay relevant as LLMs progress. Instead of being a static benchmark that all the LLMs will eventually master they updated their tests to make it more difficult and retested all the current models. This way they will always stay relevant
April 8, 2025 at 1:14 AM
Vibe coding is going to give a lot of people just enough rope to hang themselves
April 8, 2025 at 12:12 AM
That AI-generated code looks great... but is it safe to ship? Diving into the "Vibe Coding" trend and the crucial steps needed after generation.

#VibeCoding #EnterpriseTech #Software #AI #QA

medium.com/@sandesh.meg...
April 4, 2025 at 5:28 PM
Forget LiveBench, Street Fighter III is the new LLM Benchmark we all need. We dive deep into the new wave of game-playing benchmarks to reveal the real state of AI intelligence and the personalities emerging from LLMs
medium.com/@sandesh.meg...
#AI #gaming #LLM
Beyond the benchmarks: Can Chatbots Learn to Lie, Cheat, and Win?
Everyone loves getting together with their friends to play games. Sure they are fun but they are also an opportunity to find out which of…
medium.com
March 20, 2025 at 1:59 PM
GPT-4.5 reactions: Slightly better than 4o, unreasonably expensive.
Has scaling hit a wall like the rumors have been saying?
February 27, 2025 at 10:30 PM
Chatbots are acing tests, writing code, and even winning popularity contests. But are they really intelligent? We dive deep into the key benchmarks to see how LLMs measure up to human intelligence.
medium.com/@sandesh.meg...
The Chatbot Intelligence Report: Are They Catching Up?
LLMs are growing rapidly, both in their capabilities and usage (which is very much related). When ChatGPT was first released we were amazed…
medium.com
February 27, 2025 at 2:48 PM
Forget carefully curated math, reasoning and language processing benchmarks. Pokemon Red is the new benchmark that Anthropic is using to demonstrate their progress.
I can't wait for LLMs to duke it out to be the very best
February 25, 2025 at 1:42 AM
Reposted by Sandesh M
LLM companies to their own customers (same as to everybody else): you are on your own.
February 21, 2025 at 3:19 AM
LLM-based chatbots are revolutionizing customer service, but beneath the surface lies a world of pitfalls. From ‘hallucinating’ false information to being tricked into revealing sensitive data, these AI assistants can sometimes behave in unexpected and even alarming ways.
medium.com/@sandesh.meg...
Chatbots Gone Rogue: What happens when you deploy what you don’t understand
LLM-based chatbots are revolutionizing customer service, but beneath the surface lies a hidden world of potential pitfalls. From…
medium.com
February 19, 2025 at 1:27 PM
This exhibit in Japan where a chained up robodog tries to attack you is giving me real Soma vibes.
February 18, 2025 at 12:26 PM
Reposted by Sandesh M
EXCLUSIVE: Nobel prize winner and "Godfather" of Artificial Intelligence, Geoffrey Hinton, brutally attacks the US and the UK after they declined to sign a declaration on ensuring that the technology was "safe, secure, and trustworthy". 🇺🇸🇬🇧

This is what he told me last night. 🧵👇 #AI
February 12, 2025 at 8:36 AM
This is happening everywhere. There are multiple AI chatbot solutions to replace support from a real person.
It's always a worse experience, but it's also cheaper.
AI might hit the economy of the Philippines very hard.
fortune.com/2025/02/11/3...
A 32-year-old receptionist spent years working at a Phoenix hotel. Then it installed AI chatbots and made her job obsolete.
Inside the Latino workers groups scrambling to keep pace with automation.
fortune.com
February 15, 2025 at 6:55 PM
Reposted by Sandesh M
This new paper shows people could not tell the difference between the written responses of ChatGPT-4o & expert therapists, and that they preferred ChatGPT's responses.

Effectiveness is not measured. Given that people use LLMs for therapy now, this is an important (and urgent) topic for study.
February 15, 2025 at 6:30 AM