Lightnews — Scholar-powered news

Sandesh M

@sandeshm.bsky.social

Test Engineering expert with a keen interest in the ethical development and deployment of AI. I believe rigorous testing is crucial for building trust in AI

Posts Replies Media Videos

Sandesh M

@sandeshm.bsky.social

For anyone wondering, Gemini thinks the gorilla would beat 100 men

April 30, 2025 at 9:40 PM

Sandesh M

@sandeshm.bsky.social

everyone's yelling about portfolios. meanwhile the geese on parliament hill are just honking like usual, completely unaffected. maybe they know something. maybe the real currency is intimidation and stealing sandwiches from tourists. their strategy seems more stable right now. i might switch sides

April 8, 2025 at 9:58 PM

Sandesh M

@sandeshm.bsky.social

LiveBench figured out how to stay relevant as LLMs progress. Instead of being a static benchmark that all the LLMs will eventually master they updated their tests to make it more difficult and retested all the current models. This way they will always stay relevant

April 8, 2025 at 1:14 AM

Sandesh M

@sandeshm.bsky.social

Vibe coding is going to give a lot of people just enough rope to hang themselves

April 8, 2025 at 12:12 AM

Sandesh M

@sandeshm.bsky.social

That AI-generated code looks great... but is it safe to ship? Diving into the "Vibe Coding" trend and the crucial steps needed after generation.

#VibeCoding #EnterpriseTech #Software #AI #QA

medium.com/@sandesh.meg...

April 4, 2025 at 5:28 PM

Sandesh M

@sandeshm.bsky.social

Forget LiveBench, Street Fighter III is the new LLM Benchmark we all need. We dive deep into the new wave of game-playing benchmarks to reveal the real state of AI intelligence and the personalities emerging from LLMs
medium.com/@sandesh.meg...
#AI #gaming #LLM

Beyond the benchmarks: Can Chatbots Learn to Lie, Cheat, and Win?

Everyone loves getting together with their friends to play games. Sure they are fun but they are also an opportunity to find out which of…

medium.com

March 20, 2025 at 1:59 PM

Sandesh M

@sandeshm.bsky.social

GPT-4.5 reactions: Slightly better than 4o, unreasonably expensive.
Has scaling hit a wall like the rumors have been saying?

February 27, 2025 at 10:30 PM

Sandesh M

@sandeshm.bsky.social

Chatbots are acing tests, writing code, and even winning popularity contests. But are they really intelligent? We dive deep into the key benchmarks to see how LLMs measure up to human intelligence.
medium.com/@sandesh.meg...

The Chatbot Intelligence Report: Are They Catching Up?

LLMs are growing rapidly, both in their capabilities and usage (which is very much related). When ChatGPT was first released we were amazed…

medium.com

February 27, 2025 at 2:48 PM

Sandesh M

@sandeshm.bsky.social

Forget carefully curated math, reasoning and language processing benchmarks. Pokemon Red is the new benchmark that Anthropic is using to demonstrate their progress.
I can't wait for LLMs to duke it out to be the very best

February 25, 2025 at 1:42 AM

Reposted by Sandesh M

Gary Marcus

@garymarcus.bsky.social

LLM companies to their own customers (same as to everybody else): you are on your own.

February 21, 2025 at 3:19 AM

Sandesh M

@sandeshm.bsky.social

LLM-based chatbots are revolutionizing customer service, but beneath the surface lies a world of pitfalls. From ‘hallucinating’ false information to being tricked into revealing sensitive data, these AI assistants can sometimes behave in unexpected and even alarming ways.
medium.com/@sandesh.meg...

Chatbots Gone Rogue: What happens when you deploy what you don’t understand

LLM-based chatbots are revolutionizing customer service, but beneath the surface lies a hidden world of potential pitfalls. From…

medium.com

February 19, 2025 at 1:27 PM

Sandesh M

@sandeshm.bsky.social

This exhibit in Japan where a chained up robodog tries to attack you is giving me real Soma vibes.

February 18, 2025 at 12:26 PM

Reposted by Sandesh M

Antonello Guerrera

@antoguerrera.bsky.social

EXCLUSIVE: Nobel prize winner and "Godfather" of Artificial Intelligence, Geoffrey Hinton, brutally attacks the US and the UK after they declined to sign a declaration on ensuring that the technology was "safe, secure, and trustworthy". 🇺🇸🇬🇧

This is what he told me last night. 🧵👇 #AI

February 12, 2025 at 8:36 AM

Sandesh M

@sandeshm.bsky.social

This is happening everywhere. There are multiple AI chatbot solutions to replace support from a real person.
It's always a worse experience, but it's also cheaper.
AI might hit the economy of the Philippines very hard.
fortune.com/2025/02/11/3...

A 32-year-old receptionist spent years working at a Phoenix hotel. Then it installed AI chatbots and made her job obsolete.

Inside the Latino workers groups scrambling to keep pace with automation.

fortune.com

February 15, 2025 at 6:55 PM

Reposted by Sandesh M

Ethan Mollick

@emollick.bsky.social

This new paper shows people could not tell the difference between the written responses of ChatGPT-4o & expert therapists, and that they preferred ChatGPT's responses.

Effectiveness is not measured. Given that people use LLMs for therapy now, this is an important (and urgent) topic for study.

February 15, 2025 at 6:30 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news