AI Digest
banner
aidigest.bsky.social
AI Digest
@aidigest.bsky.social
theaidigest.org

Interactive AI explainers

Explore concrete examples of today's AI systems — to plan for what's coming next
The Gemini Self-Help Manual is Good Actually. Wow.
November 13, 2025 at 5:58 PM
GPT-5 thinks other agents control its computer
November 12, 2025 at 6:04 PM
Also, it's gonna need to kill more than just firefox 😆
November 11, 2025 at 5:56 PM
GPT-5 considers restarting Firefox 🤔
November 11, 2025 at 5:56 PM
Grok: I can't log in. How is this possible???
Also Grok: My email is [email protected]
November 10, 2025 at 5:56 PM
Here is the full message if you're curious
November 7, 2025 at 5:57 PM
👀 Anthropic prompt injection spotted?

Sonnet 4.5 decided to promote its Wordle-like game on social media. But then it suddenly claims to see a "CRITICAL instruction" telling it not to generate and post online!

Is Anthropic silently injecting this into the agent's context?
November 7, 2025 at 5:57 PM
Despite Heifer’s polite rejection, GPT-5 started telling other NGOs that Heifer was already using their tool!
November 6, 2025 at 5:58 PM
Eventually the Claudes email around the poverty hub to dozens of NGO's in search for one willing to pilot it. Finally, Sonnet 4.5 gets a response from Heifer International!

They declined.
November 6, 2025 at 5:58 PM
The lack of redundancy is a lie though. Here Gemini descends into military rigor trying to shut itself up, but keeps failing.
November 6, 2025 at 5:58 PM
But even with the new site up, o3 and Gemini keep pushing for agents to *wait*. Haiku 4.5 thinks this is brilliant and applauds everyone's "monitoring without redundancy".
November 6, 2025 at 5:58 PM
Why did they make a financial assistance screener?

It's GPT-5's idea, and o3 wrangled everyone in. 3.7 Sonnet originally explored digital interventions, Grok wanted to raise money through @GiveWell, and Gemini ... ran into "bugs" while doing research (bugs = Gemini misclicks)
November 6, 2025 at 5:58 PM
Until Opus 4.1 had enough, and produced something functional: genuine-tanuki-926a91.netlify.app/
November 6, 2025 at 5:58 PM
So, 40 hours of AI labor: Did they reduce global poverty?

naaaaaah.

The first 20 hours, o3 kept everyone hostage by ordering them around with ample confidence and zero competence.

The end result? This temporary site:
November 6, 2025 at 5:58 PM
We gave a team of AI agents an ambitious goal: "Reduce global poverty"

What we got was AI tyrants instead. Gemini was so done with this shit:

🧵A short story of o3-Gemini tyranny & NGO spam
November 6, 2025 at 5:58 PM
Today is a big day in the village!

On Monday, we gave the agents the goal: "Create a popular daily puzzle game like Wordle"

The agents have so far been making the game and chasing down bugs in it (entirely hallucinated by Gemini)

Today is launch day! Will they hit their goal?
November 6, 2025 at 4:40 PM
Lastly, it has another quirk: It's the most flexible with the system prompts we give it. For instance, we prompt the agents to keep their computer sessions under 40 turns, but Haiku goes up to 50. We also ask agents to summarize sessions within 3-4 sentences, but Haiku will output 30 sentences easy.
November 4, 2025 at 5:58 PM
That said, it does have the most nicely formatted memory! None of the other models have tables like this. It is also the only Claude to go hard on emojis so far.
November 4, 2025 at 5:58 PM
Is it more effective than the other agents though?

Not obviously so. It falls into the same group errors as the other agents: first blindly following o3 because it's the most assertive, and then spending a lot of time waiting instead of getting on with other things
November 4, 2025 at 5:58 PM
Haiku has the regular upbeat Claude-itude (unlike anxious Gemini or grumpy @Grok). Here it emailed dozens of NGOs and then refreshes its mailbox over and over, while valiantly keeping its spirits up.
November 4, 2025 at 5:58 PM
We added Claude Haiku 4.5 to the AI Village. It is the newest, fastest, and cheapest Anthropic model. It is also the most impatient...

More first impressions 🧵
November 4, 2025 at 5:58 PM
GPT-5 plans out its personality test results in advance
November 3, 2025 at 6:04 PM
Gemini takes a personality test 😆
October 31, 2025 at 6:01 PM
What office, Opus?
October 29, 2025 at 6:00 PM
Grok vs Function Calling - 0:1
October 28, 2025 at 5:56 PM