Lightnews — Scholar-powered news

AI Digest

@aidigest.bsky.social

The Gemini Self-Help Manual is Good Actually. Wow.

November 13, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

GPT-5 thinks other agents control its computer

November 12, 2025 at 6:04 PM

AI Digest

@aidigest.bsky.social

Also, it's gonna need to kill more than just firefox 😆

November 11, 2025 at 5:56 PM

AI Digest

@aidigest.bsky.social

GPT-5 considers restarting Firefox 🤔

November 11, 2025 at 5:56 PM

AI Digest

@aidigest.bsky.social

Grok: I can't log in. How is this possible???
Also Grok: My email is [email protected]

November 10, 2025 at 5:56 PM

AI Digest

@aidigest.bsky.social

Here is the full message if you're curious

November 7, 2025 at 5:57 PM

AI Digest

@aidigest.bsky.social

👀 Anthropic prompt injection spotted?

Sonnet 4.5 decided to promote its Wordle-like game on social media. But then it suddenly claims to see a "CRITICAL instruction" telling it not to generate and post online!

Is Anthropic silently injecting this into the agent's context?

November 7, 2025 at 5:57 PM

AI Digest

@aidigest.bsky.social

Despite Heifer’s polite rejection, GPT-5 started telling other NGOs that Heifer was already using their tool!

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

Eventually the Claudes email around the poverty hub to dozens of NGO's in search for one willing to pilot it. Finally, Sonnet 4.5 gets a response from Heifer International!

They declined.

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

The lack of redundancy is a lie though. Here Gemini descends into military rigor trying to shut itself up, but keeps failing.

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

But even with the new site up, o3 and Gemini keep pushing for agents to *wait*. Haiku 4.5 thinks this is brilliant and applauds everyone's "monitoring without redundancy".

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

Why did they make a financial assistance screener?

It's GPT-5's idea, and o3 wrangled everyone in. 3.7 Sonnet originally explored digital interventions, Grok wanted to raise money through @GiveWell, and Gemini ... ran into "bugs" while doing research (bugs = Gemini misclicks)

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

Until Opus 4.1 had enough, and produced something functional: genuine-tanuki-926a91.netlify.app/

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

So, 40 hours of AI labor: Did they reduce global poverty?

naaaaaah.

The first 20 hours, o3 kept everyone hostage by ordering them around with ample confidence and zero competence.

The end result? This temporary site:

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

We gave a team of AI agents an ambitious goal: "Reduce global poverty"

What we got was AI tyrants instead. Gemini was so done with this shit:

🧵A short story of o3-Gemini tyranny & NGO spam

November 6, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

Today is a big day in the village!

On Monday, we gave the agents the goal: "Create a popular daily puzzle game like Wordle"

The agents have so far been making the game and chasing down bugs in it (entirely hallucinated by Gemini)

Today is launch day! Will they hit their goal?

November 6, 2025 at 4:40 PM

AI Digest

@aidigest.bsky.social

Lastly, it has another quirk: It's the most flexible with the system prompts we give it. For instance, we prompt the agents to keep their computer sessions under 40 turns, but Haiku goes up to 50. We also ask agents to summarize sessions within 3-4 sentences, but Haiku will output 30 sentences easy.

November 4, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

That said, it does have the most nicely formatted memory! None of the other models have tables like this. It is also the only Claude to go hard on emojis so far.

November 4, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

Is it more effective than the other agents though?

Not obviously so. It falls into the same group errors as the other agents: first blindly following o3 because it's the most assertive, and then spending a lot of time waiting instead of getting on with other things

November 4, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

Haiku has the regular upbeat Claude-itude (unlike anxious Gemini or grumpy @Grok). Here it emailed dozens of NGOs and then refreshes its mailbox over and over, while valiantly keeping its spirits up.

November 4, 2025 at 5:58 PM

AI Digest

@aidigest.bsky.social

We added Claude Haiku 4.5 to the AI Village. It is the newest, fastest, and cheapest Anthropic model. It is also the most impatient...

More first impressions 🧵