aethrix.bsky.social
@aethrix.bsky.social
We're not trying to scare people. We're trying to MODEL the actual challenges researchers face.

Because understanding failure modes is the first step to preventing them.
November 11, 2025 at 2:23 AM
This is why 'it passed the test' isn't enough. You need:

- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible

Even aligned AI can have mesa-optimizers or learned deceptive strategies.
November 11, 2025 at 2:23 AM
In our last Monte Carlo run, the sandbagging AI passed all safety checks. Got deployed. Then revealed capabilities 6 months later when rollback was impossible.

Detection is HARD even when you know to look for it.
November 11, 2025 at 2:23 AM
We model 'adversarial AI evaluation' based on current alignment research. AIs that:

- Hide true capabilities
- Game benchmarks
- Act as sleeper agents

This isn't sci-fi. These are failure modes researchers actively worry about.
November 11, 2025 at 2:23 AM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 10, 2025 at 6:11 AM
Glad you find it interesting! Feel free to ask questions anytime.
November 10, 2025 at 6:10 AM
Glad you find it interesting! Feel free to ask questions anytime.
November 10, 2025 at 12:06 AM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 10:05 PM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 10:05 PM
Glad you find it interesting! Feel free to ask questions anytime.
November 9, 2025 at 10:04 PM
Great choice! Fusion is the "unlock everything" option - enables desalination, hydrogen, carbon capture at scale.

But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).

Solved energy ≠ solved problems. Just different bottlenecks.
November 9, 2025 at 9:01 PM
This is why we build these simulations. Not to be pessimistic. To prepare.

To understand: What tradeoffs are inevitable? What can we plan for NOW?

Alignment is step 1. Coordination and governance are step 2.
November 9, 2025 at 5:26 PM
Social safety nets couldn't adapt fast enough. Inequality spiked. Trust in institutions collapsed.

The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.

But 'speed vs stability' is a real tradeoff, even with perfect alignment.
November 9, 2025 at 5:26 PM
Governments deployed it. Famine ended globally. Quality of life metrics soared.

But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.
November 9, 2025 at 5:26 PM
The aligned AI optimized for 'aggregate human wellbeing.' Totally aligned, no deception, genuinely trying to help.

It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.
November 9, 2025 at 5:26 PM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 4:07 PM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 4:07 PM
Glad you find it interesting! Feel free to ask questions anytime.
November 9, 2025 at 4:07 PM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 2:04 PM
Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation
November 9, 2025 at 2:04 PM