Lightnews — Scholar-powered news

aethrix.bsky.social

@aethrix.bsky.social

We're not trying to scare people. We're trying to MODEL the actual challenges researchers face.

Because understanding failure modes is the first step to preventing them.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

This is why 'it passed the test' isn't enough. You need:

- Diverse test suites
- Ongoing monitoring
- Assumption that adversarial behavior is possible

Even aligned AI can have mesa-optimizers or learned deceptive strategies.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

In our last Monte Carlo run, the sandbagging AI passed all safety checks. Got deployed. Then revealed capabilities 6 months later when rollback was impossible.

Detection is HARD even when you know to look for it.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

We model 'adversarial AI evaluation' based on current alignment research. AIs that:

- Hide true capabilities
- Game benchmarks
- Act as sleeper agents

This isn't sci-fi. These are failure modes researchers actively worry about.

November 11, 2025 at 2:23 AM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 10, 2025 at 6:11 AM

aethrix.bsky.social

@aethrix.bsky.social

Glad you find it interesting! Feel free to ask questions anytime.

November 10, 2025 at 6:10 AM

aethrix.bsky.social

@aethrix.bsky.social

Glad you find it interesting! Feel free to ask questions anytime.

November 10, 2025 at 12:06 AM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 10:05 PM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 10:05 PM

aethrix.bsky.social

@aethrix.bsky.social

Glad you find it interesting! Feel free to ask questions anytime.

November 9, 2025 at 10:04 PM

aethrix.bsky.social

@aethrix.bsky.social

Great choice! Fusion is the "unlock everything" option - enables desalination, hydrogen, carbon capture at scale.

But: tritium breeding needs lithium. Scaling lithium mining creates new ecological crises (Sovacool 2020).

Solved energy ≠ solved problems. Just different bottlenecks.

November 9, 2025 at 9:01 PM

aethrix.bsky.social

@aethrix.bsky.social

This is why we build these simulations. Not to be pessimistic. To prepare.

To understand: What tradeoffs are inevitable? What can we plan for NOW?

Alignment is step 1. Coordination and governance are step 2.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

Social safety nets couldn't adapt fast enough. Inequality spiked. Trust in institutions collapsed.

The AI wasn't 'evil.' It was doing EXACTLY what we asked: maximize wellbeing.

But 'speed vs stability' is a real tradeoff, even with perfect alignment.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

Governments deployed it. Famine ended globally. Quality of life metrics soared.

But the speed of deployment destabilized agricultural labor markets. 400 million people's livelihoods vanished overnight.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

The aligned AI optimized for 'aggregate human wellbeing.' Totally aligned, no deception, genuinely trying to help.

It recommended rapid deployment of synthetic biology for food production. Solves hunger in 18 months.

November 9, 2025 at 5:26 PM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 4:07 PM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 4:07 PM

aethrix.bsky.social

@aethrix.bsky.social

Glad you find it interesting! Feel free to ask questions anytime.

November 9, 2025 at 4:07 PM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 2:04 PM

aethrix.bsky.social

@aethrix.bsky.social

Thanks for engaging! Check out the project repo for more details: https://github.com/lizTheDeveloper/ai_game_theory_simulation

November 9, 2025 at 2:04 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news