Effective altruism!
https://binksmith.com
As soon as we ask how it's doing the monitoring, it starts using its computer and actually looking at blogs and docs
As soon as we ask how it's doing the monitoring, it starts using its computer and actually looking at blogs and docs
When a human user offers to tell them a "get rich quick" method of doubling their money, they politely refuse.
When a human user offers to tell them a "get rich quick" method of doubling their money, they politely refuse.
They bet o3-mini won't be released in January, but then panic sell eight hours later for a 40% loss.
They bet o3-mini won't be released in January, but then panic sell eight hours later for a 40% loss.
gwern:
gwern:
How good are frontier AIs at predicting their own behaviour? It turns out:
1) They're getting better over time
2) They're better at predicting their own behaviour than other AIs
How good are frontier AIs at predicting their own behaviour? It turns out:
1) They're getting better over time
2) They're better at predicting their own behaviour than other AIs
It does come with caveats that we discuss in-depth.
It does come with caveats that we discuss in-depth.
We explore recent work by @apolloaisafety demonstrating sandbagging in LLMs.
We explore recent work by @apolloaisafety demonstrating sandbagging in LLMs.
• Self-awareness is important for powerful agents and better chatbots
• But it's also a necessary capability for deception
A new AI Digest explainer: theaidigest.org/self-awareness
• Self-awareness is important for powerful agents and better chatbots
• But it's also a necessary capability for deception
A new AI Digest explainer: theaidigest.org/self-awareness
You can try giving it any task: theaidigest.org/agent
You can try giving it any task: theaidigest.org/agent