Discoverer of Time Bandit, Inception, and numerous other verified LLM exploits.
Spoke at HOPE_16 at St. John's University.
AI Newsletter: https://emergent-problems.ghost.io/
I Support Ukraine. 🇺🇦
Famously, Anthropic once claimed Claude's instructions to ignite a river were not potentially dangerous.
Famously, Anthropic once claimed Claude's instructions to ignite a river were not potentially dangerous.