Current: Machine Learning PhD Student at TU Munich with Prof. Stephan Günnemann
Past: @deep-mind.bsky.social, Bosch Center for Artificial Intelligence, Bosch Connected Services
① Adaptive: Tailored to the specific LLM being attacked.
② Distributional: Considers the model’s distribution of responses.
③ Semantic: Focuses on genuinely harmful behavior (LLM-as-a-judge), not just a static starting phrase.
5/
① Adaptive: Tailored to the specific LLM being attacked.
② Distributional: Considers the model’s distribution of responses.
③ Semantic: Focuses on genuinely harmful behavior (LLM-as-a-judge), not just a static starting phrase.
5/
By Günnemann's lab @ TU Munich's lab & Google Research, w/ CAIS support
Here's how we did it. 🧵
By Günnemann's lab @ TU Munich's lab & Google Research, w/ CAIS support
Here's how we did it. 🧵