https://floriantramer.com
https://spylab.ai
But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack
But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack
Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.
Here's what we found👇
Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.
Here's what we found👇
We're hiring PhD students, postdocs (and faculty!)
After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
We're hiring PhD students, postdocs (and faculty!)
Amazing collaboration with Yiming Zhang during our internships at Meta.
Grateful to have worked with Ivan, Jianfeng, Eric, Nicholas, @floriantramer.bsky.social and Daphne.
Amazing collaboration with Yiming Zhang during our internships at Meta.
Grateful to have worked with Ivan, Jianfeng, Eric, Nicholas, @floriantramer.bsky.social and Daphne.
Unfortunately, we show it's not robust...
arxiv.org/abs/2411.14834
Unfortunately, we show it's not robust...
arxiv.org/abs/2411.14834