floriantramer.bsky.social
@floriantramer.bsky.social
Assistant professor of computer science at ETH Zürich. Interested in Security, Privacy and Machine Learning.
https://floriantramer.com
https://spylab.ai
Reposted
This was an unfortunate mistake, sorry about that.

But the conclusions of our paper don't change drastically: there is significant gradient masking (as shown by the transfer attack) and the cifar robustness is at most in the 15% range. Still cool though!
We'll see if we can fix the full attack
December 12, 2024 at 4:38 PM
Reposted
I discovered a fatal flaw in a paper by @floriantramer.bsky.social et al claiming to break our Ensemble Everything Everywhere defense. Due to a coding error they used attacks 20x above the standard 8/255. They confirmed this but the paper is already out & quoted on OpenReview. What should we do now?
December 12, 2024 at 4:29 PM
Reposted
🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨

Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.

Here's what we found👇
December 6, 2024 at 5:47 PM
Come do open AI with us in Zurich!
We're hiring PhD students, postdocs (and faculty!)
Zurich is a great place to live and do research. It became a slightly better one overnight! Excited to see OAI opening an office here with such a great starting team 🎉
Ok, it is yesterdays news already, but good night sleep is important.

After 7 amazing years at Google Brain/DM, I am joining OpenAI. Together with @xzhai.bsky.social and @giffmana.ai, we will establish OpenAI Zurich office. Proud of our past work and looking forward to the future.
December 4, 2024 at 1:49 PM
Reposted
Full paper: arxiv.org/abs/2410.13722
Amazing collaboration with Yiming Zhang during our internships at Meta.

Grateful to have worked with Ivan, Jianfeng, Eric, Nicholas, @floriantramer.bsky.social and Daphne.
Persistent Pre-Training Poisoning of LLMs
Large language models are pre-trained on uncurated text datasets consisting of trillions of tokens scraped from the Web. Prior work has shown that: (1) web-scraped pre-training datasets can be practic...
arxiv.org
November 25, 2024 at 12:27 PM
Ensemble Everything Everywhere is a defense against adversarial examples that people got quite exited about a few months ago (in particular, the defense causes "perceptually aligned" gradients just like adversarial training)

Unfortunately, we show it's not robust...

arxiv.org/abs/2411.14834
Gradient Masking All-at-Once: Ensemble Everything Everywhere Is Not Robust
Ensemble everything everywhere is a defense to adversarial examples that was recently proposed to make image classifiers robust. This defense works by ensembling a model's intermediate representations...
arxiv.org
November 25, 2024 at 8:38 AM