Jakub Łucki
jakublucki.bsky.social
Jakub Łucki
@jakublucki.bsky.social
Visiting Researcher at NASA JPL | Data Science MSc at ETH Zurich
Pinned
🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨

Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.

Here's what we found👇
Just arrived in Vancouver for #NeurIPS! If you’d like to chat about cutting-edge research, let me know! I’ve always been curious about far too many things (for my own good), so all topics are welcome.

If you can’t catch me during the week, stop by our poster on the weekend or join the presentation!
December 10, 2024 at 1:58 AM
Our paper on how unlearning fails to remove hazardous knowledge from LLM weights received 🏆 Best Paper 🏆 award at SoLaR @ NeurIPS!

Join my oral presentation on Saturday at 4:30 pm to learn more.
December 6, 2024 at 5:58 PM
🚨Unlearned hazardous knowledge can be retrieved from LLMs 🚨

Our results show that current unlearning methods for AI safety only obfuscate dangerous knowledge, just like standard safety training.

Here's what we found👇
December 6, 2024 at 5:47 PM