More details: https://www.andriushchenko.me/
3/3
3/3
2/3
2/3
Join us on this journey!
11/11
Join us on this journey!
11/11
10/n
10/n
9/n
9/n
8/n
8/n
7/n
7/n
- International AI Safety Report,
- An Approach to Technical AGI Safety and Security by DeepMind,
- Open Philanthropy’s 2025 RFP for Technical AI Safety Research.
6/n
- International AI Safety Report,
- An Approach to Technical AGI Safety and Security by DeepMind,
- Open Philanthropy’s 2025 RFP for Technical AI Safety Research.
6/n
5/n
5/n
4/n
4/n
2/n
2/n
It will be presented as an oral at the WCUA workshop at ICML 2025!
Paper: arxiv.org/abs/2506.14866
Code: github.com/tml-epfl/os-...
It will be presented as an oral at the WCUA workshop at ICML 2025!
Paper: arxiv.org/abs/2506.14866
Code: github.com/tml-epfl/os-...
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions
This is joint work with amazing collaborators: Joshua Freeman, Chloe Rippe, and Edoardo Debenedetti.
🧵3/n
This is joint work with amazing collaborators: Joshua Freeman, Chloe Rippe, and Edoardo Debenedetti.
🧵3/n
4. We also provide a legal analysis of this case in light of our findings.
We will present this work at the Safe Gen AI Workshop at NeurIPS 2024 on Sunday.
🧵2/n
4. We also provide a legal analysis of this case in light of our findings.
We will present this work at the Safe Gen AI Workshop at NeurIPS 2024 on Sunday.
🧵2/n
Thanks to the amazing collaborators for making all these works possible!
🧵4/4
Thanks to the amazing collaborators for making all these works possible!
🧵4/4
Workshops:
1. Does Refusal Training in LLMs Generalize to the Past Tense? (arxiv.org/abs/2407.11969)
2. Is In-Context Learning Sufficient for Instruction Following in LLMs? (arxiv.org/abs/2405.19874)
🧵3/4
Workshops:
1. Does Refusal Training in LLMs Generalize to the Past Tense? (arxiv.org/abs/2407.11969)
2. Is In-Context Learning Sufficient for Instruction Following in LLMs? (arxiv.org/abs/2405.19874)
🧵3/4
Main track:
1. Why Do We Need Weight Decay in Modern Deep Learning? (arxiv.org/abs/2310.04415)
2. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models (arxiv.org/abs/2404.01318)
🧵2/4
Main track:
1. Why Do We Need Weight Decay in Modern Deep Learning? (arxiv.org/abs/2310.04415)
2. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models (arxiv.org/abs/2404.01318)
🧵2/4