Maksym Andriushchenko
banner
maksym-andr.bsky.social
Maksym Andriushchenko
@maksym-andr.bsky.social
Faculty at ‪the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems. Leading the AI Safety and Alignment group. PhD from EPFL supported by Google & OpenPhil PhD fellowships.

More details: https://www.andriushchenko.me/
🚨 Incredibly excited to share that I'm starting my research group focusing on AI safety and alignment at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025! 🚨

1/n
August 6, 2025 at 3:43 PM
Main findings based on frontier LLMs:
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions
June 19, 2025 at 3:28 PM
🚨Excited to release OS-Harm! 🚨

The safety of computer use agents has been largely overlooked.

We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
June 19, 2025 at 3:28 PM
🚨Excited to share our new work!

1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.

2. Very large models, particularly with >100B parameters, have memorized significantly more.

🧵1/n
December 9, 2024 at 10:01 PM