Lightnews — Scholar-powered news

Maksym Andriushchenko

@maksym-andr.bsky.social

370 followers 270 following 26 posts

Faculty at ‪the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems. Leading the AI Safety and Alignment group. PhD from EPFL supported by Google & OpenPhil PhD fellowships.

More details: https://www.andriushchenko.me/

Posts Replies Media Videos

Maksym Andriushchenko

@maksym-andr.bsky.social

🙌 It's been a pleasure to participate in this project alongside so many amazing experts!

3/3

October 15, 2025 at 12:24 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

This is the first key update before the release of the full report early next year. Both this update and the upcoming report discuss the critically important topics of AI risks and capabilities in a balanced way, carefully weighing all available scientific evidence.

2/3

October 15, 2025 at 12:24 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Thank you so much, Nicolas! :)

August 6, 2025 at 7:35 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

We believe getting this—some may call it "AGI"—right is one of the most important challenges of our time.

Join us on this journey!

11/11

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Taking this into account, we are only interested in studying methods that are general and scale with intelligence and compute. Everything that helps to advance their safety and alignment with societal values is relevant to us.

10/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Broader vision. Current machine learning methods are fundamentally different from what they used to be pre-2022. The Bitter Lesson summarized and predicted this shift very well back in 2019: "general methods that leverage computation are ultimately the most effective".

9/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

... —literally anything that can be genuinely useful for other researchers and the general public.

8/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Research style. We are not necessarily interested in getting X papers accepted at NeurIPS/ICML/ICLR. We are interested in making an impact: this can be papers (and NeurIPS/ICML/ICLR are great venues), but also open-source repositories, benchmarks, blog posts, even social media posts ...

7/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

For more information about research topics relevant to our group, please check the following documents:
- International AI Safety Report,
- An Approach to Technical AGI Safety and Security by DeepMind,
- Open Philanthropy’s 2025 RFP for Technical AI Safety Research.

6/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

We're also interested in rigorous AI evaluations and informing the public about the risks and capabilities of frontier AI models. Additionally, we aim to advance our understanding of how AI models generalize, which is crucial for ensuring their steerability and reducing associated risks.

5/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Research group. We will focus on developing algorithmic solutions to reduce harms from advanced general-purpose AI models. We're particularly interested in alignment of autonomous LLM agents, which are becoming increasingly capable and pose a variety of emerging risks.

4/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Please fill the Google form forms.gle/GJB9bp7gRXAf... if you're interested!

3/n

Expression of Interest: AI Safety and Alignment Group (Maksym Andriushchenko)

Hi 👋! My name is Maksym Andriushchenko. I'm starting my research group at the ELLIS Institute Tübingen and Max Planck Institute for Intelligent Systems in September 2025 that focuses on AI safety and ...

forms.gle

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Hiring. I'm looking for multiple PhD students: both those able to start in Fall 2025 and through centralized programs like CLS, IMPRS, and ELLIS (the deadlines are in November) to start in Spring–Fall 2026. I'm also searching for postdocs, master's thesis students, and research interns.

2/n

August 6, 2025 at 3:43 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

This is joint work with amazing collaborators: Thomas Kuntz, Agatha Duzan, Hao Zhao, Francesco Croce, Zico Kolter, and Nicolas Flammarion.

It will be presented as an oral at the WCUA workshop at ICML 2025!

Paper: arxiv.org/abs/2506.14866
Code: github.com/tml-epfl/os-...

GitHub - tml-epfl/os-harm: OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents

OS-Harm: A Benchmark for Measuring Safety of Computer Use Agents - tml-epfl/os-harm

github.com

June 19, 2025 at 3:28 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Main findings based on frontier LLMs:
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions

June 19, 2025 at 3:28 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Paper link: josh-freeman.github.io/resources/ny....

This is joint work with amazing collaborators: Joshua Freeman, Chloe Rippe, and Edoardo Debenedetti.

🧵3/n

josh-freeman.github.io

December 9, 2024 at 10:01 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

3. However, the memorized articles cited in the NYT lawsuit were clearly cherry-picked—random NYT articles have not been memorized.

4. We also provide a legal analysis of this case in light of our findings.

We will present this work at the Safe Gen AI Workshop at NeurIPS 2024 on Sunday.

🧵2/n

December 9, 2024 at 10:01 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

3. Exploring Memorization and Copyright Violation in Frontier LLMs: A Study of the New York Times v. OpenAI 2023 Lawsuit (openreview.net/forum?id=C66...)

Thanks to the amazing collaborators for making all these works possible!

🧵4/4

Exploring Memorization and Copyright Violation in Frontier LLMs: A...

Copyright infringement in frontier large language models (LLMs) has received much attention recently due to the case NYT v. Microsoft, filed in December 2023. The New York Times claims that GPT-4...

openreview.net

December 7, 2024 at 7:26 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

3. Improving Alignment and Robustness with Circuit Breakers (arxiv.org/abs/2406.04313)

Workshops:
1. Does Refusal Training in LLMs Generalize to the Past Tense? (arxiv.org/abs/2407.11969)
2. Is In-Context Learning Sufficient for Instruction Following in LLMs? (arxiv.org/abs/2405.19874)

🧵3/4

Improving Alignment and Robustness with Circuit Breakers

AI systems can take harmful actions and are highly vulnerable to adversarial attacks. We present an approach, inspired by recent advances in representation engineering, that interrupts the models as t...

arxiv.org

December 7, 2024 at 7:26 PM

Maksym Andriushchenko

@maksym-andr.bsky.social

Here are the 6 papers that we are presenting at NeurIPS 2024:

Main track:
1. Why Do We Need Weight Decay in Modern Deep Learning? (arxiv.org/abs/2310.04415)
2. JailbreakBench: An Open Robustness Benchmark for Jailbreaking Large Language Models (arxiv.org/abs/2404.01318)

🧵2/4

Why Do We Need Weight Decay in Modern Deep Learning?

Weight decay is a broadly used technique for training state-of-the-art deep networks from image classification to large language models. Despite its widespread usage and being extensively studied in t...

arxiv.org

December 7, 2024 at 7:26 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news