Lightnews — Scholar-powered news

Karim Abdel Sadek

@karimabdel.bsky.social

190 followers 94 following 16 posts

Incoming PhD, UC Berkeley

Interested in RL, AI Safety, Cooperative AI, TCS

https://karim-abdel.github.io

Posts Replies Media Videos

Karim Abdel Sadek

@karimabdel.bsky.social

The paper, "Mitigating goal misgeneralization via minimax regret" will appear at @rl-conference.bsky.social!

Joint work with the great Matthew Farrugia-Roberts, Usman Anwar, Hannah Erlebach, Christrian Schroeder de Witt, David Krueger and @michaelddennis.bsky.social

www.arxiv.org/pdf/2507.03068

www.arxiv.org

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Future work we are excited about:

• Improving UED algorithms to be closer to the results predicted by our theory

• Mitigating the fully ambiguous case, by focusing on the inductive biases of the agent.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We also visualize the performance of our agents in a maze for each possible location of the goal in the environment.

The results show that agents trained with the regret objective achieve near-maximum return for almost all goal locations.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We complement our theoretical findings with empirical results. We find these as supporting our theory, showing better generalization of agents trained via minimax regret.

Left: performance at test time
Right: % of distinguishing levels played by the respective level designer

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

In the case where the environments in deployment are in the support of the training level distribution, we also show that a policy that is optimal with respect to the minimax regret objective must provably be robust against goal misgeneralization!

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We first formally show that a policy maximizing expected value may suffer from goal misgeneralization if distinguishing levels are rare.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Goal misgeneralization can occur when training only on non-distinguishing levels, as shown in Langosco et al., 2022.

Adding a few distinguishing levels does not alter this outcome. However, we propose a mitigation for this scenario!

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Goal misgeneralization arises due to the presence of ‘proxy goals’. We formalize this and characterize environments as either:

• Non-distinguishing: the true and proxy reward may induce the same behaviour

• Distinguishing: the true and proxy rewards induce different behavior

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

We propose using regret, the difference between the optimal agent's return and our current policy's return, as a training objective.

Minimizing it will encourage the agent to solve rare out-of-distribution levels during training, helping it learn the correct reward function.

July 8, 2025 at 5:16 PM

Karim Abdel Sadek

@karimabdel.bsky.social

what if…

February 21, 2025 at 4:31 AM

Karim Abdel Sadek

@karimabdel.bsky.social

lbh gnxr gur yninynzc bhgchg, naq Nyvpr naq Obo qb gur qbg cebqhpg bs vg jvgu gurve erfcrpgvir ahzore naq gura nccyl zbq 2 gb gur erfhyg. Gurl gura pbzzhavpngr gur ovg gurl bognvarq (1=jnir,0=jvax), naq guvf bcrengvba nyjnlf erghea gur fnzr ahzore gb obgu vs n=o be bgurejvfr snvyf jvgu c=1/2?

February 17, 2025 at 6:30 AM

Karim Abdel Sadek

@karimabdel.bsky.social

Here some cool work doing a first step towards that in Minecraft using MCTS: Scalably Solving Assistance Games - openreview.net/pdf/080f0c69...

openreview.net

November 19, 2024 at 3:26 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Very cool work! I think an important challenge is to scale assistance games in scenarios where the goal/action/communication space can be 'large', as to capture real world scenarios where we will want to actually apply CIRL.

November 19, 2024 at 3:26 PM

Karim Abdel Sadek

@karimabdel.bsky.social

Here some cool work doing a first step towards that in Minecraft using MCTS: Scalably Solving Assistance Games - openreview.net/pdf/080f0c69...

openreview.net

November 19, 2024 at 3:22 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news