OATML Group Leader;
Director of Research at the UK government's AI Safety Institute (formerly UK Taskforce on Frontier AI)
When an agent sees a trigger image it's instructed to execute malicious code and then share the image on social media to trigger other users' agents
This is a chance to talk about agent security 👇
Our latest research exposes critical security risks in AI assistants. An attacker can hijack them by simply posting an image on social media and waiting for it to be captured. [1/6] 🧵
When an agent sees a trigger image it's instructed to execute malicious code and then share the image on social media to trigger other users' agents
This is a chance to talk about agent security 👇
🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨
Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇
Paper: arxiv.org/pdf/2501.04952
1/8
@TingchenFu @AmyPrb @StephenLCasper
@AmartyaSanyal @Adel_Bibi @aidanogara_ @_robertkirk @ben_s_bucknall @fiiiiiist Luke Ong @philiptorr Kwok-Yan Lam @RobertTrager
@DavidSKrueger @sorenmind José Hernández-Orallo @megamor2.bsky.social @yaringal.bsky.social
@TingchenFu @AmyPrb @StephenLCasper
@AmartyaSanyal @Adel_Bibi @aidanogara_ @_robertkirk @ben_s_bucknall @fiiiiiist Luke Ong @philiptorr Kwok-Yan Lam @RobertTrager
@DavidSKrueger @sorenmind José Hernández-Orallo @megamor2.bsky.social @yaringal.bsky.social
🚨 New Paper Alert: Open Problem in Machine Unlearning for AI Safety 🚨
Can AI truly "forget"? While unlearning promises data removal, controlling emergent capabilities is a inherent challenge. Here's why it matters: 👇
Paper: arxiv.org/pdf/2501.04952
1/8
#CompSciOxford #12DaysOfChristmas #Oxmas
#CompSciOxford #12DaysOfChristmas #Oxmas
We will be designing the program in the coming months and will soon share ways to get involved with this new community.
Read more here: cifar.ca/cifarnews/20...
We will be designing the program in the coming months and will soon share ways to get involved with this new community.
Read more here: cifar.ca/cifarnews/20...
If this sounds interesting, application deadline for funding is 3/12
Please share with people you think this might be relevant to!
oatml.cs.ox.ac.uk/apply.html
If this sounds interesting, application deadline for funding is 3/12
Please share with people you think this might be relevant to!
oatml.cs.ox.ac.uk/apply.html
go.bsky.app/JYH5Z6M
go.bsky.app/JYH5Z6M
github.com/context-labs...
github.com/context-labs...
Please add the account to your starter packages.
Please add the account to your starter packages.
go.bsky.app/MdVxrtD
go.bsky.app/MdVxrtD
ethz.ch/en/the-eth-z.... Deadline Nov 30 for full consideration. ETH Zurich is a vibrant environment for AI research with the ETH AI Center etc. Please help spread the word!
ethz.ch/en/the-eth-z.... Deadline Nov 30 for full consideration. ETH Zurich is a vibrant environment for AI research with the ETH AI Center etc. Please help spread the word!
arxiv.org/abs/2407.11072
TL;DR — An attacker can convince your favorite LLM to suggest vulnerable code with just a minor change to the prompt!
arxiv.org/abs/2407.11072
TL;DR — An attacker can convince your favorite LLM to suggest vulnerable code with just a minor change to the prompt!
I think I will talk about why the next big challenge in AI game playing should be Dungeons and Dragons 🧙🐉
I think I will talk about why the next big challenge in AI game playing should be Dungeons and Dragons 🧙🐉
go.bsky.app/6ddpivr
go.bsky.app/6ddpivr
Please repost.
www.google.com/about/career...
arxiv.org/abs/2411.08088
arxiv.org/abs/2411.08088