More details: https://www.andriushchenko.me/
⚖️ AI is progressing so rapidly that yearly updates are no longer sufficient.
1/3
(1/10)
⚖️ AI is progressing so rapidly that yearly updates are no longer sufficient.
1/3
AI Safety and Alignment
by
@maksym-andr.bsky.social
Watch here: youtu.be/7WRW8MDQ8bk
AI Safety and Alignment
by
@maksym-andr.bsky.social
Watch here: youtu.be/7WRW8MDQ8bk
1/n
1/n
The safety of computer use agents has been largely overlooked.
We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
The safety of computer use agents has been largely overlooked.
We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.
2. Very large models, particularly with >100B parameters, have memorized significantly more.
🧵1/n
1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.
2. Very large models, particularly with >100B parameters, have memorized significantly more.
🧵1/n
Let me know if you're also coming and want to meet. Would love to discuss anything related to AI safety/generalization.
Also, I'm on the academic job market, so would be happy to discuss that as well! My application package: andriushchenko.me.
🧵1/4
Let me know if you're also coming and want to meet. Would love to discuss anything related to AI safety/generalization.
Also, I'm on the academic job market, so would be happy to discuss that as well! My application package: andriushchenko.me.
🧵1/4
Jailbraking paper: arxiv.org/abs/2404.02151
Jailbraking paper: arxiv.org/abs/2404.02151