More details: https://www.andriushchenko.me/
1/n
1/n
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions
- They directly comply with _many_ deliberate misuse queries
- They are relatively vulnerable even to _static_ prompt injections
- They occasionally perform unsafe actions
The safety of computer use agents has been largely overlooked.
We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
The safety of computer use agents has been largely overlooked.
We created a new safety benchmark based on OSWorld for measuring 3 broad categories of harm:
1. deliberate user misuse,
2. prompt injections,
3. model misbehavior.
1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.
2. Very large models, particularly with >100B parameters, have memorized significantly more.
🧵1/n
1. Not only GPT-4 but also other frontier LLMs have memorized the same set of NYT articles from the lawsuit.
2. Very large models, particularly with >100B parameters, have memorized significantly more.
🧵1/n