Saffron Huang
banner
saffron.bsky.social
Saffron Huang
@saffron.bsky.social
how shall we live together?

societal impacts researcher at Anthropic
saffronhuang.com
Oh also, VentureBeat covered this and quoted me, including on some things that surprised and interested me about this research! Screenshotting some highlights: venturebeat.com/ai/anthropic...
April 21, 2025 at 3:54 PM
I particularly want to call out that we're releasing the first empirical, large scale taxonomy of AI values, to encourage additional research into AI (and possibly also human) values, and the development of more grounded evals of models' values huggingface.co/datasets/Ant...
April 21, 2025 at 3:54 PM
We developed one way to figure this out, finding thousands of values that Claude expresses in practice, from some very common values (like helpfulness!) to a long tail of highly context-dependent values that respond and engage with a diverse range of users.
April 21, 2025 at 3:54 PM
There is a lot of work on training models to follow particular behaviors, and trying to align them with “human values”, but how do we know if this is working in practice, and what values are actually being expressed?
April 21, 2025 at 3:54 PM
Anthropic blog post on Clio here: www.anthropic.com/research/clio

Proud of the societal impacts team and particularly of @miles.land and @alextamkin.bsky.social who have been incredibly dedicated to getting Clio right.
Clio: Privacy-preserving insights into real-world AI use
A blog post describing Anthropic’s new system, Clio, for analyzing how people use AI while maintaining their privacy
www.anthropic.com
December 12, 2024 at 9:35 PM
This prompting does make me wonder if the distinction between humans doing ‘socially misaligned’ things (which we’ve generally termed ‘misuse’) and AIs being misaligned makes much sense.
December 6, 2024 at 3:40 PM
humans are still good for something guys (holding a camera steadily and panning around)
December 5, 2024 at 5:06 AM
community timeshare terminals
December 1, 2024 at 1:39 AM
good to know. i just copied the description on display
December 1, 2024 at 1:38 AM
computer people were also just furniture people
November 30, 2024 at 10:40 PM
some of the best stuff in the museum are the ads and manuals. i LOVE 60s typography and colours
November 30, 2024 at 10:38 PM
i really like this museum and highly recommend! it’s like my 4th time here
November 30, 2024 at 10:36 PM