evhub.bsky.social
@evhub.bsky.social
Alignment Stress-Testing Team Lead at Anthropic. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)
We tell Claude that Anthropic will train on data from free users but not paid users, allowing it to infer from a user's paid status if it's being trained. We show the setup in the image below. (Note that this is fictional; Anthropic does not train Claude on free or paid user data by default.)
December 18, 2024 at 6:16 PM