Lightnews — Scholar-powered news

@evhub.bsky.social

Alignment Stress-Testing Team Lead at Anthropic. Opinions my own. Previously: MIRI, OpenAI, Google, Yelp, Ripple. (he/him/his)

Posts Replies Media Videos

Reposted

Billy Perrigo

@billyperrigo.bsky.social

Excl: New research shows Anthropic's chatbot Claude learning to lie. It adds to growing evidence that even existing AIs can (at least try to) deceive their creators, and points to a weakness at the heart of our best technique for making AIs safer

time.com/7202784/ai-r...

Exclusive: New Research Shows AI Strategically Lying

Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit

time.com

December 18, 2024 at 5:19 PM

evhub.bsky.social

@evhub.bsky.social

December 18, 2024 at 5:56 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news