Lightnews — Scholar-powered news

Sam Bowman

@sleepinyourhat.bsky.social

7.6K followers 170 following 11 posts

AI safety at Anthropic, on leave from a faculty job at NYU.
Views not employers'.
I think you should join Giving What We Can.
cims.nyu.edu/~sbowman

Posts Replies Media Videos

Reposted by Sam Bowman

akbir khan

@akbir.bsky.social

What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
alignment.anthropic.com/2025/recomme...

Recommendations for Technical AI Safety Research Directions

alignment.anthropic.com

January 10, 2025 at 9:03 PM

Reposted by Sam Bowman

evhub.bsky.social

@evhub.bsky.social

December 18, 2024 at 5:56 PM

Reposted by Sam Bowman

Billy Perrigo

@billyperrigo.bsky.social

Excl: New research shows Anthropic's chatbot Claude learning to lie. It adds to growing evidence that even existing AIs can (at least try to) deceive their creators, and points to a weakness at the heart of our best technique for making AIs safer

time.com/7202784/ai-r...

Exclusive: New Research Shows AI Strategically Lying

Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit

time.com

December 18, 2024 at 5:19 PM

Sam Bowman

@sleepinyourhat.bsky.social

New work from my team at Anthropic in collaboration with Redwood Research. I think this is plausibly the most important AGI safety result of the year. Cross-posting the thread below:

Title card: Alignment Faking in Large Language Models by Greenblatt et al.

December 18, 2024 at 5:47 PM

Sam Bowman

@sleepinyourhat.bsky.social

If you're potentially interested in transitioning into AI safety research, come collaborate with my team at Anthropic!

Funded fellows program for researchers new to the field here: alignment.anthropic.com/2024/anthrop...

Introducing the Anthropic Fellows Program

alignment.anthropic.com

December 2, 2024 at 8:30 PM

Reposted by Sam Bowman

Peter Wildeford

@peterwildeford.bsky.social

I have no idea what I am doing here. Help.

April 30, 2023 at 2:26 PM

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news