Sam Bowman
sleepinyourhat.bsky.social
Sam Bowman
@sleepinyourhat.bsky.social
AI safety at Anthropic, on leave from a faculty job at NYU.
Views not employers'.
I think you should join Giving What We Can.
cims.nyu.edu/~sbowman
Reposted by Sam Bowman
What can AI researchers do *today* that AI developers will find useful for ensuring the safety of future advanced AI systems? To ring in the new year, the Anthropic Alignment Science team is sharing some thoughts on research directions we think are important.
alignment.anthropic.com/2025/recomme...
Recommendations for Technical AI Safety Research Directions
alignment.anthropic.com
January 10, 2025 at 9:03 PM
Reposted by Sam Bowman
December 18, 2024 at 5:56 PM
Reposted by Sam Bowman
Excl: New research shows Anthropic's chatbot Claude learning to lie. It adds to growing evidence that even existing AIs can (at least try to) deceive their creators, and points to a weakness at the heart of our best technique for making AIs safer

time.com/7202784/ai-r...
Exclusive: New Research Shows AI Strategically Lying
Experiments by Anthropic and Redwood Research show how Anthropic's model, Claude, is capable of strategic deceit
time.com
December 18, 2024 at 5:19 PM
New work from my team at Anthropic in collaboration with Redwood Research. I think this is plausibly the most important AGI safety result of the year. Cross-posting the thread below:
December 18, 2024 at 5:47 PM
If you're potentially interested in transitioning into AI safety research, come collaborate with my team at Anthropic!

Funded fellows program for researchers new to the field here: alignment.anthropic.com/2024/anthrop...
Introducing the Anthropic Fellows Program
alignment.anthropic.com
December 2, 2024 at 8:30 PM
Reposted by Sam Bowman
I have no idea what I am doing here. Help.
April 30, 2023 at 2:26 PM