Steve Byrnes
stevebyrnes.bsky.social
Steve Byrnes
@stevebyrnes.bsky.social
Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html
Pinned
By popular demand, “Intro to brain-like AGI safety” is now also available as an easily citable & printable 200-page PDF preprint! Link & highlights in thread 🧵 1/13
New blog post! “Social drives 2: ‘Approval Reward’, from norm-enforcement to status-seeking”. I try to explain the path from an RL reward function in the brain, to deep truths about the human psyche… www.lesswrong.com/posts/fPxgFH... (1/6)
Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking — LessWrong
…Approval Reward is a brain signal that leads to: • Pleasure (positive reward) when my friends and idols seem to have positive feelings about me, or about something related to me, or about what I’m do...
www.lesswrong.com
November 12, 2025 at 9:01 PM
New blog post! “Social drives 1: ‘Sympathy Reward’, from compassion to dehumanization”. This is the 1st of 2 posts building an ever-better bridge that connects from neuroscience & algorithms on one shore, to everyday human experience on the other… www.lesswrong.com/posts/KuBiv9... (1/5)
November 10, 2025 at 3:27 PM
Blog post: “Excerpts from my neuroscience to-do list” www.lesswrong.com/posts/c6Job6...
www.lesswrong.com
October 7, 2025 at 12:26 AM
“The human niche” includes living on every continent, walking on the moon, inventing computers & nuclear weapons, and unraveling the secrets of the universe. We need a term for AI systems that can occupy this “niche”. (1/6)
I had always assumed that "general intelligence" refers to intelligent systems that show efficiency on essentially all tasks (that's the "general" part). This is why I also always assumed that (by the NFL theorems) general intelligence is impossible.
September 28, 2025 at 11:33 AM
Just read the new book “If Anyone Builds It, Everyone Dies”. Upshot: Recommended! I ~90% agree with it. Thread: ifanyonebuildsit.com
If Anyone Builds It, Everyone Dies
The race to superhuman AI risks extinction, but it's not too late to change course.
ifanyonebuildsit.com
September 18, 2025 at 7:40 PM
Clarification: When I shared this meme 2 years ago, I was referring specifically to traditional task-based fMRI studies.

“Functional Connectomics” fMRI studies, by contrast, would be flying overhead in a helicopter, strafing the water with a machine gun
August 31, 2025 at 8:46 PM
Blog post: “Neuroscience of human sexual attraction triggers (3 hypotheses)” www.lesswrong.com/posts/ktydLo...
August 25, 2025 at 7:42 PM
Uploaded a new PDF version of ↓, with various minor changes accumulated over the last 5 months—a few new paragraphs, new references, typo fixes, etc. See the alignment forum (blog) version for detailed changelogs at the bottom of each post.
By popular demand, “Intro to brain-like AGI safety” is now also available as an easily citable & printable 200-page PDF preprint! Link & highlights in thread 🧵 1/13
August 12, 2025 at 2:02 AM
If you too would like to be falsely accused of AI ghostwriting from how effortlessly and fluently you can touch-type em dashes and other unicode glyphs… then check out my handy guide!
[It’s from a decade ago, but I keep it updated.] sjbyrnes.com/unicode.html
Touch-Typing Unicode: How and Why
Let’s say I want to type the character μ. I look up the shortcut on the cheat sheet, and see that it’s “[compose key] * m”. So if the compose key is Right-Alt (for example), I would press and release Right-Alt, then *, then m. And μ appears!
sjbyrnes.com
August 8, 2025 at 4:52 PM
New blog post: “The perils of under- vs over-sculpting AGI desires”. (1/5) www.alignmentforum.org/posts/grgb2i...
August 5, 2025 at 6:28 PM
New blog post: “Behaviorist” RL Reward Functions Lead To Scheming. I argue that, if RL is used to push AI capabilities towards AGI, it will eventually lead to AI that “schemes” (feigns niceness, while looking for a chance for escape, world takeover, etc.) (1/3) www.alignmentforum.org/posts/FNJF3S...
“Behaviorist” RL reward functions lead to scheming — AI Alignment Forum
I will argue that a large class of reward functions, which I call “behaviorist”, and which includes almost every reward function in the RL and LLM literature, are all doomed to eventually lead to AI t...
www.alignmentforum.org
July 23, 2025 at 5:48 PM
New 2-post series on “foom & doom” scenarios, where radical superintelligence arises seemingly out of nowhere and wipes out humanity. These were often discussed a decade ago, but are now widely dismissed due to LLMs. …Well call me old fashioned, but I’m still expecting foom & doom 🧵 (1/10)
June 23, 2025 at 6:46 PM
I started an announcements mailing list on substack. It will basically just be links to new blog posts when I publish them, very similar to following me on bluesky, but in the comfort of your own email inbox. stevebyrnes1.substack.com
Steve Byrnes’s Substack | Substack
Mailing list for announcements of new blog posts and other works. Click to read Steve Byrnes’s Substack, a Substack publication. Launched 33 minutes ago.
stevebyrnes1.substack.com
May 22, 2025 at 8:00 PM
Blog post: “Reward button alignment” www.alignmentforum.org/posts/JrTk2p...

For RL agents (incl “brain-like AGI”), there’s a reward function, with huge effect on what the AI winds up wanting to do.

One option is: hook reward to a physical button. And then the AI wants you to press the button. 1/2
Reward button alignment — AI Alignment Forum
In the context of model-based RL agents in general, and brain-like AGI in particular, part of the source code is a reward function. The programmers get to put whatever code they want into the reward f...
www.alignmentforum.org
May 22, 2025 at 7:18 PM
New blog post: “‘The Era of Experience’ has an Unsolved Technical Alignment Problem” (1/4) 🧵 www.alignmentforum.org/posts/TCGgiJ...
“The Era of Experience” has an unsolved technical alignment problem — AI Alignment Forum
Every now and then, some AI luminaries (1) propose that the future of powerful AI will be reinforcement learning agents—an algorithm class that in many ways has more in common with MuZero (2019) than ...
www.alignmentforum.org
April 24, 2025 at 2:12 PM
that must have been a fun experiment
April 16, 2025 at 1:44 AM
Reposted by Steve Byrnes
we made a map!

gap-map.org is a tool we built to help you explore the landscape of R&D gaps holding back science - and the bridge-scale fundamental development efforts that might allow humanity to solve them, across almost two dozen fields
The Gap Map
Explore R&D Gaps and their related Foundational Capabilities.
gap-map.org
April 15, 2025 at 1:17 PM
Couple updates to my old post summarizing the technical alignment problem for brain-like AGI (or more generally, actor-critic model-based reinforcement learning AGI): 🧵(1/3) www.alignmentforum.org/posts/wucncP...
[Intro to brain-like-AGI safety] 10. The alignment problem — AI Alignment Forum
In this post, I discuss the alignment problem for brain-like AGIs—i.e., the problem of making an AGI that’s trying to do some particular thing that the AGI designers had intended for it to be trying t...
www.alignmentforum.org
April 13, 2025 at 2:22 AM
Reposted by Steve Byrnes
📺 📻 New on the FLI Podcast: @asterainstitute.bsky.social artificial general intelligence (AGI) safety researcher @stevebyrnes.bsky.social joins for a discussion diving into the hot topic of AGI, including different paths to it - and why brain-like AGI would be dangerous. 🧵👇
April 4, 2025 at 8:36 PM
By popular demand, “Intro to brain-like AGI safety” is now also available as an easily citable & printable 200-page PDF preprint! Link & highlights in thread 🧵 1/13
March 22, 2025 at 3:31 PM
I have a revised and improved talk introducing my research to a general audience: “Challenges for Safe & Beneficial Brain-Like Artificial General Intelligence”. Thanks to Mila AI Safety Reading Group for the invitation! youtu.be/IXi96sRMKUI
"Challenges for Safe & Beneficial Brain-Like Artificial General Intelligence" talk by Steven Byrnes
YouTube video by Steve Byrnes
youtu.be
March 20, 2025 at 7:42 PM