Steve Byrnes
stevebyrnes.bsky.social
Steve Byrnes
@stevebyrnes.bsky.social
Researching Artificial General Intelligence Safety, via thinking about neuroscience and algorithms, at Astera Institute. https://sjbyrnes.com/agi.html
I also cover various other topics, including One Weird Trick for recognizing sociopaths, status games, nerds sharing their Special Interests, and following norms even when nobody ever will find out. Link again: www.lesswrong.com/posts/fPxgFH... (6/6)
Social drives 2: “Approval Reward”, from norm-enforcement to status-seeking — LessWrong
…Approval Reward is a brain signal that leads to: • Pleasure (positive reward) when my friends and idols seem to have positive feelings about me, or about something related to me, or about what I’m do...
www.lesswrong.com
November 12, 2025 at 9:01 PM
We tend to view ourselves in socially-approved ways: “my true self” has traits my idols like, but “my urges” interfere. Robert Trivers, Robin Hanson, etc explain this as evolved self-deception, but I argue that Approval Reward is a better nuts-&-bolts explanation (5/6)
November 12, 2025 at 9:01 PM
Other consequences of “Approval Reward” are more profound, but less obvious. My guess is that Approval Reward fires 10,000 times a day—little squirts of pride when you’re doing something your friends or idols would like. (4/6)
November 12, 2025 at 9:01 PM
Some consequences of “Approval Reward” are pretty straightforward: credit-seeking, blame-avoidance, status-seeking, norm-following, and norm-enforcement. (3/6)
November 12, 2025 at 9:01 PM
“Approval Reward” is a string of reward signals that I previously discussed from a neuro perspective. It basically leads to pleasure when my friends and idols seem to have positive feelings about me, and displeasure when negative. (2/6) bsky.app/profile/stev...
In this post, I flesh out the effects of that (hypothesized) circuit, first by splitting its output stream of reward signals into four subcomponents, depending on the circumstances in which it triggers. (4/5)
November 12, 2025 at 9:01 PM
This post then dives into 1 of those 4: “Sympathy Reward”. If someone (especially a friend or idol) feels good/bad, that makes me feel good/bad. I discuss its obvious prosocial effects, along with its less-obvious (not all nice!) effects. Link again: www.lesswrong.com/posts/KuBiv9... (5/5)
November 10, 2025 at 3:27 PM
In this post, I flesh out the effects of that (hypothesized) circuit, first by splitting its output stream of reward signals into four subcomponents, depending on the circumstances in which it triggers. (4/5)
November 10, 2025 at 3:27 PM
Next, my post “Neuroscience of human social instincts: a sketch” (from last year) hypothesized an innate drive: “the compassion / spite circuit”. That name doesn’t do it justice; I suggested it also underlies status-seeking and much else. (3/5) bsky.app/profile/stev...
New post: “Neuroscience of human social instincts: a sketch”. Feels like big progress on my longstanding top neuroscience & AGI Safety research priority! What’s the problem I’m trying to solve and why should anyone care? (1/9) 🧵 www.lesswrong.com/posts/kYvbHC...
Neuroscience of human social instincts: a sketch — LessWrong
My primary neuroscience research goal for the past couple years has been to solve a certain problem, a problem which has had me stumped since the very beginning of when I became interested in neurosci...
www.lesswrong.com
November 10, 2025 at 3:27 PM
Starting on the neuro side: my general view is that our brains have something akin to an RL reward function. Call it “innate drives” or “primary rewards”. Pain is bad, quenching-thirst is good, etc. Some of these concern our social & moral intuitions. (2/5) bsky.app/profile/stev...
By popular demand, “Intro to brain-like AGI safety” is now also available as an easily citable & printable 200-page PDF preprint! Link & highlights in thread 🧵 1/13
November 10, 2025 at 3:27 PM
my take is §2.1 here (1st screenshot) www.lesswrong.com/posts/hsf7tQ... (+ 2nd screenshot is about how different motives can lead to different knowledge [albeit overlapping])
October 29, 2025 at 10:51 AM
Oh olfactory bulb definitely projects other places too. I was responding to your sentence “…olfaction (and gustation, and interoception more broadly) has always been part of the steering system…” which I thought was painting everything-related-to-olfaction with an overly broad brush.
October 8, 2025 at 1:23 AM
I guess I’ll just say that

olfactory epithelium → olfactory bulb → anterior olfactory cortex

…sure seems awfully similar to…

retina → LGN → occipital cortex

In particular, I think the whole cortex is doing within-lifetime learning / memory, including the olfactory parts of the cortex.
October 8, 2025 at 12:46 AM
Hmm, I think this is getting into sufficiently deep disagreements between us that it might not fit in a bluesky thread…
October 8, 2025 at 12:46 AM
I agree that this is the status quo, and I think it’s bad and am doing what I can to change that.

I have something in mind where a good understanding of the SC + a certain tracer study (or access to a connectomic dataset) lets us find a key hypothalamus cell group for compassion & norm-following.
October 7, 2025 at 1:13 AM
I find your reply a bit puzzling. Normally people “in the AGI community” don’t talk about the No Free Lunch Theorem amongst themselves—it doesn’t come up much. Instead NFL is invoked by people who think that the whole idea of AGI is dumb and impossible and that “the AGI community” is stupid.
September 28, 2025 at 1:28 PM
Eventually we will build AIs that could (if they want to) wipe out humans and run the global economy on their own for the next 10,000 years, including inventing and deploying wild new technologies, new business models, etc. We need a term for that. If not AGI, then something else :) (6/6)
September 28, 2025 at 11:33 AM
Human brains are crap at intuitive molecular dynamics. But we’re able to invent computers and write MD simulation code. Likewise, when faced with a difficult problem, an AGI (or whatever we call it) might build a tool to solve it, or find a clever way to avoid it altogether, etc. (5/6)
September 28, 2025 at 11:33 AM
No Free Lunch is not relevant to this, for the same reason that The Scientific Method is valid in both July and February: If a pattern holds everywhere in the universe, you sacrifice nothing by assuming it. More on NFL (if you can get past the writing style): intelligence.org/2017/12/06/c... (4/6)
A reply to Francois Chollet on intelligence explosion
This is a reply to Francois Chollet, the inventor of the Keras wrapper for the Tensorflow and Theano deep learning systems, on his essay “The impossibility of intelligence explosion.” In response to c...
intelligence.org
September 28, 2025 at 11:33 AM
Other people use other terms. None is perfect. I believe LeCun likes “human-level machine intelligence”, but then we have to add different clarifications like “…but it could think 1000× faster than a human” and “…but it may lack a human-level sense of smell”. (3/6)
September 28, 2025 at 11:33 AM
I happen to like the term “AGI” for that, as long as it's understood that the G is “general as in not specific” (“in general, Boston has nice weather”), not “general as in universal” (“I have a general proof of the math theorem”). (2/6)
September 28, 2025 at 11:33 AM