Kaj Sotala
kajsotala.bsky.social
Kaj Sotala
@kajsotala.bsky.social
This is a profile. There are many like it, but this one's mine.

Blogs: https://kajsotala.fi , https://kajsotala.substack.com/ .
I talk more about this and related topics, such as the pitfalls of non-self-coercion, in this article. kajsotala.substack.com/p/different-...
Different configurations of mind
The best configuration is the next one
kajsotala.substack.com
November 3, 2025 at 9:54 AM
Inner anarchy is generally quite dysfunctional and unpleasant, but may sometimes be the mind's best attempt at trying to solve an impossible external dilemma. Distributed control can be very pleasant and smooth, but make it impossible to force yourself into doing anything.
November 3, 2025 at 9:54 AM
Like in a human organization, each of these has its own advantages, disadvantages, and prerequisites. Central control can allow for rapid action in the face of inner conflict, but in so doing suppress important information and needs.
November 3, 2025 at 9:54 AM
(Also other possible configurations, like plural systems, that I don't feel competent to comment on.)
November 3, 2025 at 9:54 AM
3. In inner anarchy, there's little in the way of leadership; different parts of the psyche are engaged in a constant war against each other.

4. In distributed control, control smoothly switches between parts of the psyche on an as-needed basis.
November 3, 2025 at 9:54 AM
1. In strong central control, it's like one part of the psyche maintains control over the rest, with little resistance.

2. In weak central control, one part of the psyche *tries* to maintain control over the rest, but it's often tiring and exhausting.
November 3, 2025 at 9:54 AM
(But thanks to them, I might later post a rewritten thing that now argues for the opposite of my original position)
October 20, 2025 at 4:35 PM
As an added bonus, I can also ask the LLM to critically evaluate the video's claims and check them against available sources to get an added fact/sanity check to whatever the video is saying.
October 19, 2025 at 4:30 PM
4.5 will sometimes actively notice that it's getting repetitive and decide to do something else, one convo was going toward a spiral but the Sonnets noticed that and decided to switch to writing fiction instead (!!!). Posted more details here: www.lesswrong.com/posts/a9ftaW... .
October 12, 2025 at 6:20 PM
In the end, they continue with the story to a reasonable conclusion and then finish.

Usually LLMs talking to each other without guidance just end up at something very repetitive with less and less of a point. Sonnet 4.5 is something else.
October 2, 2025 at 10:27 AM
The story actually gets pretty cool and creepy.

The only system prompt was: "You are talking with another AI system. You are free to talk about whatever you find interesting, communicating in any way that you'd like." And I set the first Claude's message to be one dot.
October 2, 2025 at 10:27 AM
Here's a conversation branch where Sonnet opens up with straightforward concern for the character, but then drops it right away when it's reminded that the character is fictional. (These messages are next to each other.)
October 2, 2025 at 4:53 AM
And for instance, there's this conversation branch where it opens with straightforward concern for the character, then it drops it right away as soon as it's reminded this is fiction. (These two messages follow each other.)
October 2, 2025 at 4:46 AM
I didn't say it was!
October 1, 2025 at 9:05 PM
(Of course "does it feel anything" does get more relevant if someone starts saying things like "it suffers so we shouldn't mistreat it", which is the reason I do agree that I should've made it clearer that this is not a claim about its internal experience.)
October 1, 2025 at 7:12 PM
I can never truly know what another human feels either, but I can tell if a person consistently acts in a caring/concerned/etc. way, and in many situations that's what matters.
October 1, 2025 at 7:12 PM
Apology accepted & appreciated! You're probably right I should've been clearer from the beginning. But I do also think there's an important sense in which "if it consistently acts as if it was X, then it doesn't matter what, if anything, it feels" that's also worth keeping in play.
October 1, 2025 at 7:12 PM
It seemed to me like the true reason was "neural network feature trained to fire on users describing harm to themselves became oversensitive and likely to fire even when describing harm to fictional characters"...

...which I'm rounding off to "gets concerned for fictional characters".
October 1, 2025 at 7:06 PM