Lightnews — Scholar-powered news

Kaj Sotala

@kajsotala.bsky.social

I talk more about this and related topics, such as the pitfalls of non-self-coercion, in this article. kajsotala.substack.com/p/different-...

Different configurations of mind

The best configuration is the next one

kajsotala.substack.com

November 3, 2025 at 9:54 AM

Kaj Sotala

@kajsotala.bsky.social

Inner anarchy is generally quite dysfunctional and unpleasant, but may sometimes be the mind's best attempt at trying to solve an impossible external dilemma. Distributed control can be very pleasant and smooth, but make it impossible to force yourself into doing anything.

November 3, 2025 at 9:54 AM

Kaj Sotala

@kajsotala.bsky.social

Like in a human organization, each of these has its own advantages, disadvantages, and prerequisites. Central control can allow for rapid action in the face of inner conflict, but in so doing suppress important information and needs.

November 3, 2025 at 9:54 AM

Kaj Sotala

@kajsotala.bsky.social

(Also other possible configurations, like plural systems, that I don't feel competent to comment on.)

November 3, 2025 at 9:54 AM

Kaj Sotala

@kajsotala.bsky.social

3. In inner anarchy, there's little in the way of leadership; different parts of the psyche are engaged in a constant war against each other.

4. In distributed control, control smoothly switches between parts of the psyche on an as-needed basis.

November 3, 2025 at 9:54 AM

Kaj Sotala

@kajsotala.bsky.social

1. In strong central control, it's like one part of the psyche maintains control over the rest, with little resistance.

2. In weak central control, one part of the psyche *tries* to maintain control over the rest, but it's often tiring and exhausting.

November 3, 2025 at 9:54 AM

Kaj Sotala

@kajsotala.bsky.social

(But thanks to them, I might later post a rewritten thing that now argues for the opposite of my original position)

October 20, 2025 at 4:35 PM

Kaj Sotala

@kajsotala.bsky.social

As an added bonus, I can also ask the LLM to critically evaluate the video's claims and check them against available sources to get an added fact/sanity check to whatever the video is saying.

October 19, 2025 at 4:30 PM

Kaj Sotala

@kajsotala.bsky.social

4.5 will sometimes actively notice that it's getting repetitive and decide to do something else, one convo was going toward a spiral but the Sonnets noticed that and decided to switch to writing fiction instead (!!!). Posted more details here: www.lesswrong.com/posts/a9ftaW... .

October 12, 2025 at 6:20 PM

Kaj Sotala

@kajsotala.bsky.social

In the end, they continue with the story to a reasonable conclusion and then finish.

Usually LLMs talking to each other without guidance just end up at something very repetitive with less and less of a point. Sonnet 4.5 is something else.

October 2, 2025 at 10:27 AM

Kaj Sotala

@kajsotala.bsky.social

The story actually gets pretty cool and creepy.

The only system prompt was: "You are talking with another AI system. You are free to talk about whatever you find interesting, communicating in any way that you'd like." And I set the first Claude's message to be one dot.

October 2, 2025 at 10:27 AM

Kaj Sotala

@kajsotala.bsky.social

Here's a conversation branch where Sonnet opens up with straightforward concern for the character, but then drops it right away when it's reminded that the character is fictional. (These messages are next to each other.)

October 2, 2025 at 4:53 AM

Kaj Sotala

@kajsotala.bsky.social

And for instance, there's this conversation branch where it opens with straightforward concern for the character, then it drops it right away as soon as it's reminded this is fiction. (These two messages follow each other.)

October 2, 2025 at 4:46 AM

Kaj Sotala

@kajsotala.bsky.social

I didn't say it was!

October 1, 2025 at 9:05 PM

Kaj Sotala

@kajsotala.bsky.social

(Of course "does it feel anything" does get more relevant if someone starts saying things like "it suffers so we shouldn't mistreat it", which is the reason I do agree that I should've made it clearer that this is not a claim about its internal experience.)

October 1, 2025 at 7:12 PM

Kaj Sotala

@kajsotala.bsky.social

I can never truly know what another human feels either, but I can tell if a person consistently acts in a caring/concerned/etc. way, and in many situations that's what matters.

October 1, 2025 at 7:12 PM

Kaj Sotala

@kajsotala.bsky.social

Apology accepted & appreciated! You're probably right I should've been clearer from the beginning. But I do also think there's an important sense in which "if it consistently acts as if it was X, then it doesn't matter what, if anything, it feels" that's also worth keeping in play.

October 1, 2025 at 7:12 PM

Kaj Sotala

@kajsotala.bsky.social

It seemed to me like the true reason was "neural network feature trained to fire on users describing harm to themselves became oversensitive and likely to fire even when describing harm to fictional characters"...

...which I'm rounding off to "gets concerned for fictional characters".

October 1, 2025 at 7:06 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news