Ian Bicking
ianbicking.org
Ian Bicking
@ianbicking.org
600 followers 850 following 680 posts
Software developer in Minneapolis. Working with applications of LLMs. Previously: Mozilla, Meta, Brilliant.org
Posts Media Videos Starter Packs
There's no right answer, but there are _better_ answers. Really it's a discernment process for us to figure out: we don't want the LLM to just "be positive" or "be negative" but instead we have to articulate very carefully what we really do want. (Also we don't know what we want.)
Is there some universal rubric we should be applying before complimenting the user? Or just hand out compliments at some rate, like "this is in the top 25% of the user's ideas, gold star!" Or concentrate on good ideas and skim over the bad ones?
When I see some clever prompt to make the LLM stop being sycophantic I am reminded of this... the problem isn't really positivity (positivity is actually great!), but a lack of discernment. When it compliments something that doesn't deserve it. But what does "deserve it" even mean?
This comes up with all kinds of LLM discernment. If you think the LLM has a bias in one direction you can tweak the prompting, but you don't get to a "correct" discernment by just making sure the distribution looks correct. If you should accept 50% and reject 50%, it also matters which 50%
Thinking a little more about LLM sycophancy...

In general discernment is very hard to get right. You ask for critique and you'll get critique. You ask for a compliment and you'll get a compliment. There is no "just tell me the truth."
I don't really resent the rate limits, the underlying cost is real. But mostly I'm surprised that at $20/mo (and on the "auto" model setting) Cursor will happily grind for hours every day. And honestly I prefer its results for most coding tasks. (In this case I'm using Claude Code for non-code)
I didn't think I was even using Claude Code that much, and I hit a rate limit. The rate limit also blocks me from using the normal Claude chat interface, which is an interesting choice. OpenAI's Codex similarly conked out fairly early with a rate limit, meanwhile Cursor keeps going and going...
I'm not sure I understand the likely impact of at large seats...? I can imagine a bias towards higher-turnout voting segments. Is it more than that?
Reposted by Ian Bicking
TLDR; The PSF has made the decision to put our community and our shared diversity, equity, and inclusion values ahead of seeking $1.5M in new revenue. Please read and share. pyfound.blogspot.com/2025/10/NSF-...
🧵
The official home of the Python Programming Language
www.python.org
I'm kind of enjoying watching it and trying to reconstruct the surreal tesseract world it embodies.

But I guess what strikes me is how banal the game is. The environment isn't any richer for the AI. Filled with stuff, but just a different kind of background noise.
The chatlog, in all its messiness, is a core part of what's happening. It holds the not-yet-actionable information. It holds context, history. It's fully situated, including being situated in a user/assistant relationship. It's unclear what analog exists for a text adventure.
Now an LLM can solve that issue pretty well, but what does it parse into? Really there's two things being created by the system: the action and the chatlog. In the context of the chatlog you can parse "change that to eight thirty" into a concrete action.
If you say "set an alarm for 8 o'clock" and there's some {action: "createAlarm", time: "8:00"} structure you can create, then you're good. If you say "change that to eight thirty" and there's no structure to parse that into, then there's no parser that can make that work.
When I was working on voice assistants pre-LLMs I felt the same problem. The parser was only a small part of the problem. You can recognize more phrases, but the only phrases that could matter are the phrases for which some internal target structure existed.
But a better parser isn't that interesting because there's only a couple things a player can do at any moment to advance the game. You can make infinite LLM-backed easter eggs, but that gets old fast.
A lot of people think briefly "wouldn't it be cool to hook an LLM up to a text adventure to parse the input," hoping to solve a real (but minor) problem when the game can't interpret a valid command because it isn't formatted correctly.
It appears to be freedom because you can't see the map of possible actions. Anything you type _could_ work. This makes the game feel open but it's an illusion, your mind imagining freedoms you have not yet pursued; but also you cannot pursue them.
In practice I don't think text adventures offer much freedom. You can't dream up new ways to solve puzzles. Picking up and dropping inventory items is not a deep system. Determining the correct verb to use can be a puzzle of its own, but it's not freedom.
I was reading this post talking about the great things about text adventures, and it got me thinking about parsing and representation... entropicthoughts.com/the-greatnes...

Specifically from the question: do text adventures really give players a lot of freedom?
The Greatness of Text Adventures
entropicthoughts.com
Watching the kids write fractions by hand on a Chromebook trackpad made me feel somehow complicit
I think pushing it further means explicit thinking about memory formation, consolidation, updating, reinforcing surprise and disposing of unsurprising/inferable memories, and so on.

Which is all very fun to think about!
Should AI prescribe medication for mental health? If someone says yes I will guess it’s 50/50 that they are just an edgelord trolling the question.
Like there was a free wellness benefit of Therapy Lite at a previous job, and I tried a couple sessions. AI could absolutely beat that experience right now, hands down. Having a human who embodies a pamphlet on CBT and mindfulness is not hard to beat.
At this exact moment? Surely no, and if they answered otherwise I’d suspect they were interpreting the question differently than me. “In some cases, right now?” - probably many would say yes, and due to the ambiguity of that language it’s an easily defensible position.
In the small parking lot behind my house one of the employees always parks in the same place, with the placards “if you’re going to ride my ass at least pull my hair” and “bimbo on board” very visible in her back window. I admire the confidence!