Ryan Egesdahl
banner
ryan.deriamis.net
Ryan Egesdahl
@ryan.deriamis.net
Software Engineer, nerd, and interested in everything. AuDHD, so please be patient. All opinions are strictly my own unless explicitly stated otherwise.
I am now angry with myself for continuing to scroll through my timeline. I think I’ll take the rest of the day off from social media.
November 22, 2025 at 8:59 PM
Let’s save this for posterity so a block doesn’t ruin the message, shall we?
October 15, 2025 at 5:14 PM
Two very different things seen by me today.
October 6, 2025 at 5:00 AM
Anyway, the authors conclude that the errors and nonsense responses parallel misclassifications in supervised learning - and yes, that is definitely a problem to take note of. Even in the example I gave where a fact database is somehow created, we would have to deal with classification errors.
September 21, 2025 at 10:09 PM
Finally, the authors point out that even decidable questions sometimes require either clarification or an "I Don't Know" (IDK) response from the LLM.

Now, I ask you - how often do you get clarifying questions from ChatGPT? How many times do you get an IDK instead of a hallucination? 🤔
September 21, 2025 at 9:56 PM
Unfortunately, natural (non-mathematical) languages carry hidden context. Anyone who has experienced autism knows this fact *viscrerally*. The authors highlight that fact here. The issue is of how logical judgements work in mathematics - natural languages often produce undecidable statements.
September 21, 2025 at 9:56 PM
Well, now I sort of get where their stochastic model comes from, and for what it is, I can see its utility. However, I still think it's based on a flawed premise. Again, Gödel and Tarski tell us why.

LLMs should not be *predicting* factual responses to begin with, so I don't think the model works.
September 21, 2025 at 9:01 PM
To be clear, I *do* follow the diagram above the text. However, the error examples don't seem to be stochastic issues with binary classification. They are still answers which rely on facts that are either defined or not. I can, for example, use SPARQL or Prolog queries to get correct answers.
September 21, 2025 at 8:49 PM
I don't follow this line of reasoning yet. I get that certain generative errors can result from stochastic (semi-random) influences, but I don't understand how statistical factors produce binary classification errors that lead to them. I will be watching for a forthcoming explanation in the paper.
September 21, 2025 at 8:44 PM
Wait, what‽

By Gödel's incompleteness theorems and Tarski's undefinability theorem, we already know that such an operation would be invalid... am I missing something here?

An AI model is a mathematical construct within language theory. Therefore, it can't determine validity in that system!
September 21, 2025 at 8:22 PM
Now *this* is an interesting statement. It's apparently *not* errors in training data that causes an increase in the rate of hallucinations! The reference is a fairly dated book, but I am definitely putting it on my TBR shelf - once I, you know, have a job again. 😮‍💨

Moving on.
September 21, 2025 at 7:59 PM
And now for the top-down on the paper. The Introduction gives us a fairly concise description of the problem through an example.

Following from the conclusion and the abstract, we can guess that the problem is that because the model training rewards guesses, the error is also due to a guess.
September 21, 2025 at 7:49 PM
The conclusion also doesn't seem to say that hallucinations are inevitable. What I am reading here is that the authors believe that "Simple modifications of mainstream [training] evaluations ... can remove barriers to the suppression of hallucinations and open the door to future work[.]"
September 21, 2025 at 7:34 PM
The abstract presents the paper as stating that AI hallucinations are a product of the training and evaluation methods and that they may be overcome to produce more trustworthy models. That's a very optimistic statement and not at all what the news article said.
September 21, 2025 at 7:34 PM
Note that three of the four authors on the paper are from OpenAI itself. This isn't truly a problem - we just need to be aware of potential biases - Also, the one person who is *not* with OpenAI is not the lead author. Again, this is not really a problem, but it should engender caution.
September 21, 2025 at 7:34 PM
I knew this would come in handy someday. In fact, I have a few more that express the same sentiment. Isn’t this timeline absolutely awesome*?

* The Earth being hit by an asteroid would also be awesome in the same sense, you know…
September 8, 2025 at 8:15 PM
Pancakes for lunch? YES.

Extreme fluffage, beautifully toasty and rich, crispy edges… 🤤

(The slightly-too-dark edge is due to my crappy gas stove and needing a fan in my kitchen. 🤷‍♂️ They don’t taste burnt, though.)

#foodsky #food #pancakes @crowbar.wtf
September 3, 2025 at 7:44 PM
Since when did it become the standard practice to “bake” a mousse? Isn’t that just a soufflé or a flourless chocolate cake (maybe a lava cake), depending on the ingredient ratios? 🤔
August 27, 2025 at 11:10 PM
The US has a long history of protesting with food, including throwing food at terrible people. That’s a good thing - and it’s absolutely HILARIOUS!

#sandwich #HamSandwich
August 16, 2025 at 4:44 PM
Oh, but he does!
August 16, 2025 at 3:50 AM
Well, there goes my sex life. Forever.
August 7, 2025 at 8:57 AM
Just to show I was serious -

Pancakes, y’all.

Gloriously imperfect and absolutely delicious.

You really can - and *should* - take a break from the insanity to do things you love when you need to. Cooking amazing things is one of mine.

For the interested, food science geekery is in the 🧵below.
August 1, 2025 at 7:43 PM
For the “but that’s an Amendment!” freaks, there’s also Article I Section 8, which gives Congress (not the President) the power to legislate naturalization. and also Article I Section 9, which makes exceedingly clear that Habeas Corpus cannot be suspended for *anyone*, even for noncitizens.
June 27, 2025 at 6:08 PM
I keep finding reasons to use this one of late. Curious…
June 27, 2025 at 5:54 AM
It’s been a while since I used this one…
June 25, 2025 at 6:06 PM