MrCheeze
banner
mrcheeze.github.io
MrCheeze
@mrcheeze.github.io
I reverse engineer games, etc. Also found elsewhere on the internet: https://mrcheeze.github.io/
all my homies hate stupid bruteforce trial and error switch puzzle
November 27, 2025 at 4:33 AM
Claude got the third badge, tying its current best number of badges in under 6000 steps! And just from watching it play, it really can't be understated how much this specific model improved on spatial understanding (go AROUND the wall) and on taking and following useful notes on where things are.
November 27, 2025 at 4:10 AM
...However, Gem 3 has completely stalled on one specific puzzle in the game. Not an ice or boulder puzzle, but rather the Goldenrod underground switch puzzle that frankly has insane mechanics that are unfair even for a human:
pokemow.com/Gen2/Shutter...
(that said, it did miss noticing an NPC hint)
November 26, 2025 at 8:29 PM
Meanwhile, Gemini 3 has been absolutely breezing through Crystal, because its somewhat more powerful harness was designed to make the game barely possible for Gemini 2.5. Gemini 3 is *way* smarter AND has the game walkthrough memorized, making most of the game easy...
www.twitch.tv/gemini_plays...
Gemini_Plays_Pokemon - Twitch
Gemini 3 Plays Pokémon Crystal - LET'S RACE | !askgem [!faq !harness] !badges
www.twitch.tv
November 26, 2025 at 8:29 PM
That said, it was helped by a change to the harness where floor tiles that are behind a wall are marked with a unique colour, on top of the existing two for walls/floors.

There's also these changes designed for Rocket hideout - past Claude models were not given this, making any comparisons unfair.
November 26, 2025 at 8:29 PM
so ein mist
November 23, 2025 at 7:02 PM
Reposted by MrCheeze
if you also feel the need to reread - or want to experience it for the first time - here you go
THE GREAT OUTDOOR FIGHT
The Great Outdoor Fight: details forthcoming.
achewood.com
November 22, 2025 at 8:19 PM
There's a second effect here too: "making excuses" is ALWAYS a moderately high probability response, whereas an actual useful answer is more difficult to predict with confidence.

(This is also why models will sometimes say things are against their rules that aren't.)
November 22, 2025 at 10:14 PM
Surely it's impossible to train a model without examples of "not spam", although I suppose restricting themselves to specifically the examples that people flag as false detections is enough.
November 22, 2025 at 5:38 PM
Very precise wording on their part of "we do not use your Gmail content *for training our Gemini AI model*" since spamfilters are themselves AI models that have been trained on your emails for decades
November 22, 2025 at 5:32 PM
November 21, 2025 at 2:42 PM
thezvi.substack.com/p/i-am-the-g... has some if you scroll down to that section
I am the Golden Gate Bridge
Easily Interpretable Summary of New Interpretability Paper
thezvi.substack.com
November 21, 2025 at 5:37 AM
Are you sure this is translucency and not just the "every other pixel" effect used for transparent water in the final game?
November 20, 2025 at 7:24 PM
Floating up through the atmosphere like a balloon, but at least I still have Hope
November 19, 2025 at 11:35 PM
November 19, 2025 at 3:33 PM
But then Tyler follows up twice in a row with "ok but what do you ACTUALLY mean here" and Altman just stonewalls?
November 19, 2025 at 3:23 PM
The disproof of the Cat's Cradle hypothesis
November 19, 2025 at 5:43 AM
Unlike financial markets, insider trading is seen as a good thing for prediction markets since The Point is to get accurate predictions
November 18, 2025 at 4:57 AM
In combination with the "surf down" glitch, this can be used to skip most of the bottom floor of the dungeon in R/G, if you wanted to do that for some reason. ignore the weird translation patch
November 17, 2025 at 9:41 PM
Put it in the pile
November 17, 2025 at 5:33 PM