Lightnews — Scholar-powered news

MrCheeze

@mrcheeze.github.io

all my homies hate stupid bruteforce trial and error switch puzzle

November 27, 2025 at 4:33 AM

MrCheeze

@mrcheeze.github.io

Claude got the third badge, tying its current best number of badges in under 6000 steps! And just from watching it play, it really can't be understated how much this specific model improved on spatial understanding (go AROUND the wall) and on taking and following useful notes on where things are.

November 27, 2025 at 4:10 AM

MrCheeze

@mrcheeze.github.io

...However, Gem 3 has completely stalled on one specific puzzle in the game. Not an ice or boulder puzzle, but rather the Goldenrod underground switch puzzle that frankly has insane mechanics that are unfair even for a human:
pokemow.com/Gen2/Shutter...
(that said, it did miss noticing an NPC hint)

November 26, 2025 at 8:29 PM

MrCheeze

@mrcheeze.github.io

Meanwhile, Gemini 3 has been absolutely breezing through Crystal, because its somewhat more powerful harness was designed to make the game barely possible for Gemini 2.5. Gemini 3 is *way* smarter AND has the game walkthrough memorized, making most of the game easy...
www.twitch.tv/gemini_plays...

Gemini_Plays_Pokemon - Twitch

Gemini 3 Plays Pokémon Crystal - LET'S RACE | !askgem [!faq !harness] !badges

www.twitch.tv

November 26, 2025 at 8:29 PM

MrCheeze

@mrcheeze.github.io

That said, it was helped by a change to the harness where floor tiles that are behind a wall are marked with a unique colour, on top of the existing two for walls/floors.

There's also these changes designed for Rocket hideout - past Claude models were not given this, making any comparisons unfair.

* Marked Spin tiles and Teleport tiles as not navigable so navigation sequences wouldn’t accidentally hit these which was impossible for the model to realize
* Made it so when the model hits a spin tile we wait for the player to stop moving before giving the model the next screenshot

November 26, 2025 at 8:29 PM

MrCheeze

@mrcheeze.github.io

so ein mist

November 23, 2025 at 7:02 PM

Reposted by MrCheeze

Micah

@rincewind.run

if you also feel the need to reread - or want to experience it for the first time - here you go

THE GREAT OUTDOOR FIGHT

The Great Outdoor Fight: details forthcoming.

achewood.com

November 22, 2025 at 8:19 PM

MrCheeze

@mrcheeze.github.io

There's a second effect here too: "making excuses" is ALWAYS a moderately high probability response, whereas an actual useful answer is more difficult to predict with confidence.

(This is also why models will sometimes say things are against their rules that aren't.)

November 22, 2025 at 10:14 PM

MrCheeze

@mrcheeze.github.io

Surely it's impossible to train a model without examples of "not spam", although I suppose restricting themselves to specifically the examples that people flag as false detections is enough.

November 22, 2025 at 5:38 PM

MrCheeze

@mrcheeze.github.io

Very precise wording on their part of "we do not use your Gmail content *for training our Gemini AI model*" since spamfilters are themselves AI models that have been trained on your emails for decades

November 22, 2025 at 5:32 PM

MrCheeze

@mrcheeze.github.io

@smsunshines.bsky.social

Super Mario Sunshine sand bird made of cubes

Super Mario Sunshine secret level full of cubes

November 21, 2025 at 2:42 PM

MrCheeze

@mrcheeze.github.io

thezvi.substack.com/p/i-am-the-g... has some if you scroll down to that section

I am the Golden Gate Bridge

Easily Interpretable Summary of New Interpretability Paper

thezvi.substack.com

November 21, 2025 at 5:37 AM

MrCheeze

@mrcheeze.github.io

Are you sure this is translucency and not just the "every other pixel" effect used for transparent water in the final game?

November 20, 2025 at 7:24 PM

MrCheeze

@mrcheeze.github.io

Floating up through the atmosphere like a balloon, but at least I still have Hope

November 19, 2025 at 11:35 PM

MrCheeze

@mrcheeze.github.io

Trevor Moore from WKUK describing his face being taken off in the take my face off skit

November 19, 2025 at 3:33 PM

MrCheeze

@mrcheeze.github.io

But then Tyler follows up twice in a row with "ok but what do you ACTUALLY mean here" and Altman just stonewalls?

November 19, 2025 at 3:23 PM

MrCheeze

@mrcheeze.github.io

The disproof of the Cat's Cradle hypothesis

November 19, 2025 at 5:43 AM

MrCheeze

@mrcheeze.github.io

Unlike financial markets, insider trading is seen as a good thing for prediction markets since The Point is to get accurate predictions

November 18, 2025 at 4:57 AM

MrCheeze

@mrcheeze.github.io

Relevant: bsky.app/profile/mrch...

MrCheeze @mrcheeze.github.io · Mar 18

In combination with the "surf down" glitch, this can be used to skip most of the bottom floor of the dungeon in R/G, if you wanted to do that for some reason. ignore the weird translation patch

November 17, 2025 at 9:41 PM

MrCheeze

@mrcheeze.github.io

Put it in the pile

November 17, 2025 at 5:33 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news