coolstuffdude.bsky.social
@coolstuffdude.bsky.social
the difference between making a product based on prompting vs RL training a model to complete objectives and doing it well

this is why RL is such a big deal!
January 24, 2026 at 3:24 AM
thank you!
January 23, 2026 at 3:45 AM
yes please! i'm interested :)
January 23, 2026 at 3:45 AM
do you know how i can find the feed? i dont see it when searching on the normal bluesky app
January 23, 2026 at 3:40 AM
What makes you say that?
January 23, 2026 at 3:37 AM
You can tell how good it is by the fact that the popular models from China are adopting it's strategies related to interleaved thinking and tool calling

People underestimate how good it is because all inference engines run the model incorrectly because it needs to use Responses API
January 12, 2026 at 1:48 AM
extremely sick, would love this for youtube music too
January 6, 2026 at 6:17 AM
I have a question for anyone, why NY of all states? Why not Florida or a more conservative state? It would be hilarious if they chose NY because it still has the best lawyers and the other states would mess it up
January 4, 2026 at 5:57 AM
> That is impossible across thousands of drugs

I feel like the answer is to solve this than just accept it as truth. Everything else in the world is negotiated, but for some reason a couple thousand things can't be?
December 25, 2025 at 6:03 AM
Very cool, i've been asking a similar twisted riddle, models are too good now

A farmer with a wolf, a goat, and a cabbage must cross a river by boat. The boat can carry only the farmer and a single item. If left unattended together, the goat would eat the wolf, or the wolf would eat the cabbage.
December 13, 2025 at 11:36 PM
oooo, ok one theory

The `Context hygine:` section is hint for the model that does compaction. So this is saying "During compaction, summarize long sections from loaded skills content"

I actually don't think I've ever double checked if its the same model or a different one for compaction though.
December 13, 2025 at 5:43 AM
Hrm, any idea what it means by "summarize long sections instead of pasting them"

Referring to reciting information from the skills themselves to the user I guess? Or maybe related to the output of skills?
December 13, 2025 at 4:42 AM
Hot take: I think they're actively trying their hardest. There are probably 100 people full time working on it, throwing more cooks at it would probably make it take longer too. They're trying to nerf it as much as they can but still keep people on the platform. I'd prefer that over people on grok
November 7, 2025 at 5:13 AM
I would be very interested in seeing a "elder safety" benchmark, where you can evaluate models on topics like these. I'd be very interested to see how different scores are for for gpt-5 vs claude vs grok
November 7, 2025 at 4:40 AM
yep! like if the model is debugging a really verbose python test, instead of running python test.py and look for its output, it can run python test.py | grep "[TEST-ERR]"

It's more important with auto-compaction from claude, no matter what in context gets compacted, files still exist
October 21, 2025 at 2:40 AM
containers elegantly serve the role as state management for agents, because that's what computers were designed to do for humans! organize info, execute things, handle long running things

i've wanted to write about this for a while, and your post is so spot on that you inspired me :)
October 20, 2025 at 2:30 AM
I think we're going to see an arms race in containerized execution environments for models. it's not easy even for openai or anthropic to run 10 or 100MM containers

having it "just work" on the api is just so powerful though, we may see cloud providers have a real role to play here
October 20, 2025 at 2:29 AM
not the same but related, openai trains its models to manipulate images in their containers to be able to extract text or zoom in on images. it's basically skills. it also feels to me that the image manipulation is done via sub agent on chatgpt
October 20, 2025 at 2:29 AM
I think you're correct that skills > mcp, skills are inherently _composable_! your skill can write to files instead of blowing up context, they can be used with any unix commands which the model already understands, they can kick off background processes, we're just scratching the surface
October 20, 2025 at 2:18 AM
I appreciate you posting all of these, I always feel very up to date because of you! :)
October 15, 2025 at 4:04 AM
I was pretty surprised that I could also hit escape, ask it to give me a status report on all of the agents, and then say continue and they would continue. Super cool!
October 11, 2025 at 10:02 PM
I came across your stuff because of Simon Willison, I have to say I REALLY like how you think about things with Claude!

Such an amazing resource to read through your blog, thank you :)
October 11, 2025 at 2:54 PM
if you have any questions about the model lmk, I've spent way too much time with it in vLLM, feel free to DM me
October 10, 2025 at 12:10 AM
Whatever option you go with, triple check that the tokens coming out look right when compared to the messages, the 'chat template' for gpt-oss is super different than any other model and most implementations are just broken
October 10, 2025 at 12:04 AM
If your cards work with it, vLLM may be a better bet if you're getting serious
October 9, 2025 at 11:58 PM