_ - \. 🇺🇸
banner
crumb.bsky.social
_ - \. 🇺🇸
@crumb.bsky.social
https://hf.co/crumb | she / xe / it / fae / E / Ey
Pinned
have been revisiting this a lot
youtu.be/0BVM0UC28nY
i really really really like to think
December 20, 2025 at 7:07 PM
i suspect teaching an agent w scratchpad to navigate arbitrary spaces, for example, by applying inputs to reservoirs, could be insanely powerful
then the only thing you'd have to do for it would be afford it interfaces that let it do useful things, auto scientist, smart optimizer
December 20, 2025 at 6:59 PM
glad i made the post on the 7th b4 that paper so i could show that potentially a lot of people are rediscovering this technique independently, i do think "reasoning for completing specified tasks" vs "reasoning for world simulation" have different intelligence ceilings though ;3
December 20, 2025 at 6:06 PM
I'm so excited to see other people discovering this solution!
I've been doing literally this but for general completions. this lets a model learn to reason about _everything present in the web text corpus_
December 20, 2025 at 6:06 PM
you can just simulate specific users and optimize against the simulations, nothing can stop you
December 16, 2025 at 6:42 AM
if you havent switched from myli to ydraletas you're ngmi
December 10, 2025 at 8:48 PM
it was really funny seeing param count of 6b gradually grow to 10b over time as people keep trying to game the gains "in the same weight class" 😹
December 9, 2025 at 3:43 PM
you dont need an entire library to do (log_probs * -advantages).backward()
December 8, 2025 at 12:56 AM
GAN approach w/ reasoning seems like it's gonna be stable at small batch sizes - 4B @ bs 8
December 8, 2025 at 12:56 AM
no outspoken fan of grpo (value estimation easy as shit to put into ur trainer and can further force model to represent "goodness" well) but discourse about it "not being RL" is insane, if you maximize the probability of good samples and minimize the bad you can't really go too far wrong
December 8, 2025 at 12:55 AM
that looked so messy as a human so i am trying out some different reskins, "computers think in binary"?
November 30, 2025 at 12:43 AM
are you guys chill with this or do i have safety geeks that will crucify me over here
November 28, 2025 at 8:17 PM
November 27, 2025 at 7:46 PM
it's supposed to be, like, a bug
November 11, 2025 at 8:03 PM
high pass@k is awesome cause if you actually care about solving problems and getting the best possible solutions it is actually relevant but if you only care about a "product" then obviously it's not worth your time to think about
November 11, 2025 at 8:01 PM
i hope everyone that had a hand in making assistants the norm for what "language models" are goes to hell no matter what
October 2, 2025 at 4:16 PM
have been revisiting this a lot
youtu.be/0BVM0UC28nY
September 30, 2025 at 1:43 AM
friggin massive shout out to openinference hosting deepseek v3.1 on openrouter for free
even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay
September 29, 2025 at 7:05 PM
even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay
September 29, 2025 at 7:05 PM
took you long enough Dumb Ass
September 18, 2025 at 3:29 AM
i think... working towards a set goal like "agi" is not really conducive to finding out what this specific tech stack could be the best at
September 16, 2025 at 6:52 PM
Check out some visualizers like this here:
midwestern-simulation.neocities.org/main/library...

Check out the embedding model we created for them here:
hf.co/midwestern-s...
September 16, 2025 at 5:49 PM
12 embedding tokens seems to be a sweet spot between reconstruction quality and ability to do math to the embeddings before decoding for our 3b model
September 15, 2025 at 7:21 AM
September 13, 2025 at 10:49 PM