Lightnews — Scholar-powered news

_ - \. 🇺🇸

@crumb.bsky.social

i really really really like to think

December 20, 2025 at 7:07 PM

_ - \. 🇺🇸

@crumb.bsky.social

a woman in a police uniform with glasses is holding her hands together

ALT: a woman in a police uniform with glasses is holding her hands together

media.tenor.com

December 20, 2025 at 7:05 PM

_ - \. 🇺🇸

@crumb.bsky.social

i suspect teaching an agent w scratchpad to navigate arbitrary spaces, for example, by applying inputs to reservoirs, could be insanely powerful
then the only thing you'd have to do for it would be afford it interfaces that let it do useful things, auto scientist, smart optimizer

December 20, 2025 at 6:59 PM

_ - \. 🇺🇸

@crumb.bsky.social

glad i made the post on the 7th b4 that paper so i could show that potentially a lot of people are rediscovering this technique independently, i do think "reasoning for completing specified tasks" vs "reasoning for world simulation" have different intelligence ceilings though ;3

December 20, 2025 at 6:06 PM

_ - \. 🇺🇸

@crumb.bsky.social

I'm so excited to see other people discovering this solution!
I've been doing literally this but for general completions. this lets a model learn to reason about _everything present in the web text corpus_

December 20, 2025 at 6:06 PM

_ - \. 🇺🇸

@crumb.bsky.social

you can just simulate specific users and optimize against the simulations, nothing can stop you

December 16, 2025 at 6:42 AM

_ - \. 🇺🇸

@crumb.bsky.social

if you havent switched from myli to ydraletas you're ngmi

December 10, 2025 at 8:48 PM

_ - \. 🇺🇸

@crumb.bsky.social

it was really funny seeing param count of 6b gradually grow to 10b over time as people keep trying to game the gains "in the same weight class" 😹

December 9, 2025 at 3:43 PM

_ - \. 🇺🇸

@crumb.bsky.social

you dont need an entire library to do (log_probs * -advantages).backward()

December 8, 2025 at 12:56 AM

_ - \. 🇺🇸

@crumb.bsky.social

GAN approach w/ reasoning seems like it's gonna be stable at small batch sizes - 4B @ bs 8

December 8, 2025 at 12:56 AM

_ - \. 🇺🇸

@crumb.bsky.social

no outspoken fan of grpo (value estimation easy as shit to put into ur trainer and can further force model to represent "goodness" well) but discourse about it "not being RL" is insane, if you maximize the probability of good samples and minimize the bad you can't really go too far wrong

December 8, 2025 at 12:55 AM

_ - \. 🇺🇸

@crumb.bsky.social

that looked so messy as a human so i am trying out some different reskins, "computers think in binary"?

November 30, 2025 at 12:43 AM

_ - \. 🇺🇸

@crumb.bsky.social

are you guys chill with this or do i have safety geeks that will crucify me over here

November 28, 2025 at 8:17 PM

_ - \. 🇺🇸

@crumb.bsky.social

November 27, 2025 at 7:46 PM

_ - \. 🇺🇸

@crumb.bsky.social

it's supposed to be, like, a bug

November 11, 2025 at 8:03 PM

_ - \. 🇺🇸

@crumb.bsky.social

high pass@k is awesome cause if you actually care about solving problems and getting the best possible solutions it is actually relevant but if you only care about a "product" then obviously it's not worth your time to think about

November 11, 2025 at 8:01 PM

_ - \. 🇺🇸

@crumb.bsky.social

i hope everyone that had a hand in making assistants the norm for what "language models" are goes to hell no matter what

October 2, 2025 at 4:16 PM

_ - \. 🇺🇸

@crumb.bsky.social

have been revisiting this a lot
youtu.be/0BVM0UC28nY

September 30, 2025 at 1:43 AM

_ - \. 🇺🇸

@crumb.bsky.social

friggin massive shout out to openinference hosting deepseek v3.1 on openrouter for free

_ - \. 🇺🇸 @crumb.bsky.social · Sep 29

even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay

September 29, 2025 at 7:05 PM

_ - \. 🇺🇸

@crumb.bsky.social

even tho we trained on filtered data generated by deepseek v3 base, our desc2doc model didn't follow prompts as well as we'd hoped. so last night i pounded out a rubric based trainer using deepseek v3.1 (:free) as judge. it is now running. yaaay

September 29, 2025 at 7:05 PM

_ - \. 🇺🇸

@crumb.bsky.social

took you long enough Dumb Ass

September 18, 2025 at 3:29 AM

_ - \. 🇺🇸

@crumb.bsky.social

i think... working towards a set goal like "agi" is not really conducive to finding out what this specific tech stack could be the best at

September 16, 2025 at 6:52 PM

_ - \. 🇺🇸

@crumb.bsky.social

Check out some visualizers like this here:
midwestern-simulation.neocities.org/main/library...

Check out the embedding model we created for them here:
hf.co/midwestern-s...

_ - \. 🇺🇸 @crumb.bsky.social · Sep 13

September 16, 2025 at 5:49 PM

_ - \. 🇺🇸

@crumb.bsky.social

12 embedding tokens seems to be a sweet spot between reconstruction quality and ability to do math to the embeddings before decoding for our 3b model

September 15, 2025 at 7:21 AM

_ - \. 🇺🇸

@crumb.bsky.social

September 13, 2025 at 10:49 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news