Lightnews — Scholar-powered news

Ian Littman

@ian.im

360 followers 910 following 130 posts

Software dev, historically PHP but currently Golang, Mod/infra @ phpc.social, co-organizer longhornphp.com and mergephp.com, co-maintainer at Joind.in

Posts Replies Media Videos

Ian Littman

@ian.im

If you want, I'll swap you a virtual ticket for an uncon talk at 19:30 UTC, and just bridge it through Discord. DM me somehow.

October 24, 2025 at 4:32 PM

Ian Littman

@ian.im

So far it's rather introductory thirty minutes in.

Gonna be in and out today due to meetings among other things as well. Tomorrow should be clearer 😅

October 3, 2025 at 2:31 PM

Ian Littman

@ian.im

Of course the tricky bit is big context windows take more VRAM, so you can afford a smaller model with the same RAM. But also 1MM token window makes certain tasks possible provided context rot isn't too bad.

September 7, 2025 at 9:39 PM

Ian Littman

@ian.im

What types of use cases? Guessing docs gen or code introspection/error explanation? For error explanation I've definitely had a bit better luck with local models (e.g. fp8 Qwen3 30B), which gives hope for a smallish code specific model (<=32B).

September 7, 2025 at 9:08 PM

Ian Littman

@ian.im

Added context: I use LLM codegen occasionally enough that my JetBrains All Products Pack subscription plus free tiers elsewhere cover all of my usage thusfar.

September 7, 2025 at 8:45 PM

Ian Littman

@ian.im

Note that I'm using Sonnet 4 as an example here because it can actually do useful code things and we already know that Qwen Code 480B is, well 480B so you'd need a $10k Mac Studio to run that locally at fp8. Which...maybe that's worth the price of admission but that's steeper than $300/mo!

September 7, 2025 at 8:29 PM

Ian Littman

@ian.im

Because if Sonnet 4 can be run in 128GB (more like 112GB net of other things) then it's conceivably possible to get an open-weights model that'll efficiently run locally and cloud AI provider subsidy is no longer table stakes

Could even argue this for 256/512GB as linked DGX Spark or M3 Ultra exist

September 7, 2025 at 8:28 PM

Ian Littman

@ian.im

Question is whether the pay-per-token costs of either open-weights hosts or Google/Anthropic/OpenAI/xAI cover costs. Obviously the rate limits on $300/mo plans don't, but curious on the rest.

Corollary question is whether Sonnet 4 is runnable in 128GB of unified memory 1/2

September 7, 2025 at 8:28 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news