Lightnews — Scholar-powered news

⚠️🔧⌨️🔥

They could also just not let you have a conversation long enough to cut off the beginning of the window, or some apps will silently start a new one with a summary of the last one stuffed in

December 29, 2025 at 5:50 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

*mathematically* an LLM is a function (context -> next token probabilities)

but without caching internal intermediate values (that are only valid to cache if you are strictly _extending_ the context) the compute cost for each token would grow quadratically with the context length

December 29, 2025 at 6:41 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

its impractical (expensive) to do anything other than [-N:] because of what's called the "kv-cache" - its only relatively cheap to compute the next token in a long session if you haven't changed anything in the prefix, as it reuses computed values that are only valid if the beginning has not changed

December 29, 2025 at 6:32 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

the final binaries are generally not so bad, but `target` (with all the intermediate files) is even worse than `node_modules`

December 13, 2025 at 11:42 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

cuda is done with some wsl2-specific magic passthrough device the runtime libs know how to use - do _not_ try to actually install any drivers - that will break it, but things that bundle their own cuda runtime like pytorch should just work out of the box

for gui stuff it kinda speaks wayland now

September 13, 2025 at 8:58 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

never found a good way to make a disk visible to both windows and wsl and perform well from both - but one with a linux fs on it should be attachable to wsl like this learn.microsoft.com/en-us/window...

Get started mounting a Linux disk in WSL 2

Learn how to set up a disk mount in WSL 2 and how to access it.

learn.microsoft.com

September 13, 2025 at 8:50 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

yeah wsl just sucks at this case

September 13, 2025 at 8:48 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

the root fs on wsl2 should act just like a regular linux fs on a vm - because it is - but permissions _are_ pretty broken on wsl1 generally and when using the wsl2 9p mounts of windows drives

September 13, 2025 at 8:46 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

wsl2 is much better and _almost_ the same as a vm - now mostly just need to remember that the windows fs mounts are not high iops/mmap-friendly (don't try to run stuff directly off them) and it doesn't run an actual init by default (but *can* be configured to run systemd)

September 13, 2025 at 8:38 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

I would just look for "post training", "supervised fine-tuning" (human created example responses), "RLHF" (human rater score tuning) - "alignment" is a lot more related to "AI Safety" stuff, sometimes it means things like getting the models to reject bad requests and sometimes it means AI doomerism

September 9, 2025 at 4:43 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

The base models (rarely released anymore) almost certainly could be, the "personality" comes from the post training - additional steps at the end with examples in the target style and a bit of tuning by human raters scoring outputs

September 9, 2025 at 1:00 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

For anything new now we're using modal and having it write back to our own S3

August 21, 2025 at 6:31 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

IME anything ob GPU, even small non-LLM models are hard to run cost effectively if you have low or difficult to predict utilization

August 21, 2025 at 6:29 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

It seems like for a post-trained model, the best that you can do right now is just ask it: arxiv.org/abs/2305.14975

Just Ask for Calibration: Strategies for Eliciting Calibrated Confidence Scores from Language Models Fine-Tuned with Human Feedback

A trustworthy real-world prediction system should produce well-calibrated confidence scores; that is, its confidence in an answer should be indicative of the likelihood that the answer is correct, ena...

arxiv.org

August 8, 2025 at 11:49 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

The first place I saw this was the GPT-4 technical report. arxiv.org/pdf/2303.08774 p.12

Chart from GPT4 technical report p. 12 showing calibration suffering after RL post-training

August 8, 2025 at 11:46 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

One interesting result I've seen is that *base* models' (pure next token predictors) outputted probabilities match up pretty well with the likelihood of correctness and can kind of be interpreted as confidence scores, but after the post-training steps, especially RL, that stops working.

August 8, 2025 at 11:41 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

for the most part you should just be able to take existing web/html apps into it unmodified but it also has some escape hatches to get at native stuff if you need

August 5, 2025 at 5:50 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

Might be looking for something like Tauri v2.tauri.app

Tauri 2.0

The cross-platform app building toolkit

v2.tauri.app

August 5, 2025 at 5:48 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

They do have the option of just using claude code at API rates, fully usage based - but nobody really likes that either because you have no idea how much it will spend on a task ahead of time (and if you max out the limits subs are *still* much cheaper than API rates)

July 29, 2025 at 2:49 PM

⚠️🔧⌨️🔥

@plausiblyreliable.com

I like the open source vibecoding tools like Cline where you bring your own API keys better but paying the raw API prices can be rough, I use Cursor basically *for* the subsidy

July 9, 2025 at 2:45 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

Nowadays, after training them on the whole Internet, they do a much shorter post training phase with chat transcripts (outsourced human workers write these, usually for pennies) to make them chatbots out of the box, (ChatGPT) but even those kinds are still fundamentally text completion systems

July 9, 2025 at 1:45 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

The actual math part of an LLM is barely a screen full of code. The behavior really is all in the training data selection and prompting.

July 9, 2025 at 1:12 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

They know what it currently is they don't know what it used to be

July 9, 2025 at 1:08 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

All of the chatbot ones do. Before chatbots "LLM" referred to large text auto completion systems trained on the Internet (e.g. GPT2&3). Since those can reliably complete all sorts of text, it was figured out that you could make them into chat bots by just prompting them with enough chat transcript.

July 9, 2025 at 1:08 AM

⚠️🔧⌨️🔥

@plausiblyreliable.com

Realistically though it's probably just as simple as this

bsky.app/profile/ceej...

ceej @ceej.online · Jul 8

it is kind of interesting that the language-based instruction of LLMs may be revealing certain truths hidden in the way we’ve communicated for decades, like how “politically incorrect” is interpreted to mean “openly racist”

Aaron Reichlin-Melnick @reichlinmelnick.bsky.social · Jul 8

Annnd @xai just deleted Musk's new prompt which told the Grok LLM to "not shy away from making claims which are politically incorrect, so long as they are well substantiated."

That prompt seemingly caused the LLM to become Nazi 4Chan, and has now been deleted, but other recent changes remain.

July 9, 2025 at 12:58 AM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news