Tim Kellogg
@timkellogg.me
7K followers 720 following 11K posts
AI Architect | North Carolina | AI/ML, IoT, science WARNING: I talk about kids sometimes
Posts Media Videos Starter Packs
Pinned
timkellogg.me
Does AI get bored?

I gave them nothing to do, just to see what happens

one thing — they devolve into a repetitive “collapse” state, I guess you could call it boredom

but some break out into math & poetry on their own, I didn’t expect which ones that would be

timkellogg.me/blog/2025/09...
Does AI Get Bored?
timkellogg.me
timkellogg.me
are we ready to rename datacenters to “AI churches”?
Reposted by Tim Kellogg
jefferyharrell.bsky.social
I had a nightmare last night that I left an AI agent running autonomously in the background for an hour and when I came back I had a million dollars in the bank.

You'd think dreaming of a million dollars would be nice, but I was so scared of inevitably getting caught. "The AI did it!" I would cry.
timkellogg.me
so dense.. i’m not sure. i don’t really follow the pretraining developments, maybe @dorialexander.bsky.social knows of something?
timkellogg.me
what do you mean by sparse data? bad data? or sparse rewards? something else..
timkellogg.me
oh ya, china has been on fire. hard to keep up
timkellogg.me
a lot of change in one year
kevinschaul.bsky.social
New from me: Last year, the best open-weight AI models were made in the U.S. Now, they are all made in China.

More data and what it means -> 🎁 wapo.st/4nPUBud
Chart titled Chinese companies make the most popular free AI models
timkellogg.me
i haven’t been able to get it running. i want to drop it into this harness bsky.app/profile/timk...
timkellogg.me
Does AI get bored?

I gave them nothing to do, just to see what happens

one thing — they devolve into a repetitive “collapse” state, I guess you could call it boredom

but some break out into math & poetry on their own, I didn’t expect which ones that would be

timkellogg.me/blog/2025/09...
Does AI Get Bored?
timkellogg.me
timkellogg.me
Karpathy: nanochat

A small training+inference pipeline for creating your own LLM from scratch

$100 will get you a somewhat functional model

$1000 is more coherent & solves math

detailed walkthrough: github.com/karpathy/nan...

repo: github.com/karpathy/nan...
Andrej Karpathy & @karpathy
X.com
Excited to release new repo: nanochat! (it's among the most unhinged I've written).
Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web Ul.
It weighs ~8,000 lines of imo quite clean code to:
- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with
IPDDOI - RL the model optionally on GSM8K with
"GRPO"
- Efficient inference the model in an Engine with
KV cache, simple prefill/ decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUl.
- Write a single markdown report card, summarizing and gamifying the whole thing.
Even for as low as ~$100 in cost (~4 hours on an
8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions.
About ~12 hours surpasses GPT-2 CORE metric.
As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and
70s on ARC-Easy, 20s on GSM8K, etc.
My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.
Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
nanochat
timkellogg.me
i mean, obvs there’s a crap ton that we haven’t observed, but i’m confident there’s no persistent memory going on
timkellogg.me
i haven’t dug deep yet, but i think that’s the point
timkellogg.me
Cartridges: train a smaller KV cache with self-study

a lot of top researchers think this is part of the continual learning puzzle

putting this here to force myself to dive deeper into it (later)

hazyresearch.stanford.edu/blog/2025-06...
Cartridges: Storing long contexts in tiny caches with self-study
hazyresearch.stanford.edu
timkellogg.me
this is *wild*

a series of papers indicating that text-only LLMs still have an idea of how audio or visual modes work
phillipisola.bsky.social
Over the past year, my lab has been working on fleshing out theory + applications of the Platonic Representation Hypothesis.

Today I want to share two new works on this topic:

Eliciting higher alignment: arxiv.org/abs/2510.02425
Unpaired learning of unified reps: arxiv.org/abs/2510.08492

1/9
timkellogg.me
i don’t see nuclear here
janrosenow.bsky.social
Grid scale batteries are changing our electricity system. Excellent new visual story on batteries in FT today shows just how far this technology has evolved.

Fasten your seatbelts, this is just the beginning.

ig.ft.com/mega-batteri...
timkellogg.me
it’s like dark matter — it’s an observation. you can disagree with what dark matter is, you can say its a bad name, you can come up with alternate explanations for the observations, but they’re still observations

this paper is extending one of the theories for whats causing our observations
timkellogg.me
sorry, what’s the logical fallacy?

is this just one of those cases where you don’t like the word choice so you’re writing the entire thing off as sloppy science?
timkellogg.me
ya i don’t think that’s what’s happening. we see an effect and look for a name for it. that is all

no one is saying, “these are the exact mechanics for which humans reason”. only that it has a lot of the same effects
timkellogg.me
i maintain that claude is french. i mean, just look at Dario..
timkellogg.me
something i’ve been tracking for years — private companies acting like heads of state

seems like the trend accelerated with AI, labs are forming international relationships beyond simple deal making
Two men are seated across from each other in high-backed chairs, engaged in conversation in a formal setting. The man on the left wears a dark blue suit, white shirt, and tie, while the man on the right wears a traditional white kurta with a gray vest. They are separated by a small wooden table with papers and coasters on it. The background features a wooden wall, framed artwork of birds and foliage, and a small statue on a shelf, giving the scene a dignified, diplomatic atmosphere.
timkellogg.me
y’all say pretraining scaling is dead, but then why tf do we have chips/racks that can single-handedly serve the next OOM of LLM scale?
timkellogg.me
“GB300 is a rack, NOT a single chip”

idk i’m not sure you can make a clear distinction anymore. it’s got 2-tiered *shared* memory pool across the cluster

20 TB HBM (fast GPU memory)
17 TB LPDDR (CPU-managed)

blogs.nvidia.com/blog/microso...
Microsoft Azure Unveils World’s First NVIDIA GB300 NVL72 Supercomputing Cluster for OpenAI
New state-of-the-art platform enables next-generation AI model development and deployment while furthering American leadership in AI.
blogs.nvidia.com
timkellogg.me
if this is true, i think you should expect Gemini to soon dominate

we keep getting more and more confirmation that reasoning begins in pre-training

today’s evidence: arxiv.org/abs/2510.07364

maybe Gemini 3 is the tidal shift where Google gains a permanent lead