Marco
@mcognetta.bsky.social
2.3K followers 1.2K following 650 posts
Language and keyboard stuff at Google + PhD student at Tokyo Institute of Technology. I like computers and Korean and computers-and-Korean and high school CS education. Georgia Tech → 연세대학교 → 東京工業大学. https://theoreticallygoodwithcomputers.com/
Posts Media Videos Starter Packs
Pinned
mcognetta.bsky.social
A lot of you followed me due to #NLP, but I like to post about #chess (especially computer chess), #programming (especially puzzles, code golf, etc), and machine learning.

And some less technical stuff like #Korean, #Esperanto, and #trains (mostly in Japan, just due to proximity).
Reposted by Marco
sethkarten.ai
Pokemon is truly the pareto frontier of agent research
- The RPG requires an autonomous embodied agentic agent with perception, planning, memory, and control
- VGC and Gen 9 OU penalize erroneous actions with fast-paced opponent-modeling in short games
(1/3)
Reposted by Marco
mcognetta.bsky.social
I gave an invited tutorial on tokenization and formal language theory at DLT2025 in Seoul this week.

I started my grad school career in Korea doing automata theory so it was really nice to be back and talking about that again.

The slides/reading list are here 👇🏻
GitHub - mcognetta/subword_tokenization_meets_formal_language_theory: A repo with slides and reading list for Subword Tokenization Meets Formal Language Theory @ DLT2025.
A repo with slides and reading list for Subword Tokenization Meets Formal Language Theory @ DLT2025. - mcognetta/subword_tokenization_meets_formal_language_theory
github.com
mcognetta.bsky.social
I've never been to COLM unfortunately, but poster sessions are by far the best part of conferences to me as a presenter & attendee.

I think talks are cool for:
- small conferences where you just want everyone to see it at once and then discuss
- a very small set of highlighted papers just for clout
Reposted by Marco
simonwillison.net
nanochat by Andrej Karpathy is neat - 8,000 lines of code (mostly Python, a tiny bit of Rust) that can train an LLM on $100 of rented cloud compute which can then be served with a web chat UI on a much smaller machine simonwillison.net/2025/Oct/13/...
nanochat
Really interesting new project from Andrej Karpathy, described at length in this discussion post. It provides a full ChatGPT-style LLM, including training, inference and a web Ui, that can be …
simonwillison.net
Reposted by Marco
timkellogg.me
Karpathy: nanochat

A small training+inference pipeline for creating your own LLM from scratch

$100 will get you a somewhat functional model

$1000 is more coherent & solves math

detailed walkthrough: github.com/karpathy/nan...

repo: github.com/karpathy/nan...
Andrej Karpathy & @karpathy
X.com
Excited to release new repo: nanochat! (it's among the most unhinged I've written).
Unlike my earlier similar repo nanoGPT which only covered pretraining, nanochat is a minimal, from scratch, full-stack training/inference pipeline of a simple ChatGPT clone in a single, dependency-minimal codebase. You boot up a cloud GPU box, run a single script and in as little as 4 hours later you can talk to your own LLM in a ChatGPT-like web Ul.
It weighs ~8,000 lines of imo quite clean code to:
- Train the tokenizer using a new Rust implementation
- Pretrain a Transformer LLM on FineWeb, evaluate CORE score across a number of metrics
- Midtrain on user-assistant conversations from SmolTalk, multiple choice questions, tool use.
- SFT, evaluate the chat model on world knowledge multiple choice (ARC-E/C, MMLU), math (GSM8K), code (HumanEval)
- RL the model optionally on GSM8K with
IPDDOI - RL the model optionally on GSM8K with
"GRPO"
- Efficient inference the model in an Engine with
KV cache, simple prefill/ decode, tool use (Python interpreter in a lightweight sandbox), talk to it over CLI or ChatGPT-like WebUl.
- Write a single markdown report card, summarizing and gamifying the whole thing.
Even for as low as ~$100 in cost (~4 hours on an
8XH100 node), you can train a little ChatGPT clone that you can kind of talk to, and which can write stories/poems, answer simple questions.
About ~12 hours surpasses GPT-2 CORE metric.
As you further scale up towards ~$1000 (~41.6 hours of training), it quickly becomes a lot more coherent and can solve simple math/code problems and take multiple choice tests. E.g. a depth 30 model trained for 24 hours (this is about equal to FLOPs of GPT-3 Small 125M and 1/1000th of GPT-3) gets into 40s on MMLU and
70s on ARC-Easy, 20s on GSM8K, etc.
My goal is to get the full "strong baseline" stack into one cohesive, minimal, readable, hackable, maximally forkable repo. nanochat will be the capstone project of LLM101n (which is still being developed). I think it also has potential to grow developed). I think it also has potential to grow into a research harness, or a benchmark, similar to nanoGPT before it. It is by no means finished, tuned or optimized (actually I think there's likely quite a bit of low-hanging fruit), but I think it's at a place where the overall skeleton is ok enough that it can go up on GitHub where all the parts of it can be improved.
Link to repo and a detailed walkthrough of the nanochat speedrun is in the reply.
nanochat
mcognetta.bsky.social
I think this is a long shot ofc, but it's a pretty funny and interesting idea.

Also a good example of nominative determinism (doubly so if you treat LLM-generated text as "silver" data rather than "gold" data).

open.substack.com/pub/astralco...
ACX Grants Results 2025
...
open.substack.com
mcognetta.bsky.social
From the Astral Codex Ten Grants.

TLDR: books remain a primary training source for LLMs. A lot of books that feature AI have it as something bad or dangerous or harmful to humanity, which might bias models to be this way. What if we flooded the corpus with examples of good AI?
Reposted by Marco
phillipisola.bsky.social
In “Words That Make Language Models Perceive,” we find if you ask an LLM to “imagine seeing,” then how it processes text becomes more like how a vision system would represent that same scene.

If you ask it to “imagine hearing,” its representation becomes more like that of an auditory model.

3/9
Diagram showing how prompts can steer an LLM toward kernel structure that better matches that of sensory encoders.
Reposted by Marco
weirdbaking.uk
God I love google Japan, I love that they open soruce this dumb stuff so if you wanted to you could make it youtu.be/BgdWyD0cBx4?...
Gboard ダイヤルバージョン / Gboard Dial Version
YouTube video by Google Japan
youtu.be
mcognetta.bsky.social
But not all is bad: the addition of the 5th review brought the total length of the reviews on this paper to just above the average length of the reviews that I wrote for other papers.

/s
mcognetta.bsky.social
I'm crashing out y'all.
mcognetta.bsky.social
Our #AAAI2026 paper just got a last-minute review added to it (2 days before the rebuttal deadline and after we had already written our rebuttal) that also incredibly low quality.

Unclear how we are expected to provide rebuttals in situations like this with only 2500 characters total.

#ML
mcognetta.bsky.social
The sum total length of all reviews on my AAAI paper is less than the length of any of the reviews I made for my batch.

It's really great out here y'all.
Reposted by Marco
pydata.bsky.social
☕ The Pacific Northwest isn’t just known for coffee! Come connect, learn, and get inspired over three days of talks and tutorials at PyData Seattle 2025, Nov. 7-9 at Bellevue College. Be sure to check out the brand new schedule!! pydata.org/seattle2025
Reposted by Marco
romanlutz.bsky.social
I’ll be at PyData Seattle talking about AI red teaming with PyRIT. Looking forward to chatting about everything Python and/or AI safety & security!
pydata.bsky.social
☕ The Pacific Northwest isn’t just known for coffee! Come connect, learn, and get inspired over three days of talks and tutorials at PyData Seattle 2025, Nov. 7-9 at Bellevue College. Be sure to check out the brand new schedule!! pydata.org/seattle2025
mcognetta.bsky.social
Forget kei truck coffee stands, I want a Jimny coffee stand.

www.instagram.com/p/DPqU7VyCD_N
mcognetta.bsky.social
This is so sick.

Next up, Toki Pona - Rust hybrid.
malper.bsky.social
The number of languages in the world just got a lot higher! At least constructed ones.
Meet ConlangCrafter - a pipeline for creating novel languages with LLMs.
A Japanese-Esperanto creole? An alien cephalopod color-based language?
Enter your idea and see a conlang emerge. 🧵👇
Reposted by Marco
malper.bsky.social
The number of languages in the world just got a lot higher! At least constructed ones.
Meet ConlangCrafter - a pipeline for creating novel languages with LLMs.
A Japanese-Esperanto creole? An alien cephalopod color-based language?
Enter your idea and see a conlang emerge. 🧵👇
mcognetta.bsky.social
Have you considered using the monitor as a desk?

On a more serious note, this is a real recurring problem I have with desks that are within my budget. If I had 5k for a rock solid desk with no wobble, I'd do it in a heartbeat.
mcognetta.bsky.social
Rate my setup.

(It's all coming together)
mcognetta.bsky.social
It actually did the misreading thing twice and both were really cut-and-dry wrong. Like it said (paraphrasing) "you wrote <this mathematical statement> on line X" and then we looked at line X and it's almost what we wrote, but with a small typo that changes the meaning completely.
mcognetta.bsky.social
The 6th was a notational thing that we overlooked in some pseudocode but is present in our actual experiments and in the main body of the paper. Easy 2 character fix in LaTeX.
mcognetta.bsky.social
No matter what the reviewer guidelines say, I am sure this will negatively impact our paper since we can't really rebut it (both by the rebuttal guidelines and the rebuttal length constraints).
mcognetta.bsky.social
The AI review for my AAAI paper made 6 concrete technical claims in the weakness section. 5 of them are 100% verifiably wrong.

Ranging from as bad as literally misreading the text (evidenced by it copy-pasting the text in the review but with a typo) to fundamentally misunderstanding an algorithm.