Abhinav Upadhyay
@abhi9u.bsky.social
NetBSD Dev | Python Internals, AI, compilers, databases, & performance engineering | https://blog.codingconfessions.com/
Pinned
Abhinav Upadhyay
@abhi9u.bsky.social
· Jan 19
How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with gzip -9, you can't compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
System calls are the interface between the user space and the kernel. They are needed for fundamental things like reading a file, and making a network call. But they are also very expensive because they need to do things like saving/restoring registers, page table, stack.
September 21, 2025 at 12:47 PM
System calls are the interface between the user space and the kernel. They are needed for fundamental things like reading a file, and making a network call. But they are also very expensive because they need to do things like saving/restoring registers, page table, stack.
Reposted by Abhinav Upadhyay
What Makes System Calls Expensive: A Linux Internals Deep Dive.
blog.codingconfessions.com/p/what-makes...
Another great post by @abhi9u.bsky.social. I learned a lot, including vDSO.
blog.codingconfessions.com/p/what-makes...
Another great post by @abhi9u.bsky.social. I learned a lot, including vDSO.
What Makes System Calls Expensive: A Linux Internals Deep Dive
An explanation of how Linux handles system calls on x86-64 and why they show up as expensive operations in performance profiles
blog.codingconfessions.com
September 16, 2025 at 8:01 PM
What Makes System Calls Expensive: A Linux Internals Deep Dive.
blog.codingconfessions.com/p/what-makes...
Another great post by @abhi9u.bsky.social. I learned a lot, including vDSO.
blog.codingconfessions.com/p/what-makes...
Another great post by @abhi9u.bsky.social. I learned a lot, including vDSO.
People talk about profiling to find bottlenecks, but no one explains how to fix them. Many devs shoot in the dark and when the optimization backfires, they aren't sure how to proceed.
July 13, 2025 at 1:50 PM
People talk about profiling to find bottlenecks, but no one explains how to fix them. Many devs shoot in the dark and when the optimization backfires, they aren't sure how to proceed.
Skipping computer architecture was my biggest student mistake. Hardware’s execution model dictates memory layout, binaries, compiler output, and runtimes. If you want to build systems, learn how the CPU works. I wrote an article on this:
blog.codingconfessions.com/p/seeing-the...
blog.codingconfessions.com/p/seeing-the...
May 21, 2025 at 4:42 AM
Skipping computer architecture was my biggest student mistake. Hardware’s execution model dictates memory layout, binaries, compiler output, and runtimes. If you want to build systems, learn how the CPU works. I wrote an article on this:
blog.codingconfessions.com/p/seeing-the...
blog.codingconfessions.com/p/seeing-the...
Here’s a fun visual showing how len() finds the length of a list in Python.
The size is stored right inside the object, but len() takes a five-pointer detour, only to land back where it started.
This is why if not mylist is ~2x faster for emptiness checks!
The size is stored right inside the object, but len() takes a five-pointer detour, only to land back where it started.
This is why if not mylist is ~2x faster for emptiness checks!
April 12, 2025 at 8:43 AM
Here’s a fun visual showing how len() finds the length of a list in Python.
The size is stored right inside the object, but len() takes a five-pointer detour, only to land back where it started.
This is why if not mylist is ~2x faster for emptiness checks!
The size is stored right inside the object, but len() takes a five-pointer detour, only to land back where it started.
This is why if not mylist is ~2x faster for emptiness checks!
When you see an expert optimize a piece of code, it may look like dark magic. In reality, these experts have a deep understanding of the entire computing stack, and when they look at an application profile, they quickly understand the bottlenecks.
March 28, 2025 at 8:48 AM
When you see an expert optimize a piece of code, it may look like dark magic. In reality, these experts have a deep understanding of the entire computing stack, and when they look at an application profile, they quickly understand the bottlenecks.
Reposted by Abhinav Upadhyay
How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with gzip -9, you can't compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
January 19, 2025 at 3:52 AM
How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with gzip -9, you can't compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
Reposted by Abhinav Upadhyay
The second post is by @abhi9u.bsky.social on some common CPU architecture concepts (instruction pipelining, memory caching &speculative execution). Abhi has done a great job with the analogies in this one.
blog.codingconfessions.com/p/hardware-a...
blog.codingconfessions.com/p/hardware-a...
Hardware-Aware Coding: CPU Architecture Concepts Every Developer Should Know
Write faster code by understanding how it flows through your CPU
blog.codingconfessions.com
March 24, 2025 at 1:28 PM
The second post is by @abhi9u.bsky.social on some common CPU architecture concepts (instruction pipelining, memory caching &speculative execution). Abhi has done a great job with the analogies in this one.
blog.codingconfessions.com/p/hardware-a...
blog.codingconfessions.com/p/hardware-a...
Reposted by Abhinav Upadhyay
Hardware-Aware Coding: CPU Architecture Concepts Every Developer Should Know
blog.codingconfessions.com/p/hardware-a...
Another great post by @abhi9u.bsky.social
blog.codingconfessions.com/p/hardware-a...
Another great post by @abhi9u.bsky.social
Hardware-Aware Coding: CPU Architecture Concepts Every Developer Should Know
Write faster code by understanding how it flows through your CPU
blog.codingconfessions.com
March 21, 2025 at 4:35 PM
Hardware-Aware Coding: CPU Architecture Concepts Every Developer Should Know
blog.codingconfessions.com/p/hardware-a...
Another great post by @abhi9u.bsky.social
blog.codingconfessions.com/p/hardware-a...
Another great post by @abhi9u.bsky.social
Every few years linux distributions decide that they need to make it more difficult for you to generate and find core dump files.
Now you need to install a systemd utility to do this.
Now you need to install a systemd utility to do this.
February 17, 2025 at 3:32 PM
Every few years linux distributions decide that they need to make it more difficult for you to generate and find core dump files.
Now you need to install a systemd utility to do this.
Now you need to install a systemd utility to do this.
Woke up to an email from Douglas Mcilroy himself in response to my article on his work on Unix Spell. I am very grateful that he took the time to read and respond to it!
PS: I fixed the error he found in the article.
Article: blog.codingconfessions.com/p/how-unix-s...
PS: I fixed the error he found in the article.
Article: blog.codingconfessions.com/p/how-unix-s...
February 5, 2025 at 2:03 PM
Woke up to an email from Douglas Mcilroy himself in response to my article on his work on Unix Spell. I am very grateful that he took the time to read and respond to it!
PS: I fixed the error he found in the article.
Article: blog.codingconfessions.com/p/how-unix-s...
PS: I fixed the error he found in the article.
Article: blog.codingconfessions.com/p/how-unix-s...
How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with gzip -9, you can't compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
January 19, 2025 at 3:52 AM
How do you fit a 250kB dictionary in 64kB of RAM and still perform fast lookups? For reference, even with gzip -9, you can't compress this file below 85kB.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
In the 1970s, Douglas McIlroy faced this exact challenge while implementing the spell checker for Unix at AT&T.
Some old papers are a joy to read because of their simplicity. This is the abstract from a 1966 paper describing a compression technique for infinite codes.
For static data, you know the probabilities of the symbols, and can build a Huffman tree, but for an infinite stream, you can't do that
For static data, you know the probabilities of the symbols, and can build a Huffman tree, but for an infinite stream, you can't do that
January 5, 2025 at 8:38 AM
Some old papers are a joy to read because of their simplicity. This is the abstract from a 1966 paper describing a compression technique for infinite codes.
For static data, you know the probabilities of the symbols, and can build a Huffman tree, but for an infinite stream, you can't do that
For static data, you know the probabilities of the symbols, and can build a Huffman tree, but for an infinite stream, you can't do that
Reposted by Abhinav Upadhyay
Back to school, time for Free Python Coding Resources for Schools, and by resources we mean our entire curriculum, all video lessons
Help us spread the word with school teachers everywhere
50+ hours of lessons covering all levels, beginner to advanced, designed specifically for school children
…👇🏻
Help us spread the word with school teachers everywhere
50+ hours of lessons covering all levels, beginner to advanced, designed specifically for school children
…👇🏻
January 2, 2025 at 1:28 PM
Back to school, time for Free Python Coding Resources for Schools, and by resources we mean our entire curriculum, all video lessons
Help us spread the word with school teachers everywhere
50+ hours of lessons covering all levels, beginner to advanced, designed specifically for school children
…👇🏻
Help us spread the word with school teachers everywhere
50+ hours of lessons covering all levels, beginner to advanced, designed specifically for school children
…👇🏻
I noticed that my latest article trending on HN! So it's worth sharing here as well.
It is first in a series to cover the internals of context switching in Linux. This one explains the core data structures for process and state management during context switching.
It is first in a series to cover the internals of context switching in Linux. This one explains the core data structures for process and state management during context switching.
Linux Context Switching Internals: Part 1 - Process State and Memory
How does the Linux kernel represent processes and their state: A breakdown of task_struct and mm_struct
blog.codingconfessions.com
January 2, 2025 at 6:58 PM
I noticed that my latest article trending on HN! So it's worth sharing here as well.
It is first in a series to cover the internals of context switching in Linux. This one explains the core data structures for process and state management during context switching.
It is first in a series to cover the internals of context switching in Linux. This one explains the core data structures for process and state management during context switching.
An article I wrote a while back was trending on HN today. It explains the impossibility theorem of clustering, according to which a perfect clustering algorithm is impossible to achieve.
December 27, 2024 at 4:09 PM
An article I wrote a while back was trending on HN today. It explains the impossibility theorem of clustering, according to which a perfect clustering algorithm is impossible to achieve.
You thought systems programmers don't need to know math? Linux kernel developers prove you wrong by contradiction—they prove corollaries in their comments.
December 26, 2024 at 9:17 AM
You thought systems programmers don't need to know math? Linux kernel developers prove you wrong by contradiction—they prove corollaries in their comments.
Reposted by Abhinav Upadhyay
What does it mean for something to be Turing complete?
I answer this question, and more, through a series of fully interactive Turing machine simulations! Play, pause, step forwards and backwards, and even write your own Turing machine programs in my latest blog post.
samwho.dev/turing-machi...
I answer this question, and more, through a series of fully interactive Turing machine simulations! Play, pause, step forwards and backwards, and even write your own Turing machine programs in my latest blog post.
samwho.dev/turing-machi...
December 20, 2024 at 10:33 PM
What does it mean for something to be Turing complete?
I answer this question, and more, through a series of fully interactive Turing machine simulations! Play, pause, step forwards and backwards, and even write your own Turing machine programs in my latest blog post.
samwho.dev/turing-machi...
I answer this question, and more, through a series of fully interactive Turing machine simulations! Play, pause, step forwards and backwards, and even write your own Turing machine programs in my latest blog post.
samwho.dev/turing-machi...
Continuing on the hardware concurrency thread, this paper is a must read. It simplifies reasoning about concurrency on x86 by providing an abstract machine model
December 20, 2024 at 4:24 PM
Continuing on the hardware concurrency thread, this paper is a must read. It simplifies reasoning about concurrency on x86 by providing an abstract machine model
What every systems programmer should know about concurrency—a very dense but impactful read. Things like lock free and wait free synchronisation techniques will stop seeming like black box if you understand this.
PDF: assets.bitbashing.io/papers/concu...
PDF: assets.bitbashing.io/papers/concu...
December 8, 2024 at 5:36 PM
What every systems programmer should know about concurrency—a very dense but impactful read. Things like lock free and wait free synchronisation techniques will stop seeming like black box if you understand this.
PDF: assets.bitbashing.io/papers/concu...
PDF: assets.bitbashing.io/papers/concu...
Ulrich Drepper is a rockstar. I keep running into his papers. This one is on the implementation of thread local storage:
PDF: www.akkadia.org/drepper/tls....
PDF: www.akkadia.org/drepper/tls....
December 7, 2024 at 2:01 PM
Ulrich Drepper is a rockstar. I keep running into his papers. This one is on the implementation of thread local storage:
PDF: www.akkadia.org/drepper/tls....
PDF: www.akkadia.org/drepper/tls....
Reposted by Abhinav Upadhyay
Sometimes it takes me 22 years (+ one evening) to write a blog post. Here are my thoughts on "homoiconicity" and, as an alternative, "bicameral syntax". (Warning: 4000 words.)
parentheticallyspeaking.org/articles/bic...
parentheticallyspeaking.org/articles/bic...
Bicameral, Not Homoiconic
Parenthetically Speaking: Articles by Shriram Krishnamurthi
parentheticallyspeaking.org
December 2, 2024 at 3:19 AM
Sometimes it takes me 22 years (+ one evening) to write a blog post. Here are my thoughts on "homoiconicity" and, as an alternative, "bicameral syntax". (Warning: 4000 words.)
parentheticallyspeaking.org/articles/bic...
parentheticallyspeaking.org/articles/bic...
When you first learn about the fork() syscall, it can seem magical. How can a single system call produce two different return values at the same time?!
In my latest article, I demystify the hidden magic of fork and also show how it is implemented in Linux.
blog.codingconfessions.com/p/the-magic-...
In my latest article, I demystify the hidden magic of fork and also show how it is implemented in Linux.
blog.codingconfessions.com/p/the-magic-...
Disillusioning the Magic of the fork System Call
How the kernels implement the fork system call
blog.codingconfessions.com
November 27, 2024 at 11:37 AM
When you first learn about the fork() syscall, it can seem magical. How can a single system call produce two different return values at the same time?!
In my latest article, I demystify the hidden magic of fork and also show how it is implemented in Linux.
blog.codingconfessions.com/p/the-magic-...
In my latest article, I demystify the hidden magic of fork and also show how it is implemented in Linux.
blog.codingconfessions.com/p/the-magic-...
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
YouTube video by Grant Sanderson
youtu.be
November 26, 2024 at 4:13 AM
Hacking on some code while Cooper is occupied watching a movie.
November 21, 2024 at 4:31 PM
Hacking on some code while Cooper is occupied watching a movie.