Damien Tournoud
damientournoud.bsky.social
Damien Tournoud
@damientournoud.bsky.social
That's a lot of data movement...
January 21, 2026 at 10:03 PM
Long story short, update to Golang 1.25.6 or 1.24.12.
January 20, 2026 at 9:44 PM
In my testing, with just 2 reqs/s (concurrency of 10) you can easily get the server to allocate 8 GB+ and consume 15 cores of CPU.

This is even worse when the server supports compressed request bodies (but my sense is that it is uncommon), for example with the Decompress middleware in Echo.
January 20, 2026 at 9:44 PM
This results in the allocation of a map with 2.1 million entries, and the corresponding 2.1 million short key strings. In total, 416 MB of allocation.
January 20, 2026 at 9:44 PM
The issue is a classic memory amplification issue: while Request.ParseForm refuses by default to parse request bodies bigger than 10 MB, even that can result in significant memory usage for specially crafted inputs.

In this case, the input is a URL-encoded form data with small keys and no values.
January 20, 2026 at 9:44 PM
Reposted by Damien Tournoud
Bluesky is definitely a place for science, research and citing papers. We hope it will continue to close the gap as rapidly as it has with legacy social media. Research, science and dank memes need many homes on the internet.

Bluesky provides one of the more inviting ones.

Happy New Year.
a man wearing a beanie says " yeah science "
Alt: Jeffie (ok Jessie) Pinkman from Breaking Bad wearing a beanie says " yeah science" and points. Hey did anyone watch Pluribus? Sick show, we Stan Vince Gilligan. I bet his middle name is Jeff.
media.tenor.com
January 8, 2026 at 1:08 PM
Very interesting. If listRepos is not fundamentally a Relay API, you could have collectiondir serve that API in addition to listReposByCollection.

It is downstream of a relay, and consumes listHosts (it doesn't right now, but it should) and the firehose to index repositories and collections.
January 9, 2026 at 1:27 AM
Some thoughts for discussion on how we could evolve the ATProto repository persistence into an efficient log-structured storage format ⬇️
I have been wondering what it would take to use Atproto repositories as a more general database.

I think we could make them easier to query and update, without sacrificing their cryptographic properties. Another #atproto #atdev, with early food for thoughts 🧵
January 8, 2026 at 5:08 PM
A follow up to this ⬇️
A follow up on this 14 million repositories number, since I received a few questions about it:

This is the number of repositories visible to the new relays that have been in production since May. An #atproto #atdev 🧵
5/ The total size on disk of 14 million repositories ended up at around 2 TB.

But why 14 million repositories while Bluesky supposedly has 41 million users? Well, the repositories enumerated by tap are the ones visible to the new relay implementation that has only been live since May.
January 8, 2026 at 5:00 PM
10/ If you generate a manifest of all these files, you have a map of which file might contain which range of keys.

In addition, splitting the repository into smaller files should reduce write amplification during storage compaction, the process of merging updates together.
January 6, 2026 at 10:36 PM
9/ Given that we expect updates to usually only affect part of the tree (e.g. because records are always added to the end of collections), you can split the repository by sub-trees, identifying the full sub-tree by the CID of its top node.
January 6, 2026 at 10:36 PM
8/ The format is easily mergeable: given a base repository and a series of updates, you can merge as you go, starting from the root and progressively finding the correct blocks in the version they have been defined in.

Skip non-matching blocks until you find one that is at or past the current key.
January 6, 2026 at 10:36 PM
7/ What would that look like?

Output a node. Then for each entry referenced by this node, output either nothing (if the entry has not been modified in that update), or recursively another node, or a value.

We can keep the format sparse by indexing the entries in the serialization.
January 6, 2026 at 10:36 PM
6/ If we put the values right after the nodes that references them (at the cost of potentially duplicating some values), we end up with a format that can be streamed.

By ordering updates the same way, we also have an easily mergeable file format. And the begining of a log-structured storage.
January 6, 2026 at 10:36 PM
5/ That's not the only choice you can make. As noted in the Sync 1.1 release notes, if the repository was ordered in preorder depth-first search order, a reader implementation could both validate the cryptographic properties and iterate the keys in order.
January 6, 2026 at 10:36 PM
4/ The reference implementation of the PDS optimizes writes at the expense of reads: it stores each repository block as a separate file on the filesystem.

That makes writes relatively cheap, but scatters the repository into a bunch of files without a natural order, making reads expensive.
atproto/packages/pds/src/disk-blobstore.ts at 8ffbaccc68d6b772fed62665fb66df632920effd · bluesky-social/atproto
Social networking technology created by Bluesky. Contribute to bluesky-social/atproto development by creating an account on GitHub.
github.com
January 6, 2026 at 10:36 PM
3/ Like in any storage format, there is trade-off involved here, between the effort required to read/query and the effort required to write/update: making reads more efficient usually requires spending more effort on writes.
January 6, 2026 at 10:36 PM
2/ The Merkle Search Tree structure of Atproto repositories works really well as an in-memory structure, and allows updates to be propagated efficiently and in a cryptographically provable way.

That's great! Unfortunately, their structure does not make them natively easy to query or update.
Repository - AT Protocol
Self-authenticating storage for public account content
atproto.com
January 6, 2026 at 10:36 PM
Atproto-over-meshtastic maybe?
January 5, 2026 at 11:20 PM
Almost, but not quite.

The linked thread is discussing how the tap synchronization tool only has visibility on repositories active since May 2025 (not 2024). This is not a general issue with the network.
January 5, 2026 at 9:51 PM