Lightnews — Scholar-powered news

Damien Tournoud

@damientournoud.bsky.social

That's a lot of data movement...

January 21, 2026 at 10:03 PM

Damien Tournoud

@damientournoud.bsky.social

Long story short, update to Golang 1.25.6 or 1.24.12.

January 20, 2026 at 9:44 PM

Damien Tournoud

@damientournoud.bsky.social

In my testing, with just 2 reqs/s (concurrency of 10) you can easily get the server to allocate 8 GB+ and consume 15 cores of CPU.

This is even worse when the server supports compressed request bodies (but my sense is that it is uncommon), for example with the Decompress middleware in Echo.

January 20, 2026 at 9:44 PM

Damien Tournoud

@damientournoud.bsky.social

This results in the allocation of a map with 2.1 million entries, and the corresponding 2.1 million short key strings. In total, 416 MB of allocation.

Parsing a 10 MB body results in 416 MB of allocation over 2.1 million allocations:
$ go test -v -bench=. -benchmem .
BenchmarkMemory-20 1 1126808685 ns/op
437066424 B/op 2142306 allocs/op

January 20, 2026 at 9:44 PM

Damien Tournoud

@damientournoud.bsky.social

The issue is a classic memory amplification issue: while Request.ParseForm refuses by default to parse request bodies bigger than 10 MB, even that can result in significant memory usage for specially crafted inputs.

In this case, the input is a URL-encoded form data with small keys and no values.

Example application/x-www-form-urlencoded body:
a&b&c&d&e&f&g&h&i&j&k&l&m&n&o&p&q&r&s&t&u&v&w&x&y&z&A&B&C&D&E&F&G&H&I&J&K&L
&M&N&O&P&Q&R&S&T&U&V&W&X&Y&Z&ab&bb&cb&db&eb&fb&gb&hb&ib&jb&kb&lb&mb&nb&ob&p
b&qb&rb&sb&tb&ub&vb&wb&xb&yb&zb&Ab&Bb&Cb&Db&Eb&Fb&Gb&Hb&Ib&Jb&Kb&Lb&Mb&Nb&O
b&Pb&Qb&Rb&Sb&Tb&Ub&Vb&Wb&Xb&Yb&Zb&ac&bc&cc&dc&ec&fc&gc&hc&ic&jc&kc&lc&mc&n
c&oc&pc&qc&rc&sc&tc&uc&vc&wc&xc&yc&zc&Ac&Bc&Cc&Dc&Ec&Fc&Gc&Hc&Ic&Jc&Kc&Lc&M
c&Nc&Oc&Pc&Qc&Rc&Sc&Tc&Uc&Vc&Wc&Xc&Yc&Zc&ad&bd&cd&dd&ed&fd&gd&hd&id&jd&kd&l
d&md&nd&od&pd&qd&rd&sd&td&ud&vd&wd&xd&yd&zd&Ad&Bd&Cd&Dd&Ed&Fd&Gd&Hd&Id&Jd&K
d&Ld&Md&Nd&Od&Pd&Qd&Rd&Sd&Td&Ud&Vd&Wd&Xd&Yd&Zd&ae&be&ce&de&ee&fe&ge&he&ie&j
e&ke&le&me&ne&oe&pe&qe&re&se&te&ue&ve&we&xe&ye&ze&Ae&Be&Ce&De&Ee&Fe&Ge&He&I
e&Je&Ke&Le&Me&Ne&Oe&Pe&Qe&Re&Se&Te&Ue&Ve&We&Xe&Ye&Ze&af&bf&cf&df&ef&ff&gf&h
f&if&jf&kf&lf&mf&nf&of&pf&qf&rf&sf&tf&uf&vf&wf&xf&yf&zf&Af&Bf&Cf&Df&Ef&Ff&G
f&Hf&If&Jf&Kf&Lf&Mf&Nf&Of&Pf&Qf&Rf&Sf&Tf&Uf&Vf&Wf&Xf&Yf&Zf&ag&bg&cg&dg&eg&f
g&gg&hg&ig&jg&kg&lg&mg&ng&og&pg&qg&rg&sg&tg&ug&vg&wg&xg&yg&zg&Ag&Bg&Cg&Dg&E
g&Fg&Gg&Hg&Ig&Jg&Kg&Lg&Mg&Ng&Og&Pg&Qg&Rg&Sg&Tg&Ug&Vg&Wg&Xg&Yg&Zg&ah&bh&ch&d
h&eh&fh&gh&hh&ih&jh&kh&lh&mh&nh&oh&ph&qh&rh&sh&th&uh&vh&wh&xh&yh&zh&Ah&Bh&C
h&Dh&Eh&Fh&Gh&Hh&Ih&Jh&Kh&Lh&Mh&Nh&Oh&Ph&Qh&Rh&Sh&Th&Uh&Vh&Wh&Xh&Yh&Zh&ai&b
i&ci&di&ei&fi&gi&hi&ii&ji&ki&li&mi&ni&oi&pi&qi&ri&si&ti&ui&vi&wi&xi&yi&zi&A
i&Bi&Ci&Di&Ei&Fi&Gi&Hi&Ii&Ji&Ki&Li&Mi&Ni&Oi&Pi&Qi&Ri&Si&Ti&Ui&Vi&Wi&Xi&Yi&Z
i&aj&bj&cj&dj&ej&fj&gj&hj&ij&jj&kj&lj&mj&nj&oj&pj&qj&rj&sj&tj&uj&vj&wj&xj&y
j&zj&Aj&Bj&Cj&Dj&Ej&Fj&Gj&Hj&Ij&Jj&Kj&Lj&Mj&Nj&Oj&Pj&Qj&Rj&Sj&Tj&Uj&Vj&Wj&X

January 20, 2026 at 9:44 PM

Reposted by Damien Tournoud

Altmetric

@altmetric.com

Bluesky is definitely a place for science, research and citing papers. We hope it will continue to close the gap as rapidly as it has with legacy social media. Research, science and dank memes need many homes on the internet.

Bluesky provides one of the more inviting ones.

Happy New Year.

a man wearing a beanie says " yeah science "

Alt: Jeffie (ok Jessie) Pinkman from Breaking Bad wearing a beanie says " yeah science" and points. Hey did anyone watch Pluribus? Sick show, we Stan Vince Gilligan. I bet his middle name is Jeff.

media.tenor.com

January 8, 2026 at 1:08 PM

Damien Tournoud

@damientournoud.bsky.social

Very interesting. If listRepos is not fundamentally a Relay API, you could have collectiondir serve that API in addition to listReposByCollection.

It is downstream of a relay, and consumes listHosts (it doesn't right now, but it should) and the firehose to index repositories and collections.

January 9, 2026 at 1:27 AM

Damien Tournoud

@damientournoud.bsky.social

Some thoughts for discussion on how we could evolve the ATProto repository persistence into an efficient log-structured storage format ⬇️

Damien Tournoud @damientournoud.bsky.social · 29d

I have been wondering what it would take to use Atproto repositories as a more general database.

I think we could make them easier to query and update, without sacrificing their cryptographic properties. Another #atproto #atdev, with early food for thoughts 🧵

January 8, 2026 at 5:08 PM

Damien Tournoud

@damientournoud.bsky.social

A follow up to this ⬇️

Damien Tournoud @damientournoud.bsky.social · Jan 5

A follow up on this 14 million repositories number, since I received a few questions about it:

This is the number of repositories visible to the new relays that have been in production since May. An #atproto #atdev 🧵

Damien Tournoud @damientournoud.bsky.social · Dec 31

5/ The total size on disk of 14 million repositories ended up at around 2 TB.

But why 14 million repositories while Bluesky supposedly has 41 million users? Well, the repositories enumerated by tap are the ones visible to the new relay implementation that has only been live since May.

January 8, 2026 at 5:00 PM

Damien Tournoud

@damientournoud.bsky.social

10/ If you generate a manifest of all these files, you have a map of which file might contain which range of keys.

In addition, splitting the repository into smaller files should reduce write amplification during storage compaction, the process of merging updates together.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

9/ Given that we expect updates to usually only affect part of the tree (e.g. because records are always added to the end of collections), you can split the repository by sub-trees, identifying the full sub-tree by the CID of its top node.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

8/ The format is easily mergeable: given a base repository and a series of updates, you can merge as you go, starting from the root and progressively finding the correct blocks in the version they have been defined in.

Skip non-matching blocks until you find one that is at or past the current key.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

7/ What would that look like?

Output a node. Then for each entry referenced by this node, output either nothing (if the entry has not been modified in that update), or recursively another node, or a value.

We can keep the format sparse by indexing the entries in the serialization.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

6/ If we put the values right after the nodes that references them (at the cost of potentially duplicating some values), we end up with a format that can be streamed.

By ordering updates the same way, we also have an easily mergeable file format. And the begining of a log-structured storage.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

5/ That's not the only choice you can make. As noted in the Sync 1.1 release notes, if the repository was ordered in preorder depth-first search order, a reader implementation could both validate the cryptographic properties and iterate the keys in order.

A screenshot of https://docs.bsky.app/blog/relay-sync-updates that reads:

The need for ordered repository CAR file exports has become more clear, and an early implementation was completed for the PDS reference implementation. That implementation is not performant enough to merge yet, and it may be some time before ordered CAR files are a norm in the network. The exact ordering also needs to be described more formally to ensure interoperation. Work has not yet started on the "partial synchronization" variant of getRepo, which will allow fetching a subset of the repository.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

4/ The reference implementation of the PDS optimizes writes at the expense of reads: it stores each repository block as a separate file on the filesystem.

That makes writes relatively cheap, but scatters the repository into a bunch of files without a natural order, making reads expensive.

atproto/packages/pds/src/disk-blobstore.ts at 8ffbaccc68d6b772fed62665fb66df632920effd · bluesky-social/atproto

Social networking technology created by Bluesky. Contribute to bluesky-social/atproto development by creating an account on GitHub.

github.com

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

3/ Like in any storage format, there is trade-off involved here, between the effort required to read/query and the effort required to write/update: making reads more efficient usually requires spending more effort on writes.

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

2/ The Merkle Search Tree structure of Atproto repositories works really well as an in-memory structure, and allows updates to be propagated efficiently and in a cryptographically provable way.

That's great! Unfortunately, their structure does not make them natively easy to query or update.

Repository - AT Protocol

Self-authenticating storage for public account content

atproto.com

January 6, 2026 at 10:36 PM

Damien Tournoud

@damientournoud.bsky.social

Atproto-over-meshtastic maybe?

January 5, 2026 at 11:20 PM

Damien Tournoud

@damientournoud.bsky.social

Almost, but not quite.

The linked thread is discussing how the tap synchronization tool only has visibility on repositories active since May 2025 (not 2024). This is not a general issue with the network.

January 5, 2026 at 9:51 PM

Add to Home Screen

Light up
your news

Add to Home Screen

Light upyour news

Sign in to Lightnews

Sign up to start reading

Connect Bluesky

Connect with Bluesky

Light up
your news