Distributed Systems
distribsystems.bsky.social
Distributed Systems
@distribsystems.bsky.social
I tweet/retweet interesting stuff about #DistributedSystems and #compsci. Suggest links/papers/conversations via DM! Tag me for retweets. Run by @fponzi.me
https://distsys.fponzi.me/
How to do distributed locking
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.
November 24, 2025 at 12:00 PM
TernFS — an exabyte scale, multi-region distributed filesystem
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
November 3, 2025 at 12:01 PM
Linearizability testing S2 with deterministic simulation
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
September 30, 2025 at 11:00 AM
How I solved a distributed queue problem after 15 years
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
September 22, 2025 at 6:27 PM
Understanding Paxos the intuitive way
relentless-leader.com/dive-deep-in...
August 9, 2025 at 2:07 PM
Murat Demirbas and Aleksey Charapko read and discuss the HotOS paper""Real Life Is Uncertain. Consensus Should Be Too!"
July 31, 2025 at 9:37 PM
Learning about distributed systems: where to start?
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
May 30, 2025 at 11:00 AM
Just make it scale: An Aurora DSQL story
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
May 28, 2025 at 11:01 AM
Reasoning about Distributed Protocols with Smart Casual Verification
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
May 27, 2025 at 11:00 AM
Apache Iceberg Internals Dive Deep On Performance
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
May 15, 2025 at 11:01 AM
Concurrency bugs in Lucene: How to fix optimistic concurrency failures
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
May 12, 2025 at 11:04 AM
Erlang’s not about lightweight processes and message passing…
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
May 9, 2025 at 11:01 AM
So, You Want to Learn More About Deterministic Simulation Testing?
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
May 8, 2025 at 11:01 AM
May thy bits chip and shatter: Patterns for Building High-Performance Observability Pipelines at Scale
sumercip.com/posts/patter...
May 7, 2025 at 11:02 AM
Parallel, Concurrent and Distributed Programming
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
May 6, 2025 at 11:02 AM
Systems Correctness Practices at AWS: Leveraging Formal and Semi-formal Methods
dl.acm.org/doi/10.1145/...
May 5, 2025 at 11:03 AM
Distributed consensus
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
April 28, 2025 at 11:02 AM
Why is the raft consensus algorithm called "raft"?
groups.google.com/g/raft-dev/c...
April 25, 2025 at 11:01 AM
Building a modern Durable Execution Engine from First Principles
restate.dev/blog/buildin...
We built a precursor and from all the lessons learned there, we arrived at a design with a self-contained complete stack, centered around a command log and event-processor, shipping as a single Rust binary
April 21, 2025 at 5:10 PM
Decomposing Transactional Systems
transactional.blog/blog/2025-de...
Every transactional system does four things: execute, orders, validate and persists transactions.
All four of these things must be done before the system may acknowledge a transaction’s result to a client.
April 18, 2025 at 11:00 AM
How crawlers impact the operations of the Wikimedia projects
diff.wikimedia.org/2025/04/01/h...
Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grown.
April 15, 2025 at 11:02 AM
Memcached: VerifyThis Long-term Challenge
verifythis.github.io/ltc/03memcac...
VerifyThis Long-Term Challenge aims at proving that deductive program verification can produce relevant results for real systems with acceptable effort on a large scale in a collaborative manner.
April 10, 2025 at 11:02 AM
Testing Distributed Systems
asatarin.github.io/testing-dist...
April 1, 2025 at 6:35 PM
How concurrency works: A visual guide
wyounas.github.io/concurrency/...
Concurrent programming is hard.
March 24, 2025 at 10:02 PM
ChoRus is a library that enables Choreographic Programming in Rust.
lsd-ucsc.github.io/ChoRus/
Choreographic Programming is a programming paradigm that allows programmers to write "choreographies" that describe the desired behavior of a system as a whole.
March 18, 2025 at 12:00 PM