https://distsys.fponzi.me/
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.
martin.kleppmann.com/2016/02/08/h...
Redis has been making inroads into areas of data management where there are stronger consistency and durability expectations. Distributed locking is one of those areas. Let’s examine it in some more detail.
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
www.xtxmarkets.com/tech/2025-te...
This post motivates TernFS, explains its high-level architecture, and then explores some key implementation details.
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
s2.dev/blog/lineari...
We can gain confidence that S2 is linearizable by taking an empirical validation approach, using a model checker like Knossos, or Porcupine.
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
dbos.dev/blog/durable...
What we really needed to make distributed task queueing robust are durable queues that checkpoint the status of our queued tasks to a durable store like Postgres.
relentless-leader.com/dive-deep-in...
relentless-leader.com/dive-deep-in...
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
muratbuffalo.blogspot.com/2020/06/lear...
A principled, from the foundations-up, studying of distributed systems, which will take a good three months in the first pass, and many more months to build competence after that.
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
www.allthingsdistributed.com/2025/05/just...
a few weeks ago, at our internal dev conference I watched a talk from two of our PEs on building DSQL. I asked if they’d be willing to turn their insights into a deeper exploration of DSQL’s development.
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
decentralizedthoughts.github.io/2025-05-23-s...
Reasoning about distributed algorithms is hard at the best of times, with state split across remote nodes, asynchrony, concurrency, and non-determinism in the order that event occur
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
relentless-leader.com/apache-icebe...
Apache Iceberg is an ACID table format designed for large-scale analytics workloads.
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
www.elastic.co/search-labs/...
Debugging concurrency bugs is no picnic, but we're going to get into it. Enter Fray, a deterministic concurrency testing framework that turns flaky failures into reproducible ones.
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
stevana.github.io/erlangs_not_...
To me it’s clear that the big idea there isn’t lightweight processes2 and message passing, but rather the generic components which in Erlang are called behaviours.
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
pierrezemb.fr/posts/learn-...
A curated collection of resources about deterministic simulation testing for distributed systems.
sumercip.com/posts/patter...
sumercip.com/posts/patter...
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
ilyasergey.net/YSC4231/
This course on basic concurrent and parallel algorithms has been taught by Ilya Sergey at Yale-NUS College in 2019-2024.
dl.acm.org/doi/10.1145/...
dl.acm.org/doi/10.1145/...
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
shachaf.net/w/consensus
This page is a relatively informal discussion of distributed consensus and Paxos, what it does, how it works, and some tricks and variants.
groups.google.com/g/raft-dev/c...
groups.google.com/g/raft-dev/c...
restate.dev/blog/buildin...
We built a precursor and from all the lessons learned there, we arrived at a design with a self-contained complete stack, centered around a command log and event-processor, shipping as a single Rust binary
restate.dev/blog/buildin...
We built a precursor and from all the lessons learned there, we arrived at a design with a self-contained complete stack, centered around a command log and event-processor, shipping as a single Rust binary
transactional.blog/blog/2025-de...
Every transactional system does four things: execute, orders, validate and persists transactions.
All four of these things must be done before the system may acknowledge a transaction’s result to a client.
transactional.blog/blog/2025-de...
Every transactional system does four things: execute, orders, validate and persists transactions.
All four of these things must be done before the system may acknowledge a transaction’s result to a client.
diff.wikimedia.org/2025/04/01/h...
Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grown.
diff.wikimedia.org/2025/04/01/h...
Since the beginning of 2024, the demand for the content created by the Wikimedia volunteer community – especially for the 144 million images, videos, and other files on Wikimedia Commons – has grown.
verifythis.github.io/ltc/03memcac...
VerifyThis Long-Term Challenge aims at proving that deductive program verification can produce relevant results for real systems with acceptable effort on a large scale in a collaborative manner.
verifythis.github.io/ltc/03memcac...
VerifyThis Long-Term Challenge aims at proving that deductive program verification can produce relevant results for real systems with acceptable effort on a large scale in a collaborative manner.
asatarin.github.io/testing-dist...
asatarin.github.io/testing-dist...
wyounas.github.io/concurrency/...
Concurrent programming is hard.
wyounas.github.io/concurrency/...
Concurrent programming is hard.
lsd-ucsc.github.io/ChoRus/
Choreographic Programming is a programming paradigm that allows programmers to write "choreographies" that describe the desired behavior of a system as a whole.
lsd-ucsc.github.io/ChoRus/
Choreographic Programming is a programming paradigm that allows programmers to write "choreographies" that describe the desired behavior of a system as a whole.