Table of contents
Distributed systems engineers have been debating the benefits and drawbacks of various coordination mechanisms in concurrent settings for a long time. Finding the benefits of each strategy and identifying its strengths have been made easier by recent research. Conflict resolution existed before CRDTs (Conflict-free Replicated Data Types) became a popular subject among distributed engineers. We have a collection of conflict-free data types in CRDTs that can be securely replicated in a distributed system.
Rock, Paper, Scissors:
When you think of “conflict resolution”, the first thing that might come to mind is the famous game of “Rock, Paper, Scissors”. In this popular game, players choose one of three gestures
- rock being a closed fist representing a strong punch;
- paper being an open palm representing a gentle but steady beating;
- scissors being two fingers bent at right angles representing cutting.
If any player chooses the same gesture as their opponent once again, then that player has lost. In other words; there can be only one winner in this game. That’s what comes to mind when we think about conflict resolution in data storage systems. The idea is to not have two copies of the same piece of data ever resolve differently from each other. Under this principle, we can use something called CRDTs for conflict-free replicated data types.
Airline Reservation System:
While many systems in the real world can tolerate network partition, some cannot take the chance of receiving conflicting updates from various clients. An airline reservation system, for instance, cannot allow two users to simultaneously hold two different copies of the same ticket; one user must win, and the other loses. We must find appropriate data structures that can synchronise in this environment safely in order to prevent these scenarios. Such data structures include CRDTs, which use properties of distributed systems that are assumed to occur in practice to develop effective solutions to a number of issues.
What are Conflict-free Replicated Data Types?
A conflict-free replicated data type (CRDT) is a data type whose replicated instances can be safely and automatically merged even if they were created on different machines, and possibly under different operating systems. A class of data structures known as CRDTs can be updated consistently in a network that is only partially connected. The idea of concurrent data types served as the inspiration for the name of this group of data structures. A data type that is intended to be accessed by multiple threads of execution is known as a concurrent data type. Replicated data types, in contrast, are intended to be replicated (copied) to numerous machines.
Why Do We Need CRDTs?
In the real world, we can’t guarantee that there will be no network partitions. Some of them are planned, like when one part of the network becomes overloaded and gets shut down. Others are unplanned, like when a cable is accidentally pulled out of a router. When a network partition occurs, different nodes in the system might be disconnected from each other for a short or prolonged period of time. CRDTs use a set of assumptions about the nature of this distributed system to efficiently synchronize replicated data. To understand how CRDTs work, it's important to understand the CAP theorem, which provides a useful overview of the challenges of building highly available distributed systems.
How do CRDTs work?
A CRDT consists of a set of individual state machines that are replicated and typically synchronized with each other. Once the state machines are set up with the correct initial state, any client can make an update and all other clients will receive that update and apply the same update to their local state machine. We will use a Counter as an example.
In a CRDT implementation of a counter, each client maintains an integer that represents the current overall value of the counter. The client performs two operations:
- incrementing the local counter value;
- publishing a message that sends the new current value of the counter to all other clients.
The publishing happens at a rate set by the network, and the clients will process the received messages to look up the current value of the counter, and then increment their local counter value to the new published value. This way, the clients are always in sync with the overall counter value and no client can have an incorrect value due to the fact that they rely on the published messages to keep up with the overall counter value.
Mergers and Co-ends: Merging Changes from Discrete Combinations of Events
The simplest form of CRDT is a counter. A counter can be defined as a discrete state machine that receives pushes (increments), and pops (decrements).
A push from one client is immediately visible to all other clients. When clients connect to the system, they receive the overall current value of the counter. When a client wants to add its own counter, it performs pop and a push. The push will increment the overall current value and the pop will decrement the overall current value by one.
Thus, the overall current value can be seen as a discrete combination of the local current values at each client. This way, each client can generate a new overall current value that is a discrete combination of its local current value and all other clients’ current values.
Conflicts in Continuous Sums: Summarizing Continuous Changes While Preserving Discrete Changes
A counter is a discrete state machine that can be described as a discrete sum of events. A more complex example is a set that can be modelled as a continuous sum of events. A set can be seen as a table where each row represents a distinct value. The set can be used to answer questions such as “are there any items with a value less than 5?”. The overall current value of the set can be seen as a continuous sum of all push events. A push event adds a row to the table representing a distinct value. If there is no row with that value, the overall current value remains the same. The overall current value must be incremented to account for the rows that were already there. Otherwise, the continuous sum of all pushes will be greater than the continuous sum of all pops.
Fixing the Discrete and Continuous Sums: Adding a Subtraction
We've explained that adding a continuous value to a discrete value causes a problem. To fix the problem, we must add a subtraction operation between a continuous value and a discrete value. This way, we can account for the discrete values without adding to the continuous values. The implementation can maintain a running subtraction between the continuous and the discrete values. This allows us to subtract the continuous values from the discrete values and obtain the correct results.
Implementing a Set as a Set
We have described the basic theory behind CRDTs and the use cases for counters and sets. Before we can implement a set, we need to implement a counter first. We can implement the counter by creating a series of event handlers that handle the following events:
- Initialize - create the initial current value of the counter
- Increment - increase the current value by one
- Decrement - decrease the current value by one
- Publish - publish the current value to all clients.
The events are used for two purposes: to generate the overall current value and inform the client.
This blog post covered what CRDTs are, why we need them, and how they work. We started with a high-level overview and then hid the details behind their general concepts, properties, and assumptions. This will allow you to understand the different use cases where CRDTs excel.