Distributed System: Understanding Quorum-Based Systems

Distributed System: Understanding Quorum-Based Systems

The basics of Quorum-Based systems in Distributed Systems

In distributed systems, quorum-based approaches are essential mechanisms for maintaining consistency and availability in the face of network partitions or failures. A quorum is a subset of nodes in a distributed system that must agree on a particular decision or action before it is considered valid. Quorum-based systems are designed to ensure that a decision or action is not taken unless a sufficient number of nodes agree on it, thereby guaranteeing data consistency and availability. In this article, we will explore the concept of quorum-based systems in detail, including their role in distributed systems, the types of quorums, and how they are implemented.

Introduction

As you may already know, distributed systems are made up of many independent components that must coordinate their efforts to achieve a common goal. However, if these nodes are unable to interact with one another or if a node fails, the results can be catastrophic. Quorum-based systems serve this purpose.

A Quorum-Based System is a simple mechanism for ensuring consensus among a group of nodes in a distributed system. A majority of nodes, or a "Quorum," is required to reach a consensus.

Distributed systems will struggle without quorum-based systems, which facilitate fair and timely decision-making. In their absence, life-or-death choices may be postponed or botched. Managing distributed systems can be difficult, but with a quorum-based approach in place, disagreements between nodes are avoided.

Quorum-Based Systems in Distributed Systems

In order to accomplish a goal, multiple servers or nodes in a network coordinate and share data. In these types of systems, it is of the utmost importance that data remain accessible and intact in the case of a node failure or network partition.

Quorum-based systems are one way to make sure that data is always consistent and available. In a quorum-based system, a choice or action is only made if enough nodes agree on it. This makes sure that the action or choice is correct and that the system stays the same and is available.

As an illustration of what a distributed system looks like, let's take a look at a system that has ten nodes. When there is a requirement for a quorum of six nodes, a decision is not considered to be valid unless it has the support of at least six of the nodes in the network. If there must be a quorum before a decision can be made, then the majority vote of six or more nodes must be in favour of the decision before it can be carried out.

Quorum-based approaches are particularly useful in distributed systems since individual nodes or the network as a whole can experience failure or become fragmented. Because of this, it is extremely important to ensure that data consistency and availability are maintained, even if some of the nodes in the network are offline or unable to connect with one another.

Types of Quorums

There are several types of quorums used in distributed systems. The most common types are:

Read Quorum

In a distributed system, a read quorum is the number of nodes that must agree on a read process for it to be valid. For example, let's say we have a distributed system with ten nodes and a read group of six. When a read action is asked for, the system will only give back the data if at least six nodes can be reached and agree on the value of the data.

Read quorums are helpful when getting data from a distributed system because they make sure the data is right and available. If you can only reach less than six nodes, the read action is invalid, and the system will keep waiting until a quorum is met.

Write Quorum

A group of nodes in a distributed system that all have to agree on a write action for it to be valid is called a "write quorum." For example, let's say we have a distributed system with ten nodes and a write quorum of six. When a write action is requested, the system won't do it unless it can reach at least six nodes and they all agree on the value of the data.

Membership Quorum

In a distributed system, a membership quorum is a group of nodes that must all agree on changes to the system's membership before they are considered true. Say, for example, that we have a distributed system with ten nodes and a group of six nodes. When a node joins or leaves the system, the change is only true if at least six other nodes agree with it.

Configuration Quorum

In a distributed system, a configuration quorum is a group of nodes that must all agree on changes to the system's configuration before they are considered true. For example, let's say we have a distributed system with ten nodes and a setup group of six. When a setup change is asked for, such as changing the number of nodes in the system or the replication factor, the change is only acceptable if at least six nodes agree with it.

Implementing Quorum-Based Systems

Implementing quorum-based systems in distributed systems requires careful thought about a number of things, such as the size of the system, the number of nodes, and the desired level of consistency and availability.

Using a consensus algorithm, like the Paxos algorithm or the Raft algorithm, to set up quorum-based systems is one way to do it. These methods make sure that at least a certain number of nodes agree on a decision or action, even if some nodes fail or the network breaks up.

Another way is to use a distributed hash table (DHT), which is a system that maps keys to values and is spread out over a large area. In a DHT, a read or write operation can be done by contacting only a subset of the system's nodes, and a quorum can be needed to make sure the operation is correct.

No matter what method is used, setting up quorum-based systems requires careful planning and testing to make sure that the system stays consistent and usable even if there are problems or parts of the network are cut off.

Quorum Consensus algorithms

Quorum Consensus algorithms make sure that distributed systems are always consistent and reliable. By exchanging messages and deciding on a certain value, these algorithms help nodes in a distributed system come to a decision.

Some popular Quorum Consensus algorithms are:

  • Paxos is a protocol that is utilised in distributed systems that are fault-tolerant. It ensures that there will be no more than one value selected, and that the selected value will continue to be adhered to after it has been selected.

  • Raft algorithm is yet another prominent one. This algorithm was developed to be easier to comprehend than Paxos, and it does so by breaking down the essential components of consensus algorithms into a number of distinct parts.

  • Zab is another consensus algorithm that is used in Apache ZooKeeper, a popular open-source distributed coordination system.

  • Viewstamped replica is a protocol for state machine replication that was introduced in 1985.

Even though Quorum-Based methods are very helpful, putting them into place is not always easy. It can be hard to get everything to work in sync, and it can be hard to get performance to its best. But the pros of Quorum-Based systems are more important than the cons because they cut down on costs and make sure that only correct results are given.

Examples of Quorum-Based Systems

Moving on to some examples of systems using quorum-based protocols, we have Cassandra, DynamoDB, HBase, and Consul.

  • Cassandra is a distributed NoSQL database used by large organizations such as Apple and Instagram. It uses a modified version of the Paxos consensus algorithm.

  • On the other hand, DynamoDB, a managed NoSQL database service from AWS, uses quorum-based replication for maintaining consistency.

  • HBase, an open-source, distributed database system, uses the Hadoop Distributed File System to store data. It uses a modified version of Google’s Chubby lock service for distributed coordination.

  • Lastly, Consul, a service discovery and configuration tool, uses a Raft-based consensus algorithm.

These examples show how quorum-based systems are not limited to a specific industry or use case. The range of systems incorporating this protocol shows its applicability to multiple scenarios, from databases to distributed systems.

Advantages of Quorum-Based Systems:

  1. Consistency: Quorum-based systems keep distributed systems consistent by needing a certain number of nodes to agree on a decision or action before it can be considered valid.

  2. Availability: Quorum-based systems make sure that a certain number of nodes are always available to handle requests and make decisions, even if some of the nodes fail or the network is split up.

  3. Flexibility: Quorum-based systems can be set up to work with different sizes and types of quorums to meet different needs for consistency and availability.

  4. Fault tolerance: Quorum-based systems can keep working as long as a certain number of nodes are available, even if some of the nodes fail or the network breaks up.

  5. Scalability: In order to improve throughput and performance, quorum-based systems can scale horizontally by adding more nodes to the system.

Disadvantages of Quorum-Based Systems

  1. Complexity: Implementing quorum-based systems can be hard, and they need to be carefully designed and tested to make sure they work well and are always available, even if there are problems or parts of the network that don't work.

  2. Performance: Quorum-based systems can hurt performance because they need communication between nodes to reach a quorum, which can slow down throughput and increase latency.

  3. Maintenance: Quorum-based systems need to be maintained on a regular basis to make sure they stay consistent and ready. This can take a lot of time and resources.

  4. Configuration: It can be hard to set up quorum-based systems because the size and type of the quorum must be carefully chosen to meet particular requirements for consistency and availability.

  5. Security: Quorum-based systems can be attacked in ways like Byzantine faults, which can hurt the stability of the system and the security of the data. You can read more here

Example (PoC) in NodeJS/Typescript with express

This PoC implements a simple distributed system that allows clients to read and write data to the system using quorum-based systems. The system uses Express as the web framework and Axios as the HTTP client for communicating with other nodes in the system.

import express, { Request, Response } from 'express';
import axios from 'axios';

const app = express();
const nodes = [
  'http://node1.example.com',
  'http://node2.example.com',
  'http://node3.example.com',
];

const readQuorumSize = 2;
const writeQuorumSize = 3;

app.get('/read/:key', async (req: Request, res: Response) => {
  const key = req.params.key;
  const readNodes = nodes.slice(0, readQuorumSize);

  const values = await Promise.allSettled(
    readNodes.map((node) => axios.get(`${node}/data/${key}`))
  );

  const validValues = values.filter(
    (value) => value.status === 'fulfilled' && value.value.data !== undefined
  );

  if (validValues.length < readQuorumSize) {
    return res.status(500).send('Not enough nodes available to read data');
  }

  const data = validValues[0].value.data;
  res.send(data);
});

app.post('/write/:key', async (req: Request, res: Response) => {
  const key = req.params.key;
  const value = req.body;
  const writeNodes = nodes.slice(0, writeQuorumSize);

  const results = await Promise.allSettled(
    writeNodes.map((node) => axios.post(`${node}/data/${key}`, value))
  );

  const validResults = results.filter(
    (result) => result.status === 'fulfilled'
  );

  if (validResults.length < writeQuorumSize) {
    return res.status(500).send('Not enough nodes available to write data');
  }

  res.send('Data written successfully');
});

app.listen(3000, () => {
  console.log('Server started');
});

The system uses a configuration quorum for both read and write operations, which ensures that a quorum of nodes agrees on the operation before it is considered valid. The read quorum size is set to 2, and the write quorum size is set to 3, which means that at least 2 nodes must agree for a read operation to be valid, and at least 3 nodes must agree for a write operation to be valid.

In the read operation, the system queries a subset of nodes specified by the read quorum size, and if enough valid responses are received, the system returns the value of the data. In the write operation, the system sends the data to a subset of nodes specified by the write quorum size, and if enough valid responses are received, the system returns a success message.

This PoC is just a basic example of a quorum-based system and can be extended and modified to meet specific requirements.

Conclusion

In distributed systems, quorum-based systems are an essential mechanism for maintaining consistency and availability. By requiring a quorum of nodes to agree on a particular decision or action, quorum-based systems ensure that the system remains consistent and available, even in the face of node failures or network partitions.