Distributed Systems: Synchronisation in Complex Systems

Complex systems are used in almost every aspect of computer science and engineering, from distributed databases and networked applications to multi-core processors and real-time embedded systems. To make sure that these complex systems work right and give dependable results, it is of the utmost importance to make sure that they are consistent and honest. Synchronisation becomes a key idea in keeping this regularity, allowing different parts and processes to work together well and produce correct results.

Introduction

A distributed system is a type of computer system that is made up of numerous independent computers which are connected through a network and communicate with one another. A computer that is part of such a system is referred to as a node, and each node in the system is tasked with carrying out a certain operation.

Distributed systems are capable of managing enormous amounts of data and traffic and can continue to function normally even if some of the nodes in the system fail. This is because distributed systems are composed of multiple nodes, and because of this, distributed systems can provide excellent scalability, dependability, and fault tolerance, which are the key advantages of using distributed systems.

However, some difficulties emerge with distributed systems, such as the requirement to synchronise and maintain consistency across all nodes.

Understanding Synchronization

Synchronisation is the coordination of activities and events between more than one entity to reach a shared goal in a way that is correct and makes sense. In the setting of complex systems, synchronisation is the process of managing how different parts, processors, threads, or distributed nodes interact with each other. This is done to keep the system in a coherent state and avoid conflicts that could cause it to act wrongly or corrupt data.

What are the Challenges in Complex Systems?

Complex systems often have many parts or processes that run at the same time and need to share resources, talk to each other, and share data.

There are some issues that can occur when things are not kept, a few of them are highlighted below:

Race Conditions

These occur when multiple processes or threads access shared resources concurrently and the final outcome depends on the order of execution. Race conditions can lead to unpredictable behavior and data corruption.

Deadlocks

A deadlock happens when multiple processes are unable to proceed because each is waiting for a resource held by another, resulting in a standstill.

Data Inconsistency

In distributed systems, data is often replicated across different nodes. Without proper synchronization, inconsistencies can arise due to delayed updates or conflicting modifications.

Starvation

Some processes may be indefinitely delayed in accessing resources or progressing due to poor synchronization strategies, leading to reduced system performance.

Which aspect of distributed system requires Synchronization?

Synchronisation is essential in a distributed system because it guarantees that all of the system's nodes are working towards the same objective and are aware of the actions taken by the other nodes in the system.

Simply put, the process of coordinating the actions of numerous computers (nodes) to make them function more efficiently together is referred to as synchronisation.

It is required in several aspects of distributed systems, some of which are:

Resource Access

It's possible that numerous nodes in a distributed system will require access to the same resource at the same time. Therefore, to make sure that only one node can access a resource at any given moment, synchronisation techniques are utilised.

Event Ordering

Events can happen at different times on different parts of a distributed system. Synchronisation techniques are used to make sure that events are set up in the right way so that nodes can handle them in the right order.

Clock Synchronization

In a distributed system, each node has its own clock that isn't necessarily in sync with the others. Synchronisation techniques are used to ensure that all nodes have the same perception of time passing.

What is Consistency in Distributed Systems?

Consistency is the requirement that all nodes in a distributed system see the same data at the same time. In a distributed system, maintaining consistency is challenging because data can be modified on different nodes at different times. Consistency is required in several areas, two of which are:

Data replication: Data may be replicated across numerous nodes in a distributed system to offer fault tolerance and availability. Consistency between replicas is required to ensure that all nodes see the same data.

Distributed transactions: Transactions in a distributed system may involve numerous nodes. To ensure that a transaction is completed correctly, all nodes engaged in the transaction must maintain consistency.

What are the types of Synchronization Mechanisms?

Locks/Mutexes

These are basic synchronisation primitives that only let one process or thread hold a lock at a time. This limits who can use a shared resource. Even though it works well to avoid race situations, using it wrong can lead to deadlocks or less concurrency.

Semaphores

Semaphores are counters with number values that are used to control who can use a set of resources. They are useful for limiting the number of processes that can use a shared resource at the same time.

Monitors

Monitors are a type of concept that combines data and synchronisation basics into a single unit. They wrap up shared data and methods and only let one thread run at a time in the monitor, so only one thread can view the data at a time.

Message Passing

In a distributed system, processes talk to each other by sending each other messages. Messages are sent and received consistently and in the right order when the right protocols are used.

Atomic Operations

These are actions that are carried out as a whole, without being broken up into smaller steps. They safeguard data from outside influence and guarantee its integrity.

What are the techniques for achieving synchronization and consistency?

Several techniques are used to achieve synchronization and consistency in distributed systems. Some of these techniques include:

Two-phase commit

Two-phase commit is a protocol used to ensure that distributed transactions are completed correctly. In the two-phase commit protocol, all nodes involved in a transaction must agree to commit the transaction before it is considered complete.

Vector clocks

Vector clocks and logical clocks are techniques for ordering events in a distributed system, which aids in tracking the causality relationship between occurrences. Vector clocks give each node a vector that represents its point of view on events, whereas logical clocks keep a global logical time ordering. Even in the presence of network delays and asynchronous communication, these techniques help maintain a consistent view of events.

Quorum-based systems

Quorum-based systems are used to ensure that data replicas are consistent. A majority of nodes in a quorum-based system must agree on the value of a piece of data before it is considered correct.

Consensus algorithms

Consensus algorithms are used in distributed systems to ensure that all nodes agree on a certain value or decision. Consensus algorithms are used in situations when nodes must agree on a value, such as when electing a leader or establishing the order of transactions.

Clock synchronization protocols

Clock synchronisation techniques are used to ensure that all nodes in a distributed system view time in the same way. Network Time technology (NTP) is a commonly used clock synchronisation protocol that maintains clock synchronisation over the network within a few milliseconds.

What are the things to consider when adopting synchronisation?

Network Latency

Network latency makes it take longer for nodes in different places to talk to each other, which makes it hard to keep real-time continuity. Synchronisation methods have to take into account different latency times and make sure that nodes don't mistakenly think that delayed updates are events that happened out of order.

Node Failures

Distributed systems often have nodes fail because of problems with the hardware, bugs in the software, or problems with the network. Synchronisation methods need to be able to handle situations in which nodes stop responding. This makes sure that the system continues to work correctly even when nodes stop responding.

Scalability

As distributed systems get bigger, it gets harder to keep everything in sync. Synchronisation mechanisms must be able to grow as the number of nodes and interactions increase, without sacrificing speed.

Consistency-Availability Trade-off

The CAP theorem says that a distributed system can provide at most two out of three properties: Consistency, Availability, and Partition tolerance. Strategies for synchronisation often require making trade-offs between these qualities, which requires careful thought based on the needs of the application.

Concurrency and Contentions

When multiple nodes access and change shared resources at the same time, this can lead to disagreement and conflict. Finding the right balance between concurrency and synchronisation is tricky because too much locking can slow down speed, while too much concurrency can lead to data corruption.

Demonstrating synchronization in a distributed system

Suppose we have a simple distributed system with two replicas of a key-value store. We want to ensure that any updates to the key-value store are synchronized across both replicas so that clients can always read the latest version of the data.

We can achieve this by implementing a synchronization protocol between the replicas. One such protocol is the two-phase commit protocol, which involves the following phases:

Prepare phase: The coordinator asks all replicas to prepare to commit changes.
Commit phase: If all replicas can prepare successfully, the coordinator asks all replicas to commit the changes.

In this example, we define a Replica class that represents a single replica of the distributed database. The class contains a Map that holds the key-value pairs of the database, as well as get() and set() methods to read and write to the database.

// Define a replica class that holds a copy of the distributed database
class Replica {
  private data: Map<string, string> = new Map();

  get(key: string): string | undefined {
    return this.data.get(key);
  }

  set(key: string, value: string): void {
    this.data.set(key, value);
  }
}

We also define a Coordinator class that acts as the coordinator for the two-phase commit protocol between replicas. The class contains an array of Replica objects, as well as a transactionInProgress flag to prevent multiple transactions from occurring simultaneously. The beginTransaction() method is the entry point for the two-phase commit protocol. It first checks whether a transaction is already in progress, and returns an error if one is. Otherwise, it proceeds to the prepare and commit phases.

// Define the coordinator class that coordinates two-phase commit protocol between replicas
class Coordinator {
  private replicas: Replica[];
  private transactionInProgress: boolean = false;

  constructor(replicas: Replica[]) {
    this.replicas = replicas;
  }

  async beginTransaction(transaction: Transaction): Promise<boolean> {
    if (this.transactionInProgress) {
      console.error("Transaction already in progress.");
      return false;
    }

    console.log(`Starting transaction with key ${transaction.key} and value ${transaction.value}`);

    try {
    // Phase 1: Prepare phase - ask all replicas to prepare to commit changes
      this.transactionInProgress = true;
      const prepareResponses = await Promise.all(
        this.replicas.map(async (replica) => await this.sendPrepare(replica, transaction))
      );

      if (prepareResponses.some((response) => !response)) {
        console.error("One or more replicas failed to prepare.");
        return false;
      }

    // Phase 2: Commit phase - ask all replicas to commit changes
      const commitResponses = await Promise.all(
        this.replicas.map(async (replica) => await this.sendCommit(replica, transaction))
      );

      if (commitResponses.some((response) => !response)) {
        console.error("One or more replicas failed to commit.");
        // Strong consistency
        return false;
      }

      console.log(`Transaction committed with key ${transaction.key} and value ${transaction.value}`);
      return true;
    } catch (error) {
      console.error(`Error during transaction: ${error.message}`);
      return false;
    } finally {
      this.transactionInProgress = false;
    }
  }

  private async sendPrepare(replica: Replica, transaction: Transaction): Promise<boolean> {
    console.log(`Preparing replica ${replica}`);
    // Simulate network delay
    await new Promise((resolve) => setTimeout(resolve, Math.random() * 1000));
    const currentValue = replica.get(transaction.key);
    if (currentValue !== undefined) {
      console.log(`Replica ${replica} prepared successfully`);
      return true;
    } else {
      console.log(`Replica ${replica} failed to prepare`);
      return false;
    }
  }

  private async sendCommit(replica: Replica, transaction: Transaction): Promise<boolean> {
    console.log(`Committing to replica ${replica}`);
    // Simulate network delay
    await new Promise((resolve) => setTimeout(resolve, Math.random() * 1000));
    replica.set(transaction.key, transaction.value);
    console.log(`Replica ${replica} committed successfully`);
    return true;
  }
}

In the prepare phase, the Coordinator object sends a prepare message to each replica, and waits for a response. If any replica fails to prepare successfully, the transaction is aborted and an error response is returned.

In the commit phase, the Coordinator object sends a commit message to each replica, and waits for a response. If any replica fails to commit successfully, the transaction is aborted and an error response is returned.

To simulate the network delay between replicas, the sendPrepare() and sendCommit() methods each contain a call to setTimeout() with a random delay.

Finally, we set up an Express app with a single endpoint /transaction that calls the beginTransaction() method on the Coordinator object.

import express from "express";

interface Transaction {
  key: string;
  value: string;
}

const replica1 = new Replica();
const replica2 = new Replica();
const coordinator = new Coordinator([replica1, replica2]);

const app = express();
app.use(express.json());

app.post("/transaction", async (req, res) => {
  const transaction = req.body as Transaction;
  const result = await coordinator.beginTransaction(transaction);
  if (result) {
    res.sendStatus(200);
  } else {
    res.sendStatus(500);
  }
});

app.listen(3000, () => {
  console.log("Server started on port 3000");
});

With this implementation, clients can initiate a transaction by sending a POST request to the /transaction endpoint. The Coordinator object will then coordinate the two-phase commit protocol between the replicas, ensuring that any updates to the database are synchronized across both replicas.

Some Real-world applications of synchronisation

The role of synchronization in maintaining consistency is evident across a broad spectrum of real-world applications:

Databases: Transaction management relies on synchronization to maintain the integrity of data and prevent conflicts between concurrent transactions.
Operating Systems: To ensure fair and effective resource utilisation, process scheduling, memory management, and resource sharing all need synchronisation.
Parallel Programming: Synchronisation is used by multi-core processors to organise the execution of multiple threads, making sure that data is shared correctly and preventing "race conditions."
Distributed Systems: Systems with multiple nodes need synchronisation to handle data replication, keep stability, and stop conflicts.

Conclusion

In the complicated and interconnected world of computers, synchronisation is a key part of keeping systems correct, secure, and able to work together. Synchronisation mechanisms stop race conditions, deadlocks, and data inconsistencies by managing how different parts communicate and share resources. As technology improves and systems get more complicated, it's still important to know a lot about synchronisation and how it can be used to make reliable, high-performance systems that give accurate results.