Every team working with distributed systems must eventually overcome the issue of distributed locking. Every system we create is vulnerable to failure, and this risk is heightened while operating in a time of war. If you don't take action, there are numerous ways for your software to become trapped in a never-ending cycle of failure detection, emergency recovery, etc.
What is Distributed Locking?
Distributed locking is the process of managing access to a single resource that is shared by a number of clients at the global level. This enables a group of clients to effectively work together to make sure that only one of them has access at any given time. The aim of distributed locking, a type of mutual exclusion, is to guarantee that only one operation can take place at a time. The most typical comparison is to a locker room where each person has just one locker and must arrange access to it. The locker in a distributed system could be a database or another shared resource. Distributed locking can be used whenever a distributed system has to coordinate access to a shared resource.
The locker problem in distributed systems
Consider Alice and Bob as two individuals who require access to a single locker at a train station. They must cooperate with one another to access the one locker that is available. There are three possible outcomes for this situation:
- One of them might open the locker first, and the other might follow suit by failing or waiting. Depending on the outcome of a coin flip, Alice might win the locker and Bob might lose it. But since neither party will be aware when they are up against the other, they might have to wait an eternity.
- Or perhaps both of them simultaneously obtain the locker. They may arrive at the same moment and fail to wait for one another if they are both moving too slowly or too quickly. If the locker is broken, they can both receive it at the same time.
In distributed systems, the locker problem can be solved in exactly the same three methods. Although you can add as many computers as you want, each person is still only allowed one locker.
Distributed Lock Strategies
The locker problem is solved using a wide range of techniques in distributed systems. When two operations occur simultaneously in distributed systems, it doesn't necessarily signify that two users are attempting to use the same locker. It denotes the simultaneous creation or updating of a resource by two individuals. A database row, a message queue, or anything else that can exist separately from other activities could all be considered resources. The system must wait for one of these processes to finish before continuing in order to avoid these procedures racing and leading to a damaged system.
Varieties of distributed lock strategies.
- Relying on Shared Nothingness and Good Timing
- Distributed Lockers
- Allowing for Merging of Conflicting Operations
Relying on Shared Nothingness and Good Timing
In many distributed systems, the solution to a distributed locking problem is as simple as waiting for one action to finish before starting another. This is frequently referred to as a distributed locking "wait-and-see" strategy. Operations in this kind of system are intended to pause for a while before continuing. This is frequently observed with distributed databases like Postgres, distributed caches like Redis, and message queuing systems like RabbitMQ and Kafka. Every operation in these systems is planned to finish as soon as possible because they are all created for shared emptiness. As a result, if two operations are vying for a lock, the activity that can wait for the other operation to finish first will often win.
Centralised services known as distributed lockers are in charge of creating, locking, and unlocking shared resources. They are frequently used to give a variety of applications basic distributed locking capabilities.
- The Network Information Service (NIS), established by Sun Microsystems in the middle of the 1980s as a common means to distribute user/group information across networks, is one of the early instances of this paradigm. NIS provided a centralised locker service that other programmes could use to manage access to shared resources because it was designed to be utilised across many types of systems.
- ZooKeeper is another illustration of a locker service. To manage access to common filesystem paths on a distributed system, ZooKeeper was initially intended to be a centralised service. It transformed into a widespread dispersed locker service over time.
Allowing for Merging of Conflicting Operations
It may be possible for you to profit from the attempted race if two operations are vying for the same lock. For example, you could use the attempted race to create a completely new distributed resource that both processes rely on. In this case, the system detects the conflict, waits for one of the activities to finish, and then uses the results to create a new distributed resource that is used by both operations. A "take-turns" mechanism is a common name for such a distributed lock technique. An example of this is a distributed lock that uses a shared counter to determine who will get the lock next. This is typical in distributed logging systems like Kafka. In this system, whenever one operation is composing a log entry, the other two operations attempt to obtain the shared log lock. If they are successful, they both raise the counter to demonstrate that they are using the lock. If that doesn't work, they both raise the counter and try to open the door once more. The counter is reduced and the other operation is notified that it is their turn now when one of the operations completes successfully.
Distributed locking is one of the most important problems that arise in distributed systems, which provide their own set of difficulties. This issue can be solved using a variety of tactics, each with a unique set of trade-offs. Since the locker problem is particularly difficult to resolve in distributed systems, numerous distributed locking solutions have been developed over time. The most crucial thing to keep in mind is that distributed locking is an opportunity for innovative solutions, not a problem.