<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0"><channel><title><![CDATA[Backend Engineering w/Sofwan]]></title><description><![CDATA[A dedicated, experienced, and versatile Backend Engineer with a strong engineering background with a Bachelor's degree focused in Computer Science (Education) f]]></description><link>https://blog.sofwancoder.com</link><image><url>https://cdn.hashnode.com/res/hashnode/image/upload/v1660112579313/2jmZOt6W9.png</url><title>Backend Engineering w/Sofwan</title><link>https://blog.sofwancoder.com</link></image><generator>RSS for Node</generator><lastBuildDate>Tue, 14 Apr 2026 14:56:00 GMT</lastBuildDate><atom:link href="https://blog.sofwancoder.com/rss.xml" rel="self" type="application/rss+xml"/><language><![CDATA[en]]></language><ttl>60</ttl><item><title><![CDATA[Protocols: Nothing Works Without Rules]]></title><description><![CDATA[If you've spent any time in the trenches of software engineering, especially wrestling with distributed systems or large-scale architectures, you know that complexity is the name of the game. We build layers upon layers, abstractions over abstraction...]]></description><link>https://blog.sofwancoder.com/protocols-nothing-works-without-rules</link><guid isPermaLink="true">https://blog.sofwancoder.com/protocols-nothing-works-without-rules</guid><category><![CDATA[engineering]]></category><category><![CDATA[backend]]></category><category><![CDATA[internet]]></category><category><![CDATA[Kernel]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sun, 20 Apr 2025 19:19:47 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1745161873332/67641e82-6dc7-4a52-821f-67f8be590d52.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>If you've spent any time in the trenches of software engineering, especially wrestling with distributed systems or large-scale architectures, you know that complexity is the name of the game. We build layers upon layers, abstractions over abstractions, trying to tame the beast. But have you ever stopped to think about the <em>absolute bedrock</em> upon which all this complexity rests?</p>
<p>It’s not design patterns, not fancy frameworks or syntactic sugar, just good old rules. It's something much more fundamental, almost invisible, yet utterly pervasive: <strong>Protocols</strong>. They are really just those invisible boundaries and structures that define how things should work, and more importantly, how they shouldn’t.</p>
<p>Lately, I’ve been reflecting a lot on the core of our discipline, and I’ve come to a conclusion that feels both obvious and philosophical at the same time: <strong>everything is protocol</strong>. Strip away all the abstraction, peel back the layers of your clean architecture, and what you’ll find underneath. if it’s well-engineered, it is a set of protocols, a defined, agreed-upon ways of interaction.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>Protocols aren't just about <code>HTTP</code> requests or <code>TCP</code> handshakes; they are the fundamental rule sets that govern <em>any</em> interaction, from the subatomic level to sprawling global networks. They are the source of immense power, enabling collaboration and complexity (the "Good"), but also the origin of frustrating constraints, baffling bugs, and security nightmares (the "Evil"). They are, quite literally, how <em>anything</em> is built on <em>anything</em>.</p>
<p>At the simplest level, a protocol is a rule or set of rules that define how entities communicate and interact. In software, that could be as high-level as HTTP or as low-level as TCP/IP. But this idea goes beyond networking.</p>
<p>Consider object-oriented programming, what is an interface if not a protocol? A promise: “<strong>Any class that implements this will behave this way.</strong>” Same thing with APIs. Same thing with serialization formats. Protocols are everywhere. They are not optional, and they are not “nice to haves”. They are the very things that enable systems to be built on top of one another.</p>
<h2 id="heading-what-are-we-really-talking-about-when-we-say-protocol">What Are We <em>Really</em> Talking About When We Say "Protocol"?</h2>
<p>Forget RFCs and formal definitions for a second. At its heart, a protocol is simply an <strong>agreement</strong>. It's a set of rules that defines how two or more entities will interact. Think about the simplest human interaction: a handshake.  </p>
<ol>
<li><p><strong>Initiation:</strong> One person extends their hand.</p>
</li>
<li><p><strong>Syntax:</strong> The hand is usually open, palm facing inwards or slightly up.</p>
</li>
<li><p><strong>Response:</strong> The other person mirrors the action, extending their own hand.</p>
</li>
<li><p><strong>Semantics:</strong> The hands clasp. This signifies greeting, agreement, or farewell.  </p>
</li>
<li><p><strong>Action:</strong> A brief shake (the timing and pressure are also subtle parts of the protocol!).</p>
</li>
<li><p><strong>Termination:</strong> The hands release.</p>
</li>
</ol>
<p>Break any of these implicit rules, and the interaction feels off, fails, or conveys a different message entirely. Offer a closed fist, hold on too long, use the wrong hand in some cultures – you've violated the protocol.</p>
<p>This simple example highlights the core components of <em>any</em> protocol, whether social or technical:</p>
<ul>
<li><p><strong>Syntax:</strong> The structure or format of the messages/actions (e.g., the layout of bits in a network packet, the required fields in a JSON payload, the posture of a handshake).</p>
</li>
<li><p><strong>Semantics:</strong> The meaning of the messages/actions (e.g., <code>SYN</code> means "I want to connect," <code>200 OK</code> means "Request successful," a clasped hand means "Greeting acknowledged").</p>
</li>
<li><p><strong>Timing/Ordering:</strong> When messages/actions should happen and in what sequence (e.g., you must send a <code>SYN</code> before an <code>ACK</code> in TCP, you offer your hand <em>before</em> shaking).</p>
</li>
</ul>
<p>Without these agreed-upon rules, communication and interaction descend into chaos. Nothing gets built. Nothing functions.</p>
<p><a target="_blank" href="https://blog.sofwancoder.com/http-on-tcp-stateless-protocol-on-the-internets-stateful-network">Read more on how Stateless HTTP protocol was built on TCP here</a>.</p>
<h2 id="heading-what-are-these-rules">What are these rules?</h2>
<p>These can be low-level like TCP, or application-level like gRPC or GraphQL. You can even think of REST conventions or Kafka message schemas as protocols.</p>
<p>Take this example of a client-server interaction over HTTP:</p>
<pre><code class="lang-plaintext">GET /api/users HTTP/1.1
Host: example.com
Accept: application/json
Authorization: Bearer token123
</code></pre>
<p>If the server <strong>doesn’t follow the HTTP protocol</strong>, and maybe responds with a malformed header, your well-behaved client might crash or throw a cryptic error. The contract is broken.</p>
<p>Protocols are beautiful because they create predictability. Predictability means stability, and in a large system, that’s gold.</p>
<p>But protocols are also strict. They don’t care about your business logic or your fancy framework. If the data doesn’t conform, the request is rejected. If a handshake isn’t done right, the connection dies. If the quorum isn’t met, no consensus is reached. It’s brutal, but it’s necessary.</p>
<p>I’ve seen well-meaning developers treat protocols as if they were guidelines. That’s a mistake. <strong>Protocols aren’t guidelines. They’re contracts. Break them, and the consequences range from silent failures to catastrophic outages.</strong></p>
<hr />
<p>For example, consider the following rust code which expects a json response (contract):</p>
<pre><code class="lang-rust"><span class="hljs-keyword">use</span> reqwest::header::ACCEPT;
<span class="hljs-keyword">use</span> std::error::Error;
<span class="hljs-meta">#[tokio::main]</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">fn</span> <span class="hljs-title">main</span></span>() -&gt; <span class="hljs-built_in">Result</span>&lt;(), <span class="hljs-built_in">Box</span>&lt;<span class="hljs-keyword">dyn</span> Error&gt;&gt; {
    <span class="hljs-keyword">let</span> client = reqwest::Client::new();
    <span class="hljs-keyword">let</span> res = client
        .get(<span class="hljs-string">"https://malformed.example.com/api/user/42"</span>)
        .header(ACCEPT, <span class="hljs-string">"application/json"</span>)
        .send()
        .<span class="hljs-keyword">await</span>;
    <span class="hljs-keyword">match</span> res {
        <span class="hljs-literal">Ok</span>(response) =&gt; {
            <span class="hljs-keyword">if</span> response.status().is_success() {
                <span class="hljs-keyword">let</span> json = response.json::&lt;serde_json::Value&gt;().<span class="hljs-keyword">await</span>;
                <span class="hljs-keyword">match</span> json {
                    <span class="hljs-literal">Ok</span>(data) =&gt; <span class="hljs-built_in">println!</span>(<span class="hljs-string">"User profile: {:#?}"</span>, data),
                    <span class="hljs-literal">Err</span>(e) =&gt; eprintln!(<span class="hljs-string">"⚠️  Failed to parse JSON: {}"</span>, e),
                }
            } <span class="hljs-keyword">else</span> {
                eprintln!(<span class="hljs-string">"❌ HTTP error: {}"</span>, response.status());
            }
        }
        <span class="hljs-literal">Err</span>(e) =&gt; {
            eprintln!(<span class="hljs-string">"🚨 Protocol error: {}"</span>, e);
            <span class="hljs-comment">// e.g. "error reading response headers: invalid HTTP header"</span>
        }
    }
    <span class="hljs-literal">Ok</span>(())
}
</code></pre>
<p>This code looks innocent and robust. You handle errors based on status codes and expect a clean JSON body.</p>
<p>But here’s what could <em>break</em> everything:</p>
<p><strong>Scenario:</strong></p>
<p>The server responds like this:</p>
<pre><code class="lang-plaintext">HTTP/1.1 200 OK
Content-Type: application/json
Content-Length: 32
X-Custom-Header

{ "name": "Sofwan", "role": "Admin" }
</code></pre>
<p>Notice that malformed header? <code>X-Custom-Header</code> is missing a value.</p>
<p><strong>What happens?</strong></p>
<ul>
<li><p>The HTTP parser chokes.</p>
</li>
<li><p><code>fetch()</code> throws a low-level error like <code>SyntaxError: Unexpected end of JSON input</code> or worse, a native crash.</p>
</li>
<li><p>Your graceful business logic never gets a chance to run.</p>
</li>
</ul>
<hr />
<p><strong>Lesson:</strong></p>
<ul>
<li><p>Your <strong>client</strong> followed the rules.</p>
</li>
<li><p>Your <strong>code</strong> was clean.</p>
</li>
<li><p>But the <strong>server</strong> broke the protocol contract.</p>
</li>
</ul>
<p>And the protocol doesn’t bend. There’s no “maybe” or “almost correct.” It either complies or it fails—hard.</p>
<hr />
<p>One time, in a project where we were building a real-time event pipeline, we switched from one message broker to another, both supporting the same messaging semantics. Except, as it turned out, one of them was more strict about message acknowledgment order. That slight difference in protocol adherence exposed a lurking race condition that had been hiding in our consumer logic for weeks. We only found out because production messages started disappearing. Just like that.</p>
<h2 id="heading-protocols-in-engineering-the-lifeblood-of-distributed-systems">Protocols in Engineering: The Lifeblood of Distributed Systems</h2>
<p>Nowhere is the power and peril of protocols more evident than in distributed computing. When you have multiple machines, potentially separated by unreliable networks, trying to coordinate and achieve a common goal, unambiguous, robust protocols are not just helpful; they are <strong>essential</strong>.</p>
<ul>
<li><p><strong>Consensus Protocols (Raft, Paxos):</strong> How do multiple nodes agree on a value or a state transition, even if some nodes crash or messages get lost? These protocols are incredibly intricate sets of rules defining message types (AppendEntries, RequestVote), state transitions, and leader election logic. Get the protocol implementation slightly wrong, and you get split-brain scenarios, data corruption, or total system unavailability.<br />  Protocols like Raft are designed to be <em>unambiguous</em>, because ambiguity in distributed consensus leads to <strong>split-brain</strong> scenarios, data loss, or total collapse. An example is,</p>
<pre><code class="lang-rust">  <span class="hljs-keyword">if</span> <span class="hljs-keyword">self</span>.state == Follower &amp;&amp; election_timeout_expired() {
      <span class="hljs-keyword">self</span>.state = Candidate;
      <span class="hljs-keyword">self</span>.current_term += <span class="hljs-number">1</span>;
      <span class="hljs-keyword">self</span>.votes = <span class="hljs-number">1</span>; <span class="hljs-comment">// vote for self</span>
      broadcast_vote_request();
  }
</code></pre>
<p>  This is not a simple if-statement. It’s part of a carefully defined <strong>state machine</strong>, where every node must transition the same way for the cluster to remain consistent.</p>
</li>
<li><p><strong>Replication Protocols:</strong> How does a primary database node ensure its replicas have the same data? Synchronous, asynchronous, semi-synchronous replication – these are all protocols defining the interaction and guarantees between the primary and its followers.</p>
</li>
<li><p><strong>Messaging Protocols (AMQP, MQTT, Kafka Protocol):</strong> How do producers send messages and consumers receive them reliably and efficiently via a broker? These protocols define message formats, delivery guarantees (at-least-once, at-most-once, exactly-once – each a different protocol!), acknowledgments, and topic/queue semantics.</p>
</li>
<li><p><strong>Remote Procedure Call (RPC) Protocols (gRPC, Thrift):</strong> How does one service invoke a function on another service across a network as if it were local? These involve protocols for serialization (Protocol Buffers, Avro – protocols themselves!), request/response mapping, error handling, and connection management.</p>
</li>
<li><p><strong>API Contracts (REST, GraphQL):</strong> While often seen as architectural styles, the specific way you structure your URLs, use HTTP verbs, format your JSON/GraphQL queries and responses <em>is</em> a protocol between your frontend and backend, or between microservices. A poorly defined or inconsistently implemented API protocol leads to endless integration headaches.</p>
</li>
</ul>
<p>In distributed systems, protocols are the invisible threads holding everything together over a chasm of network latency and potential failures. You cannot fake it. <strong>You cannot fake correctness</strong>. Either your protocols are sound, or your system is broken. They <em>are</em> the system, in many ways.</p>
<h2 id="heading-the-grand-tapestry-anything-built-on-anything">The Grand Tapestry: Anything Built on Anything</h2>
<p>It’s almost poetic, right? But it’s also very literal in software engineering. This is where the magic, and sometimes the madness, truly lies. Our world, both natural and artificial, <strong>is a stack of protocols.</strong> It’s no exaggeration to say that the <strong>internet itself is a stack of protocols</strong>, beautifully layered, <strong>each one building on the constraints and guarantees of the one below</strong>. That’s not an accident. That’s engineering.</p>
<p>Think about physics. The fundamental forces and particles interact according to strict rules (protocols). These rules allow atoms to form. The rules governing atomic interactions (chemistry protocols) allow molecules to form. Molecular interactions (biochemical protocols) allow cells to function. Cell interactions allow organisms. Organism interactions (social protocols, language) allow societies.  </p>
<p>It’s protocols all the way down.</p>
<p>Now, map this to our world of software engineering (using the OSI model)</p>
<ol>
<li><p><strong>Physical Layer:</strong> How voltages or light pulses represent bits on a wire or fiber. That's a protocol.  </p>
</li>
<li><p><strong>Data Link Layer:</strong> How bits are grouped into frames, how to detect errors, how to manage access to the physical medium (e.g., Ethernet protocol). Built <em>on</em> the physical layer protocol.</p>
</li>
<li><p><strong>Network Layer:</strong> How to route packets across multiple networks (e.g., IP protocol). Built <em>on</em> the data link layer protocol. It doesn't care if it's Ethernet or Wi-Fi underneath, as long as the lower layer adheres to <em>its</em> expected protocol.  </p>
</li>
<li><p><strong>Transport Layer:</strong> How to provide reliable (TCP) or unreliable (UDP) end-to-end communication, manage flow control, and segment data. Built <em>on</em> the network layer protocol.</p>
</li>
<li><p><strong>Application Layer:</strong> How specific applications communicate (e.g., HTTP for web, SMTP for email, gRPC for RPC). Built <em>on</em> the transport layer protocol.  </p>
</li>
</ol>
<p>Each layer relies on the guarantees provided by the layer below it, interacting with it through a well-defined protocol (an interface, essentially). It abstracts away the details of the lower layers, allowing engineers working on the Application Layer (like many of us) to think about application logic without worrying about voltage levels or frame collisions.</p>
<hr />
<p>This layering, enabled entirely by protocols, is the <em>only</em> reason we can build systems as complex as the modern internet or large-scale distributed databases. Imagine trying to write a web application if you had to manually manage packet routing and error correction for every single request! Protocols are the great abstraction enablers.</p>
<p>A message broker speaks AMQP or MQTT. Your backend talks JSON over HTTPS. Inside your services, gRPC messages dance over HTTP/2. Below all of that, it's TCP. Beneath that, IP. Below that, Ethernet. Each layer is built on a protocol, defined to the letter, specifying behaviour, expectations, constraints.</p>
<p>Systems can only interoperate if they speak the same language, and the language is defined by a protocol.</p>
<p>Take distributed systems for example. The moment you split a system across network boundaries, you’ve walked into a land ruled by protocols. Consistency? Availability? Partition tolerance? These CAP theorem elements are not abstract ideas, they manifest in how your nodes agree (<strong>consensus protocols</strong>), how they replicate (<strong>replication protocols</strong>), how they detect faults (<strong>heartbeat protocols</strong>), and how they recover (<strong>failure and healing protocols</strong>).</p>
<hr />
<p><strong>Take the following code snippet for example</strong></p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Application Layer – Developer's perspective</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">sendWelcomeEmail</span>(<span class="hljs-params">user: User</span>) </span>{
  <span class="hljs-keyword">const</span> message = {
    to: user.email,
    subject: <span class="hljs-string">"Welcome to Our Platform"</span>,
    body: <span class="hljs-string">`Hi <span class="hljs-subst">${user.name}</span>, thanks for joining us!`</span>
  };

  <span class="hljs-comment">// Message is serialized into JSON over HTTPS</span>
  <span class="hljs-keyword">await</span> fetch(<span class="hljs-string">"https://email-service.internal/api/send"</span>, {
    method: <span class="hljs-string">"POST"</span>,
    headers: {
      <span class="hljs-string">"Content-Type"</span>: <span class="hljs-string">"application/json"</span>,
      <span class="hljs-string">"Authorization"</span>: <span class="hljs-string">`Bearer <span class="hljs-subst">${process.env.API_TOKEN}</span>`</span>
    },
    body: <span class="hljs-built_in">JSON</span>.stringify(message)
  });
}
</code></pre>
<p>That looks simple, right? But here’s what’s actually happening underneath:</p>
<p><strong>HTTP Layer (Application → Transport)</strong></p>
<ul>
<li><p>The <code>fetch()</code> call constructs an HTTP request—following the <strong>HTTP/1.1 or HTTP/2 protocol</strong> spec.</p>
</li>
<li><p>Headers are formatted, body is encoded, request line is created.</p>
</li>
</ul>
<p><strong>TLS Layer (Transport → Network)</strong></p>
<ul>
<li>If HTTPS is used, TLS handles <strong>encryption</strong>, <strong>handshake</strong>, and <strong>certificate validation</strong>—following the <strong>TLS protocol</strong>.</li>
</ul>
<p><strong>TCP Layer</strong></p>
<ul>
<li>Below that, the request is chunked into packets and sent over <strong>TCP</strong>, which manages <strong>ordering</strong>, <strong>packet loss</strong>, and <strong>retry mechanisms</strong>.</li>
</ul>
<p><strong>IP Layer</strong></p>
<ul>
<li>TCP hands data to <strong>IP</strong>, which handles <strong>addressing</strong> and <strong>routing</strong> packets across networks.</li>
</ul>
<p><strong>Link Layer (Ethernet)</strong></p>
<ul>
<li>The network adapter frames the IP packets and sends them as <strong>electrical signals or photons</strong> using <strong>Ethernet</strong> or Wi-Fi protocols.  </li>
</ul>
<hr />
<p><strong>On the flip side;</strong></p>
<ul>
<li><p>If the <strong>TLS handshake fails</strong>, the entire request fails.</p>
</li>
<li><p>If <strong>TCP drops a packet</strong>, but retries work, the developer never notices.</p>
</li>
<li><p>If <strong>Ethernet collisions</strong> aren’t handled by the protocol, your entire application breaks.</p>
</li>
</ul>
<p>That’s the beauty of protocol layering: every layer <strong>abstracts away the horror</strong> of the layer beneath it, while <strong>strictly enforcing contracts</strong>.</p>
<p>And if any layer doesn’t speak the exact expected protocol? Miscommunication. Failure. Silence.</p>
<h2 id="heading-the-protocols-we-create">The Protocols We Create</h2>
<p>Protocols aren’t just things we consume, they’re also things we design. When you're designing a protocol, be it an internal API, an event contract, or a distributed consensus mechanism, you are shaping the <em>interface between people</em>.</p>
<p>You create a REST API? That’s a protocol. You publish a Kafka event format? That’s a protocol. You define a SQL schema? That’s a protocol between your code and the database.</p>
<p>Here’s a mini example of a <strong>custom internal protocol</strong> for idempotent request handling:</p>
<pre><code class="lang-plaintext">POST /api/charge HTTP/1.1
Idempotency-Key: a5c7efb4-91f4-11e5-bf7f-feff819cdc9f
Content-Type: application/json
</code></pre>
<p>The contract:</p>
<ul>
<li><p>If the same <code>Idempotency-Key</code> is received again, return the original result.</p>
</li>
<li><p>Must store response against the key for a period of time.</p>
</li>
<li><p>Must hash body to detect replay attacks.</p>
</li>
</ul>
<p>If your backend <em>ignores</em> any part of this protocol, duplicate charges could happen. Or worse, undetectable bugs.</p>
<p>Great engineers don’t just design systems. They design <strong>agreements</strong> that enable systems to survive change, growth, and human error. And that’s what protocols really are.</p>
<h2 id="heading-the-engineers-burden-crafting-better-rules">The Engineer's Burden: Crafting Better Rules</h2>
<p>As software engineers, particularly those working on distributed or foundational systems, we are often not just <em>consumers</em> of protocols, but also <em>designers</em>. Whether defining an API contract, creating an internal RPC mechanism, or developing a new distributed algorithm, we are crafting the rules of interaction.</p>
<p>This is a significant responsibility. A poorly designed protocol can inflict pain for years, hindering development, causing production issues, and limiting future evolution. Contrary, a clean, well-defined, extensible protocol is a gift to future developers (including our future selves).</p>
<p>What makes a "good" protocol?</p>
<ul>
<li><p><strong>Clarity &amp; Unambiguity:</strong> Leave no room for interpretation. Define states, transitions, message formats, and error conditions precisely.</p>
</li>
<li><p><strong>Simplicity (where possible):</strong> Favor simplicity unless complexity is truly justified by the requirements.</p>
</li>
<li><p><strong>Extensibility:</strong> Think about future evolution. How will you add features? How will you version the protocol? (e.g., using feature flags, well-defined version negotiation).</p>
</li>
<li><p><strong>Efficiency:</strong> Consider the performance implications – serialization overhead, number of round trips, etc.</p>
</li>
<li><p><strong>Robustness:</strong> Define how errors are handled. What happens if a message is lost, duplicated, or corrupted?</p>
</li>
<li><p><strong>Security:</strong> Build security considerations in from the start, don't bolt them on later.</p>
</li>
</ul>
<h2 id="heading-conclusion-masters-of-the-rules">Conclusion: Masters of the Rules</h2>
<p>Protocols are the invisible architecture of our connected world and our complex software systems. They are the embodiment of the "anything built on anything" principle, enabling layers of abstraction and interoperability that make modern technology feasible. They are the source of immense "Good," allowing systems to communicate, coordinate, and scale.</p>
<p>But they also carry the potential for "Evil" – the rigidity of legacy, the complexity that breeds bugs, the ambiguities that cause friction, and the vulnerabilities that expose us.</p>
<p>As engineers, understanding protocols isn't just about knowing TCP vs. UDP or REST vs. gRPC. It's about recognizing the fundamental role of agreed-upon rules in <em>any</em> system we build. It's about appreciating the trade-offs inherent in their design and striving to be thoughtful, meticulous creators of the rules that will govern the interactions within our own complex creations. Because ultimately, the quality of our systems often comes down to the quality of the protocols holding them together.</p>
]]></content:encoded></item><item><title><![CDATA[Distributed Systems: Consensus Protocols]]></title><description><![CDATA[In the realm of distributed systems, consensus protocols plays an important role. They ensure that multiple, often geographically dispersed, components of a system agree on a single source of truth. This agreement is essential for maintaining data co...]]></description><link>https://blog.sofwancoder.com/distributed-systems-consensus-protocols</link><guid isPermaLink="true">https://blog.sofwancoder.com/distributed-systems-consensus-protocols</guid><category><![CDATA[distributed system]]></category><category><![CDATA[protocols]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Wed, 14 Aug 2024 22:54:17 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1723675935394/f2794b51-5827-433f-bc9c-c17833a8ac74.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In the realm of distributed systems, consensus protocols plays an important role. They ensure that multiple, often geographically dispersed, components of a system agree on a single source of truth. This agreement is essential for maintaining data consistency, reliability, and overall system coherence. The challenge lies in achieving consensus efficiently and accurately in an environment where individual components may fail, messages might be delayed or lost, and malicious actors could exist. This article delves into the intricacies of consensus protocols, their importance, various types, and practical implementations in distributed systems.</p>
<h3 id="heading-the-importance-of-consensus-protocols">The Importance of Consensus Protocols</h3>
<p>Consensus protocols are foundational to the operation of distributed systems, which include databases, cloud services, blockchain technologies, and more. The primary reasons for their importance are:</p>
<ol>
<li><p><strong>Consistency and Reliability</strong>: Ensuring that all nodes in a system have a consistent view of data is important. Without consensus, different parts of the system could make contradictory decisions, leading to data corruption and unreliable operations.</p>
</li>
<li><p><strong>Fault Tolerance</strong>: Distributed systems must be resilient to failures, whether they are due to network issues, hardware malfunctions, or software bugs. Consensus protocols help the system continue functioning correctly even when some components fail.</p>
</li>
<li><p><strong>Coordination and Synchronization</strong>: In many distributed applications, nodes need to coordinate actions, like committing a transaction or updating a record. Consensus protocols provide the mechanism for this coordination.</p>
</li>
</ol>
<h3 id="heading-types-of-consensus-protocols">Types of Consensus Protocols</h3>
<p>Consensus protocols can be broadly categorised based on their approach to achieving agreement among distributed nodes. The main types include:</p>
<ol>
<li><p><strong>Classical Consensus Protocols</strong>:</p>
<ul>
<li><p><strong>Paxos</strong>: Developed by Leslie Lamport, Paxos is a family of protocols for solving consensus in a network of unreliable or asynchronous processors. It is renowned for its robustness and is widely used in practical implementations.</p>
</li>
<li><p><strong>Raft</strong>: Designed to be more understandable than Paxos, Raft achieves the same goals of consistency and fault-tolerance. Raft divides the consensus problem into leader election, log replication, and safety.</p>
</li>
</ul>
</li>
<li><p><strong>Blockchain-Based Consensus</strong>:</p>
<ul>
<li><p><strong>Proof of Work (PoW)</strong>: Used by Bitcoin, PoW requires participants (miners) to solve complex cryptographic puzzles to validate transactions and create new blocks. This method is energy-intensive but has proven effective in decentralized settings.</p>
</li>
<li><p><strong>Proof of Stake (PoS)</strong>: Instead of computational power, PoS relies on participants staking their own cryptocurrency to validate transactions. This method is more energy-efficient and is used by platforms like Ethereum 2.0.</p>
</li>
</ul>
</li>
<li><p><strong>Byzantine Fault Tolerant (BFT) Protocols</strong>:</p>
<ul>
<li><p><strong>PBFT (Practical Byzantine Fault Tolerance)</strong>: Designed to tolerate Byzantine faults, where nodes may act maliciously or unpredictably. PBFT is highly efficient in environments where nodes are assumed to be partially trusted.</p>
</li>
<li><p><strong>Tendermint</strong>: Used in various blockchain applications, Tendermint provides BFT consensus with fast finality and is designed to support high transaction throughput.</p>
</li>
</ul>
</li>
</ol>
<h3 id="heading-detailed-analysis-of-key-protocols">Detailed Analysis of Key Protocols</h3>
<h4 id="heading-paxos">Paxos</h4>
<p>Paxos is one of the most influential consensus protocols. It operates under the assumption that some nodes might fail or act asynchronously. Paxos consists of three roles: proposers, acceptors, and learners. The process involves multiple phases:</p>
<ol>
<li><p><strong>Prepare Phase</strong>: A proposer sends a prepare request with a proposal number to a quorum of acceptors. Acceptors respond with a promise not to accept proposals with a lower number and may include the last accepted proposal.</p>
</li>
<li><p><strong>Accept Phase</strong>: Once a majority of promises are received, the proposer sends an accept request with the proposal. Acceptors then decide to accept the proposal if it matches the highest proposal number they promised not to reject.</p>
</li>
<li><p><strong>Learn Phase</strong>: Once a proposal is accepted by a majority, the learners are informed about the chosen value, ensuring the entire system converges on this value.</p>
</li>
</ol>
<p>Paxos is highly fault-tolerant but can be complex to implement due to its multiple phases and requirements for quorum management.</p>
<h4 id="heading-raft">Raft</h4>
<p>Raft simplifies the consensus process by clearly defining roles and steps. It comprises three main components: leader election, log replication, and safety.</p>
<ol>
<li><p><strong>Leader Election</strong>: Nodes elect a leader who is responsible for managing the log replication. If a leader fails, a new one is elected.</p>
</li>
<li><p><strong>Log Replication</strong>: The leader receives log entries from clients and replicates them to follower nodes. Once a majority of followers acknowledge the log entries, they are committed and applied to the state machine.</p>
</li>
<li><p><strong>Safety</strong>: Raft ensures that once a log entry is committed, it remains committed and will be applied by all future leaders.</p>
</li>
</ol>
<p>Raft's structured approach makes it easier to understand and implement compared to Paxos, leading to its adoption in many modern distributed systems like etcd and Consul.</p>
<h4 id="heading-proof-of-work-and-proof-of-stake">Proof of Work and Proof of Stake</h4>
<p>In blockchain networks, consensus ensures the integrity and security of the decentralized ledger.</p>
<ol>
<li><p><strong>Proof of Work (PoW)</strong>: PoW requires participants to perform computational work to propose a new block. The process includes solving a cryptographic puzzle, which ensures that adding new blocks requires significant effort, deterring malicious actors. However, PoW is criticised for its high energy consumption.</p>
</li>
<li><p><strong>Proof of Stake (PoS)</strong>: PoS selects validators based on the number of coins they hold and are willing to "stake" as collateral. Validators are chosen randomly, and their probability of being selected is proportional to their stake. PoS is more energy-efficient and offers quicker finality than PoW.</p>
</li>
</ol>
<h4 id="heading-byzantine-fault-tolerance-bfthttpsenwikipediaorgwikibyzantinefault"><a target="_blank" href="https://en.wikipedia.org/wiki/Byzantine_fault">Byzantine Fault Tolerance (BFT)</a></h4>
<p><a target="_blank" href="https://en.wikipedia.org/wiki/Byzantine_fault">BFT</a> protocols are designed to function correctly even if some nodes behave maliciously.</p>
<ol>
<li><p><a target="_blank" href="https://www.geeksforgeeks.org/practical-byzantine-fault-tolerancepbft/"><strong>Practical Byzantine Fault Tolerance (PBFT)</strong></a>: PBFT operates in a sequence of rounds, where a primary node proposes a value, and the other nodes (replicas) agree on this value through multiple rounds of voting. PBFT is designed for environments where the number of faulty nodes is less than one-third of the total nodes.</p>
</li>
<li><p><a target="_blank" href="https://cosmos-network.gitbooks.io/cosmos-academy/content/introduction-to-the-cosmos-ecosystem/tendermint-bft-consensus-algorithm.html"><strong>Tendermint</strong></a>: Tendermint uses a similar approach to PBFT but is optimised for blockchain applications. It offers quick finality and high transaction throughput, making it suitable for decentralized applications that require fast and secure consensus.</p>
</li>
</ol>
<h3 id="heading-considerations-for-implementing-consensus-protocols">Considerations for Implementing Consensus Protocols</h3>
<p>Implementing consensus protocols involves careful consideration of system requirements and constraints. Here are key aspects to consider:</p>
<ol>
<li><p><strong>Fault Tolerance and Network Assumptions</strong>: Different protocols are designed to handle different types of faults (e.g., crash faults, Byzantine faults). Understanding the failure model of your system is crucial for selecting the appropriate protocol.</p>
</li>
<li><p><strong>Performance and Scalability</strong>: The choice of protocol can significantly impact the system's performance and scalability. For instance, PoW offers robust security but is less scalable due to its high energy consumption, whereas PoS provides better scalability but requires a secure staking mechanism.</p>
</li>
<li><p><strong>Ease of Implementation</strong>: Protocols like Raft are easier to implement and understand, making them suitable for many practical applications. In contrast, Paxos, while robust, can be more challenging to implement correctly.</p>
</li>
<li><p><strong>Use Case Specifics</strong>: The application domain (e.g., blockchain, distributed databases) often dictates the choice of consensus protocol. Blockchain applications might prioritise security and decentralisation (favouring PoW or PoS), while distributed databases might prioritise consistency and performance (favouring Raft or Paxos).</p>
</li>
</ol>
<h3 id="heading-conclusion">Conclusion</h3>
<p>Consensus protocols are the backbone of distributed systems, ensuring that multiple nodes can agree on a single source of truth despite failures and network issues. From the classical Paxos and Raft to the modern blockchain-based PoW and PoS, each protocol offers unique advantages and challenges. Understanding these protocols' principles, strengths, and limitations is essential for designing robust, reliable, and scalable distributed systems. As technology evolves, so will these protocols, continuing to play a pivotal role in the advancement of distributed computing.</p>
]]></content:encoded></item><item><title><![CDATA[Fault tolerance  in distributed systems]]></title><description><![CDATA[In today's connected world, distributed systems are everywhere. They help run things like cloud computing and social media, which we use every day. But these systems can sometimes fail, so making sure they work well is very important. That's why we n...]]></description><link>https://blog.sofwancoder.com/fault-tolerance-in-distributed-systems</link><guid isPermaLink="true">https://blog.sofwancoder.com/fault-tolerance-in-distributed-systems</guid><category><![CDATA[distributed system]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[software development]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Fri, 13 Oct 2023 21:52:27 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1697232688392/e59a547a-79c8-49c9-9083-a1b73c1ad0b7.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In today's connected world, distributed systems are everywhere. They help run things like cloud computing and social media, which we use every day. But these systems can sometimes fail, so making sure they work well is very important. That's why we need fault tolerance. In this article, we'll talk about why fault tolerance matters in distributed systems and discuss different ways to make it happen.</p>
<h2 id="heading-understanding-distributed-systems"><strong>Understanding Distributed Systems</strong></h2>
<p>Before we talk about why fault tolerance is important in distributed systems, let's first understand what they are. A distributed system is a group of connected computers that work together to reach a shared goal. Instead of doing everything on one computer like in a traditional system, distributed systems spread tasks and information across many computers. This helps with growing bigger, balancing work, and handling mistakes, making distributed systems useful for many different things.</p>
<h2 id="heading-the-importance-of-fault-tolerance"><strong>The Importance of Fault Tolerance</strong></h2>
<p>Fault tolerance is the ability of a system to keep functioning properly when failure occurs. In distributed systems, component failures can occur for various reasons, such as hardware malfunctions, network issues, or software errors.</p>
<p>Without fault tolerance mechanisms in place, a single point of failure can lead to system outages, data loss, and a bad user experience. The importance of fault tolerance in distributed systems can be summarized in several key points:</p>
<h3 id="heading-1-increased-reliability"><strong>1. Increased Reliability</strong></h3>
<p>Fault tolerance makes a distributed system more reliable. It lessens the effect of problems so users can keep using the system without stopping. This dependability is very important for important areas like banking, health, and managing big systems.</p>
<h3 id="heading-2-high-availability"><strong>2. High Availability</strong></h3>
<p>Fault tolerance guarantees that a distributed system stays up and running, no matter what obstacles come its way! High availability is absolutely crucial for applications that just can't afford any downtime, like e-commerce websites, streaming platforms, and communication tools - they all need to be accessible 24/7!</p>
<h3 id="heading-3-data-integrity"><strong>3. Data Integrity</strong></h3>
<p>In distributed systems, data is copied to many places for better speed and backup. Fault tolerance methods keep data correct and consistent, even when parts break or data moves between places.</p>
<h3 id="heading-4-scalability"><strong>4. Scalability</strong></h3>
<p>One of the best things about distributed systems is their scalability, they can easily adapt to handle more work as needed. Fault tolerance is crucial in maintaining the scalability of these systems, as the addition or removal of nodes should not disrupt overall system operation.</p>
<h3 id="heading-5-disaster-recovery"><strong>5. Disaster Recovery</strong></h3>
<p>In distributed systems, a whole data centre might fail. Fault tolerance strategies, such as geographic redundancy, can help in disaster recovery scenarios, ensuring that the system can recover and continue operating in a different location.</p>
<h2 id="heading-achieving-fault-tolerance-in-distributed-systems"><strong>Achieving Fault Tolerance in Distributed Systems</strong></h2>
<p>To achieve fault tolerance in distributed systems, various strategies and techniques are employed. Here are some of the most common approaches:</p>
<h3 id="heading-1-redundancy"><strong>1. Redundancy</strong></h3>
<p>Redundancy involves replicating data or services across multiple nodes or components. If one node fails, another one which has the data can seamlessly take over, ensuring uninterrupted service. Redundancy can be applied at various levels, including data redundancy, node redundancy, and component redundancy.</p>
<h3 id="heading-2-load-balancing"><strong>2. Load Balancing</strong></h3>
<p>Load balancing is a technique that distributes incoming traffic or requests evenly across multiple nodes. This not only improves performance but also enhances fault tolerance. If one node becomes overwhelmed or fails, the load balancer can redirect traffic to healthy nodes, preventing overloads and downtime.</p>
<h3 id="heading-3-failover-and-failback"><strong>3. Failover and Failback</strong></h3>
<p>Failover and failback mechanisms automatically switch from a failed component to a backup or secondary component. This approach is often used for critical systems such as databases and web servers. After the primary component recovers, failback mechanisms switch back to the original component.</p>
<h3 id="heading-4-replication"><strong>4. Replication</strong></h3>
<p>Data replication is an essential technique for fault tolerance. By replicating data across multiple nodes, distributed systems can ensure data availability even if some nodes fail. Various replication strategies, including master-slave, leader-follower, and quorum-based approaches, are used to maintain data consistency and availability.</p>
<h3 id="heading-5-geographic-redundancy"><strong>5. Geographic Redundancy</strong></h3>
<p>For disaster recovery and high availability, geographic redundancy is employed. This involves replicating data and services across multiple data centres or locations, often in different regions or countries. If one location experiences a failure, the system can continue operating from another location.</p>
<h3 id="heading-6-error-detection-and-recovery"><strong>6. Error Detection and Recovery</strong></h3>
<p>Implementing mechanisms for error detection and recovery is crucial for fault tolerance. Systems can use techniques such as <a target="_blank" href="https://en.wikipedia.org/wiki/Heartbeat_(computing)">heart-beating</a>, health checks, and automated recovery procedures to identify and mitigate failures in real time.</p>
<h3 id="heading-7-distributed-consensus-algorithms"><strong>7. Distributed Consensus Algorithms</strong></h3>
<p>Distributed consensus algorithms like <a target="_blank" href="https://en.wikipedia.org/wiki/Paxos_(computer_science)">Paxos</a> and <a target="_blank" href="https://raft.github.io/">Raft</a> play a significant role in maintaining data consistency and fault tolerance. These algorithms help distributed systems agree on the order of operations and ensure that data remains accurate, even in the presence of network partitions or node failures.</p>
<h3 id="heading-8-monitoring-and-logging"><strong>8. Monitoring and Logging</strong></h3>
<p>Comprehensive monitoring and logging are essential for identifying and diagnosing failures. Logging enables administrators to trace the cause of issues while monitoring tools provide real-time insights into system performance and health.</p>
<h2 id="heading-challenges-of-fault-tolerance"><strong>Challenges of Fault Tolerance</strong></h2>
<p>While fault tolerance is essential for the reliability of distributed systems, it comes with its own set of challenges:</p>
<ol>
<li><p><strong>Complexity</strong>: Implementing fault tolerance mechanisms can significantly increase the complexity of a distributed system, making it more challenging to design, deploy, and maintain.</p>
</li>
<li><p><strong>Resource Overhead</strong>: Redundancy, replication, and other fault tolerance strategies usually require additional hardware and computational resources, which can increase operational costs.</p>
</li>
<li><p><strong>Consistency vs. Availability</strong>: Maintaining a balance between data consistency and system availability is a common challenge in distributed systems. Ensuring both can be complex, particularly in the presence of network partitions.</p>
</li>
<li><p><strong>Latency</strong>: Some fault tolerance mechanisms, such as geographic redundancy, can introduce latency, which may be unacceptable for real-time or low-latency applications.</p>
</li>
</ol>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Fault tolerance is key in distributed systems, it helps them stay reliable and protects data even when components fail. Using methods like redundancy, load balancing, and data replication, these systems can be strong enough for today's connected world. But, it's important to balance fault tolerance with its challenges and costs, to make sure the system works well with its purposes and goals. As technology keeps changing, fault tolerance remains an important issue for people who design and run distributed systems.</p>
]]></content:encoded></item><item><title><![CDATA[Distributed Systems: Synchronisation in Complex Systems]]></title><description><![CDATA[Complex systems are used in almost every aspect of computer science and engineering, from distributed databases and networked applications to multi-core processors and real-time embedded systems. To make sure that these complex systems work right and...]]></description><link>https://blog.sofwancoder.com/distributed-systems-synchronisation-in-complex-systems</link><guid isPermaLink="true">https://blog.sofwancoder.com/distributed-systems-synchronisation-in-complex-systems</guid><category><![CDATA[distributed system]]></category><category><![CDATA[engineering]]></category><category><![CDATA[backend]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[protocols]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sun, 20 Aug 2023 20:36:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1692550992100/a9bf9fcd-4a34-4ee0-afbf-c0a9659e04a3.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Complex systems are used in almost every aspect of computer science and engineering, from distributed databases and networked applications to multi-core processors and real-time embedded systems. To make sure that these complex systems work right and give dependable results, it is of the utmost importance to make sure that they are consistent and honest. Synchronisation becomes a key idea in keeping this regularity, allowing different parts and processes to work together well and produce correct results.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>A distributed system is a type of computer system that is made up of numerous independent computers which are connected through a network and communicate with one another. A computer that is part of such a system is referred to as a node, and each node in the system is tasked with carrying out a certain operation.</p>
<p>Distributed systems are capable of managing enormous amounts of data and traffic and can continue to function normally even if some of the nodes in the system fail. This is because distributed systems are composed of multiple nodes, and because of this, distributed systems can provide excellent scalability, dependability, and fault tolerance, which are the key advantages of using distributed systems.</p>
<p>However, some difficulties emerge with distributed systems, such as the requirement to <strong>synchronise</strong> and maintain <a target="_blank" href="https://blog.sofwancoder.com/consistency-models-in-distributed-system"><strong>consistency</strong></a> across all nodes.</p>
<h2 id="heading-understanding-synchronization"><strong>Understanding Synchronization</strong></h2>
<p>Synchronisation is the coordination of activities and events between more than one entity to reach a shared goal in a way that is correct and makes sense. In the setting of complex systems, synchronisation is the process of managing how different parts, processors, threads, or distributed nodes interact with each other. This is done to keep the system in a coherent state and avoid conflicts that could cause it to act wrongly or corrupt data.</p>
<h2 id="heading-what-are-the-challenges-in-complex-systems"><strong>What are the Challenges in Complex Systems?</strong></h2>
<p>Complex systems often have many parts or processes that run at the same time and need to share resources, talk to each other, and share data.  </p>
<p>There are some issues that can occur when things are not kept, a few of them are highlighted below:</p>
<h3 id="heading-race-conditions"><strong>Race Conditions</strong></h3>
<p>These occur when multiple processes or threads access shared resources concurrently and the final outcome depends on the order of execution. Race conditions can lead to unpredictable behavior and data corruption.</p>
<h3 id="heading-deadlocks"><strong>Deadlocks</strong></h3>
<p>A deadlock happens when multiple processes are unable to proceed because each is waiting for a resource held by another, resulting in a standstill.</p>
<h3 id="heading-data-inconsistency"><strong>Data Inconsistency</strong></h3>
<p>In distributed systems, data is often replicated across different nodes. Without proper synchronization, inconsistencies can arise due to delayed updates or conflicting modifications.</p>
<h3 id="heading-starvation">Starvation</h3>
<p>Some processes may be indefinitely delayed in accessing resources or progressing due to poor synchronization strategies, leading to reduced system performance.</p>
<h2 id="heading-which-aspect-of-distributed-system-requires-synchronization">Which aspect of distributed system requires Synchronization?</h2>
<p>Synchronisation is essential in a distributed system because it guarantees that all of the system's nodes are working towards the same objective and are aware of the actions taken by the other nodes in the system.</p>
<p>Simply put, the <strong>process of coordinating the actions of numerous computers (nodes)</strong> to make them function more efficiently together is referred to as synchronisation.</p>
<p>It is required in several aspects of distributed systems, some of which are:</p>
<h3 id="heading-resource-access">Resource Access</h3>
<p>It's possible that numerous nodes in a distributed system will require access to the same resource at the same time. Therefore, to make sure that only one node can access a resource at any given moment, synchronisation techniques are utilised.</p>
<h3 id="heading-event-ordering">Event Ordering</h3>
<p>Events can happen at different times on different parts of a distributed system. Synchronisation techniques are used to make sure that events are set up in the right way so that nodes can handle them in the right order.</p>
<h3 id="heading-clock-synchronization">Clock Synchronization</h3>
<p>In a distributed system, each node has its own clock that isn't necessarily in sync with the others. Synchronisation techniques are used to ensure that all nodes have the same perception of time passing.</p>
<h2 id="heading-what-is-consistency-in-distributed-systems">What is Consistency in Distributed Systems?</h2>
<p><a target="_blank" href="https://blog.sofwancoder.com/consistency-models-in-distributed-system">Consistency</a> is the requirement that all nodes in a distributed system see the same data at the same time. In a distributed system, maintaining consistency is challenging because data can be modified on different nodes at different times. Consistency is required in several areas, two of which are:</p>
<p><strong>Data replication:</strong> Data may be replicated across numerous nodes in a distributed system to offer fault tolerance and availability. Consistency between replicas is required to ensure that all nodes see the same data.</p>
<p><strong>Distributed transactions:</strong> <a target="_blank" href="https://blog.sofwancoder.com/distributed-transactions-overview">Transactions in a distributed system</a> may involve numerous nodes. To ensure that a transaction is completed correctly, all nodes engaged in the transaction must maintain consistency.</p>
<h2 id="heading-what-are-the-types-of-synchronization-mechanisms"><strong>What are the types of Synchronization Mechanisms?</strong></h2>
<h3 id="heading-locksmutexes"><strong>Locks/Mutexes</strong></h3>
<p>These are basic synchronisation primitives that only let one process or thread hold a lock at a time. This limits who can use a shared resource. Even though it works well to avoid race situations, using it wrong can lead to deadlocks or less concurrency.</p>
<h3 id="heading-semaphores"><strong>Semaphores</strong></h3>
<p>Semaphores are counters with number values that are used to control who can use a set of resources. They are useful for limiting the number of processes that can use a shared resource at the same time.</p>
<h3 id="heading-monitors">Monitors</h3>
<p>Monitors are a type of concept that combines data and synchronisation basics into a single unit. They wrap up shared data and methods and only let one thread run at a time in the monitor, so only one thread can view the data at a time.</p>
<h3 id="heading-message-passing">Message Passing</h3>
<p>In a distributed system, processes talk to each other by sending each other messages. Messages are sent and received consistently and in the right order when the right protocols are used.</p>
<h3 id="heading-atomic-operations"><strong>Atomic Operations</strong></h3>
<p>These are actions that are carried out as a whole, without being broken up into smaller steps. They safeguard data from outside influence and guarantee its integrity.</p>
<h2 id="heading-what-are-the-techniques-for-achieving-synchronization-and-consistency">What are the techniques for achieving synchronization and consistency?</h2>
<p>Several techniques are used to achieve synchronization and consistency in distributed systems. Some of these techniques include:</p>
<h3 id="heading-two-phase-commit">Two-phase commit</h3>
<p><a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics">Two-phase commit is a protocol used to ensure that distributed transactions are completed correctly.</a> In the two-phase commit protocol, all nodes involved in a transaction must <a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics#:~:text=all%20participants%20agree%20to%20commit%20the%20transaction.">agree to commit</a> the transaction before it is considered complete.</p>
<h3 id="heading-vector-clocks">Vector clocks</h3>
<p>Vector clocks and logical clocks are techniques for ordering events in a distributed system, which aids in tracking the causality relationship between occurrences. Vector clocks give each node a vector that represents its point of view on events, whereas logical clocks keep a global logical time ordering. Even in the presence of network delays and asynchronous communication, these techniques help maintain a consistent view of events.</p>
<h3 id="heading-quorum-based-systems">Quorum-based systems</h3>
<p><a target="_blank" href="https://blog.sofwancoder.com/distributed-system-understanding-quorum-based-systems">Quorum-based systems</a> are used to ensure that data replicas are consistent. A majority of nodes in a <a target="_blank" href="https://blog.sofwancoder.com/distributed-system-understanding-quorum-based-systems">quorum-based system</a> must agree on the value of a piece of data before it is considered correct.</p>
<h3 id="heading-consensus-algorithms">Consensus algorithms</h3>
<p>Consensus algorithms are used in distributed systems to ensure that all nodes agree on a certain value or decision. Consensus algorithms are used in situations when nodes must agree on a value, such as when electing a leader or establishing the order of transactions.</p>
<h3 id="heading-clock-synchronization-protocols">Clock synchronization protocols</h3>
<p>Clock synchronisation techniques are used to ensure that all nodes in a distributed system view time in the same way. Network Time technology (NTP) is a commonly used clock synchronisation protocol that maintains clock synchronisation over the network within a few milliseconds.</p>
<h2 id="heading-what-are-the-things-to-consider-when-adopting-synchronisation">What are the things to consider when adopting synchronisation?</h2>
<h3 id="heading-network-latency"><strong>Network Latency</strong></h3>
<p>Network latency makes it take longer for nodes in different places to talk to each other, which makes it hard to keep real-time continuity. Synchronisation methods have to take into account different latency times and make sure that nodes don't mistakenly think that delayed updates are events that happened out of order.</p>
<h3 id="heading-node-failures"><strong>Node Failures</strong></h3>
<p>Distributed systems often have nodes fail because of problems with the hardware, bugs in the software, or problems with the network. Synchronisation methods need to be able to handle situations in which nodes stop responding. This makes sure that the system continues to work correctly even when nodes stop responding.</p>
<h3 id="heading-scalability"><strong>Scalability</strong></h3>
<p>As distributed systems get bigger, it gets harder to keep everything in sync. Synchronisation mechanisms must be able to grow as the number of nodes and interactions increase, without sacrificing speed.</p>
<h3 id="heading-consistency-availability-trade-off"><strong>Consistency-Availability Trade-off</strong></h3>
<p>The CAP theorem says that a distributed system can provide at most two out of three properties: Consistency, Availability, and Partition tolerance. Strategies for synchronisation often require making trade-offs between these qualities, which requires careful thought based on the needs of the application.</p>
<h3 id="heading-concurrency-and-contentions"><strong>Concurrency and Contentions</strong></h3>
<p>When multiple nodes access and change shared resources at the same time, this can lead to disagreement and conflict. Finding the right balance between concurrency and synchronisation is tricky because too much locking can slow down speed, while too much concurrency can lead to data corruption.</p>
<h2 id="heading-demonstrating-synchronization-in-a-distributed-system">Demonstrating synchronization in a distributed system</h2>
<p>Suppose we have a simple distributed system with two replicas of a key-value store. We want to ensure that any updates to the key-value store are synchronized across both replicas so that clients can always read the latest version of the data.</p>
<p>We can achieve this by implementing a synchronization protocol between the replicas. One such protocol is the <a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics">two-phase commit protocol</a>, which involves the following phases:</p>
<ol>
<li><p><a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics#heading-the-prepare-phase"><strong>Prepare phase</strong></a><strong>:</strong> The coordinator asks all replicas to prepare to commit changes.</p>
</li>
<li><p><a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics#heading-the-commit-phase"><strong>Commit phase:</strong></a> If all replicas can prepare successfully, the coordinator asks all replicas to commit the changes.</p>
</li>
</ol>
<p>In this example, we define a <code>Replica</code> class that represents a single replica of the distributed database. The class contains a <code>Map</code> that holds the key-value pairs of the database, as well as <code>get()</code> and <code>set()</code> methods to read and write to the database.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Define a replica class that holds a copy of the distributed database</span>
<span class="hljs-keyword">class</span> Replica {
  <span class="hljs-keyword">private</span> data: <span class="hljs-built_in">Map</span>&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">string</span>&gt; = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>();

  get(key: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">string</span> | <span class="hljs-literal">undefined</span> {
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.data.get(key);
  }

  set(key: <span class="hljs-built_in">string</span>, value: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">void</span> {
    <span class="hljs-built_in">this</span>.data.set(key, value);
  }
}
</code></pre>
<p>We also define a <code>Coordinator</code> class that acts as the coordinator for the two-phase commit protocol between replicas. The class contains an array of <code>Replica</code> objects, as well as a <code>transactionInProgress</code> flag to prevent multiple transactions from occurring simultaneously. The <code>beginTransaction()</code> method is the entry point for the two-phase commit protocol. It first checks whether a transaction is already in progress, and returns an error if one is. Otherwise, it proceeds to the prepare and commit phases.</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Define the coordinator class that coordinates two-phase commit protocol between replicas</span>
<span class="hljs-keyword">class</span> Coordinator {
  <span class="hljs-keyword">private</span> replicas: Replica[];
  <span class="hljs-keyword">private</span> transactionInProgress: <span class="hljs-built_in">boolean</span> = <span class="hljs-literal">false</span>;

  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">replicas: Replica[]</span>) {
    <span class="hljs-built_in">this</span>.replicas = replicas;
  }

  <span class="hljs-keyword">async</span> beginTransaction(transaction: Transaction): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">boolean</span>&gt; {
    <span class="hljs-keyword">if</span> (<span class="hljs-built_in">this</span>.transactionInProgress) {
      <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"Transaction already in progress."</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    }

    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Starting transaction with key <span class="hljs-subst">${transaction.key}</span> and value <span class="hljs-subst">${transaction.value}</span>`</span>);

    <span class="hljs-keyword">try</span> {
    <span class="hljs-comment">// Phase 1: Prepare phase - ask all replicas to prepare to commit changes</span>
      <span class="hljs-built_in">this</span>.transactionInProgress = <span class="hljs-literal">true</span>;
      <span class="hljs-keyword">const</span> prepareResponses = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all(
        <span class="hljs-built_in">this</span>.replicas.map(<span class="hljs-keyword">async</span> (replica) =&gt; <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.sendPrepare(replica, transaction))
      );

      <span class="hljs-keyword">if</span> (prepareResponses.some(<span class="hljs-function">(<span class="hljs-params">response</span>) =&gt;</span> !response)) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"One or more replicas failed to prepare."</span>);
        <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
      }

    <span class="hljs-comment">// Phase 2: Commit phase - ask all replicas to commit changes</span>
      <span class="hljs-keyword">const</span> commitResponses = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all(
        <span class="hljs-built_in">this</span>.replicas.map(<span class="hljs-keyword">async</span> (replica) =&gt; <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.sendCommit(replica, transaction))
      );

      <span class="hljs-keyword">if</span> (commitResponses.some(<span class="hljs-function">(<span class="hljs-params">response</span>) =&gt;</span> !response)) {
        <span class="hljs-built_in">console</span>.error(<span class="hljs-string">"One or more replicas failed to commit."</span>);
        <span class="hljs-comment">// Strong consistency</span>
        <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
      }

      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Transaction committed with key <span class="hljs-subst">${transaction.key}</span> and value <span class="hljs-subst">${transaction.value}</span>`</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    } <span class="hljs-keyword">catch</span> (error) {
      <span class="hljs-built_in">console</span>.error(<span class="hljs-string">`Error during transaction: <span class="hljs-subst">${error.message}</span>`</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    } <span class="hljs-keyword">finally</span> {
      <span class="hljs-built_in">this</span>.transactionInProgress = <span class="hljs-literal">false</span>;
    }
  }

  <span class="hljs-keyword">private</span> <span class="hljs-keyword">async</span> sendPrepare(replica: Replica, transaction: Transaction): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">boolean</span>&gt; {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Preparing replica <span class="hljs-subst">${replica}</span>`</span>);
    <span class="hljs-comment">// Simulate network delay</span>
    <span class="hljs-keyword">await</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">resolve</span>) =&gt;</span> <span class="hljs-built_in">setTimeout</span>(resolve, <span class="hljs-built_in">Math</span>.random() * <span class="hljs-number">1000</span>));
    <span class="hljs-keyword">const</span> currentValue = replica.get(transaction.key);
    <span class="hljs-keyword">if</span> (currentValue !== <span class="hljs-literal">undefined</span>) {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Replica <span class="hljs-subst">${replica}</span> prepared successfully`</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    } <span class="hljs-keyword">else</span> {
      <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Replica <span class="hljs-subst">${replica}</span> failed to prepare`</span>);
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    }
  }

  <span class="hljs-keyword">private</span> <span class="hljs-keyword">async</span> sendCommit(replica: Replica, transaction: Transaction): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">boolean</span>&gt; {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Committing to replica <span class="hljs-subst">${replica}</span>`</span>);
    <span class="hljs-comment">// Simulate network delay</span>
    <span class="hljs-keyword">await</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Promise</span>(<span class="hljs-function">(<span class="hljs-params">resolve</span>) =&gt;</span> <span class="hljs-built_in">setTimeout</span>(resolve, <span class="hljs-built_in">Math</span>.random() * <span class="hljs-number">1000</span>));
    replica.set(transaction.key, transaction.value);
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Replica <span class="hljs-subst">${replica}</span> committed successfully`</span>);
    <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
  }
}
</code></pre>
<p>In the prepare phase, the <code>Coordinator</code> object sends a <code>prepare</code> message to each replica, and waits for a response. If any replica fails to prepare successfully, the transaction is aborted and an error response is returned.</p>
<p>In the commit phase, the <code>Coordinator</code> object sends a <code>commit</code> message to each replica, and waits for a response. If any replica fails to commit successfully, the transaction is aborted and an error response is returned.</p>
<p>To simulate the network delay between replicas, the <code>sendPrepare()</code> and <code>sendCommit()</code> methods each contain a call to <code>setTimeout()</code> with a random delay.</p>
<p>Finally, we set up an Express app with a single endpoint <code>/transaction</code> that calls the <code>beginTransaction()</code> method on the <code>Coordinator</code> object.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express <span class="hljs-keyword">from</span> <span class="hljs-string">"express"</span>;

<span class="hljs-keyword">interface</span> Transaction {
  key: <span class="hljs-built_in">string</span>;
  value: <span class="hljs-built_in">string</span>;
}

<span class="hljs-keyword">const</span> replica1 = <span class="hljs-keyword">new</span> Replica();
<span class="hljs-keyword">const</span> replica2 = <span class="hljs-keyword">new</span> Replica();
<span class="hljs-keyword">const</span> coordinator = <span class="hljs-keyword">new</span> Coordinator([replica1, replica2]);

<span class="hljs-keyword">const</span> app = express();
app.use(express.json());

app.post(<span class="hljs-string">"/transaction"</span>, <span class="hljs-keyword">async</span> (req, res) =&gt; {
  <span class="hljs-keyword">const</span> transaction = req.body <span class="hljs-keyword">as</span> Transaction;
  <span class="hljs-keyword">const</span> result = <span class="hljs-keyword">await</span> coordinator.beginTransaction(transaction);
  <span class="hljs-keyword">if</span> (result) {
    res.sendStatus(<span class="hljs-number">200</span>);
  } <span class="hljs-keyword">else</span> {
    res.sendStatus(<span class="hljs-number">500</span>);
  }
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Server started on port 3000"</span>);
});
</code></pre>
<p>With this implementation, clients can initiate a transaction by sending a POST request to the <code>/transaction</code> endpoint. The <code>Coordinator</code> object will then coordinate the two-phase commit protocol between the replicas, ensuring that any updates to the database are <strong>synchronized</strong> across both replicas.</p>
<h2 id="heading-some-real-world-applications-of-synchronisation">Some Real-world applications of synchronisation</h2>
<p>The role of synchronization in maintaining consistency is evident across a broad spectrum of real-world applications:</p>
<ol>
<li><p><strong>Databases:</strong> Transaction management relies on synchronization to maintain the integrity of data and prevent conflicts between concurrent transactions.</p>
</li>
<li><p><strong>Operating Systems:</strong> To ensure fair and effective resource utilisation, process scheduling, memory management, and resource sharing all need synchronisation.</p>
</li>
<li><p><strong>Parallel Programming:</strong> Synchronisation is used by multi-core processors to organise the execution of multiple threads, making sure that data is shared correctly and preventing "race conditions."</p>
</li>
<li><p><strong>Distributed Systems:</strong> Systems with multiple nodes need synchronisation to handle data replication, keep stability, and stop conflicts.</p>
</li>
</ol>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In the complicated and interconnected world of computers, <strong>synchronisation is a key part of keeping systems correct, secure, and able to work together.</strong> Synchronisation mechanisms <strong>stop race conditions, deadlocks, and data inconsistencies</strong> by managing how different parts communicate and share resources. As technology improves and systems get more complicated, it's still important to know a lot about synchronisation and how <strong>it can be used to make reliable, high-performance systems</strong> that give accurate results.</p>
]]></content:encoded></item><item><title><![CDATA[Database Engines: Overview]]></title><description><![CDATA[Today, information is essential to the success of every business. The need for efficient data storage, retrieval, and administration grows as the volume of data generated grows. This is when DBMSs come in handy. Data storage and management are made e...]]></description><link>https://blog.sofwancoder.com/database-engines-overview</link><guid isPermaLink="true">https://blog.sofwancoder.com/database-engines-overview</guid><category><![CDATA[Databases]]></category><category><![CDATA[databasemanagement]]></category><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[system]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Tue, 25 Jul 2023 21:51:21 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1690321803670/4293b6e8-365a-4171-9b2e-c082cc7032a1.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Today, information is essential to the success of every business. The need for efficient data storage, retrieval, and administration grows as the volume of data generated grows. This is when DBMSs come in handy. Data storage and management are made easier with the help of database engines. They offer a toolkit for effective data storage, retrieval, and manipulation. What are database engines, and why are they important in database engineering? We'll answer those questions and more in this post.</p>
<h2 id="heading-definition-of-database-engines"><strong>Definition of Database Engines</strong></h2>
<p>A database engine is a component of software that facilitates working with databases. It controls how information in a database is saved, retrieved, and altered. Database engines act as a bridge between the database's data and the user, facilitating interaction with the information stored there. MySQL, Oracle, and SQL Server are just a few of the most well-known database management systems.</p>
<h2 id="heading-why-do-database-engines-matter-in-database-engineering"><strong>Why do Database Engines Matter in Database Engineering?</strong></h2>
<p>Database engines play a critical role in database engineering. They provide the tools and capabilities necessary for the efficient and effective management of databases. Without database engines, it would be challenging to store, retrieve, and manipulate data in an organized and systematic manner.</p>
<p>Some of the key benefits of using database engines include:</p>
<h3 id="heading-data-integrity">Data Integrity</h3>
<p>Database engines ensure that data is stored, retrieved, and manipulated accurately and consistently. This is crucial for maintaining data integrity, which is essential for making informed decisions based on the data.</p>
<h3 id="heading-scalability">Scalability</h3>
<p>Database engines provide scalable solutions for storing and managing data. This means that as the amount of data increases, the database can be scaled up to accommodate the growth.</p>
<h3 id="heading-security">Security</h3>
<p>Database engines provide a range of security features to protect data from unauthorized access, modification, and destruction.</p>
<h3 id="heading-performance">Performance</h3>
<p>Database engines are designed to optimize the performance of database operations, ensuring that data is retrieved and manipulated quickly and efficiently.</p>
<h3 id="heading-ease-of-use">Ease of Use</h3>
<p>Database engines provide user-friendly interfaces that make it easy for users to interact with the database and perform operations on the data.</p>
<h2 id="heading-types-of-database-engines"><strong>Types of Database Engines</strong></h2>
<p>There are several types of database engines, each with its own strengths and weaknesses.</p>
<p>The most common types of database engines include:</p>
<h3 id="heading-relational-database-engines">Relational Database Engines</h3>
<p>Relational database engines are the most widely used type of database engines. They are based on the relational database model, which organizes data into tables, each with a unique primary key. Relational databases use Structured Query Language (SQL) to retrieve and manipulate data. Some of the popular relational database engines include MySQL, Oracle, and SQL Server.</p>
<h4 id="heading-how-relational-database-engines-work">How Relational Database Engines Work</h4>
<p>Relational database engines store data in tables, with each table representing a different entity or object. Each table has a unique primary key that is used to identify the rows in the table. Tables can be related to each other using foreign keys, which are used to establish relationships between tables.</p>
<p>To retrieve data from a relational database, users use SQL queries. SQL is a standard language used to manipulate relational databases. SQL queries are used to select, insert, update, and delete data from the database.</p>
<h4 id="heading-examples-of-relational-database-engines">Examples of Relational Database Engines</h4>
<ul>
<li><p><strong>MySQL</strong>:</p>
<p>  MySQL is an open-source relational database engine that is widely used in web applications. It is known for its scalability and performance.</p>
</li>
<li><p><strong>Oracle</strong>:</p>
<p>  Oracle is a popular relational database engine used in enterprise applications. It provides a range of features for managing large and complex databases.</p>
</li>
</ul>
<h3 id="heading-nosql-database-engines">NoSQL Database Engines</h3>
<p>NoSQL database engines are designed to handle unstructured or semi-structured data. Unlike relational databases, NoSQL databases do not use tables to store data. Instead, they use a variety of data models, such as document, key-value, graph, and column-family models. NoSQL databases are highly scalable and can handle large amounts of data with ease. Some of the popular NoSQL database engines include MongoDB, Cassandra, and Redis.</p>
<h4 id="heading-how-nosql-database-engines-work">How NoSQL Database Engines Work</h4>
<p>NoSQL databases are designed to handle unstructured or semi-structured data, which makes them highly flexible and scalable. NoSQL databases use a variety of data models to store data, including document, key-value, graph, and column-family models.</p>
<p>In a document model, data is stored in a document, which is similar to a JSON object. In a key-value model, data is stored as a set of key-value pairs. In a graph model, data is stored as nodes and edges, which are used to represent relationships between objects. In a column-family model, data is stored in columns, which are grouped into column families.</p>
<p>NoSQL databases use a variety of query languages to retrieve and manipulate data. Some of the popular query languages used in NoSQL databases include MongoDB Query Language (MQL) and Cassandra Query Language (CQL).</p>
<h4 id="heading-examples-of-nosql-database-engines">Examples of NoSQL Database Engines</h4>
<ul>
<li><p><strong>MongoDB</strong>:</p>
<p>  MongoDB is a popular NoSQL database engine that is widely used in web applications. It uses a document data model and provides a range of features for handling unstructured data.</p>
</li>
<li><p><strong>Cassandra</strong>:</p>
<p>  Cassandra is a highly scalable NoSQL database engine that is designed to handle large amounts of data with ease. It uses a column-family data model and provides a range of features for handling distributed data.</p>
</li>
</ul>
<h3 id="heading-graph-database-engines">Graph database engines</h3>
<p>Graph database engines are made to handle complex connections between data, like those found in social networks, recommendation engines, and knowledge graphs. They are based on graph theory, which is a set of mathematical rules for showing and analysing the relationships between objects.</p>
<p>Graph database engines store data as nodes and edges. Nodes are things like people, places, or goods, and edges are things like "likes," "follows," or "location" that show how these things are related to each other. This makes it possible to query complicated relationships quickly and easily and to do advanced analytics and machine learning on the data.</p>
<p>One of the best things about graph database engines is that they can grow horizontally, which makes them great for working with a lot of data. They are also very flexible because you can add new nodes and edges without changing the way the graph is laid out. This makes them great for things like recommendation engines, catching fraud, and building knowledge graphs.</p>
<p>Some popular graph database engines include Neo4j and ObjectDB.</p>
<h3 id="heading-object-oriented-database-engines">Object-oriented database engines</h3>
<p>These are designed to store objects rather than rows and columns. Examples include ObjectDB and Versant.</p>
<h3 id="heading-in-memory-database-engines">In-Memory Database Engines</h3>
<p>In-memory database engines are made to keep data in memory instead of on a hard drive. This makes them very useful because info can be found and changed quickly. In-memory databases are frequently used in programmes that need to handle data in real-time, like online trading platforms and games.</p>
<h4 id="heading-how-in-memory-database-engines-work">How In-Memory Database Engines Work</h4>
<p>In-memory database engines keep information in memory, which makes them very fast. Standard SQL queries are used to get and change data, just like in regular relational databases. In-memory databases are often used with traditional databases, where data is first saved on a hard drive and then cached in memory so it can be accessed quickly.</p>
<h4 id="heading-examples-of-in-memory-database-engines">Examples of In-Memory Database Engines</h4>
<ul>
<li><p><strong>SAP HANA</strong>:</p>
<p>  SAP HANA is an in-memory database engine that is widely used in enterprise applications. It is known for its high performance and scalability.</p>
</li>
<li><p><strong>Redis</strong>:</p>
<p>  Redis is an open-source in-memory database engine that is commonly used as a cache. It provides a range of features for storing and retrieving data in memory.</p>
</li>
</ul>
<h2 id="heading-factors-to-consider-when-choosing-a-database-engine"><strong>Factors to Consider When Choosing a Database Engine</strong></h2>
<p>When choosing a database engine, there are several factors to consider, including:</p>
<ul>
<li><p><strong>Scalability</strong>: Can the database engine handle the amount of data you need to store and manipulate?</p>
</li>
<li><p><strong>Security</strong>: Does the database engine provide the necessary security features to protect your data?</p>
</li>
<li><p><strong>Data Consistency</strong>: Does the database engine ensure that data is stored and manipulated accurately and consistently?</p>
</li>
<li><p><strong>Performance</strong>: How quickly can the database engine retrieve and manipulate data?</p>
</li>
<li><p><strong>Ease of Use</strong>: Is the database engine easy to use and understand?</p>
</li>
<li><p><strong>Cost</strong>: What is the cost of using the database engine?</p>
</li>
</ul>
<h2 id="heading-conclusion"><strong>Conclusion</strong></h2>
<p>Database engines are an important part of modern tools for managing data. They give you the tools and skills you need to handle databases efficiently and effectively. Choosing the right database engine is very important if you want to store, recover, and change data in a consistent and accurate way. The most popular types of database engines are Relational, NoSQL, and In-Memory. Each has its own strengths and weaknesses. It's important to think about things like scalability, security, data consistency, performance, ease of use, and cost when picking a database engine.</p>
]]></content:encoded></item><item><title><![CDATA[Distributed System: Understanding Quorum-Based Systems]]></title><description><![CDATA[In distributed systems, quorum-based approaches are essential mechanisms for maintaining consistency and availability in the face of network partitions or failures. A quorum is a subset of nodes in a distributed system that must agree on a particular...]]></description><link>https://blog.sofwancoder.com/distributed-system-understanding-quorum-based-systems</link><guid isPermaLink="true">https://blog.sofwancoder.com/distributed-system-understanding-quorum-based-systems</guid><category><![CDATA[Node.js]]></category><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[System Architecture]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sat, 06 May 2023 22:58:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1683410183611/9367d38a-db10-4df4-9e8e-2030147f34b6.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>In distributed systems, quorum-based approaches are essential mechanisms for maintaining consistency and availability in the face of network partitions or failures. A quorum is a subset of nodes in a distributed system that must agree on a particular decision or action before it is considered valid. Quorum-based systems are designed to ensure that a decision or action is not taken unless a sufficient number of nodes agree on it, thereby guaranteeing data consistency and availability. In this article, we will explore the concept of quorum-based systems in detail, including their role in distributed systems, the types of quorums, and how they are implemented.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>As you may already know, distributed systems are made up of many independent components that must coordinate their efforts to achieve a common goal. However, if these nodes are unable to interact with one another or if a node fails, the results can be catastrophic. Quorum-based systems serve this purpose.</p>
<p>A Quorum-Based System is a simple mechanism for ensuring consensus among a group of nodes in a distributed system. A majority of nodes, or a "Quorum," is required to reach a consensus.</p>
<p>Distributed systems will struggle without quorum-based systems, which facilitate fair and timely decision-making. In their absence, life-or-death choices may be postponed or botched. Managing distributed systems can be difficult, but with a quorum-based approach in place, disagreements between nodes are avoided.</p>
<h2 id="heading-quorum-based-systems-in-distributed-systems">Quorum-Based Systems in Distributed Systems</h2>
<p>In order to accomplish a goal, multiple servers or nodes in a network coordinate and share data. In these types of systems, it is of the utmost importance that data remain accessible and intact in the case of a node failure or network partition.</p>
<p>Quorum-based systems are one way to make sure that data is always consistent and available. In a quorum-based system, a choice or action is only made if enough nodes agree on it. This makes sure that the action or choice is correct and that the system stays the same and is available.</p>
<p>As an illustration of what a distributed system looks like, let's take a look at a system that has ten nodes. When there is a requirement for a quorum of six nodes, a decision is not considered to be valid unless it has the support of at least six of the nodes in the network. If there must be a quorum before a decision can be made, then the majority vote of six or more nodes must be in favour of the decision before it can be carried out.</p>
<p>Quorum-based approaches are particularly useful in distributed systems since individual nodes or the network as a whole can experience failure or become fragmented. Because of this, it is extremely important to ensure that data consistency and availability are maintained, even if some of the nodes in the network are offline or unable to connect with one another.</p>
<h2 id="heading-types-of-quorums"><strong>Types of Quorums</strong></h2>
<p>There are several types of quorums used in distributed systems. The most common types are:</p>
<h3 id="heading-read-quorum"><strong>Read Quorum</strong></h3>
<p>In a distributed system, a read quorum is the number of nodes that must agree on a read process for it to be valid. For example, let's say we have a distributed system with ten nodes and a read group of six. When a read action is asked for, the system will only give back the data if at least six nodes can be reached and agree on the value of the data.</p>
<p>Read quorums are helpful when getting data from a distributed system because they make sure the data is right and available. If you can only reach less than six nodes, the read action is invalid, and the system will keep waiting until a quorum is met.</p>
<h3 id="heading-write-quorum"><strong>Write Quorum</strong></h3>
<p>A group of nodes in a distributed system that all have to agree on a write action for it to be valid is called a "write quorum." For example, let's say we have a distributed system with ten nodes and a write quorum of six. When a write action is requested, the system won't do it unless it can reach at least six nodes and they all agree on the value of the data.</p>
<h3 id="heading-membership-quorum"><strong>Membership Quorum</strong></h3>
<p>In a distributed system, a membership quorum is a group of nodes that must all agree on changes to the system's membership before they are considered true. Say, for example, that we have a distributed system with ten nodes and a group of six nodes. When a node joins or leaves the system, the change is only true if at least six other nodes agree with it.</p>
<h3 id="heading-configuration-quorum"><strong>Configuration Quorum</strong></h3>
<p>In a distributed system, a configuration quorum is a group of nodes that must all agree on changes to the system's configuration before they are considered true. For example, let's say we have a distributed system with ten nodes and a setup group of six. When a setup change is asked for, such as changing the number of nodes in the system or the replication factor, the change is only acceptable if at least six nodes agree with it.</p>
<h2 id="heading-implementing-quorum-based-systems"><strong>Implementing Quorum-Based Systems</strong></h2>
<p>Implementing quorum-based systems in distributed systems requires careful thought about a number of things, such as the size of the system, the number of nodes, and the desired level of consistency and availability.</p>
<p>Using a consensus algorithm, like the <strong>Paxos algorithm</strong> or the <strong>Raft algorithm</strong>, to set up quorum-based systems is one way to do it. These methods make sure that at least a certain number of nodes agree on a decision or action, even if some nodes fail or the network breaks up.</p>
<p>Another way is to use a <strong>distributed hash table (DHT)</strong>, which is a system that maps keys to values and is spread out over a large area. In a <strong>DHT</strong>, a read or write operation can be done by contacting only a subset of the system's nodes, and a quorum can be needed to make sure the operation is correct.</p>
<p>No matter what method is used, setting up quorum-based systems requires careful planning and testing to make sure that the system stays consistent and usable even if there are problems or parts of the network are cut off.</p>
<h2 id="heading-quorum-consensus-algorithms">Quorum Consensus algorithms</h2>
<p>Quorum Consensus algorithms make sure that distributed systems are always consistent and reliable. By exchanging messages and deciding on a certain value, these algorithms help nodes in a distributed system come to a decision.</p>
<p>Some popular Quorum Consensus algorithms are:</p>
<ul>
<li><p><strong>Paxos</strong> is a protocol that is utilised in distributed systems that are fault-tolerant. It ensures that there will be no more than one value selected, and that the selected value will continue to be adhered to after it has been selected.</p>
</li>
<li><p><strong>Raft</strong> algorithm is yet another prominent one. This algorithm was developed to be easier to comprehend than Paxos, and it does so by breaking down the essential components of consensus algorithms into a number of distinct parts.</p>
</li>
<li><p><strong>Zab</strong> is another consensus algorithm that is used in Apache ZooKeeper, a popular open-source distributed coordination system.</p>
</li>
<li><p><strong>Viewstamped</strong> replica is a protocol for state machine replication that was introduced in 1985.</p>
</li>
</ul>
<p>Even though Quorum-Based methods are very helpful, putting them into place is not always easy. It can be hard to get everything to work in sync, and it can be hard to get performance to its best. But the pros of Quorum-Based systems are more important than the cons because they cut down on costs and make sure that only correct results are given.</p>
<h2 id="heading-examples-of-quorum-based-systems">Examples of Quorum-Based Systems</h2>
<p>Moving on to some examples of systems using quorum-based protocols, we have Cassandra, DynamoDB, HBase, and Consul.</p>
<ul>
<li><p><strong>Cassandra is a distributed NoSQL database</strong> used by large organizations such as Apple and Instagram. It uses a modified version of the <strong>Paxos consensus algorithm</strong>.</p>
</li>
<li><p>On the other hand, <strong>DynamoDB, a managed NoSQL database</strong> service from AWS, uses <strong>quorum-based replication</strong> for maintaining consistency.</p>
</li>
<li><p><strong>HBase, an open-source, distributed database system</strong>, uses the Hadoop Distributed File System to store data. It uses a modified version of Google’s Chubby lock service for distributed coordination.</p>
</li>
<li><p>Lastly, <strong>Consul, a service discovery and configuration tool</strong>, uses a Raft-based consensus algorithm.</p>
</li>
</ul>
<p>These examples show how quorum-based systems are not limited to a specific industry or use case. The range of systems incorporating this protocol shows its applicability to multiple scenarios, from databases to distributed systems.</p>
<h2 id="heading-advantages-of-quorum-based-systems">Advantages of Quorum-Based Systems:</h2>
<ol>
<li><p><strong>Consistency</strong>: Quorum-based systems keep distributed systems consistent by needing a certain number of nodes to agree on a decision or action before it can be considered valid.</p>
</li>
<li><p><strong>Availability</strong>: Quorum-based systems make sure that a certain number of nodes are always available to handle requests and make decisions, even if some of the nodes fail or the network is split up.</p>
</li>
<li><p><strong>Flexibility</strong>: Quorum-based systems can be set up to work with different sizes and types of quorums to meet different needs for consistency and availability.</p>
</li>
<li><p><strong>Fault tolerance</strong>: Quorum-based systems can keep working as long as a certain number of nodes are available, even if some of the nodes fail or the network breaks up.</p>
</li>
<li><p><strong>Scalability</strong>: In order to improve throughput and performance, quorum-based systems can scale horizontally by adding more nodes to the system.</p>
</li>
</ol>
<h2 id="heading-disadvantages-of-quorum-based-systems">Disadvantages of Quorum-Based Systems</h2>
<ol>
<li><p><strong>Complexity</strong>: Implementing quorum-based systems can be hard, and they need to be carefully designed and tested to make sure they work well and are always available, even if there are problems or parts of the network that don't work.</p>
</li>
<li><p><strong>Performance</strong>: Quorum-based systems can hurt performance because they need communication between nodes to reach a quorum, which can slow down throughput and increase latency.</p>
</li>
<li><p><strong>Maintenance</strong>: Quorum-based systems need to be maintained on a regular basis to make sure they stay consistent and ready. This can take a lot of time and resources.</p>
</li>
<li><p><strong>Configuration</strong>: It can be hard to set up quorum-based systems because the size and type of the quorum must be carefully chosen to meet particular requirements for consistency and availability.</p>
</li>
<li><p><strong>Security</strong>: Quorum-based systems can be attacked in ways like <a target="_blank" href="https://en.wikipedia.org/wiki/Byzantine_fault"><strong>Byzantine faults</strong></a>, which can hurt the stability of the system and the security of the data. You can read more <a target="_blank" href="https://academy.binance.com/en/articles/byzantine-fault-tolerance-explained">here</a></p>
</li>
</ol>
<h2 id="heading-example-poc-in-nodejstypescript-with-express">Example (PoC) in NodeJS/Typescript with express</h2>
<p>This PoC implements a simple distributed system that allows clients to read and write data to the system using quorum-based systems. The system uses Express as the web framework and Axios as the HTTP client for communicating with other nodes in the system.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;

<span class="hljs-keyword">const</span> app = express();
<span class="hljs-keyword">const</span> nodes = [
  <span class="hljs-string">'http://node1.example.com'</span>,
  <span class="hljs-string">'http://node2.example.com'</span>,
  <span class="hljs-string">'http://node3.example.com'</span>,
];

<span class="hljs-keyword">const</span> readQuorumSize = <span class="hljs-number">2</span>;
<span class="hljs-keyword">const</span> writeQuorumSize = <span class="hljs-number">3</span>;

app.get(<span class="hljs-string">'/read/:key'</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
  <span class="hljs-keyword">const</span> key = req.params.key;
  <span class="hljs-keyword">const</span> readNodes = nodes.slice(<span class="hljs-number">0</span>, readQuorumSize);

  <span class="hljs-keyword">const</span> values = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.allSettled(
    readNodes.map(<span class="hljs-function">(<span class="hljs-params">node</span>) =&gt;</span> axios.get(<span class="hljs-string">`<span class="hljs-subst">${node}</span>/data/<span class="hljs-subst">${key}</span>`</span>))
  );

  <span class="hljs-keyword">const</span> validValues = values.filter(
    <span class="hljs-function">(<span class="hljs-params">value</span>) =&gt;</span> value.status === <span class="hljs-string">'fulfilled'</span> &amp;&amp; value.value.data !== <span class="hljs-literal">undefined</span>
  );

  <span class="hljs-keyword">if</span> (validValues.length &lt; readQuorumSize) {
    <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">500</span>).send(<span class="hljs-string">'Not enough nodes available to read data'</span>);
  }

  <span class="hljs-keyword">const</span> data = validValues[<span class="hljs-number">0</span>].value.data;
  res.send(data);
});

app.post(<span class="hljs-string">'/write/:key'</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
  <span class="hljs-keyword">const</span> key = req.params.key;
  <span class="hljs-keyword">const</span> value = req.body;
  <span class="hljs-keyword">const</span> writeNodes = nodes.slice(<span class="hljs-number">0</span>, writeQuorumSize);

  <span class="hljs-keyword">const</span> results = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.allSettled(
    writeNodes.map(<span class="hljs-function">(<span class="hljs-params">node</span>) =&gt;</span> axios.post(<span class="hljs-string">`<span class="hljs-subst">${node}</span>/data/<span class="hljs-subst">${key}</span>`</span>, value))
  );

  <span class="hljs-keyword">const</span> validResults = results.filter(
    <span class="hljs-function">(<span class="hljs-params">result</span>) =&gt;</span> result.status === <span class="hljs-string">'fulfilled'</span>
  );

  <span class="hljs-keyword">if</span> (validResults.length &lt; writeQuorumSize) {
    <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">500</span>).send(<span class="hljs-string">'Not enough nodes available to write data'</span>);
  }

  res.send(<span class="hljs-string">'Data written successfully'</span>);
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server started'</span>);
});
</code></pre>
<p>The system uses a configuration quorum for both read and write operations, which ensures that a quorum of nodes agrees on the operation before it is considered valid. The read quorum size is set to 2, and the write quorum size is set to 3, which means that at least 2 nodes must agree for a read operation to be valid, and at least 3 nodes must agree for a write operation to be valid.</p>
<p>In the read operation, the system queries a subset of nodes specified by the read quorum size, and if enough valid responses are received, the system returns the value of the data. In the write operation, the system sends the data to a subset of nodes specified by the write quorum size, and if enough valid responses are received, the system returns a success message.</p>
<p><strong>This PoC is just a basic example of a quorum-based system and can be extended and modified to meet specific requirements.</strong></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In distributed systems, quorum-based systems are an essential mechanism for maintaining consistency and availability. By requiring a quorum of nodes to agree on a particular decision or action, quorum-based systems ensure that the system remains consistent and available, even in the face of node failures or network partitions.</p>
]]></content:encoded></item><item><title><![CDATA[Modulus Sharding in Software Engineering]]></title><description><![CDATA[Modulus sharding is a method used in software engineering to spread data across various servers in a way that improves performance, scalability, and reliability. The method involves dividing data into smaller, easier-to-handle pieces and putting thos...]]></description><link>https://blog.sofwancoder.com/modulus-sharding-in-software-engineering</link><guid isPermaLink="true">https://blog.sofwancoder.com/modulus-sharding-in-software-engineering</guid><category><![CDATA[Databases]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[sharding]]></category><category><![CDATA[algorithms]]></category><category><![CDATA[backend]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sun, 23 Apr 2023 01:31:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1682213365714/bb973134-68be-45bf-974c-fd919e4bfc37.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Modulus sharding is a method used in software engineering to spread data across various servers in a way that improves performance, scalability, and reliability. The method involves dividing data into smaller, easier-to-handle pieces and putting those pieces on different computers, or "shards." This article will give a detailed overview of modulus sharding, including its benefits, challenges, and best practices.</p>
<h2 id="heading-overview-of-modulus-sharding">Overview of Modulus Sharding</h2>
<p>Modulus sharding involves dividing data into various shards based on several servers, or "nodes," which has already been decided. Each shard is given a unique identifier, which is usually an integer value, and the data is spread across the shards based on how much of the shard ID is left after dividing it by the number of nodes. For example, if we have three nodes and six shards, each node would be responsible for two shards, with the shards distributed as follows:</p>
<ul>
<li><p>Node 1: Shards 1 and 4</p>
</li>
<li><p>Node 2: Shards 2 and 5</p>
</li>
<li><p>Node 3: Shards 3 and 6</p>
</li>
</ul>
<p>This distribution makes sure that each shard is spread out evenly across the nodes, which helps to improve speed and scalability. Also, because each node is in charge of a subset of the shards, the system as a whole is more resistant to breakdowns. If any one node fails, it won't affect the availability of the whole system.</p>
<h2 id="heading-benefits-of-modulus-sharding">Benefits of Modulus Sharding</h2>
<p>There are several benefits to using modulus sharding in software engineering:</p>
<h3 id="heading-improved-performance">Improved Performance</h3>
<p>Modulus sharding is a technique that can help to enhance the performance of a system by dividing data across numerous nodes. This allows each node to manage a smaller portion of the data, which in turn leads to improved performance. This can help to improve overall response times by reducing the burden that is placed on individual nodes.</p>
<h3 id="heading-increased-scalability">Increased Scalability:</h3>
<p>Because more nodes can be added to the system as needed to accommodate expanding volumes of data, modulus sharding can also help to boost the scalability of a system. This is because of how the data is stored. In addition to this, the fact that each node is accountable for a part of the data makes it much simpler to add additional nodes without negatively affecting the performance of the nodes that are already present.</p>
<h3 id="heading-better-fault-tolerance">Better Fault Tolerance</h3>
<p>A system's fault tolerance can be improved by utilising modulus sharding because it allows for the failure of any individual node without having an effect on the overall system's availability. It is also possible to recover data that has been lost because it is distributed among numerous nodes, which makes it possible to reassemble the data using the nodes that are still available.</p>
<h2 id="heading-challenges-of-modulus-sharding">Challenges of Modulus Sharding</h2>
<p>While modulus sharding can provide significant benefits, there are also several challenges to consider when implementing the technique:</p>
<h3 id="heading-data-distribution">Data Distribution:</h3>
<p>When dealing with enormous amounts of data, the process of distributing the data among multiple shards can become very complicated. It is essential to give careful consideration to the distribution algorithm to guarantee that the data is dispersed uniformly across all of the shards.</p>
<h3 id="heading-shard-rebalancing">Shard Rebalancing</h3>
<p>If the system continues to expand and undergoes modifications, it is possible that rebalancing the shards may become essential to guarantee that the data will be dispersed uniformly throughout the nodes. This can be a difficult process because it involves moving data between nodes without affecting the system's availability.</p>
<h3 id="heading-query-performance">Query Performance</h3>
<p>It is possible that queries that span many shards would run more slowly than queries that just require data from a single shard because data is dispersed across multiple nodes. It is essential to give careful consideration to the design of queries to reduce the negative effects of shard distribution on the performance of queries.</p>
<h2 id="heading-example-poc">Example (PoC)</h2>
<p><strong>As an example(PoC);</strong> We'll be writing a code which set up a sharded MongoDB cluster using four MongoDB shards and uses an Express server to expose a REST API for creating and retrieving user objects. The user objects are stored in the MongoDB shards and are distributed across the shards using a simple shard key based on the user ID.</p>
<ol>
<li><p>Firstly, we'll import the required dependencies:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Import the required packages</span>
 <span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
 <span class="hljs-keyword">import</span> mongoose <span class="hljs-keyword">from</span> <span class="hljs-string">'mongoose'</span>;
</code></pre>
</li>
<li><p>Creating an instance of the Express app:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Create an instance of the Express application</span>
 <span class="hljs-keyword">const</span> app = express();
</code></pre>
</li>
<li><p>Defining an interface/schema for the User object:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Define the User interface that extends the mongoose.Document interface</span>
 <span class="hljs-keyword">interface</span> User <span class="hljs-keyword">extends</span> mongoose.Document {
   id: <span class="hljs-built_in">number</span>;
   name: <span class="hljs-built_in">string</span>;
 }

 <span class="hljs-comment">// Define the schema</span>
 <span class="hljs-keyword">const</span> userSchema = <span class="hljs-keyword">new</span> mongoose.Schema({
   id: <span class="hljs-built_in">Number</span>,
   name: <span class="hljs-built_in">String</span>,
 });
</code></pre>
</li>
<li><p>Initializing an empty array to store connections to each shard:</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">const</span> shards: mongoose.Connection[] = [];
</code></pre>
</li>
<li><p>Defining a function to connect to a shard:</p>
<pre><code class="lang-typescript"> <span class="hljs-keyword">const</span> connectToShard = <span class="hljs-keyword">async</span> (shardNumber: <span class="hljs-built_in">number</span>) =&gt; {
   <span class="hljs-keyword">const</span> shard = mongoose.createConnection(<span class="hljs-string">`mongodb://localhost/users<span class="hljs-subst">${shardNumber}</span>`</span>);
   <span class="hljs-keyword">await</span> shard.once(<span class="hljs-string">'open'</span>, <span class="hljs-function">() =&gt;</span> {
     <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Connected to shard <span class="hljs-subst">${shardNumber}</span>`</span>);
   });
   shards.push(shard);
 };
</code></pre>
</li>
<li><p>Connecting to each shard:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Connect to all four MongoDB shards</span>
 <span class="hljs-keyword">for</span> (<span class="hljs-keyword">let</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">4</span>; i++) {
   connectToShard(i);
 }
</code></pre>
</li>
<li><p>Defining a function to get the shard for a user:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Define a function to get the MongoDB shard for a given user ID</span>
 <span class="hljs-keyword">const</span> getUserShard = (userId: <span class="hljs-built_in">number</span>): mongoose.Connection =&gt; {
   <span class="hljs-keyword">const</span> shardIndex = userId % shards.length;
   <span class="hljs-keyword">return</span> shards[shardIndex];
 };
</code></pre>
</li>
<li><p>Defining a function to get a UserModel for a specific shard:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Define a function to get the UserModel for a given shard</span>
 <span class="hljs-keyword">const</span> UserModel = (shardIndex: <span class="hljs-built_in">number</span>): mongoose.Model&lt;User&gt; =&gt; {
   <span class="hljs-keyword">const</span> shard = shards[shardIndex];
   <span class="hljs-keyword">return</span> shard.model&lt;User&gt;(<span class="hljs-string">'User'</span>, userSchema);
 };
</code></pre>
</li>
<li><p>Defining a route to get a user by ID:</p>
<pre><code class="lang-typescript"> <span class="hljs-comment">// Define a route to get a user by ID</span>
 app.get(<span class="hljs-string">'/users/:id'</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
   <span class="hljs-keyword">const</span> userId = <span class="hljs-built_in">parseInt</span>(req.params.id);
   <span class="hljs-comment">// Get the MongoDB shard for the user ID</span>
   <span class="hljs-keyword">const</span> shardIndex = userId % shards.length;
   <span class="hljs-comment">// Get the UserModel for the shard</span>
   <span class="hljs-keyword">const</span> UserModelForShard = UserModel(shardIndex);
   <span class="hljs-keyword">const</span> user = <span class="hljs-keyword">await</span> UserModelForShard.findOne({ id: userId });
   <span class="hljs-keyword">if</span> (!user) {
     res.status(<span class="hljs-number">404</span>).send(<span class="hljs-string">'User not found'</span>);
     <span class="hljs-keyword">return</span>;
   }
   res.send(user);
 });
</code></pre>
<ul>
<li><p>The logic behind <code>const shardIndex = userId % shards.length;</code> is to determine which shard a given user's data should be stored on based on their user ID.</p>
</li>
<li><p>In this case, the <code>userId</code> is used to calculate a modulus value (<code>%</code>) based on the length of the <code>shards</code> array. The modulus operation returns the remainder of dividing the <code>userId</code> by the <code>shards.length</code>.</p>
</li>
<li><p>The resulting modulus value is then used as an index to access the corresponding shard in the <code>shards</code> array. This ensures that each user's data is stored on a specific shard based on their user ID, while also distributing the data evenly across all available shards for horizontal scaling.</p>
</li>
</ul>
</li>
<li><p>Defining a route to create a new user:</p>
<pre><code class="lang-typescript">app.post(<span class="hljs-string">'/users'</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
  <span class="hljs-keyword">const</span> user: User = req.body;
  <span class="hljs-comment">// Get the MongoDB shard for the user ID</span>
  <span class="hljs-keyword">const</span> shardIndex = user.id % shards.length;
  <span class="hljs-comment">// Get the UserModel for the shard</span>
  <span class="hljs-keyword">const</span> UserModelForShard = UserModel(shardIndex);
  <span class="hljs-keyword">const</span> createdUser = <span class="hljs-keyword">await</span> UserModelForShard.create(user);
  res.send(createdUser);
});
</code></pre>
</li>
<li><p>Starting the Express server:</p>
<pre><code class="lang-typescript">app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server running on port 3000'</span>);
});
</code></pre>
</li>
</ol>
<h3 id="heading-putting-it-all-together"><strong>Putting it all together!</strong></h3>
<p>The code demonstrates how to connect to a MongoDB shard, how to create a user schema and model, and how to query and create user objects using the appropriate shard.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> mongoose <span class="hljs-keyword">from</span> <span class="hljs-string">'mongoose'</span>;

<span class="hljs-keyword">const</span> app = express();

<span class="hljs-keyword">interface</span> User <span class="hljs-keyword">extends</span> mongoose.Document {
  id: <span class="hljs-built_in">number</span>;
  name: <span class="hljs-built_in">string</span>;
}

<span class="hljs-keyword">const</span> userSchema = <span class="hljs-keyword">new</span> mongoose.Schema({
  id: <span class="hljs-built_in">Number</span>,
  name: <span class="hljs-built_in">String</span>,
});

<span class="hljs-keyword">const</span> shards: mongoose.Connection[] = [];

<span class="hljs-comment">// Define a function to connect to a MongoDB shard</span>
<span class="hljs-keyword">const</span> connectToShard = <span class="hljs-keyword">async</span> (shardNumber: <span class="hljs-built_in">number</span>) =&gt; {
  <span class="hljs-keyword">const</span> shard = mongoose.createConnection(<span class="hljs-string">`mongodb://localhost/users<span class="hljs-subst">${shardNumber}</span>`</span>);
  <span class="hljs-keyword">await</span> shard.once(<span class="hljs-string">'open'</span>, <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Connected to shard <span class="hljs-subst">${shardNumber}</span>`</span>);
  });
  shards.push(shard);
};

<span class="hljs-comment">// Connect to all four MongoDB shards</span>
<span class="hljs-keyword">for</span> (<span class="hljs-keyword">let</span> i = <span class="hljs-number">0</span>; i &lt; <span class="hljs-number">4</span>; i++) {
  connectToShard(i);
}

<span class="hljs-comment">// Define a function to get the MongoDB shard for a given user ID</span>
<span class="hljs-keyword">const</span> getUserShard = (userId: <span class="hljs-built_in">number</span>): mongoose.Connection =&gt; {
  <span class="hljs-keyword">const</span> shardIndex = userId % shards.length;
  <span class="hljs-keyword">return</span> shards[shardIndex];
};

<span class="hljs-comment">// Define a function to get the UserModel for a given shard</span>
<span class="hljs-keyword">const</span> UserModel = (shardIndex: <span class="hljs-built_in">number</span>): mongoose.Model&lt;User&gt; =&gt; {
  <span class="hljs-keyword">const</span> shard = shards[shardIndex];
  <span class="hljs-keyword">return</span> shard.model&lt;User&gt;(<span class="hljs-string">'User'</span>, userSchema);
};

<span class="hljs-comment">// Define a route to get a user by ID</span>
app.get(<span class="hljs-string">'/users/:id'</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
  <span class="hljs-keyword">const</span> userId = <span class="hljs-built_in">parseInt</span>(req.params.id);
  <span class="hljs-comment">// Get the MongoDB shard for the user ID</span>
  <span class="hljs-keyword">const</span> shardIndex = userId % shards.length;
  <span class="hljs-comment">// Get the UserModel for the shard</span>
  <span class="hljs-keyword">const</span> UserModelForShard = UserModel(shardIndex);
  <span class="hljs-keyword">const</span> user = <span class="hljs-keyword">await</span> UserModelForShard.findOne({ id: userId });
  <span class="hljs-keyword">if</span> (!user) {
    res.status(<span class="hljs-number">404</span>).send(<span class="hljs-string">'User not found'</span>);
    <span class="hljs-keyword">return</span>;
  }
  res.send(user);
});

app.post(<span class="hljs-string">'/users'</span>, <span class="hljs-keyword">async</span> (req: Request, res: Response) =&gt; {
  <span class="hljs-keyword">const</span> user: User = req.body;
  <span class="hljs-comment">// Get the MongoDB shard for the user ID</span>
  <span class="hljs-keyword">const</span> shardIndex = user.id % shards.length;
  <span class="hljs-comment">// Get the UserModel for the shard</span>
  <span class="hljs-keyword">const</span> UserModelForShard = UserModel(shardIndex);
  <span class="hljs-keyword">const</span> createdUser = <span class="hljs-keyword">await</span> UserModelForShard.create(user);
  res.send(createdUser);
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server running on port 3000'</span>);
});
</code></pre>
<h2 id="heading-best-practices-for-modulus-sharding">Best Practices for Modulus Sharding</h2>
<p>To ensure that modulus sharding is implemented effectively, there are several best practices that software engineers should follow:</p>
<h3 id="heading-plan-for-growth">Plan for Growth:</h3>
<p>Modulus sharding should be designed to handle future growth in terms of both the amount of data and the number of nodes. It's important to think carefully about the distribution algorithm and the way shards are rebalanced to make sure they can grow well.</p>
<h3 id="heading-monitor-performance">Monitor Performance</h3>
<p>It is important to keep an eye on important metrics like query response times, shard distribution, and node health to make sure the system is working well. This can help figure out if there are any problems or speed bottlenecks with certain nodes or shards.</p>
<h3 id="heading-consider-data-access-patterns">Consider Data Access Patterns</h3>
<p>When making the sharding plan, it's important to think about how the data is accessed. If the same data is frequently accessed at the same time, it may be best to keep it on the same shard to speed up queries.</p>
<h3 id="heading-use-consistent-hashing">Use Consistent Hashing</h3>
<p>Consistent hashing is a way to spread data across nodes so that shards don't have to be rebalanced as often. This can help make the system more scalable and lessen the effect of adding or taking away nodes.</p>
<h3 id="heading-implement-replication">Implement Replication</h3>
<p>It is important to set up data replication across various nodes to improve fault tolerance. This can help make sure that data is still accessible if a node fails.</p>
<h3 id="heading-test-and-validate">Test and Validate</h3>
<p>Before putting a sharded system into production, it's important to test and validate it fully. This can help find problems or performance bottlenecks before they affect end users.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Modulus sharding is a powerful method that can help software systems run faster, grow, and handle errors better. But it's important to think carefully about the distribution algorithm, the way shards are rebalanced, and other things to make sure the implementation works. Software engineers can use modulus sharding to help their systems grow and work well by following best practises and keeping an eye on speed.</p>
]]></content:encoded></item><item><title><![CDATA[Understanding Scalability: Beyond Speed]]></title><description><![CDATA[When discussing the planning and development of software, it is common practice to use the terms "speed" and "scalability" interchangeably. However, these ideas do not refer to the same thing, and it is critical to have a solid understanding of the d...]]></description><link>https://blog.sofwancoder.com/understanding-scalability-beyond-speed</link><guid isPermaLink="true">https://blog.sofwancoder.com/understanding-scalability-beyond-speed</guid><category><![CDATA[scalability]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[software development]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sat, 15 Apr 2023 23:48:59 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1681600751188/39371cc1-3c1c-4c8a-8e09-4a20747ecddb.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>When discussing the planning and development of software, it is common practice to use the terms "speed" and "scalability" interchangeably. However, these ideas do not refer to the same thing, and it is critical to have a solid understanding of the distinctions between them to develop software that is both successful and efficient.</p>
<h2 id="heading-speed">Speed</h2>
<p>Speed refers to the ability of a software system to perform a specific task quickly. For instance, a search engine can be regarded as having a high level of performance if it can provide pertinent results in a matter of milliseconds. Speed plays a crucial role in user interaction with a software system, as it directly impacts their experience and satisfaction. Users have a widespread expectation that software will be quick and responsive; any delays or lags in performance will likely result in irritation and unhappiness on their part.</p>
<h2 id="heading-scalability">Scalability</h2>
<p>Scalability, on the other hand, is the capacity of a software system to manage an expanding volume of work or traffic. Scalability refers to the ability of a system to accommodate more users. Scalability is essential because software systems are frequently intended to expand and transform throughout the course of their lifetimes. As a result, it is essential that these systems are able to manage rising demand without crashing or becoming inoperable.</p>
<h2 id="heading-understanding-the-difference">Understanding the difference</h2>
<p>When discussing software systems, it is essential to have an understanding that scalability and speed are two separate and independent ideas. Either <strong>a system can be quick while lacking the ability to scale</strong>, or <strong>it can be scalable while lacking the ability to be quick</strong>. Examining a web application that enables users to upload and share images is a great way to demonstrate this idea, so let's get started.</p>
<p>Imagine that <strong>this app was originally made to only support a small number of people and a small number of photos</strong>. In this case, the programme might be very good at quickly processing and showing the images. But if the number of users and picture library grows a lot, the system might not be able to keep up with the demand. So, the performance may get worse, which could lead to slow response times or even a total system failure. Even though the application is very fast when dealing with smaller workloads, it is not scalable in this case. The significance of distinguishing between these two characteristics when assessing the performance of software systems is illustrated by this example.</p>
<p><strong>Consider a software system that is intended to manage a high volume of traffic but is not optimised for speed.</strong> This presents a similar challenge. This system might be able to accommodate a huge number of requests and users, but the processing of each request might take a very lengthy time. In this particular scenario, the system may be scalable, but individual consumers may find it to be slow.</p>
<h2 id="heading-speed-vs-scalability-in-web-applications">Speed vs Scalability in Web Applications</h2>
<p>Let's consider a web application that allows users to upload and share photos. To make things simple, let's assume that each photo is stored as a file on disk and that the web application simply serves the file to the user when requested.</p>
<p>Here is some code that reads a photo file from the disk and returns it to the user:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> path <span class="hljs-keyword">from</span> <span class="hljs-string">'path'</span>;

<span class="hljs-keyword">const</span> app = express();

app.get(<span class="hljs-string">'/photo/:filename'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> photoPath = path.join(__dirname, <span class="hljs-string">'photos'</span>, req.params.filename);
  res.sendFile(photoPath);
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server is listening on port 3000'</span>);
});
</code></pre>
<p>This code is fast because it simply reads the file from the disk and returns it to the user. However, if the number of users and photos grows significantly, the system may become overwhelmed and slow down or even crash. In this case, the system may not be scalable, even if it is fast for small loads.</p>
<p>To make the system more scalable, we could introduce a caching layer that stores frequently accessed photos in the memory. Here is some TypeScript code that implements this caching layer using the express-cache-controller middleware:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> path <span class="hljs-keyword">from</span> <span class="hljs-string">'path'</span>;
<span class="hljs-keyword">import</span> cacheController <span class="hljs-keyword">from</span> <span class="hljs-string">'express-cache-controller'</span>;

<span class="hljs-keyword">const</span> app = express();

app.use(cacheController());

app.get(<span class="hljs-string">'/photo/:filename'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> photoPath = path.join(__dirname, <span class="hljs-string">'photos'</span>, req.params.filename);
  res.sendFile(photoPath);
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server is listening on port 3000'</span>);
});
</code></pre>
<p>In this code, we've added the express-cache-controller middleware to enable caching. This middleware adds a <code>Cache-Control</code> header to the response that tells the client how long to cache the response. By default, the middleware caches responses for 60 seconds. This caching layer can help to improve scalability by reducing the number of file reads and network requests.</p>
<h2 id="heading-speed-vs-scalability-in-database-systems">Speed vs Scalability in Database Systems</h2>
<p>Let's consider a database system that stores information about users and their purchases. To make things simple, let's assume that we have a single table called <code>users</code> with the following schema:</p>
<pre><code class="lang-sql"><span class="hljs-keyword">CREATE</span> <span class="hljs-keyword">TABLE</span> <span class="hljs-keyword">users</span> (
  <span class="hljs-keyword">id</span> <span class="hljs-built_in">INT</span> PRIMARY <span class="hljs-keyword">KEY</span>,
  <span class="hljs-keyword">name</span> <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  email <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  address <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  city <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  state <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">255</span>),
  zip <span class="hljs-built_in">VARCHAR</span>(<span class="hljs-number">10</span>)
);
</code></pre>
<p>Here is some TypeScript code that retrieves a user's information from the database using the node-mysql2 library:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> mysql <span class="hljs-keyword">from</span> <span class="hljs-string">'mysql2/promise'</span>;

<span class="hljs-keyword">const</span> pool = mysql.createPool({
  host: <span class="hljs-string">'localhost'</span>,
  user: <span class="hljs-string">'root'</span>,
  password: <span class="hljs-string">'password'</span>,
  database: <span class="hljs-string">'database'</span>,
});

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getUserInfo</span>(<span class="hljs-params">userId: <span class="hljs-built_in">number</span></span>) </span>{
  <span class="hljs-keyword">const</span> connection = <span class="hljs-keyword">await</span> pool.getConnection();
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> [rows] = <span class="hljs-keyword">await</span> connection.query(
      <span class="hljs-string">'SELECT name, email, address, city, state, zip FROM users WHERE id = ?'</span>,
      [userId],
    );
    <span class="hljs-keyword">const</span> row = rows[<span class="hljs-number">0</span>];
    <span class="hljs-keyword">return</span> {
      name: row.name,
      email: row.email,
      address: row.address,
      city: row.city,
      state: row.state,
      zip: row.zip,
    };
  } <span class="hljs-keyword">finally</span> {
    connection.release();
  }
}
</code></pre>
<p>This code is fast because it simply executes a single SQL query to retrieve the user's information from the database. However, if the number of users and purchases grows significantly, the system may become overwhelmed and slow down or even crash. In this case, the system may not be scalable, even if it is fast for small loads.</p>
<p>To make the system more scalable, we could introduce a caching layer that stores frequently accessed user information in memory. Here is some TypeScript code that implements this caching layer using the node-cache library:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> mysql <span class="hljs-keyword">from</span> <span class="hljs-string">'mysql2/promise'</span>;
<span class="hljs-keyword">import</span> NodeCache <span class="hljs-keyword">from</span> <span class="hljs-string">'node-cache'</span>;

<span class="hljs-keyword">const</span> pool = mysql.createPool({
  host: <span class="hljs-string">'localhost'</span>,
  user: <span class="hljs-string">'root'</span>,
  password: <span class="hljs-string">'password'</span>,
  database: <span class="hljs-string">'database'</span>,
});

<span class="hljs-keyword">const</span> userCache = <span class="hljs-keyword">new</span> NodeCache({ stdTTL: <span class="hljs-number">60</span>, checkperiod: <span class="hljs-number">120</span> });

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">getUserInfo</span>(<span class="hljs-params">userId: <span class="hljs-built_in">number</span></span>) </span>{
  <span class="hljs-keyword">let</span> userInfo = userCache.get(userId);
  <span class="hljs-keyword">if</span> (!userInfo) {
    <span class="hljs-keyword">const</span> connection = <span class="hljs-keyword">await</span> pool.getConnection();
    <span class="hljs-keyword">try</span> {
      <span class="hljs-keyword">const</span> [rows] = <span class="hljs-keyword">await</span> connection.query(
        <span class="hljs-string">'SELECT name, email, address, city, state, zip FROM users WHERE id = ?'</span>,
        [userId],
      );
      <span class="hljs-keyword">const</span> row = rows[<span class="hljs-number">0</span>];
      userInfo = {
        name: row.name,
        email: row.email,
        address: row.address,
        city: row.city,
        state: row.state,
        zip: row.zip,
      };
      userCache.set(userId, userInfo);
    } <span class="hljs-keyword">finally</span> {
      connection.release();
    }
  }
  <span class="hljs-keyword">return</span> userInfo;
}
</code></pre>
<p>In this code, we've added a caching layer using the node-cache library. This caching layer stores user information in memory for a specified period (60 seconds in this example). If the requested user information is in the cache, we return it immediately without querying the database. Otherwise, we query the database and store the result in the cache before returning it to the client. This caching layer can help to improve scalability by reducing the number of database queries and network requests.</p>
<h2 id="heading-real-world-examples">Real-World Examples</h2>
<p>To illustrate this concept further, let's look at some real-world examples.</p>
<h3 id="heading-example-1-social-media-platforms">Example 1: Social Media Platforms</h3>
<p>Social media platforms such as Facebook, Twitter, and Instagram are examples of software systems that need to be both fast and scalable. These platforms need to be fast to provide a good user experience, and they need to be scalable to handle the large number of users and data that they generate.</p>
<p>For example, Facebook has over 2.8 billion monthly active users, and it needs to be able to handle a huge amount of traffic and data. To achieve this, Facebook uses a variety of techniques to improve scalability, including distributed systems, caching, load balancing, and sharding. These techniques allow Facebook to handle a huge amount of data and traffic, but they also introduce some latency in the system. In other words, Facebook may not always be the fastest platform, but it is designed to be highly scalable.</p>
<h3 id="heading-example-2-e-commerce-platforms">Example 2: E-commerce Platforms</h3>
<p>E-commerce platforms such as Amazon and eBay also need to be both fast and scalable. These platforms need to be fast to provide a good user experience, and they need to be scalable to handle the large number of products and transactions that they generate.</p>
<p>For example, Amazon is one of the largest e-commerce platforms in the world, and it needs to be able to handle a huge amount of traffic and data. To achieve this, Amazon uses a variety of techniques to improve scalability, including distributed systems, caching, load balancing, and partitioning. These techniques allow Amazon to handle a huge amount of data and traffic, but they also introduce some latency in the system. In other words, Amazon may not always be the fastest platform, but it is designed to be highly scalable.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In conclusion, speed and scalability are two important but distinct concepts in software design and development. A system may be fast for small loads but may not be scalable when the load increases significantly. To make a system more scalable, we may need to introduce additional layers such as caching, load balancing, or sharding. These layers may introduce additional complexity and overhead, but they can help to improve scalability and ensure that the system can handle increased loads in the future.</p>
]]></content:encoded></item><item><title><![CDATA[Real-Time Messaging Protocol (RTMP/s)]]></title><description><![CDATA[RTMP is a widely used protocol for streaming audio, video, and data over the Internet in real time. It is particularly well-suited for applications that require low latency, such as live streaming events, online gaming, and video conferencing. RTMPS ...]]></description><link>https://blog.sofwancoder.com/real-time-messaging-protocol-rtmps</link><guid isPermaLink="true">https://blog.sofwancoder.com/real-time-messaging-protocol-rtmps</guid><category><![CDATA[messaging]]></category><category><![CDATA[protocols]]></category><category><![CDATA[data]]></category><category><![CDATA[backend]]></category><category><![CDATA[networking]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sun, 26 Mar 2023 22:10:11 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1679868471334/fc7d8aba-3c42-4993-8f3e-054171a3227e.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>RTMP is a widely used protocol for streaming audio, video, and data over the Internet in real time. It is particularly well-suited for applications that require low latency, such as live streaming events, online gaming, and video conferencing. RTMPS is a secure version of the protocol that adds an additional layer of security by encrypting the data being transmitted between the client and the server.</p>
<p>In this article, we will discuss RTMP in detail, its architecture, the advantages and disadvantages of RTMP, and its use cases.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>Real-time Messaging Protocol (RTMP) is a streaming protocol that is designed to deliver video, audio, and other types of data in real time over the internet. Developed by Macromedia, it was first released in 2002 and is now owned by Adobe Systems. RTMP is widely used for live streaming and video-on-demand (VOD) applications.</p>
<h2 id="heading-what-is-rtmp">What is RTMP?</h2>
<p>RTMP (Real-time Messaging Protocol) is a streaming protocol designed to deliver audio, video, and other types of data in real time over the internet. It was developed in 2002 and is now owned by Adobe Systems. RTMP uses a client-server architecture where the client sends a request to the server to establish a connection, and the server sends the requested data to the client. It supports low-latency streaming, high-quality audio and video, adaptive bitrate streaming, and encryption for the secure transmission of data. RTMP is commonly used for live streaming, video-on-demand, webinars, and gaming applications.</p>
<p>RTMP is a client-server protocol, which means that the client (usually a Flash player or a web browser) establishes a connection with the server, and the server sends the data to the client over the established connection. The client can then display the data (e.g., video or audio) in real time as it is received.</p>
<h2 id="heading-why-rtmp">Why RTMP?</h2>
<p>RTMP was originally developed to support the real-time streaming of video and audio data between a Flash player and a server, and it is still widely used today for this purpose. However, the protocol has also been adopted by many other streaming platforms and applications, including YouTube, Facebook, and Twitch.</p>
<p>Before the development of RTMP, traditional HTTP-based streaming protocols had significant latency, which made them unsuitable for live-streaming applications. RTMP solved this problem by enabling low-latency streaming, which made it possible to deliver real-time content to viewers reliably and efficiently. Today, RTMP remains a popular choice for live streaming and video-on-demand applications and is widely used by broadcasters, content providers, and businesses.</p>
<h2 id="heading-rtmps">RTMPs</h2>
<p>In addition to RTMP, there is also a secure version of the protocol called RTMPs (RTMP over Secure Sockets Layer). RTMPs use the same underlying protocol as RTMP but add an additional layer of security by encrypting the data being transmitted between the client and the server using the Secure Sockets Layer (SSL) protocol. This helps to protect against man-in-the-middle attacks and other forms of data tampering.</p>
<h2 id="heading-rtmp-architecture">RTMP Architecture</h2>
<p>RTMP uses a client-server architecture where the client sends a request to the server to establish a connection. Once the connection is established, the server sends the requested data to the client. The client can also send data to the server, such as commands and user input.</p>
<p>The RTMP protocol consists of several components:</p>
<ol>
<li><p><strong>RTMP Client:</strong> This is the application that sends the request to the RTMP server to establish a connection and receive the streaming data. The RTMP client can be a web browser, a desktop application, a mobile app, or any other device that can connect to the internet and receive streaming data.</p>
</li>
<li><p><strong>RTMP Server:</strong> This is the server that receives the request from the RTMP client and sends the streaming data back to the client. The RTMP server can be a dedicated server or a cloud-based server, and it is responsible for managing the connection, handling the streaming data, and delivering it to the client.</p>
</li>
<li><p><strong>RTMP Protocol:</strong> This is the communication protocol used between the RTMP client and the RTMP server. It defines the format of the streaming data, how it is transmitted over the internet, and how it is processed by the client and server. The RTMP protocol uses a combination of TCP (Transmission Control Protocol) and UDP (User Datagram Protocol) to deliver the streaming data, and it supports various codecs for audio and video compression.</p>
</li>
</ol>
<h2 id="heading-advantages">Advantages</h2>
<ol>
<li><p><strong>Low Latency</strong>: RTMP has very low latency, which makes it ideal for live-streaming applications. Latency is the delay between the time a video is captured and the time it is displayed on the screen. Low latency means that viewers can watch live events in real time without any noticeable delay.</p>
</li>
<li><p><strong>High Quality</strong>: RTMP supports high-quality video and audio streams, which makes it ideal for broadcasting high-quality content.</p>
</li>
<li><p><strong>Adaptive Bitrate</strong>: RTMP supports adaptive bitrate streaming, which means that the video quality can be adjusted based on the user's internet connection speed. This ensures that users with slow internet connections can still watch the content without buffering.</p>
</li>
<li><p><strong>Security</strong>: RTMP supports encryption, which makes it secure for transmitting sensitive data.</p>
</li>
<li><p><strong>Cross-platform Support</strong>: RTMP is supported by most major operating systems, including Windows, Mac, and Linux.</p>
</li>
</ol>
<p>In addition to its low latency and high-bandwidth capabilities, RTMP also has several other features that make it a popular choice for streaming applications. These include:</p>
<ul>
<li><p><strong>Protocol multiplexing:</strong> RTMP can multiplex multiple streams of data over a single connection, which allows for efficient use of network resources.</p>
</li>
<li><p><strong>Stream control:</strong> RTMP provides several controls that can be used to adjust the quality and resolution of a stream in real-time, based on the available bandwidth and other factors.</p>
</li>
<li><p><strong>Encryption:</strong> RTMP supports encryption of the data being transmitted between the client and the server, which helps to protect against man-in-the-middle attacks and other forms of data tampering.</p>
</li>
<li><p><strong>Metadata:</strong> RTMP supports the inclusion of metadata with a stream, which can be used to provide information about the stream, such as the title, description, and other metadata.</p>
</li>
</ul>
<h2 id="heading-example">Example</h2>
<p>Here is an example of how you might implement RTMP streaming in Node.js using TypeScript and the Express web framework:</p>
<p>First, you will need to install the required dependencies:</p>
<pre><code class="lang-bash">npm install express @types/express @types/node @types/node-fluent-ffmpeg fluent-ffmpeg @types/fluent-ffmpeg node-media-server
</code></pre>
<p>Next, create a new TypeScript file and import the dependencies:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { NodeMediaServer } <span class="hljs-keyword">from</span> <span class="hljs-string">'node-media-server'</span>;
</code></pre>
<p>Then, Configure the NodeMediaServer:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">const</span> app = express();

<span class="hljs-keyword">const</span> config = {
  rtmp: {
    port: <span class="hljs-number">1935</span>,
    chunk_size: <span class="hljs-number">60000</span>,
    gop_cache: <span class="hljs-literal">true</span>,
    ping: <span class="hljs-number">60</span>,
    ping_timeout: <span class="hljs-number">30</span>
  },
  http: {
    port: <span class="hljs-number">8000</span>,
    allow_origin: <span class="hljs-string">'*'</span>
  }
};

<span class="hljs-keyword">const</span> nms = <span class="hljs-keyword">new</span> NodeMediaServer(config);
nms.run();
</code></pre>
<p>Next, publish the stream using <code>ffmpeg</code></p>
<pre><code class="lang-bash"><span class="hljs-comment"># video file with H.264 video and AAC audio:</span>
ffmpeg -re -i INPUT_FILE_NAME -c copy -f flv rtmp://localhost/live/STREAM_NAME

<span class="hljs-comment"># video file that is encoded in other audio/video format</span>
ffmpeg -re -i INPUT_FILE_NAME -c:v libx264 -preset veryfast -tune zerolatency -c:a aac -ar 44100 -f flv rtmp://localhost/live/STREAM_NAME
</code></pre>
<p>Accessing the stream</p>
<pre><code class="lang-bash"><span class="hljs-comment"># RTMP</span>
rtmp://localhost/live/STREAM_NAME
<span class="hljs-comment"># http-flv</span>
http://localhost:8000/live/STREAM_NAME.flv
</code></pre>
<p>More context on the <a target="_blank" href="https://github.com/illuspas/Node-Media-Server">NodeMediaServer Package here</a>.</p>
<h2 id="heading-use-cases-for-rtmp">Use Cases for RTMP</h2>
<ol>
<li><p><strong>Live Streaming:</strong> RTMP is commonly used for live-streaming applications, such as sports events, concerts, and news broadcasts.</p>
</li>
<li><p><strong>Video-on-Demand:</strong> RTMP is also used for video-on-demand applications, where users can watch pre-recorded videos.</p>
</li>
<li><p><strong>Webinars:</strong> RTMP is commonly used for webinars, where presenters can stream live video and interact with the audience in real time.</p>
</li>
<li><p><strong>Gaming:</strong> RTMP is used for streaming gaming content, such as live gameplay and tournaments.</p>
</li>
</ol>
<p>In recent years, there has been a shift away from RTMP towards other streaming protocols, such as HTTP Live Streaming (HLS) and Dynamic Adaptive Streaming over HTTP (DASH). Despite these shifts towards other protocols, RTMP is still widely used in the streaming industry and is likely to remain a popular choice for many applications in the coming years.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>RTMP is widely used in the streaming industry because it is a reliable and efficient protocol for streaming high-quality video and audio over the Internet. It is particularly well-suited for applications that require low latency, such as live streaming events, online gaming, and video conferencing.</p>
]]></content:encoded></item><item><title><![CDATA[Internet Control Message Protocol (ICMP)]]></title><description><![CDATA[ICMP (Internet Control Message Protocol) is an Internet Standard protocol used for network health and control, error reporting, network diagnostics, and monitoring. It allows network devices to request information from each other, find the source of ...]]></description><link>https://blog.sofwancoder.com/internet-control-message-protocol-icmp</link><guid isPermaLink="true">https://blog.sofwancoder.com/internet-control-message-protocol-icmp</guid><category><![CDATA[networking]]></category><category><![CDATA[internet]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[Computer Science]]></category><category><![CDATA[backend developments]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sun, 19 Mar 2023 14:47:52 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1679236959487/e1da736d-ccef-4ae2-92f1-0d4603605295.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>ICMP (Internet Control Message Protocol) is an Internet Standard protocol used for network health and control, error reporting, network diagnostics, and monitoring. It allows network devices to request information from each other, find the source of a network problem, and monitor the health of a network. ICMP runs on top Internet Protocol (IP).</p>
<h2 id="heading-what-is-icmp">What is ICMP?</h2>
<p>ICMP is an abbreviation for Internet Control Message Protocol. It is used to monitor and manage networks, as well as to report errors. ICMP is one of the Internet's fundamental protocols. It is specified in the TCP/IP stack and is, therefore, independent of the application. Every server, router, and switch on the Internet uses the ICMP protocol. The protocol is used to diagnose and monitor networks, as well as to notify and handle errors. It works in tandem with other networks, such as TCP, UDP, and IP.</p>
<h2 id="heading-purpose-of-icmp">Purpose of ICMP</h2>
<p>Network administrators and IT support staff rely heavily on ICMP for reporting errors and fixing connections. If an error occurs during the sending or getting of an IP packet, the network device can use ICMP to notify the other end of the connection. In addition to determining whether or not two devices are communicating, ICMP signals can be used to identify and resolve connectivity problems. On top of that, ICMP is utilised by network managers to track down the source of any slowdowns or malfunctions in their networks.</p>
<h2 id="heading-importance-of-icmp">Importance of ICMP</h2>
<p>Within the realm of computer networking, ICMP is responsible for several essential tasks. The following are some of its most important roles:</p>
<h3 id="heading-error-reporting">Error Reporting</h3>
<p>ICMP is the protocol that is used to notify errors and problems that occur while IP packets are being transmitted. If a packet is lost or if a router makes a mistake while processing a packet, for instance, ICMP messages can be used to inform the sender and the user about the problem.</p>
<h3 id="heading-troubleshooting-network">Troubleshooting Network</h3>
<p>ICMP messages can be used to check if objects on a network are connected and to figure out what's wrong with the network. For instance, the ping utility sends an ICMP Echo Request message to a device and looks for an Echo Reply message. If the receiving device can be reached and can reply, it will send an Echo Reply message. If not, the sender can figure out that there's a problem with the connection.</p>
<p><strong>Traceroute is another tool that uses ICMP data</strong> to figure out what's wrong with a network. It sends a series of packets to a destination device, each with an increasing TTL value. Each router along the way replies with an ICMP Time Exceeded message. By looking at the TTL values and response times of the packets, network managers can find out where the packets are going and see if there are any problems along the way.</p>
<h3 id="heading-path-mtu-discovery">Path MTU Discovery</h3>
<p>The maximum transmission unit (MTU) size of the network link between two devices can be found out with ICMP. This makes sure that packets don't get broken up, which can help the network work better and lower the chance of packet loss. With ICMP data, you can find out the MTU size of a network path and change the size of the packets to match.</p>
<h3 id="heading-traffic-management">Traffic Management</h3>
<p>ICMP messages can be used to manage network traffic by letting routers talk to each other and changing how packets move to keep the network from getting backed up. For example, a router can use the ICMP Source Quench message to tell a sender to slow down the rate at which packets are sent if the network is busy.</p>
<h3 id="heading-security">Security</h3>
<p>Some types of network attacks, such as denial-of-service (DoS) attacks and IP spoofing attacks, can be found and stopped with the help of ICMP data. For example, the ICMP Echo Request flood attack sends a large number of ICMP Echo Request messages to a target device. This overwhelms the device and makes it hard for the network to work. Network managers can find and stop these kinds of attacks by using ICMP messages.</p>
<h3 id="heading-ipv6-neighbor-discovery">IPv6 Neighbor Discovery</h3>
<p>ICMPv6 is used by IPv6 devices for neighbour discovery, which is the process of discovering other devices on a network. ICMPv6 messages are used to identify and communicate with other devices on the same network segment, which is essential for IPv6 network operations.</p>
<h2 id="heading-icmp-message-types">ICMP Message Types</h2>
<p>ICMP messages come in different types, and each type serves a unique purpose. Some of the common ICMP message types include:</p>
<h3 id="heading-echo-requestreply">Echo Request/Reply</h3>
<p>A "ping," or ICMP Echo Request/Reply, is a straightforward network diagnostic instrument that enables one device to check the network connectivity of another device.</p>
<p>The process works as follows:</p>
<ol>
<li><p>An ICMP Echo Request packet is sent from the starting device to the IP address of the receiving device. This packet contains a unique identifier and a sequence number.</p>
</li>
<li><p>After receiving an ICMP Echo Request packet, the receiver will respond with an ICMP Echo Reply packet. The request ID and transmission sequence number are identical in this response.</p>
</li>
<li><p>Upon receiving the ICMP Echo Reply packet, the initiating device can determine whether the destination device is reachable and responsive.</p>
</li>
</ol>
<p>If the initiating device doesn't get an ICMP Echo Reply packet within a certain amount of time, it can assume that the target device is not reachable or is having trouble connecting.</p>
<h3 id="heading-destination-unreachable">Destination Unreachable</h3>
<p>When a packet is unable to be delivered to the location that was specified for it, this sort of message is generated. The message provides details regarding the reasons the packet could not be transmitted, such as an error that occurred on the network or an unreachable destination.</p>
<h3 id="heading-time-exceeded">Time Exceeded</h3>
<p>This message type is transmitted whenever a packet is thrown away because the time-to-live (TTL) number it was set to has been exceeded. The time-to-live (TTL) value of a packet is reduced by one whenever it travels through a router; once it approaches zero, the packet is deemed invalid and is thrown away.</p>
<h3 id="heading-redirect">Redirect</h3>
<p>A router will use this type of message when it wants to convey the information to a device that a more advantageous route to a destination is now accessible.</p>
<h3 id="heading-router-advertisement-solicitation">Router Advertisement/ Solicitation</h3>
<p>Routers make use of these different message types in order to publicise their existence and provide information regarding the topology of the network.</p>
<h2 id="heading-applications-of-icmp">Applications of ICMP</h2>
<p>ICMP is primarily used for error reporting and network troubleshooting, but it also has several other applications; which include</p>
<h3 id="heading-finding-out-which-host-is-responsible-for-a-network-problem">Finding Out Which Host Is Responsible For A Network Problem</h3>
<p>Finding the cause of a network problem is a very important use of ICMP. By sending a packet with a "time to live" (TTL) value of 1, network managers can use ICMP to find the host that is causing a network problem. When the packet gets to its target, the host will send an ICMP error message that says something like "The IP address of this packet's destination cannot be reached." The network controller will then know that the problem is with the host whose IP address is given in the error message.</p>
<h3 id="heading-network-monitoring-and-reporting">Network Monitoring and Reporting</h3>
<p>ICMP can be used to monitor the health of a network. For example, you can use ICMP to check whether a host is reachable or if a network device is up and running. When you send a request to a host, it will generate a response to let you know that everything is okay or if there is a problem.</p>
<p>Three main ICMP message types can be used for monitoring and reporting:</p>
<ul>
<li><p><strong>Echo request (ping)</strong> - This is used to check if a host is up and running. This can be used to look for hosts that are unreachable on a network. It can also be used to find out how long a host takes to respond to a ping request.</p>
</li>
<li><p><strong>Echo reply</strong> - This is used to respond to an echo request. It tells you when a host is up and running.</p>
</li>
<li><p><strong>Destination unreachable</strong> - This can be used to let you know that a host is down, or that there is a network problem that causes the host to be unreachable.</p>
</li>
</ul>
<h3 id="heading-network-tracing">Network Tracing</h3>
<p>This is how a host uses ICMP to trace the path of data packets through a network. If host A sends a Traceroute message to host B, host B will send back an ICMP Traceroute Reply message. Each network device that receives the message will add information about its location in the message and send the message to the next device.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>ICMP is a critical protocol in computer networking with many applications. It plays a vital role in error reporting and network troubleshooting, and it also has several other applications, such as path MTU discovery, traffic management, security, and IPv6 neighbour discovery. Without ICMP, network administrators would have a much harder time diagnosing and resolving network issues, optimizing network performance, and maintaining network security.</p>
]]></content:encoded></item><item><title><![CDATA[Authentication and Identity Validation]]></title><description><![CDATA[Authentication and identity validation are important concepts in software engineering, as they ensure that only authorized users have access to certain resources or systems. In this article, we will explore the basics of authentication and identity v...]]></description><link>https://blog.sofwancoder.com/authentication-and-identity-validation</link><guid isPermaLink="true">https://blog.sofwancoder.com/authentication-and-identity-validation</guid><category><![CDATA[identity-management]]></category><category><![CDATA[identity platform]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[backend]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sat, 28 Jan 2023 19:23:40 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1674931921714/0bad6249-a723-4106-89c6-58abf05f679a.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Authentication and identity validation are important concepts in software engineering, as they ensure that only authorized users have access to certain resources or systems. In this article, we will explore the basics of authentication and identity validation, and discuss some best practices for implementing these security measures.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>When you log into your bank, social media account, or email, you are usually prompted to provide some sort of verification such as a password or a secret answer. This is known as “authentication”.</p>
<p>When you are building an authentication system from scratch, you need identity validation as part of your user onboarding process. With identity validation, we can check what information about the user is publicly available and how trustworthy those sources are.</p>
<p>For this article, I've consulted <a target="_blank" href="https://www.linkedin.com/in/ibraheem-zulkifli/">a DevRel expert</a> from <a target="_blank" href="https://myidentitypass.com">IdentityPass</a>—a company that offers a suite of products to help you verify and gain deeper insights about your customers/business to stay compliant and avoid fraudulent activities— for tips to consider when dealing with Identity-aware systems.</p>
<h2 id="heading-what-is-authentication">What is Authentication?</h2>
<p>Authentication is the process of verifying the identity of a user, device, or system. It is typically done through the use of credentials, such as a username and password, biometric data, or a token. When you log into a website, app or other digital service, you are authenticating yourself with that service.</p>
<p>Users prove their identity to a system via the process of Authentication. When you log into your email account, you prove that you are the owner of that account by entering your password.</p>
<h2 id="heading-types-of-authentication">Types of Authentication</h2>
<p>The goal of authentication is to confirm that the user or device attempting to access a system or resource is who or what they claim to be. There are several types of authentication methods, including:</p>
<h3 id="heading-knowledge-based-authentication-what-you-know">Knowledge-based authentication: What you know</h3>
<p>This type of authentication relies on the user being able to provide a piece of information that only they should know, such as a password or a personal identification number (PIN).</p>
<h3 id="heading-possession-based-authentication-what-you-have">Possession-based authentication: What you have</h3>
<p>This type of authentication requires the user to present a physical object that they possess, such as a security token or a smart card.</p>
<h3 id="heading-inherence-based-authentication-what-you-are">Inherence-based authentication: What you are</h3>
<p>This type of authentication uses biometric data, such as fingerprints, facial recognition, or voice recognition, to verify the identity of the user.</p>
<h2 id="heading-what-is-identity-validation">What is Identity Validation?</h2>
<p>Identity validation is the process of confirming that an individual is who they claim to be. It is the process of confirming that a person attempting to sign up for a service or log into an existing account is who they claim to be. It involves verifying that the information provided by a user is accurate and corresponds to a real person. This can also involve verifying the user's name, date of birth, address, and other personal details.</p>
<p>There are several methods for identity validation, including:</p>
<ul>
<li><p><strong>Manual verification:</strong> This involves manually reviewing the information provided by the user and comparing it to other sources, such as government records or credit bureau data.</p>
</li>
<li><p><strong>Automated verification:</strong> This involves using software or other automated systems to verify the information provided by the user. This can include using algorithms to check for inconsistencies or red flags or using external sources—<a target="_blank" href="https://myidentitypass.com/"><strong>such as IdentityPass</strong></a>— to confirm the accuracy of the information.</p>
</li>
</ul>
<p>When you create an account on Uber or Lyft, you authenticate yourself through your account by entering your email and password. But when you first use the account, you are asked to confirm your identity by adding a photo of your license. You are validating that you are who you say you are. Identity validation is often used in situations where sensitive information is being shared, like when applying for a loan or setting up a health insurance account. It is also used in business settings for compliance reasons, such as in financial services where certain types of accounts require an “accepted” or “verified” ID.</p>
<h2 id="heading-why-is-identity-validation-important">Why is Identity Validation Important?</h2>
<p>Identity validation is important for several reasons:</p>
<ol>
<li><p><strong>Security:</strong> By verifying the identity of users, organizations can ensure that only authorized individuals have access to sensitive information and systems. This helps to prevent unauthorized access, data breaches, and other security incidents.</p>
</li>
<li><p><strong>Compliance:</strong> Many industries and organizations are subject to regulations that require them to verify the identity of individuals. For example, financial institutions are required to comply with anti-money laundering (AML) and know-your-customer (KYC) regulations, which mandate the verification of customer identities.</p>
</li>
<li><p><strong>Fraud prevention:</strong> By validating the identity of users, organizations can detect and prevent fraudulent activity. For example, by verifying that the information provided by a user corresponds to a real person, organizations can prevent individuals from creating fake accounts or using stolen identities.</p>
</li>
<li><p><strong>Trust and credibility:</strong> By validating the identity of users, organizations can build trust and credibility with their customers. This can be especially important for businesses that rely on online transactions, where customers may be hesitant to provide personal information without assurance that it will be protected.</p>
</li>
<li><p><strong>Accurate record-keeping:</strong> Identity validation also helps organizations to maintain accurate and up-to-date records of their customers and clients. This can help organizations comply with regulations and laws that require the maintenance of accurate records and can be useful for future reference.</p>
</li>
</ol>
<p>Overall, identity validation is an important aspect of security and compliance that helps organizations to protect their assets and customers, prevent fraud, and maintain trust and credibility. Organizations must have well-defined and implemented procedures for identity validation that are compliant with industry and legal standards.</p>
<h2 id="heading-authenticate-identify-verify">Authenticate! identify!! Verify!!!</h2>
<p>Yes! Exactly in that order. Let your users tell you who they are, attempt to identify them, and finally verify that they are exactly everything they claim to be.</p>
<ul>
<li><p><strong>Authentication</strong> is the process of proving that you are who you say you are by entering a password, name or other identifiers. You log into an account or website by authenticating yourself. You can also authenticate someone else by logging into their account and entering the correct password or other identifiers.</p>
</li>
<li><p><strong>Identification</strong> is the process of confirming that you are who you say you are. You can identify yourself by providing personal details like your name, date of birth or address.</p>
</li>
<li><p><strong>Verification</strong> is the process of confirming that something is true. Similarly, you can verify an account by providing an identifying feature like your mother’s maiden name or your National Identification Number.</p>
</li>
</ul>
<h2 id="heading-should-you-implement-identity-validation">Should you implement identity validation?</h2>
<p>If you are storing sensitive information like National Identification Numbers, you may be required to implement identity validation. Additionally, some industries, like financial services, demand that businesses meet strict identity validation requirements. Other industries, like healthcare, also often require identity validation. Before implementing identity validation, make sure you understand the requirements in your industry.</p>
<p>It is important to note that not all identity verification providers are created equal. Selecting a provider that offers the features required for your business, including robust fraud prevention and reliable results, is critical. A very good example is <a target="_blank" href="https://myidentitypass.com/">IdentityPass</a> which offers a wide range of solutions to help businesses with many possible identity and business verification needs.</p>
<h2 id="heading-best-practices-for-implementing-identity-validation">Best Practices for Implementing Identity Validation</h2>
<ul>
<li><p><strong>Collect only what is necessary:</strong> Review your data requirements and make sure they are necessary and relevant to your business. The fewer verifications you require, the less friction there is in the sign-up process.</p>
</li>
<li><p><strong>Build flexibility into your requirements to reduce false negatives:</strong> False negatives occur when someone fails to pass a verification check and is unable to sign up for an account. These can cause major problems for your business.</p>
</li>
<li><p><strong>Make it easy for users to verify their information:</strong> Offer multiple verification methods, and make sure you are guiding users through the process and helping them along the way.</p>
</li>
<li><p><strong>Use identity validation early in the onboarding process:</strong> By validating the user’s identity at the beginning of the onboarding process, you can significantly reduce false negatives and decrease the complexity of your sign-up and onboarding process.</p>
</li>
<li><p><strong>Use encrypted communication:</strong> When transmitting authentication credentials or other sensitive information, it's important to use encrypted communication to prevent interception by attackers.</p>
</li>
<li><p><strong>Implement identity validation measures:</strong> To ensure that the information provided by users is accurate and corresponds to real people, it's important to implement identity validation measures, such as manual or automated verification processes.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Authentication is the process by which a user proves their identity to a system, while identity validation is the process of confirming that a person attempting to sign up for a service or log into an existing account is who they claim to be.</p>
]]></content:encoded></item><item><title><![CDATA[Minimising Correlated Failures in Distributed Systems]]></title><description><![CDATA[Scalability and dependability are two areas where distributed systems face new problems. When many services are hosted on separate computers, they must use network protocols to talk to one another. The more the variety of services available, the grea...]]></description><link>https://blog.sofwancoder.com/minimising-correlated-failures-in-distributed-systems</link><guid isPermaLink="true">https://blog.sofwancoder.com/minimising-correlated-failures-in-distributed-systems</guid><category><![CDATA[distributed system]]></category><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[software development]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Mon, 16 Jan 2023 09:11:51 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1673829812088/8f2e4a02-2cca-4cc9-b0b3-942be587d26c.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Scalability and dependability are two areas where distributed systems face new problems. When many services are hosted on separate computers, they must use network protocols to talk to one another. The more the variety of services available, the greater the likelihood that something will go wrong. Distributed systems will inevitably experience some form of failure; the important thing is how you plan to handle it. Furthermore, if your system is hosted on cloud infrastructure's virtual machines (VMs), failures can have knock-on repercussions for other customers who are utilising the same physical hardware. This article explores the topic of improving the resilience of large-scale distributed systems in the face of failure.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>Distributed systems are complex. They involve many different processes and services that can fail or degrade independently. As a result, distributed applications need to be able to handle these failures gracefully. This article outlines five techniques for minimizing correlated failures in distributed systems:</p>
<ul>
<li><p>failure isolation,</p>
</li>
<li><p>defensive coding,</p>
</li>
<li><p>continuous monitoring,</p>
</li>
<li><p>peer review, and</p>
</li>
<li><p>Immutable APIs.</p>
</li>
</ul>
<p>These techniques help developers avoid the most common sources of correlated failures in software stacks and services across all layers of the stack and make it easier to debug issues when they do occur. These techniques will not eliminate every instance of correlated failures across your system’s architecture, but they will go a long way toward reducing their presence and impact on your end users.</p>
<h2 id="heading-the-challenges-of-scalability-and-reliability">The Challenges of Scalability and Reliability</h2>
<p><strong>Distributed systems handle a large amount of data across a network of systems that may be geographically distributed</strong>. Distributed systems are more complex than centralized systems, but they can also be more efficient and scalable because they can employ additional computing resources.</p>
<p>For example, you might use a distributed system if you need to analyze data in a very large database that can’t be processed on one computer. Distributed systems also have their unique challenges related to scalability and reliability.</p>
<p>To explain; <strong>let’s take a look at the high-level architecture of a typical distributed system:</strong></p>
<p>Distributed system architectures follow a standard pattern where data is ingested into a centralized data store.</p>
<p><img src="https://cdn.hashnode.com/res/hashnode/image/upload/v1673826472315/543ca6cd-e4a0-47ae-9353-3686fee4ad72.png" alt class="image--center mx-auto" /></p>
<p>This data store is responsible for replicating and distributing that data to other data stores that are spread across the network. If you’re building a distributed system, it’s important to understand that these data stores are not equally reliable.</p>
<p>Each data store has its unique level of availability and reliability. In many ways, this is what makes distributed systems difficult to build. You have to account for these differences, and <strong>you have to account for the fact that systems inevitably fail.</strong> In other words, <strong>distributed systems are fragile by default.</strong> You have to <strong>do things (and lots of things) differently to make them reliable.</strong></p>
<h2 id="heading-why-is-it-so-hard-to-build-reliable-distributed-systems">Why is it so hard to build reliable distributed systems?</h2>
<p>The central challenge of building a distributed system is that the system itself is distributed — all the components are distributed across different locations. Distributed systems pose unique reliability challenges that can’t be solved with a single, centralized approach because that centralized approach will only be as reliable as the weakest component. When something fails, it can impact the entire system, and distributed systems are always susceptible to failure.</p>
<h2 id="heading-identifying-the-critical-path-in-distributed-systems">Identifying the Critical Path in Distributed Systems</h2>
<p>When you’re trying to optimize a distributed system, the first step is to identify the critical path. The critical path is the path that determines the overall availability of the system. <strong>The critical path is the path that takes the longest amount of time to complete. It’s the path where a failure will have the most impact on the system as a whole.</strong> If this path fails, the entire system will be at risk of failing.</p>
<p>To identify the critical path, <strong>you have to look at everything that your system does.</strong> <strong>You have to understand every operation that your system performs</strong> and every operation that each component of your system performs. <strong>Once you’ve identified the critical path, you can focus your attention on making that path as reliable as possible.</strong> The less reliable the path, the more attention you should give to it.</p>
<h2 id="heading-use-redundancy-to-reduce-failures-in-distributed-systems">Use Redundancy to Reduce Failures in Distributed Systems</h2>
<p><strong>Redundancy is the ability to withstand failure by having multiple redundant components that can take over if a component fails.</strong> It is a common technique used to improve the availability of distributed systems because it enables you to make the critical path more reliable by adding more components to that path.</p>
<p>The more components you have performing a single operation, the less likely each component is to fail. There are many different kinds of redundancy you can use in distributed systems, including:</p>
<ul>
<li><p><strong>Automated failover</strong> - Automated failover uses a secondary component to take over when the primary component fails. This can be as simple as having a system that triggers a human to manually take over when a component fails.</p>
</li>
<li><p><strong>Service-level agreement (SLA)</strong> - An SLA is an agreement between a service owner and a client that specifies the level of availability and performance the system will maintain.</p>
</li>
<li><p><strong>Load balancing</strong> - Load balancing distributes the workload across multiple components. This can be useful for distributing the workload across multiple instances of an application or across instances of multiple applications.</p>
</li>
<li><p><strong>Redundant data stores</strong> - Redundant data stores allow you to write the same data to multiple copies of a data store. This helps to ensure that the data will be retained in the event that one data store fails.</p>
</li>
</ul>
<h2 id="heading-use-failure-detection-to-repair-your-system-after-a-failure-occurs">Use Failure Detection to Repair Your System After a Failure Occurs</h2>
<p><strong>Failure detection is the process of monitoring your system to identify when it fails.</strong> For example, you can detect a failure when a service is unavailable or when it returns an error. There are several different techniques for detecting failures, but some of the most common include:</p>
<ul>
<li><p><strong>Timeouts</strong> - Timeouts can be used to detect when a service is taking too long to respond. This is especially useful when communicating with services hosted on different networks.</p>
</li>
<li><p><strong>Retry logic</strong> - Retry logic can be used to detect when a service is unavailable by retrying the request until it succeeds.</p>
</li>
<li><p><strong>Circuit breakers</strong> - Circuit breakers can be used to detect when a service is failing and automatically stop sending requests to that service.</p>
</li>
<li><p><strong>Thresholds</strong> - Thresholds can be used to trigger an alert when a metric crosses a certain threshold. - Outages - Outages are when a service is completely unavailable.</p>
</li>
</ul>
<h2 id="heading-dont-rely-on-one-thing-when-building-a-distributed-system">Don’t Rely on One Thing When Building a Distributed System</h2>
<p>One of the most important things to remember when building a distributed system is that <strong>you can’t rely on any single component to provide 100% uptime.</strong> You can’t rely on a single data store, a single network, or a single service. Instead, you have to <strong>build the system in such a way that it can survive even when one or more components fail.</strong> You have to build the system in such a way that it can withstand the occasional, inevitable failure of a component. To do that, you have to <strong>design the system to be fault-tolerant</strong> by using the following principles:</p>
<ul>
<li><p><strong>Isolation</strong> - Isolation is the ability to run one component as an independent unit. This means that the failure of one component won’t impact the other components in the system.</p>
</li>
<li><p><strong>Decoupling</strong> - Decoupling is the ability of components to communicate without depending on each other. This means that the failure of one component won’t impact the other components in the system.</p>
</li>
<li><p><strong>Redundancy</strong> - Redundancy is the ability to withstand failure by using multiple components to perform the same task. This means that the failure of one component won’t impact the other components in the system.</p>
</li>
</ul>
<h2 id="heading-more-techniques-for-avoiding-correlated-failures">More Techniques for avoiding correlated failures</h2>
<p>The most fundamental truth of distributed systems is that they often experience failures. There will always be unexpected problems, no matter how meticulously you plan and test your system. To construct distributed systems that are both scalable and trustworthy, it is essential to learn how to handle these errors.</p>
<h3 id="heading-failure-isolation">Failure Isolation</h3>
<p><strong>The key to dealing with failures is to minimize the side effects of those failures.</strong> This means <strong>isolating the part of the system that failed</strong> from the rest of the system. The best way to do this is to design your system so that each component can be operated in isolation. This lets you scale systems horizontally by adding more capacity without adding the risk of cascading failure.</p>
<h3 id="heading-defensive-coding">Defensive Coding</h3>
<p>Defensive coding is another important part of the design of distributed systems. Building a distributed system requires a different way of thinking than building an application that runs on a single server. Distributed systems must consider stability, scalability, and performance, unlike single-server apps.</p>
<p>Distributed systems, in particular, must <strong>follow the best practices for handling errors</strong>, especially when it comes to dealing with unplanned events like network and hardware failures. That means <strong>you have to handle all errors with grace</strong>, not just fatal ones. Distributed systems have different requirements than single-server apps, making error handling difficult. Hardware and network may fail, thus, distributed <strong>systems must be designed to manage failures as normal</strong>.</p>
<h3 id="heading-continuous-monitoring">Continuous Monitoring</h3>
<p>Monitoring and logging play a significant role in the design of distributed systems. This is because it can be difficult to keep track of the numerous moving elements that distributed systems frequently contain. In particular, sharding is frequently used to scale distributed systems that rely on distributed databases. Data distribution across numerous machines is called sharding. How do you identify the downed machines and the missing data if your distributed system depends on a sharded database? How may communication errors between database shards be found?</p>
<p><strong>Monitoring entails more than only monitoring uptime</strong>. It involves <strong>monitoring every element that affects uptime</strong>. Monitoring must also be spread in a distributed system. Distributed components cannot be monitored by a centralised monitoring solution. Similar to centralised data, dispersed monitoring systems won't be able to collect it.</p>
<h3 id="heading-peer-review">Peer Review</h3>
<p>Reviewing your code with new eyes is one of the best ways to stop bugs from entering. Peer reviews should be conducted as soon as possible after the design phase of distributed systems is complete. This will find as many problems and design flaws up-front before the code is even put into use. You can spot possible difficulties before they become serious issues that need to be corrected by showing your design to a coworker. There are a few various approaches you can take. You can use a collaborative tool like a shared document or a design review board or you can present your design to a colleague in person.</p>
<h3 id="heading-immutable-api">Immutable API</h3>
<p>API design is also crucial in developing distributed systems. In the context of distributed systems, APIs serve as the connecting tissue. It's not enough to develop an API and cross your fingers. A well-designed API is essential. An immutable API design is one approach. Simply said, <strong>an immutable API design is one in which the API endpoints are built exclusively for CRUD operations (creating, reading, updating, and deleting) in a way that prevents data alteration</strong>. There are a couple of reasons why this matters. As a result, your API can scale more easily without you having to handle concurrency or resource-locking issues. T<strong>he only guarantee in distributed systems is that something will go wrong.</strong> <strong>The risk of a single component failure triggering a cascading failure across the system is mitigated if the API is designed to prevent data alteration.</strong></p>
<h2 id="heading-summing-up">Summing up</h2>
<p>Distributed systems present new challenges in scalability and reliability. When all services reside on different machines, they must communicate with one another through networked protocols. The more services there are, the more opportunities there are for something to go wrong. Failure is a natural part of distributed systems, but it’s how you deal with that failure that matters most. To handle this, you can use redundancy to reduce failures, use failure detection to repair the system after a failure occurs, and don't rely on one thing when building a distributed system. Distributed systems are challenging to build, but they are also powerful and scalable. To optimize them, you have to understand the critical path and focus on making them as reliable as possible.</p>
]]></content:encoded></item><item><title><![CDATA[Distributed Transactions: Overview]]></title><description><![CDATA[A distributed transaction is one that involves numerous database systems or other resources in a single transaction. Changes made to one system or resource must be reflected in all other systems or resources involved in the transaction in such instan...]]></description><link>https://blog.sofwancoder.com/distributed-transactions-overview</link><guid isPermaLink="true">https://blog.sofwancoder.com/distributed-transactions-overview</guid><category><![CDATA[distributed system]]></category><category><![CDATA[Distributed Database]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Tue, 03 Jan 2023 21:48:10 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1672775528871/b52c54bc-dd33-497e-92ae-7464cb31427b.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A distributed transaction is one that involves numerous database systems or other resources in a single transaction. Changes made to one system or resource must be reflected in all other systems or resources involved in the transaction in such instances. In other words, any changes made by the transaction must be committed or rolled back in all systems or resources involved. This article defines distributed transactions and describes how they function.</p>
<h2 id="heading-what-are-transactions">What are Transactions?</h2>
<p>Transactions are a basic notion in database systems and other data-manipulation systems. A transaction is a unit of work that consists of one or more data operations. In a nutshell, a transaction is a collection of commands that either complete entirely or fail altogether.</p>
<h3 id="heading-properties-of-a-transaction">Properties of a Transaction</h3>
<p>The key properties of a transaction are atomicity, consistency, isolation, and durability.</p>
<h4 id="heading-atomicity">Atomicity</h4>
<p>Atomicity refers to the property of a transaction that ensures that either all or none of the operations in the transaction are performed. This means that if an error happens during transaction execution, all changes made by the transaction are undone, and the system is returned to a consistent state.</p>
<h4 id="heading-consistency">Consistency</h4>
<p>The property of a transaction that assures that the transaction leaves the system in a consistent state is referred to as consistency. A consistent state is one in which all of the system's rules and restrictions are met.</p>
<h4 id="heading-isolation">Isolation</h4>
<p>The property of a transaction that ensures that the changes performed by the transaction are not visible to other transactions until the transaction is committed is referred to as isolation. This means that other transactions cannot see the intermediate states of the data while the transaction is running.</p>
<h4 id="heading-durability">Durability</h4>
<p>The term "transaction durability" is used to describe the quality of a transaction that guarantees its modifications will survive a system failure. This guarantees that the transaction's modifications will survive a crash of the system.</p>
<h2 id="heading-when-is-a-transaction-distributed">When is a Transaction Distributed?</h2>
<p>Transactions are straightforward to implement in a single-system environment. Depending on the outcome of the transaction, the system either commits the changes to the data store or rolls them back and records them in a temporary log called the transaction log.</p>
<p>However, the problem becomes more complicated in a distributed system, where numerous systems or resources are involved in a single transaction. This is because the transaction must produce results that are consistent and persistent across all systems or resources involved. This is known as a distributed transaction.</p>
<p><strong>Distributed transactions refer to a situation where multiple database systems, or other resources, are involved in a single transaction.</strong> In such cases, the changes made to one system or resource must be reflected in all the other systems or resources participating in the transaction. In other words, all the changes made by the transaction must be committed or rolled back in all the participating systems or resources.</p>
<h2 id="heading-requirements-for-distributed-transactions">Requirements for distributed transactions</h2>
<p>There are two important requirements for distributed transactions:</p>
<ul>
<li><p>Consistency: this means all distributed databases are equally up to date with the most recent information.</p>
</li>
<li><p>Termination: the distributed transaction is either fully executed or not executed at all. If a distributed transaction fails, it needs to fail for every database that participated in the transaction.</p>
</li>
</ul>
<h2 id="heading-importance-of-distributed-transactions">Importance of Distributed Transactions</h2>
<p>When a business process involving several systems or resources must be atomic—that is, all changes must be committed or none of them are committed—distributed transactions become crucial. A distributed transaction would be necessary, for instance, to guarantee the completion or reversal of a bank transfer between two different banking systems in the event of an error.</p>
<p>For processing a payment, Distributed Transactions might be helpful when validating and charging a credit card. Typically, billing information is kept separate from credit card information in a database. Using distributed transactions, we can synchronise the data in these two databases.</p>
<h2 id="heading-challenges-of-implementing-distributed-transactions">Challenges of Implementing Distributed Transactions</h2>
<p>There are two major challenges involved in implementing distributed transactions, which are:</p>
<h3 id="heading-consistency-problem">Consistency Problem</h3>
<p>A major hurdle in implementing distributed transactions is making sure the changes performed by the transaction are consistent across all systems or resources involved. The issue is commonly referred to as the "consistency problem."</p>
<h3 id="heading-durability-problem">Durability Problem</h3>
<p>Assuring that the transaction's modifications will survive a system failure is another difficult task. The issue is commonly referred to as the "durability problem."</p>
<h2 id="heading-techniques-and-protocols-for-implementing-distributed-transactions">Techniques and Protocols for Implementing Distributed Transactions</h2>
<p>To solve the consistency and durability problems, various techniques and protocols have been developed for implementing distributed transactions.</p>
<h3 id="heading-two-phase-commit-protocol-2pchttpsblogsofwancodercomtwo-phased-commit-and-extended-architecture-the-basics"><a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics">Two-Phase Commit Protocol (2PC)</a></h3>
<p>The transaction log is a short-term log used in this protocol to record the changes made by the transaction. Each system or resource involved in the transaction receives a request from the transaction coordinator, the coordinating entity, asking it to get ready to commit the modifications. Once the coordinator determines that all systems or resources are ready to commit, they will issue a commit request. If the coordinator detects that any system or resource is not yet ready to commit, they will issue a rollback request, at which point everything will reverse its recent actions.</p>
<p>There are several variations of the <a target="_blank" href="https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics">two-phase commit protocol</a>, including the three-phase commit protocol and the distributed commit protocol. These variations address specific problems or improve the efficiency of the protocol.</p>
<h3 id="heading-optimistic-concurrency-control-occ">Optimistic Concurrency Control (OCC)</h3>
<p>In this technique, each system or resource participating in the transaction maintains a version number for the data. When a transaction attempts to update the data, it checks the version number. If the version number has not changed, the transaction updates the data and increments the version number. If the version number has changed, the transaction rolls back the changes and retries the update.</p>
<h3 id="heading-xa-standard">XA Standard</h3>
<p>The XA standard is a technique for implementing distributed transactions that involve the use of an XA interface to coordinate the transaction. The XA interface defines a set of functions that can be used to start, end, and roll back a transaction.</p>
<h3 id="heading-sagas-pattern">Sagas Pattern</h3>
<p>One way to execute distributed transactions is through the use of the Sagas pattern, which entails slicing up the transaction into several smaller, self-contained pieces. Individual sagas can be committed or rolled back without affecting others.</p>
<h3 id="heading-eventual-consistency-model">Eventual Consistency Model</h3>
<p>Relaxing the consistency constraints of the transaction and letting the participating systems or resources finally converge on a consistent state is the basis of the eventual consistency model, a technique for implementing distributed transactions.</p>
<h2 id="heading-summary">Summary</h2>
<p>A distributed transaction is necessary because transactions that span multiple databases might fail due to network interruptions or other issues. It is an important concept in database systems and other distributed systems, and various techniques and protocols have been developed to solve the consistency and durability problems involved in implementing distributed transactions.</p>
]]></content:encoded></item><item><title><![CDATA[Two-Phased Commit and eXtended Architecture: The Basics]]></title><description><![CDATA[Two-phase commit (2PC) and XA (eXtended Architecture) are two important concepts in database transactions and distributed systems. They both provide a way to ensure that transactions involving multiple resources are either completed successfully or r...]]></description><link>https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics</link><guid isPermaLink="true">https://blog.sofwancoder.com/two-phased-commit-and-extended-architecture-the-basics</guid><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><category><![CDATA[backend]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Microservices]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Fri, 30 Dec 2022 17:44:36 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1672421630782/3ef034eb-618a-4555-8cc8-ccb80562aa84.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Two-phase commit (2PC) and XA (eXtended Architecture) are two important concepts in database transactions and distributed systems. They both provide a way to ensure that transactions involving multiple resources are either completed successfully or rolled back in case of failure, thus maintaining the integrity of the data. In this article, we will explain the two-phase commit protocol and XA in detail and discuss their use cases and limitations.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>In distributed transaction processing, a commit operation finalizes a transaction and makes it visible to other participants. In an extended two-phase commit (2PC) protocol, the commit action is split into two phases: <code>Prepare</code> and <code>Commit</code>. The first phase is called <code>prepare</code> because the participant prepares to commit by checking some pre-conditions. If those conditions are not satisfied, the participant can not continue in the second phase by committing but has to roll back their work. A failure at this point results in aborting the transaction and starting again from the beginning of the process. The main advantage of a 2PC protocol is that it enables automatic recovery from failures during transactions.</p>
<h2 id="heading-what-is-a-transaction">What is a Transaction?</h2>
<p>Before we delve into 2PC and XA, it is important to understand what a transaction is. A transaction is a sequence of operations that are performed as a single unit of work. The main goal of a transaction is to ensure that the data remains consistent and reliable, even in the face of failures or errors.</p>
<p>In database systems, a transaction can consist of multiple database operations, such as inserts, updates, and deletes, that are performed on one or more tables. Transactions allow us to ensure that the data remains consistent and correct, even if some of the operations fail. For example, if we are transferring money from one bank account to another, we want to make sure that the money is deducted from the first account and added to the second account, or that no changes are made at all if something goes wrong.</p>
<h2 id="heading-why-distributed-transaction-processing">Why Distributed Transaction Processing?</h2>
<p>Distributed transaction processing has become an important requirement in many application scenarios. The reasons are simple:</p>
<ul>
<li><p>first, we want to achieve scalability by increasing the size of the computing clusters to handle larger workloads.</p>
</li>
<li><p>Second, we want to achieve availability by ensuring that no single point of failure can bring down the system.</p>
</li>
</ul>
<p>Achieving scalability and availability requires distributed systems with atomic transactions.</p>
<h2 id="heading-what-is-the-two-phase-commit-protocol-2pc">What is the Two-Phase Commit Protocol (2PC)?</h2>
<p>The two-phase commit protocol is a distributed transaction protocol that ensures that a transaction is either completed successfully or rolled back in case of failure. It is called "two-phase" because it consists of two phases: a prepare phase and a commit phase.</p>
<h3 id="heading-the-prepare-phase">The <code>Prepare</code> Phase</h3>
<p>In the <code>prepare</code> phase, the transaction coordinator (also known as the "transaction manager") sends a request to all the participating resources (such as databases or message queues) to prepare for the commit. The resources then perform any necessary checks and updates, and return a response indicating whether they are ready to commit or not. If all the resources are ready to commit, the transaction coordinator moves on to the commit phase.</p>
<h3 id="heading-the-commit-phase">The <code>Commit</code> Phase</h3>
<p>In the <code>commit</code> phase, the transaction coordinator sends a commit request to all the resources. If all the resources respond successfully, the transaction is considered committed and the changes are made permanent. If any of the resources fail to commit, the transaction coordinator sends a rollback request to all the resources and the transaction is considered failed.</p>
<hr />
<p>The two-phase commit protocol is used to ensure that all the participating resources are in sync and that the changes are made consistently across all the resources. It is a reliable and widely used protocol, but it has some limitations, which we will discuss later.</p>
<h2 id="heading-distributed-transaction-processing-with-2pc">Distributed Transaction Processing with 2PC</h2>
<p>For distributed transaction processing, a two-phase commit protocol is necessary for ensuring that transactions are managed and controlled by more than one participant. Since all participants can't communicate with each other directly, a distributed transaction manager is required for controlling the transaction.</p>
<h3 id="heading-transaction-manager">Transaction Manager</h3>
<p>The transaction manager is responsible for controlling the transaction and coordinating the communication between the distributed resource managers.It does this by using a two-phase commit protocol</p>
<p>A two-phase commit protocol ensures that at least two participants have to be involved in every transaction:</p>
<ul>
<li><p>the transaction manager and</p>
</li>
<li><p>at least one resource manager.</p>
</li>
</ul>
<p>This means that <strong>a two-phase commit protocol requires a network connection between the transaction manager and the resource managers.</strong></p>
<h2 id="heading-what-is-the-extended-architecture-protocol">What is the eXtended Architecture Protocol?</h2>
<p>XA is an extension of the two-phase commit protocol that allows transactions to span multiple resources, such as databases, message queues, and file systems. It is used to coordinate the commit or rollback of a transaction across multiple resources, ensuring that the changes are made consistently and reliably.</p>
<p>In XA, each resource participating in the transaction is represented by an XA resource manager. The XA resource manager is responsible for managing the transactions on the resource and communicating with the transaction manager. The transaction manager is responsible for coordinating the commit or rollback of the transaction across all the participating resources.</p>
<p>The XA protocol defines a set of APIs (Application Programming Interfaces) that the transaction manager and the XA resource managers use to communicate and coordinate the transaction. These APIs include functions for starting, committing, and rolling back a transaction, as well as for checking the status of a transaction.</p>
<p>XA is a powerful tool for managing distributed transactions, but it has some limitations, which we will discuss later.</p>
<h2 id="heading-use-cases-for-2pc-and-xa">Use Cases for 2PC and XA</h2>
<p>2PC and XA are used in a variety of scenarios where transactions involve multiple resources, such as databases, message queues, and file systems. Some common use cases include:</p>
<ol>
<li><p>Financial transactions: 2PC and XA are widely used in the financial industry to ensure the integrity of financial transactions, such as money transfers, stock trades, and credit card payments.</p>
</li>
<li><p>E-commerce: In e-commerce systems, 2PC and XA are used to ensure that orders, payments, and inventory updates are all completed consistently and reliably.</p>
</li>
<li><p>Supply chain management: In supply chain management systems, 2PC and XA are used to ensure that orders, shipments, and inventory updates are all coordinated and consistent across multiple resources.</p>
</li>
<li><p>Healthcare: In healthcare systems, 2PC and XA are used to ensure that patient records, treatments, and billing information are all consistent and accurate.</p>
</li>
</ol>
<h2 id="heading-limitations-of-2pc-and-xa">Limitations of 2PC and XA</h2>
<p>While 2PC and XA are powerful tools for managing distributed transactions, they have some limitations:</p>
<ol>
<li><p>Performance: 2PC and XA can have a significant impact on performance, as they involve multiple round-trips and communication between the participating resources and the transaction manager. This can make them slower than other transaction protocols.</p>
</li>
<li><p>Complexity: 2PC and XA are complex protocols that require a significant amount of programming and infrastructure to implement.</p>
</li>
<li><p>Single point of failure: The transaction manager is a single point of failure in the 2PC and XA protocols. If the transaction manager fails, the entire transaction will fail.</p>
</li>
<li><p>Limited scalability: 2PC and XA can be challenging to scale, as they involve multiple round-trips and communication between the participating resources and the transaction manager.</p>
</li>
</ol>
<h2 id="heading-2pc-with-no-rollback">2PC With No Rollback</h2>
<p>In a 2PC scenario where no rollback occurs, the <code>prepare</code> phase proceeds and all participants agree to commit. Since no participant is executing a rollback at this point, the transaction can be committed. A 2PC with no rollback is an optimistic implementation where the transaction participants proceed with the commit action in the second phase. If, however, some participants were not able to satisfy the conditions, they won’t proceed and will roll back their work. This is called an optimistic approach because the participants proceed with committing their work without necessarily knowing whether their work will be visible to the other participants. The advantage of an optimistic approach is that it can lead to faster throughput in distributed transactions since no participants will be delaying the completion of their work.</p>
<h2 id="heading-2pc-with-rollback">2PC With Rollback</h2>
<p>The main difference between a 2PC with no rollback and a 2PC with a rollback is that a 2PC with a rollback can proceed only if all participants agree to commit the transaction. If any participant fails to meet the pre-conditions and is unable to continue with the <code>commit</code> in the second phase, these participants have to roll back their work. An advantage of a 2PC with rollback is that it is more conservative and is therefore likely to lead to slower throughput because many distributed transactions may take longer to complete. For example, if a transaction cannot proceed because a resource manager is down, the participants will not be able to commit, and they will have to roll back their work.</p>
<h2 id="heading-example-xa-with-nodejs-typescript-amp-express">Example: XA with NodeJS, TypeScript &amp; Express</h2>
<p>Here is an example of using XA with Node.js, TypeScript, and Express:</p>
<p><strong>NOTE: These code examples are for illustration purposes only and do not represent a complete or real-world implementation of a distributed transaction management system. They are meant to provide a general understanding of the concepts involved and should not be used as is in a production environment.</strong></p>
<ul>
<li>Firstly, We're going to create an <code>XA</code> class to manage distributed transactions. This class will help us create and manage transactions that involve multiple resources. We'll be using MySQL client, to persist and coordinate the state of different resources and ensure the transaction is either committed or rolled back as needed.</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> { Client } <span class="hljs-keyword">from</span> <span class="hljs-string">'pg'</span>;

<span class="hljs-keyword">class</span> XA {
  <span class="hljs-keyword">private</span> client: Client;
  <span class="hljs-keyword">private</span> transaction: Transaction | <span class="hljs-literal">null</span> = <span class="hljs-literal">null</span>;

  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">client: Client</span>) {
    <span class="hljs-built_in">this</span>.client = client;
  }

  <span class="hljs-keyword">async</span> beginTransaction(): <span class="hljs-built_in">Promise</span>&lt;Transaction&gt; {
    <span class="hljs-comment">// Begin a new transaction</span>
    <span class="hljs-built_in">this</span>.transaction = <span class="hljs-keyword">new</span> Transaction(<span class="hljs-built_in">this</span>.client);
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">this</span>.transaction;
  }
}
</code></pre>
<ul>
<li>Then we're going to create a <code>Transaction class</code>. The <code>Transaction</code> class is an important part of a distributed transaction management system because it helps to coordinate the actions of multiple resources involved in a transaction. It is responsible for managing the lifecycle of a distributed transaction, including the <code>prepare</code>, <code>commit</code>, and <code>rollback</code> phases.</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">class</span> Transaction {
  <span class="hljs-keyword">private</span> client: Client;
  <span class="hljs-keyword">private</span> transactionId: <span class="hljs-built_in">string</span>;
  <span class="hljs-keyword">private</span> resourceManagers: ResourceManager[] = [];

  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">client: Client</span>) {
    <span class="hljs-built_in">this</span>.client = client;
    <span class="hljs-built_in">this</span>.transactionId = <span class="hljs-built_in">Math</span>.random().toString(<span class="hljs-number">36</span>).substr(<span class="hljs-number">2</span>, <span class="hljs-number">10</span>);
  }

  <span class="hljs-keyword">async</span> addResourceManager(url: <span class="hljs-built_in">string</span>, resourceName: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt; {
    <span class="hljs-comment">// Add a new resource manager to the list</span>
    <span class="hljs-built_in">this</span>.resourceManagers.push(<span class="hljs-keyword">new</span> ResourceManager(url, resourceName));
  }

  <span class="hljs-keyword">async</span> prepare(data: <span class="hljs-built_in">any</span>): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt; {
    <span class="hljs-comment">// Send a prepare request to all the resource managers, including the transaction ID and necessary data</span>
    <span class="hljs-keyword">const</span> results = <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all(<span class="hljs-built_in">this</span>.resourceManagers.map(<span class="hljs-function"><span class="hljs-params">rm</span> =&gt;</span> rm.prepare(<span class="hljs-built_in">this</span>.transactionId, data)));

    <span class="hljs-comment">// Update the transaction status in the database</span>
    <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.client.query(<span class="hljs-string">'INSERT INTO transactions (id, status) VALUES ($1, $2)'</span>, [<span class="hljs-built_in">this</span>.transactionId, <span class="hljs-string">'prepared'</span>]);

    <span class="hljs-comment">// If any of the resource managers failed to prepare, rollback the transaction</span>
    <span class="hljs-keyword">if</span> (results.some(<span class="hljs-function"><span class="hljs-params">result</span> =&gt;</span> !result)) {
      <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.rollback();
      <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Transaction failed to prepare'</span>);
    }
  }

  <span class="hljs-keyword">async</span> commit(): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt; {
    <span class="hljs-comment">// Send a commit request to all the resource managers, including the transaction ID</span>
    <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all(<span class="hljs-built_in">this</span>.resourceManagers.map(<span class="hljs-function"><span class="hljs-params">rm</span> =&gt;</span> rm.commit(<span class="hljs-built_in">this</span>.transactionId)));

    <span class="hljs-comment">// Update the transaction status in the database</span>
    <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.client.query(<span class="hljs-string">'UPDATE transactions SET status = $1 WHERE id = $2'</span>, [<span class="hljs-string">'committed'</span>, <span class="hljs-built_in">this</span>.transactionId]);
  }

  <span class="hljs-keyword">async</span> rollback(): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt; {
    <span class="hljs-comment">// Send a rollback request to all the resource managers, including the transaction ID</span>
    <span class="hljs-keyword">await</span> <span class="hljs-built_in">Promise</span>.all(<span class="hljs-built_in">this</span>.resourceManagers.map(<span class="hljs-function"><span class="hljs-params">rm</span> =&gt;</span> rm.rollback(<span class="hljs-built_in">this</span>.transactionId)));

<span class="hljs-comment">// Update the transaction status in the database</span>
<span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.client.query(<span class="hljs-string">'UPDATE transactions SET status = $1 WHERE id = $2'</span>, [<span class="hljs-string">'reverted'</span>, <span class="hljs-built_in">this</span>.transactionId]);
  }
}
</code></pre>
<ul>
<li>Now, to the resource manager class which is another important part of the distributed transaction management system. The <code>ResourceManager</code> class is typically responsible for receiving requests from the <code>Transaction</code> class to <code>prepare</code>, <code>commit</code>, or <code>rollback</code> a transaction, and for interacting with the shared resource to perform these actions. It may also be responsible for other tasks related to managing the shared resource, such as creating and releasing locks on the resource, or handling errors that occur during the transaction.</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-keyword">class</span> ResourceManager {
  <span class="hljs-keyword">private</span> url: <span class="hljs-built_in">string</span>;

  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">url: <span class="hljs-built_in">string</span></span>) {
    <span class="hljs-built_in">this</span>.url = url;
  }

  <span class="hljs-keyword">async</span> prepare(transactionId: <span class="hljs-built_in">string</span>, data: <span class="hljs-built_in">any</span>): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">boolean</span>&gt; {
    <span class="hljs-keyword">try</span> {
      <span class="hljs-comment">// Send a prepare request to the resource manager, including the transaction ID and necessary data</span>
      <span class="hljs-keyword">await</span> axios.post(<span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">this</span>.url}</span>/prepare`</span>, { transactionId, data });
      <span class="hljs-keyword">return</span> <span class="hljs-literal">true</span>;
    } <span class="hljs-keyword">catch</span> (error) {
      <span class="hljs-keyword">return</span> <span class="hljs-literal">false</span>;
    }
  }

  <span class="hljs-keyword">async</span> commit(transactionId: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt; {
    <span class="hljs-comment">// Send a commit request to the resource manager, including the transaction ID</span>
    <span class="hljs-keyword">await</span> axios.post(<span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">this</span>.url}</span>/commit`</span>, { transactionId });
  }

  <span class="hljs-keyword">async</span> rollback(transactionId: <span class="hljs-built_in">string</span>): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">void</span>&gt; {
    <span class="hljs-comment">// Send a rollback request to the resource manager, including the transaction ID</span>
    <span class="hljs-keyword">await</span> axios.post(<span class="hljs-string">`<span class="hljs-subst">${<span class="hljs-built_in">this</span>.url}</span>/rollback`</span>, { transactionId });
  }
}
</code></pre>
<p>Here is an example of how to use the updated <code>XA</code> and <code>Transaction</code> classes to manage a distributed transaction in a Node.js application using TypeScript and the Express framework:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Router } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> { Client } <span class="hljs-keyword">from</span> <span class="hljs-string">'pg'</span>;
<span class="hljs-keyword">import</span> { XA, Transaction } <span class="hljs-keyword">from</span> <span class="hljs-string">'./xa'</span>;

<span class="hljs-keyword">const</span> app = express();
<span class="hljs-keyword">const</span> router = Router();
<span class="hljs-keyword">const</span> client = <span class="hljs-keyword">new</span> Client();
<span class="hljs-keyword">const</span> xa = <span class="hljs-keyword">new</span> XA(client);

router.post(<span class="hljs-string">'/transfer'</span>, <span class="hljs-keyword">async</span> (req, res) =&gt; {
  <span class="hljs-keyword">try</span> {
    <span class="hljs-comment">// Begin a new transaction</span>
    <span class="hljs-keyword">const</span> transaction = <span class="hljs-keyword">await</span> xa.beginTransaction();

    <span class="hljs-comment">// Add the necessary resource managers to the transaction</span>
    <span class="hljs-keyword">await</span> transaction.addResourceManager(<span class="hljs-string">'http://debit.service/api'</span>);
    <span class="hljs-keyword">await</span> transaction.addResourceManager(<span class="hljs-string">'http://credit.service/api'</span>);

    <span class="hljs-comment">// Prepare the transaction, including the necessary data</span>
    <span class="hljs-keyword">const</span> data = {
      fromAccount: req.body.fromAccount,
      toAccount: req.body.toAccount,
      amount: req.body.amount,
    };
    <span class="hljs-keyword">await</span> transaction.prepare(data);

    <span class="hljs-comment">// Commit the transaction</span>
    <span class="hljs-keyword">await</span> transaction.commit();

    res.sendStatus(<span class="hljs-number">200</span>);
  } <span class="hljs-keyword">catch</span> (error) {
    res.sendStatus(<span class="hljs-number">500</span>);
  }
});

app.use(router);

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Listening on port 3000'</span>);
});
</code></pre>
<p>In this example, the <code>prepare</code> method of the <code>Transaction</code> class will send a <code>POST</code> request to the <code>/prepare</code> endpoint of each of the resource managers, passing along the <code>transactionId</code> and the necessary data. The resource managers will then use this data to prepare for the transaction.</p>
<p>The <code>commit</code> method of the <code>Transaction</code> class will then send a <code>POST</code> request to the <code>/commit</code> endpoint of each of the resource managers, passing along the <code>transactionId</code>. The resource managers will use this request to commit the actions they prepared for in the previous step.</p>
<p>If any errors occur during the transaction, the <code>rollback</code> method of the <code>Transaction</code> class will be called, which will send a <code>POST</code> request to the <code>/rollback</code> endpoint of each of the resource managers, passing along the <code>transactionId</code>. The resource managers will use this request to roll back any actions they took during the <code>prepare</code> phase.</p>
<p>This is a basic example of how to use the <code>XA</code> and <code>Transaction</code> classes to manage a distributed transaction in a Node.js application. You may need to modify these classes and the example code to fit the specific needs of your application.</p>
<hr />
<p><strong>Warning: These code examples are for illustration purposes only and do not represent a complete or real-world implementation of a distributed transaction management system. They are meant to provide a general understanding of the concepts involved and should not be used as is in a production environment.</strong></p>
<hr />
<h2 id="heading-conclusion">Conclusion</h2>
<p>In conclusion, 2PC and XA are important concepts in database transactions and distributed systems. They provide a way to ensure that transactions involving multiple resources are either completed successfully or rolled back in case of failure, thus maintaining the integrity of the data. However, they have some limitations, including performance, complexity, and scalability, which should be taken into consideration when deciding whether to use them in a particular application.</p>
]]></content:encoded></item><item><title><![CDATA[Circuit Breaker in Microservices]]></title><description><![CDATA[It’s a given fact that microservices-based software architecture brings its own set of challenges. With so many microservices and services interacting with each other, increased complexity and the risk of failures — or cascade failures — are inevitab...]]></description><link>https://blog.sofwancoder.com/circuit-breaker-in-microservices</link><guid isPermaLink="true">https://blog.sofwancoder.com/circuit-breaker-in-microservices</guid><category><![CDATA[Software Engineering]]></category><category><![CDATA[backend]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Node.js]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Wed, 28 Dec 2022 12:27:35 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1672156426781/497aef02-c080-4aea-b35b-2f42ca8b02ea.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>It’s a given fact that microservices-based software architecture brings its own set of challenges. With so many microservices and services interacting with each other, increased complexity and the risk of failures — or cascade failures — are inevitable. To address these challenges, we need to find ways to isolate risky components and prevent their failure from propagating throughout the system. Consequently, in this article, we will explore the circuit breaker pattern in microservices architecture and see how it can help you deal with faults and failures.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>When designing an enterprise microservices architecture, one of the biggest concerns is how to manage failure in a distributed system. The software industry has seen several examples of large-scale failures, such as <a target="_blank" href="https://winbuzzer.com/2017/03/16/microsoft-azure-customers-faced-seven-hour-outage-yesterday-xcxwbn/">Microsoft Azure outage</a> and <a target="_blank" href="https://winbuzzer.com/2017/02/28/amazon-web-services-aws-outage-dragging-numerous-websites-services-xcxwbn/">AWS S3 outage</a> in 2017. Both cloud outages had a huge impact on many businesses because they were so widespread. With the microservices architecture pattern, you can build your applications using small services that have a single responsibility and are combined to create larger capabilities. When building these smaller services, it's important to implement resiliency measures to make sure they remain online when encountering errors or unexpected conditions.</p>
<h2 id="heading-what-is-a-circuit-breaker">What is a Circuit Breaker?</h2>
<p>When things go wrong, we must have some contingency in place. Otherwise, our services will keep failing and we’ll end up with no services at all. This is where circuit breakers come into play.</p>
<p><strong>A Circuit Breaker is a fault-tolerance pattern that's used to handle transient errors and prevent cascading failures.</strong> In other words, <strong>it's a mechanism to stop the propagation of errors by shutting things down gracefully.</strong></p>
<h3 id="heading-in-distributed-systems">In Distributed Systems</h3>
<p>In distributed systems, <strong>a circuit breaker can be implemented to end the flow of requests</strong> to a service that has exceeded its maximum threshold of error rate and latency. The pattern is widely used in distributed systems to improve their reliability and availability. <strong>It is implemented as a monitoring mechanism to detect faults and then decide if an action needs to be taken to prevent the faulty components from affecting the system as a whole.</strong></p>
<h2 id="heading-why-use-the-circuit-breaker-pattern">Why use the circuit breaker pattern?</h2>
<p>There are several benefits to using the Circuit Breaker pattern in a microservice architecture:</p>
<ul>
<li><p>Improved resilience and reliability: By automatically failing requests to downstream services that are not responding or experiencing high latency, the circuit breaker helps to prevent cascading failures and improve the overall resilience and reliability of the system.</p>
</li>
<li><p>Increased availability: By failing fast and stopping the chain reaction of failures, the circuit breaker helps to ensure that the system remains available and can continue to serve user requests.</p>
</li>
<li><p>Reduced resource consumption: When a downstream service is experiencing problems, the circuit breaker can help to reduce the load on the service by failing requests before they reach the service. This can help to reduce the resource consumption of the service and prevent it from becoming overloaded.</p>
</li>
<li><p>Enhanced monitoring and visibility: The circuit breaker can provide useful information about the health of downstream services, allowing the system to be monitored and any issues to be identified and addressed quickly.</p>
</li>
</ul>
<p>Overall, the Circuit Breaker pattern is an important tool for improving the resilience and reliability of microservice architectures and helping to ensure that the system remains available and responsive to user requests.</p>
<h2 id="heading-why-is-it-needed-in-microservices-architecture">Why is it needed in Microservices Architecture</h2>
<p>A microservice architecture consists of a large number of microservices that are built for specific tasks and can be reused across different applications. Because these microservices are independent, they can be deployed and scaled independently, allowing your organization to meet changing business requirements.</p>
<h3 id="heading-the-problem">The Problem</h3>
<p>This architecture is highly distributed and is therefore susceptible to a wide variety of faults, such as latency issues, outages, or unbalanced loads. When a problem occurs in one of these services, it can quickly escalate and affect all of the other microservices in the system. If a service encounters an error, it will return an error, which can travel through the entire system and create a chain reaction.</p>
<h3 id="heading-the-solution">The Solution</h3>
<p>Circuit breakers can be used to stop this propagation of errors by shutting things down gracefully. When a circuit breaker is activated, it prevents requests from reaching a faulty microservice, preventing the error from cascading through the system. With circuit breakers, you can ensure that your system remains stable even in the event of an unexpected incident.</p>
<h2 id="heading-implementing-a-circuit-breaker-in-microservices">Implementing a Circuit Breaker in Microservices</h2>
<p>One way to implement a circuit breaker is to use a state machine that tracks the health of the downstream service. The state machine has three states:</p>
<ul>
<li><p><code>closed</code>: Allow all requests</p>
</li>
<li><p><code>open</code>: Fail all request</p>
</li>
<li><p><code>half-open</code>: Allow some requests</p>
</li>
</ul>
<p>When the circuit breaker is in the closed state, requests to the downstream service are allowed to pass through. If the downstream service starts to experience failures or high latency, the circuit breaker transitions to the open state and begins to fail requests to the downstream service.</p>
<p>After a certain amount of time has passed, the circuit breaker transitions to the half-open state and allows a limited number of requests to pass through. If these requests are successful, the circuit breaker transitions back to the closed state. If the requests fail, the circuit breaker transitions back to the open state.</p>
<p><strong>Here is an example of how you might implement the circuit breaker in a Node.js application using the Express framework and TypeScript:</strong></p>
<p><strong>First,</strong> we will create a simple function that represents a downstream service that we want to protect with a circuit breaker. This function will make an HTTP request to a mock service and return the response data.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;

<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">callDownstreamService</span>(<span class="hljs-params"></span>): <span class="hljs-title">Promise</span>&lt;<span class="hljs-title">string</span>&gt; </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> axios.get(<span class="hljs-string">'http://mock-service.com/data'</span>);
    <span class="hljs-keyword">return</span> response.data;
  } <span class="hljs-keyword">catch</span> (error) {
    <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(error.message);
  }
}
</code></pre>
<p><strong>Next,</strong> we will create a circuit breaker class that will wrap the call to the downstream service. This class will use a state machine to track the health of the downstream service and automatically fail requests if necessary:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;

<span class="hljs-comment">// Enum to track the state of the circuit breaker</span>
<span class="hljs-built_in">enum</span> CircuitBreakerState {
  CLOSED, <span class="hljs-comment">// Circuit is closed and requests to the downstream service are allowed through</span>
  OPEN, <span class="hljs-comment">// Circuit is open and requests to the downstream service are failed</span>
  HALF_OPEN, <span class="hljs-comment">// Circuit is half-open and a limited number of requests are allowed through</span>
}

<span class="hljs-keyword">class</span> CircuitBreaker {
  <span class="hljs-comment">// The current state of the circuit breaker</span>
  <span class="hljs-keyword">private</span> state: CircuitBreakerState;
  <span class="hljs-comment">// The number of errors that need to occur before the circuit breaker transitions to the open state</span>
  <span class="hljs-keyword">private</span> errorThreshold: <span class="hljs-built_in">number</span>;
  <span class="hljs-comment">// The amount of time the circuit breaker stays in the open state before transitioning to the half-open state</span>
  <span class="hljs-keyword">private</span> resetTimeout: <span class="hljs-built_in">number</span>;
  <span class="hljs-comment">// The number of errors that have occurred</span>
  <span class="hljs-keyword">private</span> errorCount: <span class="hljs-built_in">number</span>;
  <span class="hljs-comment">// The number of requests allowed through in the half-open state</span>
  <span class="hljs-keyword">private</span> halfOpenRequests: <span class="hljs-built_in">number</span>;
  <span class="hljs-comment">// The time when the circuit breaker last changed state</span>
  <span class="hljs-keyword">private</span> lastStateChange: <span class="hljs-built_in">number</span>;

  <span class="hljs-comment">// Constructor for the circuit breaker class</span>
  <span class="hljs-keyword">constructor</span>(<span class="hljs-params">errorThreshold: <span class="hljs-built_in">number</span>, resetTimeout: <span class="hljs-built_in">number</span>, halfOpenRequests: <span class="hljs-built_in">number</span></span>) {
    <span class="hljs-built_in">this</span>.state = CircuitBreakerState.CLOSED;
    <span class="hljs-built_in">this</span>.errorThreshold = errorThreshold;
    <span class="hljs-built_in">this</span>.resetTimeout = resetTimeout;
    <span class="hljs-built_in">this</span>.halfOpenRequests = halfOpenRequests;
    <span class="hljs-built_in">this</span>.errorCount = <span class="hljs-number">0</span>;
    <span class="hljs-built_in">this</span>.lastStateChange = <span class="hljs-built_in">Date</span>.now();
  }

  <span class="hljs-comment">// Method to call the downstream service using the circuit breaker</span>
  <span class="hljs-keyword">async</span> callService(): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">string</span>&gt; {
    <span class="hljs-keyword">try</span> {
      <span class="hljs-comment">// Check the current state of the circuit breaker</span>
      <span class="hljs-keyword">switch</span> (<span class="hljs-built_in">this</span>.state) {
        <span class="hljs-keyword">case</span> CircuitBreakerState.CLOSED:
          <span class="hljs-comment">// Circuit is closed, so allow the request through and reset the error count</span>
          <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.callServiceAndResetErrorCount();
        <span class="hljs-keyword">case</span> CircuitBreakerState.OPEN:
          <span class="hljs-comment">// Circuit is open, so check if the reset timeout has expired</span>
          <span class="hljs-keyword">if</span> (<span class="hljs-built_in">this</span>.isResetTimeoutExpired()) {
            <span class="hljs-comment">// Reset timeout has expired, so transition to the half-open state and allow a request through</span>
            <span class="hljs-built_in">this</span>.transitionToHalfOpenState();
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.callServiceAndIncrementErrorCount();
          } <span class="hljs-keyword">else</span> {
            <span class="hljs-comment">// Reset timeout has not expired, so fail the request</span>
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Circuit is open'</span>);
          }
        <span class="hljs-keyword">case</span> CircuitBreakerState.HALF_OPEN:
          <span class="hljs-comment">// Circuit is half-open, so check if the number of allowed requests has been exceeded</span>
          <span class="hljs-keyword">if</span> (<span class="hljs-built_in">this</span>.errorCount &lt; <span class="hljs-built_in">this</span>.halfOpenRequests) {
            <span class="hljs-comment">// Allowed requests have not been exceeded, so allow the request through</span>
            <span class="hljs-keyword">return</span> <span class="hljs-keyword">await</span> <span class="hljs-built_in">this</span>.callServiceAndIncrementErrorCount();
          } <span class="hljs-keyword">else</span> {
            <span class="hljs-comment">// Allowed requests have been exceeded, so transition back to the open state and fail the request</span>
            <span class="hljs-built_in">this</span>.transitionToOpenState();
            <span class="hljs-keyword">throw</span> <span class="hljs-keyword">new</span> <span class="hljs-built_in">Error</span>(<span class="hljs-string">'Circuit is open'</span>);
          }
      }
    } <span class="hljs-keyword">catch</span> (error) {
      <span class="hljs-comment">// An error occurred, so increment the error count</span>
      <span class="hljs-built_in">this</span>.incrementErrorCount();
      <span class="hljs-keyword">throw</span> error;
    }
  }

  <span class="hljs-comment">// Method to call the downstream service and reset the error count</span>
  <span class="hljs-keyword">private</span> <span class="hljs-keyword">async</span> callServiceAndResetErrorCount(): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">string</span>&gt; {
    <span class="hljs-keyword">try</span> {
      <span class="hljs-comment">// Call the downstream service</span>
      <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> callDownstreamService();
      <span class="hljs-comment">// Reset the error count</span>
      <span class="hljs-built_in">this</span>.resetErrorCount();
      <span class="hljs-comment">// Return the response from the downstream service</span>
      <span class="hljs-keyword">return</span> response;
    } <span class="hljs-keyword">catch</span> (error) {
      <span class="hljs-comment">// An error occurred, so increment the error count</span>
      <span class="hljs-built_in">this</span>.incrementErrorCount();
      <span class="hljs-keyword">throw</span> error;
    }
  }

  <span class="hljs-comment">// Method to call the downstream service, increment the error count, and transition to the closed state</span>
  <span class="hljs-comment">// if the request is successful</span>
  <span class="hljs-keyword">private</span> <span class="hljs-keyword">async</span> callServiceAndIncrementErrorCount(): <span class="hljs-built_in">Promise</span>&lt;<span class="hljs-built_in">string</span>&gt; {
    <span class="hljs-keyword">try</span> {
      <span class="hljs-comment">// Call the downstream service</span>
      <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> callDownstreamService();
      <span class="hljs-comment">// Reset the error count</span>
      <span class="hljs-built_in">this</span>.resetErrorCount();
      <span class="hljs-comment">// Transition to the closed state</span>
      <span class="hljs-built_in">this</span>.transitionToClosedState();
      <span class="hljs-comment">// Return the response from the downstream service</span>
      <span class="hljs-keyword">return</span> response;
    } <span class="hljs-keyword">catch</span> (error) {
      <span class="hljs-comment">// An error occurred, so increment the error count</span>
      <span class="hljs-built_in">this</span>.incrementErrorCount();
      <span class="hljs-keyword">throw</span> error;
    }
  }

  <span class="hljs-comment">// Method to reset the error count</span>
  <span class="hljs-keyword">private</span> resetErrorCount(): <span class="hljs-built_in">void</span> {
    <span class="hljs-built_in">this</span>.errorCount = <span class="hljs-number">0</span>;
  }

  <span class="hljs-comment">// Method to increment the error count and transition to the open state if the error threshold is reached</span>
  <span class="hljs-keyword">private</span> incrementErrorCount(): <span class="hljs-built_in">void</span> {
    <span class="hljs-built_in">this</span>.errorCount++;
    <span class="hljs-keyword">if</span> (<span class="hljs-built_in">this</span>.errorCount &gt;= <span class="hljs-built_in">this</span>.errorThreshold) {
      <span class="hljs-built_in">this</span>.transitionToOpenState();
    }
  }

  <span class="hljs-comment">// Method to check if the reset timeout has expired</span>
  <span class="hljs-keyword">private</span> isResetTimeoutExpired(): <span class="hljs-built_in">boolean</span> {
    <span class="hljs-keyword">return</span> <span class="hljs-built_in">Date</span>.now() - <span class="hljs-built_in">this</span>.lastStateChange &gt; <span class="hljs-built_in">this</span>.resetTimeout;
  }

  <span class="hljs-comment">// Method to transition to the closed state</span>
  <span class="hljs-keyword">private</span> transitionToClosedState(): <span class="hljs-built_in">void</span> {
    <span class="hljs-built_in">this</span>.state = CircuitBreakerState.CLOSED;
    <span class="hljs-built_in">this</span>.lastStateChange = <span class="hljs-built_in">Date</span>.now();
  }

  <span class="hljs-comment">// Method to transition to the open state</span>
  <span class="hljs-keyword">private</span> transitionToOpenState(): <span class="hljs-built_in">void</span> {
    <span class="hljs-built_in">this</span>.state = CircuitBreakerState.OPEN;
    <span class="hljs-built_in">this</span>.lastStateChange = <span class="hljs-built_in">Date</span>.now();
  }

  <span class="hljs-comment">// Method to transition to the half-open state</span>
  <span class="hljs-keyword">private</span> transitionToHalfOpenState(): <span class="hljs-built_in">void</span> {
    <span class="hljs-built_in">this</span>.state = CircuitBreakerState.HALF_OPEN;
    <span class="hljs-built_in">this</span>.lastStateChange = <span class="hljs-built_in">Date</span>.now();
  }
}
</code></pre>
<p><strong>Finally</strong>, we can use the <code>CircuitBreaker</code> class in an Express route handler to protect the call to the downstream service:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> { CircuitBreaker } <span class="hljs-keyword">from</span> <span class="hljs-string">'./circuit-breaker'</span>;

<span class="hljs-keyword">const</span> app = express();

<span class="hljs-comment">// Create a new circuit breaker with an error threshold of 5, a reset timeout of 10 seconds, and allowing 2 requests through in the half-open state</span>
<span class="hljs-keyword">const</span> circuitBreaker = <span class="hljs-keyword">new</span> CircuitBreaker(<span class="hljs-number">5</span>, <span class="hljs-number">10000</span>, <span class="hljs-number">2</span>);

app.get(<span class="hljs-string">'/data'</span>, <span class="hljs-keyword">async</span> (req, res) =&gt; {
  <span class="hljs-keyword">try</span> {
    <span class="hljs-comment">// Call the downstream service using the circuit breaker</span>
    <span class="hljs-keyword">const</span> data = <span class="hljs-keyword">await</span> circuitBreaker.callService();
    <span class="hljs-comment">// Send the response data back to the client</span>
    res.send(data);
  } <span class="hljs-keyword">catch</span> (error) {
    <span class="hljs-comment">// An error occurred, so send a 500 error back to the client</span>
    res.status(<span class="hljs-number">500</span>).send(error.message);
  }
});

app.listen(<span class="hljs-number">3000</span>, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">'Server listening on port 3000'</span>);
});
</code></pre>
<p>This example creates a circuit breaker with an error threshold of 5, a reset timeout of 10 seconds, and allows 2 requests through in the half-open state. If the downstream service fails 5 times in a row, the circuit breaker will transition to the open state and start failing requests. After 10 seconds have passed, the circuit breaker will transition to the half-open state and allow 2 requests through. If these requests are successful, the circuit breaker will transition back to the closed state. If they fail, the circuit breaker will transition back to the open state.</p>
<hr />
<p>Using a circuit breaker can help to improve the resilience and reliability of a microservice architecture by providing <strong>a mechanism to fail fast and prevent cascading failures.</strong> It is important to <strong>tune the circuit breaker's parameters, such as the time to stay in the open state and the number of requests to allow through in the half-open state, to ensure that it is effective in protecting the system without causing undue disruption.</strong></p>
<hr />
<p>I hope this example helps to illustrate how the Circuit Breaker pattern can be implemented in a Node.js application using the Express framework and TypeScript.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>When designing an enterprise microservices architecture, it's important to implement resiliency measures to make sure the services remain online when encountering errors or unexpected conditions. A circuit breaker is a fault-tolerance pattern that can be used to handle transient errors and prevent cascading failures in distributed systems. With circuit breakers, you can ensure that your system remains stable even in the event of an unexpected incident.</p>
]]></content:encoded></item><item><title><![CDATA[Service Discovery in Distributed Systems]]></title><description><![CDATA[Service discovery is a key component of microservices architecture, as it enables microservices to communicate with each other and discover each other's location. In this article, we will delve into the concept of service discovery in microservices, ...]]></description><link>https://blog.sofwancoder.com/service-discovery-in-distributed-systems</link><guid isPermaLink="true">https://blog.sofwancoder.com/service-discovery-in-distributed-systems</guid><category><![CDATA[distributed system]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[Databases]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[software architecture]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Mon, 26 Dec 2022 10:55:49 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1671908051815/79d54f00-b1a2-4ca6-b3a5-6310e0118f91.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Service discovery is a key component of microservices architecture, as it enables microservices to communicate with each other and discover each other's location. In this article, we will delve into the concept of service discovery in microservices, its benefits, and how it is implemented using NodeJS/Typescript.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>Service discovery is a key aspect of building and operating microservices-based architectures. It refers to the process of finding and connecting to the desired service or resource within a distributed system. In a microservices architecture, each service is independently deployable and scalable, and they communicate with each other through APIs or other means. Service discovery enables these services to locate and connect reliably and efficiently, regardless of their location or state. It is an essential component of a microservices architecture, as it allows for the dynamic discovery and orchestration of services, enabling flexibility and resilience in the face of change.</p>
<h2 id="heading-service-discovery-in-microservices">Service Discovery in Microservices</h2>
<p>In a microservices architecture, each service is a self-contained unit that performs a specific task and communicates with other services through APIs. Service discovery refers to the process of finding the location of a particular service and establishing communication with it. It is an essential part of microservices architecture, as it enables microservices to discover and communicate with each other.</p>
<h2 id="heading-service-registry">Service Registry</h2>
<p>Service discovery is typically implemented using a service registry, which is a centralized database that stores the location and metadata of all the services in the system. When a service wants to communicate with another service, it queries the service registry to find the location of the target service. The service registry returns the location of the target service, and the calling service establishes a connection and communicates with it through APIs.</p>
<p>There are several ways to implement a service registry. <strong>One common approach is to use a central database, such as a database server or a distributed key-value store.</strong> Another approach is to use a central HTTP server that provides a REST API for registering, unregistering, and looking up services.</p>
<p>Regardless of the implementation, <strong>it is important to ensure that the service registry is reliable and highly available</strong>, as it is a critical component of the microservice architecture. <strong>If the registry goes down, the services in the system may be unable to communicate with each other</strong> and the system may become unavailable.</p>
<h2 id="heading-benefits-of-service-discovery-in-microservices">Benefits of Service Discovery in Microservices</h2>
<p>There are several benefits to using service discovery in microservices architecture:</p>
<h3 id="heading-decentralized-architecture">Decentralized Architecture</h3>
<p>Service discovery enables a decentralized architecture, where each service can operate independently and communicate with other services through APIs. This allows for greater flexibility and scalability, as services can be added, removed, or modified without affecting the overall system.</p>
<h3 id="heading-resilience">Resilience</h3>
<p>Service discovery allows services to communicate with each other through APIs, which means that services can continue to operate even if other services are down or unavailable. This increases the overall resilience of the system.</p>
<h3 id="heading-load-balancing">Load Balancing</h3>
<p>Service discovery can be used to implement load balancing, where multiple instances of a service are available to handle requests. The service registry can store the location of all the instances of a service, and the calling service can use a load-balancing algorithm to distribute requests among the instances.</p>
<h3 id="heading-dynamic-scaling">Dynamic Scaling</h3>
<p>Service discovery can be used to implement dynamic scaling, where the number of instances of a service is increased or decreased based on the workload. This allows the system to scale up or down based on demand, which helps to optimize resources and reduce costs.</p>
<h2 id="heading-how-does-service-discovery-work">How Does Service Discovery Work?</h2>
<p>Service discovery typically involves the use of a registry, which is a central repository that maintains a list of all the available services and their locations. When a service needs to communicate with another service, it queries the registry to find the location of the target service.</p>
<p>There are several ways in which the registry can be implemented, including:</p>
<ol>
<li><p><strong>Centralized registry:</strong> In a centralized registry, all services register with a central server, which maintains a list of all the available services and their locations. When a service needs to communicate with another service, it queries the central server to find the location of the target service.</p>
</li>
<li><p><strong>Decentralized registry:</strong> In a decentralized registry, each service maintains its own registry and shares it with other services. When a service needs to communicate with another service, it queries the registry of the target service to find its location.</p>
</li>
<li><p><strong>Hybrid registry:</strong> In a hybrid registry, a central server maintains a list of all the available services, but each service also maintains its own registry. This allows for a centralized view of all the available services, while still allowing for decentralized communication between services.</p>
</li>
</ol>
<p>There are also several tools and technologies available for implementing service discovery, including:</p>
<ol>
<li><p><strong>DNS:</strong> Domain Name System (DNS) is a distributed database that maps domain names to IP addresses. DNS can be used for service discovery by mapping a domain name to the IP address of a service.</p>
</li>
<li><p><strong>Load balancers:</strong> Load balancers can be used for service discovery by routing requests to the appropriate service based on the domain name or IP address.</p>
</li>
<li><p><strong>Service mesh:</strong> A service mesh is a network of microservices that can be used to implement service discovery and communication between services.</p>
</li>
</ol>
<h2 id="heading-implementing-a-service-registry-for-service-discovery-using-a-centralized-registry">Implementing a Service Registry for service Discovery using a centralized registry</h2>
<p>Here is an example of how you could implement a centralized registry for service discovery using the Express.js framework:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> bodyParser <span class="hljs-keyword">from</span> <span class="hljs-string">'body-parser'</span>;

<span class="hljs-comment">// Service registry: maps service names to their addresses</span>
<span class="hljs-comment">// Ideally, this is a database system (NoSQL/Redis/Mysql etc).</span>
<span class="hljs-keyword">const</span> serviceRegistry = <span class="hljs-keyword">new</span> <span class="hljs-built_in">Map</span>&lt;<span class="hljs-built_in">string</span>, <span class="hljs-built_in">string</span>&gt;();

<span class="hljs-comment">// Create an Express app</span>
<span class="hljs-keyword">const</span> app = express();

<span class="hljs-comment">// Parse request bodies as JSON</span>
app.use(bodyParser.json());

<span class="hljs-comment">// Register routes for the four actions</span>
app.get(<span class="hljs-string">'/services'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-comment">// Return the list of registered services</span>
  res.json(<span class="hljs-built_in">Array</span>.from(serviceRegistry.keys()));
});

app.post(<span class="hljs-string">'/register'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-comment">// Add the new service to the registry</span>
  <span class="hljs-keyword">const</span> { name, address } = req.body;
  serviceRegistry.set(name, address);
  res.send(<span class="hljs-string">'Success'</span>);
});

app.post(<span class="hljs-string">'/unregister'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-comment">// Remove the service from the registry</span>
  <span class="hljs-keyword">const</span> { name } = req.body;
  serviceRegistry.delete(name);
  res.send(<span class="hljs-string">'Success'</span>);
});

app.get(<span class="hljs-string">'/lookup/:name'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-comment">// Look up the requested service in the registry and return its address</span>
  <span class="hljs-keyword">const</span> name = req.params.name;
  <span class="hljs-keyword">const</span> address = serviceRegistry.get(name);
  <span class="hljs-keyword">if</span> (address) {
    res.json({ address });
  } <span class="hljs-keyword">else</span> {
    res.status(<span class="hljs-number">404</span>).send(<span class="hljs-string">'Not found'</span>);
  }
});

<span class="hljs-comment">// Start the server</span>
<span class="hljs-keyword">const</span> port = <span class="hljs-number">3000</span>;
app.listen(port, <span class="hljs-function">() =&gt;</span> {
  <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Registry listening on port <span class="hljs-subst">${port}</span>`</span>);
});
</code></pre>
<p>This registry implementation listens for HTTP requests on port 3000, and provides the following four actions:</p>
<ul>
<li><p><code>GET /services</code>: Returns a list of the names of all the registered services.</p>
</li>
<li><p><code>POST /register</code>: Registers a new service by adding it to the registry. The request body should be a JSON object with two properties: <code>name</code> (the name of the service) and <code>address</code> (the address of the service).</p>
</li>
<li><p><code>POST /unregister</code>: Unregisters a service by removing it from the registry. The request body should be a JSON object with a single property: <code>name</code> (the name of the service).</p>
</li>
<li><p><code>GET /lookup/&lt;name&gt;</code>: Looks up the address of the service with the given name. If the service is not found, the server returns a 404 error.</p>
</li>
</ul>
<p>Each service that wants to register with the registry can make a POST request to <code>/register</code> with its name and address in the request body, and unregister with a POST request to <code>/unregister</code> with its name in the request body. Other services can look up the address of a specific service by making a GET request to <code>/lookup/&lt;name&gt;</code>, where <code>&lt;name&gt;</code> is the name of the service they want to find.</p>
<h2 id="heading-implementing-a-service-discovery-mechanism-for-microservices-using-a-centralized-registry">Implementing a Service Discovery mechanism for Microservices using a centralized registry</h2>
<p>Here is an example of how you could implement a service discovery mechanism for microservices using a centralized registry, written in TypeScript and using the Express.js framework:</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> axios <span class="hljs-keyword">from</span> <span class="hljs-string">'axios'</span>;
<span class="hljs-keyword">import</span> express, { Request, Response } <span class="hljs-keyword">from</span> <span class="hljs-string">'express'</span>;
<span class="hljs-keyword">import</span> bodyParser <span class="hljs-keyword">from</span> <span class="hljs-string">'body-parser'</span>;

<span class="hljs-comment">// Base URL of the registry</span>
<span class="hljs-keyword">const</span> registryUrl = <span class="hljs-string">'http://registry:3000'</span>;

<span class="hljs-comment">// Register a new service with the registry</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">registerService</span>(<span class="hljs-params">name: <span class="hljs-built_in">string</span>, address: <span class="hljs-built_in">string</span></span>) </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">await</span> axios.post(<span class="hljs-string">`<span class="hljs-subst">${registryUrl}</span>/register`</span>, { name, address });
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Service "<span class="hljs-subst">${name}</span>" registered with the registry`</span>);
  } <span class="hljs-keyword">catch</span> (error) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">`Error registering service "<span class="hljs-subst">${name}</span>": <span class="hljs-subst">${error.message}</span>`</span>);
  }
}

<span class="hljs-comment">// Unregister a service with the registry</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">unregisterService</span>(<span class="hljs-params">name: <span class="hljs-built_in">string</span></span>) </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">await</span> axios.post(<span class="hljs-string">`<span class="hljs-subst">${registryUrl}</span>/unregister`</span>, { name });
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Service "<span class="hljs-subst">${name}</span>" unregistered from the registry`</span>);
  } <span class="hljs-keyword">catch</span> (error) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">`Error unregistering service "<span class="hljs-subst">${name}</span>": <span class="hljs-subst">${error.message}</span>`</span>);
  }
}

<span class="hljs-comment">// Look up the address of a service in the registry</span>
<span class="hljs-keyword">async</span> <span class="hljs-function"><span class="hljs-keyword">function</span> <span class="hljs-title">lookupService</span>(<span class="hljs-params">name: <span class="hljs-built_in">string</span></span>): <span class="hljs-title">Promise</span>&lt;<span class="hljs-title">string</span> | <span class="hljs-title">undefined</span>&gt; </span>{
  <span class="hljs-keyword">try</span> {
    <span class="hljs-keyword">const</span> response = <span class="hljs-keyword">await</span> axios.get(<span class="hljs-string">`<span class="hljs-subst">${registryUrl}</span>/lookup/<span class="hljs-subst">${name}</span>`</span>);
    <span class="hljs-keyword">return</span> response.data.address;
  } <span class="hljs-keyword">catch</span> (error) {
    <span class="hljs-built_in">console</span>.error(<span class="hljs-string">`Error looking up service "<span class="hljs-subst">${name}</span>": <span class="hljs-subst">${error.message}</span>`</span>);
  }
}

<span class="hljs-comment">// Create an Express app</span>
<span class="hljs-keyword">const</span> app = express();

<span class="hljs-comment">// Parse request bodies as JSON</span>
app.use(bodyParser.json());

<span class="hljs-comment">// Register a route to register a new service</span>
app.post(<span class="hljs-string">'/register'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> { name, address } = req.body;
  registerService(name, address);
  res.send(<span class="hljs-string">'Success'</span>);
});

<span class="hljs-comment">// Register a route to unregister a service</span>
app.post(<span class="hljs-string">'/unregister'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-keyword">const</span> { name } = req.body;
  unregisterService(name);
  res.send(<span class="hljs-string">'Success'</span>);
});

<span class="hljs-comment">// Example usage: register a new service and look it up</span>
registerService(<span class="hljs-string">'service-a'</span>, <span class="hljs-string">'http://localhost:3001'</span>);
<span class="hljs-keyword">const</span> address = <span class="hljs-keyword">await</span> lookupService(<span class="hljs-string">'service-a'</span>);
<span class="hljs-built_in">console</span>.log(<span class="hljs-string">`Address of service "service-a": <span class="hljs-subst">${address}</span>`</span>);
</code></pre>
<p>This example uses the Axios library to make HTTP requests to the registry. The registry is assumed to be running at the URL <a target="_blank" href="http://registry:3000"><code>http://registry:3000</code></a>.</p>
<p>The <code>registerService</code> function makes a POST request to <code>/register</code> with the name and address of the service to be registered. The <code>unregisterService</code> function makes a POST request to <code>/unregister</code> with the name of the service to be unregistered. The <code>lookupService</code> function makes a GET request to <code>/lookup/&lt;name&gt;</code> to look up the address of the service with the given name.</p>
<p>Each microservice can use these functions to register and unregister itself with the registry, and to look up the addresses of other services it needs to communicate with.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>Service discovery is a crucial component of microservices architecture, as it enables microservices to communicate with each other and to discover each other's location. It is typically implemented using a centralized service registry</p>
]]></content:encoded></item><item><title><![CDATA[Load Shedding in Distributed Systems]]></title><description><![CDATA[Distributed systems are made up of many parts, each of which can fail on its own. Because of this, distributed systems often have partial breakdowns in the real world. These problems could be caused by node failures, network partitions or any number ...]]></description><link>https://blog.sofwancoder.com/load-shedding-in-distributed-systems</link><guid isPermaLink="true">https://blog.sofwancoder.com/load-shedding-in-distributed-systems</guid><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[distributed system]]></category><category><![CDATA[Microservices]]></category><category><![CDATA[Computer Science]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sat, 24 Dec 2022 09:08:20 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1671801358216/3814516f-13a3-4c5a-8a6f-724fa7166393.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Distributed systems are made up of many parts, each of which can fail on its own. Because of this, distributed systems often have partial breakdowns in the real world. These problems could be caused by node failures, network partitions or any number of other things that weren't planned. These unexpected failures could bring the whole system down and affect users. What’s even more troubling is that some of these failures tend to happen again and again at unpredictable times. Load shedding is one way that we deal with these kinds of unplanned system failures— we purposely cut back on resources to stop more general failures during times of stress.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>When there is a limited supply of something, we try to "ration" it. This is called supply-chain management. When electricity or some other resource is not always available, we implement load shedding to ration it. This means that when there isn't enough power for the grid, different parts of the grid are turned off for a while so they don't use more power than their fair share of power. When this happens with distributed systems, we call it Load Shedding.</p>
<h2 id="heading-distributed-system">Distributed System</h2>
<p>In a distributed system, individuals or organisations work independently but in collaboration to achieve a common goal. A distributed system consists of numerous nodes or processes that operate independently of one another and are connected via network services such as remote procedure calls or message-passing services.</p>
<h2 id="heading-what-is-load-shedding">What is load shedding?</h2>
<p>Load shedding —a term often used in electrical engineering— according to <a target="_blank" href="https://www.dictionary.com/browse/load-shedding">dictionary.com</a> is "<em>the deliberate shutdown of electric power in a part or parts of a power-distribution system, generally to prevent the failure of the entire system when the demand strains the capacity of the system.</em>"</p>
<p>Load shedding is a regulated process in which—electricity supply is intentionally disrupted in specific locations to manage demand and avoid the entire power grid from collapsing. It is typically done when there is a shortage of electricity generation or transmission capacity, or when there is a high risk of overloading the system.</p>
<h2 id="heading-load-shedding-in-computer-science">Load Shedding in Computer Science</h2>
<p>From the definitions above, It is clear that load shedding occurs when the underlying system has insufficient capacity to continue to operate normally. Hence, it is <strong>a technique used in systems to handle situations where the system is overwhelmed and cannot keep up with the demand</strong>. When load-shedding occurs, the system will prioritize certain requests and temporarily stop processing others in order to reduce the load on the system and prevent it from crashing.</p>
<h2 id="heading-what-is-load-shedding-in-distributed-systems">What is Load Shedding in Distributed Systems?</h2>
<p>Load shedding is the act of deliberately shedding some load to keep a system from collapsing due to overload. The distributed systems that power our internet and businesses are built on a bunch of computers that have to be on at all times so that the system works — this is called having “<strong>all hands on deck</strong>.” This means that there’s a finite amount of resources that can be dedicated to these systems, and there’s often a lot of demand for them.</p>
<p>In certain situations where the system is failing partially, there are only two options:</p>
<ul>
<li><p>Let the system fail</p>
</li>
<li><p>or engage in load shedding.</p>
</li>
</ul>
<p>Load shedding in distributed systems can mean shutting down services, slowing down operations, or re-routing requests. <strong>Load shedding is a defensive approach to dealing with an overburdened system that involves deliberately bringing the system to a lower level of service</strong> than usual in order to buy time for system administrators to add new capacity to the system or repair broken equipment. It is a common practice in power grids and other critical systems where a lack of capacity can lead to system failure.</p>
<h2 id="heading-failing-gracefully-in-distributed-system">Failing Gracefully in Distributed System</h2>
<p>Distributed systems that fail gracefully are designed to redistribute the load shed from the failed component onto the healthy components of the system.</p>
<p><strong>A distributed system is said to have successfully shed load if it can handle the excess load without failing entirely</strong>. Though distributed systems have a higher probability of failure than their centralized counterparts, they have an advantage in that they can handle extremely large amounts of traffic with relatively low infrastructure costs.</p>
<p>Distributed systems experience frequent partial failures. These failures may be due to node outages, network partitioning, or any other number of unanticipated events. Load shedding is a process by which we handle these unanticipated system failures —we deliberately reduce resources under stress as a means of preventing more widespread failure.</p>
<h2 id="heading-the-major-reason-for-load-shedding">The major reason for load shedding</h2>
<p>There are many reasons why load shedding is necessary for distributed systems. First, <strong>they are not expected to run at 100%</strong>. In fact, they work best when they’re running at 80% capacity or less. In a perfectly balanced system, if one aspect is running at 100%, it’s going to take away resources from other parts of the system and cause them to slow down. <strong>A system that’s 100% busy is a sign of bad management.</strong> When you have a distributed system that is at 100% capacity, you have no room for error. This is what can cause a system to crash or go down. <strong>Managers of distributed systems need to be aware of where the system is at capacity and where it can be shed to keep it from crashing or going down.</strong></p>
<h2 id="heading-how-to-trigger-load-shedding">How to trigger load shedding?</h2>
<p>When load-shedding is triggered, the system will stop processing some of the incoming requests and prioritize others. This is done in order to reduce the load on the system and prevent it from crashing. The requests that are prioritized may be those that are considered more important or time-sensitive, such as requests for critical services or emergency services.</p>
<p><strong>The key to successfully shedding load in a distributed system is the ability to detect failures and trigger an automated response that reroutes traffic to other healthy nodes.</strong></p>
<h2 id="heading-load-shedding-strategies">Load Shedding Strategies</h2>
<p>Several methods and strategies can be used to implement load-shedding in a distributed system.</p>
<h3 id="heading-limiting-the-resource-consumption-rate">Limiting the resource consumption rate</h3>
<p>Resource limits can be used to shed load in distributed systems by reducing the resource consumption rate. The rate at which a resource is consumed can be monitored, and if the rate exceeds a certain threshold, the resource is shed to prevent the system from being overloaded.</p>
<h3 id="heading-merging-and-rerouting-requests">Merging and rerouting requests</h3>
<p>In order to avoid a cascading failure in distributed systems, the load-shedding strategy must be designed to shed load from the node that is experiencing the problem rather than shedding from the node that receives the request. For example, when a node that is responsible for serving requests is experiencing problems, the load-shedding strategy must be designed to send the requests to another node that can handle them.</p>
<h3 id="heading-dropping-requests">Dropping requests</h3>
<p>When the load-shedding strategy involves dropping requests, the nodes in the distributed system must have the ability to recognize and ignore certain types of requests. Dropping requests is typically used as a last resort, when other load-shedding strategies (e.g., reducing the resource consumption rate or rerouting requests) would require too much effort to implement.</p>
<h3 id="heading-queuing-requests">Queuing Requests</h3>
<p>One common method is to use a queue to store incoming requests and process them in a first-in, first-out (FIFO) order. When the queue becomes full, the system can stop accepting new requests and process the ones that are already in the queue.</p>
<h3 id="heading-load-balancing-requests">Load Balancing Requests</h3>
<p>Another method is to use a load balancer to distribute incoming requests evenly across multiple servers in the system. If one server becomes overloaded, the load balancer can redirect requests to other servers in the system to help alleviate the load.</p>
<h3 id="heading-artificial-intelligence-ai-algorithms">Artificial intelligence (AI) algorithms</h3>
<p>Load-shedding can also be implemented using artificial intelligence (AI) algorithms, such as machine learning (ML) algorithms. These algorithms can analyze incoming requests and determine which ones should be prioritized based on various factors, such as the importance of the request, the expected response time, and the current load on the system.</p>
<h2 id="heading-side-effects-of-load-shedding">Side Effects of Load Shedding</h2>
<p>While load-shedding can be an effective way to prevent a distributed system from crashing, it can also have negative consequences. For example, if the system is constantly shedding the load, <strong>it may not be able to meet the demands of its users.</strong> This can lead to frustration and may result in a loss of business or customer loyalty.</p>
<p>To mitigate these negative consequences, it is important to <strong>carefully monitor the system and implement load-shedding only when necessary.</strong> It is also important to <strong>have adequate capacity in the system to handle the expected workload</strong> and to <strong>have robust failover mechanisms</strong> in place to ensure that the system remains operational even in the event of a failure.</p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>In distributed systems, load shedding is the process of reducing resources or capability on purpose so that the whole system doesn't fail from being overloaded. This can mean turning off services, slowing down processes, or sending requests in a different direction. It is to stop cascading failures, in which the failure of one part causes a chain effect of failures in other parts. For load shedding to work well in distributed systems, there needs to be clear policies and procedures in place, as well as the ability to keep an eye on the system and spot possible problems before they become serious. The system should be built with enough capacity and failover methods to ensure reliable operation ration. Load shedding should be done carefully and only when it's necessary.</p>
]]></content:encoded></item><item><title><![CDATA[Understanding The Difference Between LSM Tree and B-Tree]]></title><description><![CDATA[Let’s face it, data is a tricky thing to manage. All kinds of challenges arise when you attempt to store and organize data efficiently. In the world of databases, some structures are better suited than others for specific tasks. In this blog post, we...]]></description><link>https://blog.sofwancoder.com/understanding-the-difference-between-lsm-tree-and-b-tree</link><guid isPermaLink="true">https://blog.sofwancoder.com/understanding-the-difference-between-lsm-tree-and-b-tree</guid><category><![CDATA[Databases]]></category><category><![CDATA[data structures]]></category><category><![CDATA[data]]></category><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Sun, 18 Dec 2022 18:06:33 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1671385736958/xRgDNpQZw.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Let’s face it, data is a tricky thing to manage. All kinds of challenges arise when you attempt to store and organize data efficiently. In the world of databases, some structures are better suited than others for specific tasks. In this blog post, we’ll discuss two common tree-based database structures: LSM Tree and B-Tree. Both have their advantages, with LSM Trees being more commonly used in modern applications. Which one is right for you? Keep reading for more information!</p>
<h2 id="heading-what-is-an-lsm-tree">What is an LSM Tree?</h2>
<p>An LSM Tree is a data structure that’s commonly used in databases. It’s not a specific database, it’s a data structure that can be used in several different database types. LSM stands for Log-Structured Merge. This structure has a few great benefits: It’s fast. It’s durable. It’s efficient in both space and time.</p>
<h2 id="heading-what-is-a-b-tree">What is a B-Tree?</h2>
<p>A B-Tree is a specific type of data structure that is designed to store data in a way that’s easy to find and manage. B-Trees are commonly used in relational databases, such as MySQL and Oracle. B-Trees are used to organize data on the fly and can rearrange themselves to accommodate more data as it is added to the database. B-Trees are made up of nodes, where data is stored, and links, which are used to navigate between nodes.</p>
<h2 id="heading-whats-the-difference-between-an-lsm-tree-and-a-b-tree">What’s the difference between an LSM Tree and a B-Tree?</h2>
<p>There are some key differences between an LSM Tree and a B-Tree. The biggest difference is in how each structure stores data.</p>
<ul>
<li><p><strong>In an LSM Tree, data is sorted based on the path that it takes through the tree structure</strong>. In contrast, <strong>a B-Tree sorts data based on the values within the data itself.</strong></p>
</li>
<li><p>Another difference has to do with how the structures arrange data. <strong>The data in an LSM Tree is stored in a single data file</strong>. In a <strong>B-Tree, each node has its own data file.</strong></p>
</li>
<li><p>Finally, both structures use different methods of storing links. <strong>In an LSM Tree, links are stored in the same data file as the node they connect to</strong>. <strong>In a B-Tree, links are stored separately from the data.</strong></p>
</li>
</ul>
<h2 id="heading-when-should-you-use-an-lsm-tree">When should you use an LSM Tree?</h2>
<p>If you need to store a substantial amount of information but only have a limited amount of space available, an LSM Tree is an excellent option to consider. Because this structure is both rapid and efficient, you should have no trouble obtaining the data you need promptly. An LSM Tree works well for data that is accessed in a random order rather than in a sequential one. If you require support for sequential access, an LSM Tree is not the best option to go with.</p>
<h2 id="heading-when-should-you-use-a-b-tree">When should you use a B-Tree?</h2>
<p>If your data is ordered, a B-Tree is a way to go. A B-Tree is the optimal data structure to use when your data is stored sequentially and must be retrieved in the same order every time. Additionally, B-Trees are frequently employed to keep track of metadata, or data about data. Because of their adaptability, B-Trees can be employed in a wide variety of contexts. B-Trees are more inefficient than LSM Trees when it comes to processing time and data file size. Using an LSM Tree rather than a B-Tree is the way to go if you need to store a lot of information.</p>
<h2 id="heading-final-words-which-one-is-best-for-you">Final words: Which one is best for you?</h2>
<p>Now that we've gone over the basics of both LSM Trees and B-Trees, you should have a good idea of what each data structure has to offer. An LSM Tree is a fast and efficient way to store a lot of data in a small amount of space. A B-Tree is a sequential data structure that can be used to store metadata and other kinds of information. Which one do you like best? Think about what kind of data you want to store and how you will need to get to it. Use a B-Tree if your data needs to be in a certain order or a certain order. An LSM Tree is the best choice for you if you need to store a lot of data quickly and efficiently.</p>
]]></content:encoded></item><item><title><![CDATA[Time-based One-Time Password (TOTP): What is it?]]></title><description><![CDATA[A time-based one-time password (TOTP) is a one-time password generated based on the current time and a shared secret key. This method of authentication is used in addition to a username/email and password for increased security. TOTP is used in situa...]]></description><link>https://blog.sofwancoder.com/time-based-one-time-password-totp-what-is-it</link><guid isPermaLink="true">https://blog.sofwancoder.com/time-based-one-time-password-totp-what-is-it</guid><category><![CDATA[Node.js]]></category><category><![CDATA[backend]]></category><category><![CDATA[authentication]]></category><category><![CDATA[Security]]></category><category><![CDATA[Programming Tips]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Mon, 28 Nov 2022 18:17:46 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1669659351988/P68a79Yy0.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>A time-based one-time password (TOTP) is a one-time password generated based on the current time and a shared secret key. This method of authentication is used in addition to a username/email and password for increased security. TOTP is used in situations where it is not feasible to use hardware-based tokens, such as when logging in from a public computer. TOTP is an open standard defined in <a target="_blank" href="https://tools.ietf.org/html/rfc6238">RFC6238</a>. This article will explain the basics you need to know about TOTP authentication as well as how to implement it in NodeJS applications. Let’s get started!</p>
<h2 id="heading-what-is-totp-authentication">What is TOTP Authentication?</h2>
<p>TOTP authentication is an authentication method using a time-based One-Time Password (OTP). Traditional login methods are made more secure by TOTP authentication because a user can only access an account for a limited period of time. Access will be prohibited if you try to log into an account outside of the limited time window. TOTP authentication is most frequently used for 2FA, software, and remote employee logins.</p>
<h2 id="heading-how-does-totp-authentication-work">How does TOTP Authentication work?</h2>
<p>A shared secret key and the current time are used to generate a new one-time-use password for TOTP authentication. The user then logs in or completes a login using this password (in the case of 2FA). Users are advised to enter the code as soon as it is generated because this password changes every 30 seconds. Users must enter a username, a code generated from the given time period, and —depending on the system requirement— a password in order to use TOTP authentication. In contrast to conventional login techniques, where users only need to remember their username and password, this requires more information from the user.</p>
<h2 id="heading-why-is-totp-becoming-more-popular">Why is TOTP becoming more popular?</h2>
<p>Because of its high level of security, TOTP authentication is being used more frequently. In the traditional authentication technique, it is simpler for hackers to access numerous accounts because users just need one username and password. However, users must enter a code produced from the selected time period during TOTP authentication, which adds an additional degree of protection. This implies that in order to log into several accounts, a user would require access to various devices.</p>
<h2 id="heading-how-to-implement-totp-in-nodejs">How to implement TOTP in NodeJS?</h2>
<p>There are a few things we need to do in order to implement TOTP authentication in NodeJS. In this section, we're going to use a package called <a target="_blank" href="https://github.com/speakeasyjs/speakeasy">speakeasy</a>. Speakeasy distinguishes out from the other 2FA projects on GitHub because of how active it is. We're going to experiment with this package in a new project to simplify things.</p>
<h3 id="heading-install-dependencies">Install Dependencies</h3>
<p><strong>Execute the following command</strong> start a new NodeJS project and install the Speakeasy package. Additionally, a package required for managing POST requests with a payload will be installed, along with Express.js.</p>
<pre><code class="lang-typescript">npm init -y
npm install express body-parser speakeasy
npm install -D typescript <span class="hljs-meta">@types</span>/express <span class="hljs-meta">@types</span>/speakeasy
</code></pre>
<p>Ensure that an <code>app.js</code> file is created in the current project directory and that it contains the boilerplate content listed below.</p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> express, { Request, Response, NextFunction } <span class="hljs-keyword">from</span> <span class="hljs-string">"express"</span>;
<span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> bodyParser <span class="hljs-keyword">from</span> <span class="hljs-string">"body-parser"</span>;
<span class="hljs-keyword">import</span> { generateSecret, totp } <span class="hljs-keyword">from</span> <span class="hljs-string">'speakeasy'</span>;

<span class="hljs-keyword">export</span> <span class="hljs-keyword">const</span> app = express();

app.use(bodyParser.json());
app.use(bodyParser.urlencoded({ extended: <span class="hljs-literal">true</span> }));

<span class="hljs-comment">// generate a secret token to be saved in an application like Google Authenticator</span>
app.post(<span class="hljs-string">"/generate-secret"</span>, <span class="hljs-function">(<span class="hljs-params">request: Request, response: Response, next: NextFunction</span>) =&gt;</span> { });

<span class="hljs-comment">// validate that the TOTP is valid for a given secret and is not expired</span>
app.post(<span class="hljs-string">"/validate-token"</span>, <span class="hljs-function">(<span class="hljs-params">request: Request, response: Response, next: NextFunction</span>) =&gt;</span> { });

app.listen(<span class="hljs-number">5000</span>, <span class="hljs-function">() =&gt;</span> {
    <span class="hljs-built_in">console</span>.log(<span class="hljs-string">"Listening at :5000"</span>);
});
</code></pre>
<h3 id="heading-generate-totp-secret">Generate TOTP Secret</h3>
<p><strong>First, we need to generate a shared secret key.</strong> This key will be used to generate the one-time-use passwords by the authenticator app. The generated token can be made into a QR code which can be scanned by an authenticator app. <code>typescript app.post("/generate-secret", (request: Request, response: Response, next: NextFunction) =&gt; { const {otpauth_url, base32} = generateSecret({ length: 20 }); saveSecretToDB(request.userId, base32); response.send({ "secret": base32 }); });</code> <strong>—or— Generating a QR code</strong></p>
<p>By manually entering a key or scanning a QR code, users can add a page to which they authenticate using applications like the Google Authenticator. The latter is common and significantly quicker. We employ <a target="_blank" href="https://www.npmjs.com/package/qrcode">QRcode</a> library to produce QR images. We can install the QRcode package by running <code>npm install qrcode @types/qrcode</code></p>
<pre><code class="lang-typescript"><span class="hljs-keyword">import</span> * <span class="hljs-keyword">as</span> QRCode <span class="hljs-keyword">from</span> <span class="hljs-string">'qrcode'</span>;

app.post(<span class="hljs-string">"/generate-secret"</span>, <span class="hljs-function">(<span class="hljs-params">request: Request, response: Response, next: NextFunction</span>) =&gt;</span> {
    <span class="hljs-keyword">const</span> {otpauth_url, base32} = generateSecret({ length: <span class="hljs-number">20</span> });
    <span class="hljs-comment">// How you store generated secret key varies by implementation</span>
    saveSecretToDB(request.userId, base32);
    QRCode.toFileStream(response, otpauth_url);
});
</code></pre>
<h3 id="heading-validate-totp-secret">Validate TOTP Secret</h3>
<p><strong>To validate a code provided by the user</strong> —which is generated by the authenticator app— <code>typescript app.post("/validate-token", (request: Request, response: Response, next: NextFunction) =&gt; { // How you get the secret for first-time activation depends on the implementation const secret = request.body.secret || findSecretFromDB(request.userId); const isValid = totp.verify({ secret: secret, encoding: "base32", token: request.body.token, window: 0 }) // Depending on your implementation, the secret is probably already saved saveSecret(secret) response.send({ isValid }); });</code></p>
<h2 id="heading-conclusion">Conclusion</h2>
<p>TOTP authentication is an extra layer of security used in two-factor login methods. It creates a one-time-use password using a shared secret key and a predetermined amount of time. Due to its high level of security and simplicity of use, TOTP authentication is becoming more and more popular as an additional security measure.</p>
]]></content:encoded></item><item><title><![CDATA[Idempotency In APIs: Planning for Uncertainty]]></title><description><![CDATA[Idempotency is a property that can be applied to operations, algorithms, and code. In software engineering, it refers to the ability of an operation to be performed multiple times on the same input without resulting in an unnatural state. An idempote...]]></description><link>https://blog.sofwancoder.com/idempotency-in-apis-planning-for-uncertainty</link><guid isPermaLink="true">https://blog.sofwancoder.com/idempotency-in-apis-planning-for-uncertainty</guid><category><![CDATA[idempotence]]></category><category><![CDATA[APIs]]></category><category><![CDATA[backend]]></category><category><![CDATA[Software Engineering]]></category><category><![CDATA[Frontend Development]]></category><dc:creator><![CDATA[Sofwan A. Lawal]]></dc:creator><pubDate>Mon, 07 Nov 2022 11:51:18 GMT</pubDate><enclosure url="https://cdn.hashnode.com/res/hashnode/image/upload/v1667821778117/V-GPVFoYS.png" length="0" type="image/jpeg"/><content:encoded><![CDATA[<p>Idempotency is a property that can be applied to operations, algorithms, and code. In software engineering, it refers to the ability of an operation to be performed multiple times on the same input without resulting in an unnatural state. An idempotent operation is one that can be invoked any number of times without changing the result. The opposite of idempotency is non-idempotency, which occurs when the result changes with every call. This blog post explains why you should care about idempotency and provides examples of how you can design your APIs to make them more idempotent.</p>
<h2 id="heading-introduction">Introduction</h2>
<p>Idempotency is a property of operations that can be performed without changing the result of that operation. Idempotent operations can be repeated multiple times and have the same effect as if they had been performed once. In non-idempotent APIs, It'll be difficult for us to handle issues that may result from errors and network uncertainties, which causes clients to resend some successfully handled requests.</p>
<p><strong>Consider a request a backend application which receives a debit order from a client (Let's keep it simple)</strong> ``` import express, { Request, Response } from "express"; import * as bodyParser from "body-parser";</p>
<p>const app = express(); app.use(bodyParser.urlencoded({ extended: true })); app.use(bodyParser.json());</p>
<p>let orders = []; let balance = 500; <code>Defining the `process-order` endpoint to handle the request.</code> app.post('/process-order', (req: Request, res: Response) =&gt; {</p>
<p>if (balance &lt; req.body.amount) { return res.status(400).json({ message: 'Insufficient wallet balance!' }); }</p>
<p>balance = balance - Number(req.body.amount);</p>
<p>let newOrder = { id: Math.floor(Date.now() + Math.random()), //random 4 digit number to act as ID item: req.body.item, amount: req.body.amount, };</p>
<p>//add it to orders database orders.push(newOrder);</p>
<p>const response = { message: <code>Order placed successfully!</code>, };</p>
<p>return res.status(201).json(response); }); <code>Finally start the server on a port</code> app.listen(3000, () =&gt; console.log('Server running!')); ```</p>
<p>Now to the interesting part, <strong>Consider a request sent to the backend application containing a debit order from your wallet</strong> ``` function async sendOrder() { return await axios.post("https://server.url/process-order", { amount: 100, item: "lord-of-the-rings" }) }</p>
<p>const response = await sendOrder(); // response.status is 201 <code>Assuming there was a network error on the client side, retrying this request will result in multiple debits and performance implications.</code> const response = await sendOrder(); // response.status is 201 again <code>If this request is tried one more time, then we get another debit on our users wallet, which is redundant</code> const response = await sendOrder(); // response.status is 201 again ``` The performance implications of idempotency come from not having to execute an operation more than once if it doesn’t change the state of the system. You only need one request to trigger an action, not a second one with the same parameters. Moreover, you don’t need a second request in case something goes wrong on the first try and you need to retry it.</p>
<h2 id="heading-idempotency-by-examples">Idempotency by examples</h2>
<p>Idempotency is an important property for APIs because it allows them to be invoked multiple times with the same input without failing or producing unpredictable results. Idempotency can be applied at the service endpoint, inputs and outputs of data transformations, and individual request-response interactions within services.</p>
<p>We can implement idempotency key in any way we deem fit, however common ones are</p>
<ul>
<li><p>Adding idempotent key in the body <code>// request body { amount: 100, item: 'item-key', idempotentKey: 'unique-key-id' }</code></p>
</li>
<li><p>Adding idempotent key in the url query parameters <code>// request url const url = `https://server.url/endpoint?idempotentKey=unique-key-id` </code></p>
</li>
<li><p>Adding idempotent key in the header (Most recommended) <code>X-Idempotent-Key: some-unique-key-id</code> In the example shown above, Consider the backend application is redesigned to include support for idempotency. We'll add the Idempotent key in an header <code>X-Idempotent-Key</code>.</p>
</li>
</ul>
<pre><code class="lang-typescript"><span class="hljs-comment">// To keep it simple, in real-world application, you'll setup an external cache server like redis</span>
<span class="hljs-comment">// or any distributed caching strategy you find fit for your application</span>
<span class="hljs-keyword">const</span> cache: Request&lt;<span class="hljs-built_in">string</span>, OrderItem|<span class="hljs-built_in">boolean</span>&gt; = {}

<span class="hljs-comment">// This middleware intercepts every requests going to the process-order endpoints</span>
app.use(<span class="hljs-string">'/process-order'</span>, (req: Request, res: Response, next: NextFunction) {
  <span class="hljs-comment">// fetching the key from the header when it exists</span>
  <span class="hljs-keyword">const</span> idempotentKey = req.headers[<span class="hljs-string">'x-idempotent-key'</span>] 

  <span class="hljs-keyword">if</span> (!idempotentKey) {
   <span class="hljs-comment">// proceed to handle the request because the request is not idempotent;</span>
    <span class="hljs-keyword">return</span> next() 
  }

  <span class="hljs-keyword">const</span> processedOrder = cache[idempotentKey];

  <span class="hljs-keyword">if</span> (!processedOrder) {
    <span class="hljs-keyword">return</span> next() <span class="hljs-comment">// Proceed to processing because the request was not previously processed</span>
  }

  <span class="hljs-comment">// Here, the request has been handled already</span>
  <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">200</span>).json({
    message: <span class="hljs-string">`Order processed successfully!`</span>,
  });
})
</code></pre>
<p>The implementation above, which depended on a cache includes a flaw which should be considered. The use of a non-consistent cache for idempotency results in the introduction of a consistency bug because there is no atomicity guarantee between the cache and the "main database" (whatever that may be). Depending on the precise timing of the idempotence information update, you can:</p>
<ul>
<li><p>end up with duplicate operations if you update the idempotence DB too late (or not at all, if the process fails or encounters an error!)</p>
</li>
<li><p>fail to register some operations because you update the idempotence DB too soon and crash, and when the client tries again, you act as though the change has already been done but in fact, it hasn't because it crashed the first time!</p>
</li>
</ul>
<p><strong>To solve this</strong> I recommend relying on the actual main database to decide when the operation fails or succeeds. The implementation above can be re-written as</p>
<pre><code class="lang-typescript">app.use(<span class="hljs-string">'/process-order'</span>, (req: Request, res: Response, next: NextFunction) {
  <span class="hljs-comment">// fetching the key from the header when it exists</span>
  <span class="hljs-keyword">const</span> idempotentKey = req.headers[<span class="hljs-string">'x-idempotent-key'</span>] 

  <span class="hljs-keyword">if</span> (!idempotentKey) {
   <span class="hljs-comment">// proceed to handle the request because the request is not idempotent;</span>
    <span class="hljs-keyword">return</span> next() 
  }

  <span class="hljs-comment">// Relying on the actual data-source to figure out if the operation has been completed before</span>
  <span class="hljs-comment">// This operation usually involves queries from the database</span>
  <span class="hljs-keyword">const</span> processedOrder = orders.find(<span class="hljs-function"><span class="hljs-params">order</span> =&gt;</span> order.requestId === idempotentKey);

  <span class="hljs-keyword">if</span> (!processedOrder) {
    <span class="hljs-keyword">return</span> next() <span class="hljs-comment">// Proceed to processing because the request was not previously processed</span>
  }

  <span class="hljs-comment">// Here, the request has been handled already</span>
  <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">200</span>).json({
    message: <span class="hljs-string">`Order processed successfully!`</span>,
  });
})
</code></pre>
<p>The actual process-order endpoint can be written to look like this implementation</p>
<pre><code class="lang-typescript"><span class="hljs-comment">// Normal processing</span>
app.post(<span class="hljs-string">'/process-order'</span>, <span class="hljs-function">(<span class="hljs-params">req: Request, res: Response</span>) =&gt;</span> {
  <span class="hljs-keyword">if</span> (balance &lt; req.body.amount) {
    <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">400</span>).json({ message: <span class="hljs-string">'Insufficient wallet balance!'</span> });
  }

  balance = balance - <span class="hljs-built_in">Number</span>(req.body.amount);

  <span class="hljs-keyword">const</span> idempotentKey = req.headers[<span class="hljs-string">'x-idempotent-key'</span>]

  <span class="hljs-keyword">let</span> newOrder = {
    id: <span class="hljs-built_in">Math</span>.floor(<span class="hljs-built_in">Date</span>.now() + <span class="hljs-built_in">Math</span>.random()), <span class="hljs-comment">//random 4 digit number to act as ID</span>
    item: req.body.item,
    amount: req.body.amount,
    requestId: idempotentKey ?? <span class="hljs-literal">null</span> <span class="hljs-comment">// Added the idempotentKey as requestId</span>
  };

  <span class="hljs-comment">//add it to orders database</span>
  orders.push(newOrder);

  <span class="hljs-comment">// If you employ a cache, but this has its considerations and issues</span>
  <span class="hljs-comment">// cache[idempotentKey] = true;</span>

  <span class="hljs-keyword">const</span> response = {
    message: <span class="hljs-string">`Order placed successfully!`</span>,
  };

  <span class="hljs-keyword">return</span> res.status(<span class="hljs-number">201</span>).json(response);
});
</code></pre>
<p>Now "idempotently 😂" thinking and re-writing the client implementation earlier to be idempotent. ``` // Probably persisted in the SessionStorage/LocalStorage or Component memory. // This will be added to all unique actions and changed when a new action needs to be triggered let idempotentKey = 'some-unique-key'</p>
<p>function async sendOrder(idempotentKey: string) { return await axios.post("https://server.url/process-order", { amount: 100, item: "lord-of-the-rings" }, { headers: { "X-Idempotent-Key": idempotentKey } }) }</p>
<p>const response = await sendOrder(idempotentKey); // response.status is 201 ```</p>
<p>Subsequent request will result in <code>200</code> success without causing multiple debit or redundant orders. <code>const response = await sendOrder(idempotentKey); // response.status is 200</code></p>
<p>Trying again results in no state change <code>const response = await sendOrder(idempotentKey); // response.status is 200 again</code></p>
<p>This has important performance implications for applications, so it’s worth thinking about how to design idempotent APIs from the start.</p>
<h2 id="heading-idempotent-and-high-performance-apis">Idempotent and High-Performance APIs</h2>
<p>APIs that don’t guarantee idempotency can lead to very slow applications. Imagine an e-commerce website where users are allowed to place orders. If an order is placed in error, you might want to cancel it. To be able to cancel an order, you need to be able to identify the order in question. To identify the order, you need to know the order ID. To know the order ID, you need to know the order total. But first, you need to know what payment method was used to place the order. If an order ID is non-idempotent, you’ll have to execute a whole chain of operations before you can find the order you want to cancel. Each of these operations involves communicating with the backend services, and each of them has the potential to fail due to network issues or other problems. If there are problems along the way, they will involve retrying the same chain of operations again, possibly even slower than before.</p>
<h2 id="heading-api-responses-without-idempotency">API Responses without Idempotency</h2>
<p>An example of an API response that is not idempotent is one that includes a unique ID for each resource that is returned. This is the approach you’ll find in many database APIs, where you get an ID as part of the result. These IDs are not suitable as unique identifiers since they can change each time you retry an operation. IDs generated based on a combination of the current time and the internal state of the server are also not suitable as unique identifiers. These time-based IDs are bound to change on each retry, making them non-idempotent. Any other approach that relies on some internal server state that changes with each request is not idempotent.</p>
<h2 id="heading-the-problem-with-non-idempotent-apis">The problem with non-idempotent APIs</h2>
<p>While the examples above are easy to identify, real-world APIs often break idempotency in ways that are harder to spot. What if the response to an order details request also includes the order ID? It may seem like you now have everything you need in one call. But as long as the order ID is tied to the current state of the server, it’s not idempotent. It’s still possible that the first attempt at placing the order fails due to misconfiguration or network issues, and you need to retry the order details request. What if the order ID is stored as a global variable on the server side? This isn’t idempotent either, as it will be reset as soon as the order is placed.</p>
<h2 id="heading-how-to-make-your-api-idempotent">How to make your API idempotent</h2>
<p>To make your API idempotent, you need to identify a state that might change between retries and make it immutable. Data that can’t change between retries is suitable for storing the unique identifiers. Furthermore, such data can be shared across multiple requests without needing to be duplicated. You should use a distributed system to store the state. One thing to consider is that you should never use any kind of separate database for idempotence, unless you can make this separate DB consistent with the main DB in one transactional context (which is slow and complex).</p>
<p>Hence, Avoid implementing idempotence as a bolt-on solution using an external DB for which you cannot guarantee the consistency of writes relative to where the business entities live. Usually, this means you need one of:</p>
<ul>
<li><p><strong>Atomicity</strong> - you must write the operation ID together with the changes that it introduces, so that you can rely on the equivalence: operation_ID_is_present === the_operation_was_already_applied. This usually means DB transactions, sometimes even distributed transactions (JTA/XA, but don't go down that rabbit hole).</p>
</li>
<li><p><strong>Natural idempotence</strong> - you don't rely on any external operation identifier, but instead deduplicate the essence of the command. This is tricky, for example you might not be able to have an "Add to cart" command, but instead "Make it so that there is 1 of this item in the cart". Then, deduplicating commands is a matter of seeing: is there currently 1 unit of this item in the cart? If so, do nothing.</p>
</li>
</ul>
<h2 id="heading-conclusion">Conclusion</h2>
<p>APIs that are designed for high performance from the start are easier to develop and maintain. They are less likely to break in unexpected ways, and they are easier to optimise. This includes making sure they are idempotent. An API that is not idempotent may work just fine in testing, but once it’s in production, it’s bound to have serious performance issues. The key to idempotency is identifying the state that can change in each retry and making it immutable. Data that can’t change between retries is suitable for storing the unique identifiers. Furthermore, such data can be shared across multiple requests without needing to be duplicated. This way, you can reduce the number of RPC calls significantly, which saves on network overhead, as well as processing resources. It can also make your system more robust by reducing the number of points of failure.</p>
]]></content:encoded></item></channel></rss>