It’s a given fact that microservices-based software architecture brings its own set of challenges. With so many microservices and services interacting with each other, increased complexity and the risk of failures — or cascade failures — are inevitable. To address these challenges, we need to find ways to isolate risky components and prevent their failure from propagating throughout the system. Consequently, in this article, we will explore the circuit breaker pattern in microservices architecture and see how it can help you deal with faults and failures.
Introduction
When designing an enterprise microservices architecture, one of the biggest concerns is how to manage failure in a distributed system. The software industry has seen several examples of large-scale failures, such as Microsoft Azure outage and AWS S3 outage in 2017. Both cloud outages had a huge impact on many businesses because they were so widespread. With the microservices architecture pattern, you can build your applications using small services that have a single responsibility and are combined to create larger capabilities. When building these smaller services, it's important to implement resiliency measures to make sure they remain online when encountering errors or unexpected conditions.
What is a Circuit Breaker?
When things go wrong, we must have some contingency in place. Otherwise, our services will keep failing and we’ll end up with no services at all. This is where circuit breakers come into play.
A Circuit Breaker is a fault-tolerance pattern that's used to handle transient errors and prevent cascading failures. In other words, it's a mechanism to stop the propagation of errors by shutting things down gracefully.
In Distributed Systems
In distributed systems, a circuit breaker can be implemented to end the flow of requests to a service that has exceeded its maximum threshold of error rate and latency. The pattern is widely used in distributed systems to improve their reliability and availability. It is implemented as a monitoring mechanism to detect faults and then decide if an action needs to be taken to prevent the faulty components from affecting the system as a whole.
Why use the circuit breaker pattern?
There are several benefits to using the Circuit Breaker pattern in a microservice architecture:
Improved resilience and reliability: By automatically failing requests to downstream services that are not responding or experiencing high latency, the circuit breaker helps to prevent cascading failures and improve the overall resilience and reliability of the system.
Increased availability: By failing fast and stopping the chain reaction of failures, the circuit breaker helps to ensure that the system remains available and can continue to serve user requests.
Reduced resource consumption: When a downstream service is experiencing problems, the circuit breaker can help to reduce the load on the service by failing requests before they reach the service. This can help to reduce the resource consumption of the service and prevent it from becoming overloaded.
Enhanced monitoring and visibility: The circuit breaker can provide useful information about the health of downstream services, allowing the system to be monitored and any issues to be identified and addressed quickly.
Overall, the Circuit Breaker pattern is an important tool for improving the resilience and reliability of microservice architectures and helping to ensure that the system remains available and responsive to user requests.
Why is it needed in Microservices Architecture
A microservice architecture consists of a large number of microservices that are built for specific tasks and can be reused across different applications. Because these microservices are independent, they can be deployed and scaled independently, allowing your organization to meet changing business requirements.
The Problem
This architecture is highly distributed and is therefore susceptible to a wide variety of faults, such as latency issues, outages, or unbalanced loads. When a problem occurs in one of these services, it can quickly escalate and affect all of the other microservices in the system. If a service encounters an error, it will return an error, which can travel through the entire system and create a chain reaction.
The Solution
Circuit breakers can be used to stop this propagation of errors by shutting things down gracefully. When a circuit breaker is activated, it prevents requests from reaching a faulty microservice, preventing the error from cascading through the system. With circuit breakers, you can ensure that your system remains stable even in the event of an unexpected incident.
Implementing a Circuit Breaker in Microservices
One way to implement a circuit breaker is to use a state machine that tracks the health of the downstream service. The state machine has three states:
closed
: Allow all requestsopen
: Fail all requesthalf-open
: Allow some requests
When the circuit breaker is in the closed state, requests to the downstream service are allowed to pass through. If the downstream service starts to experience failures or high latency, the circuit breaker transitions to the open state and begins to fail requests to the downstream service.
After a certain amount of time has passed, the circuit breaker transitions to the half-open state and allows a limited number of requests to pass through. If these requests are successful, the circuit breaker transitions back to the closed state. If the requests fail, the circuit breaker transitions back to the open state.
Here is an example of how you might implement the circuit breaker in a Node.js application using the Express framework and TypeScript:
First, we will create a simple function that represents a downstream service that we want to protect with a circuit breaker. This function will make an HTTP request to a mock service and return the response data.
import axios from 'axios';
async function callDownstreamService(): Promise<string> {
try {
const response = await axios.get('http://mock-service.com/data');
return response.data;
} catch (error) {
throw new Error(error.message);
}
}
Next, we will create a circuit breaker class that will wrap the call to the downstream service. This class will use a state machine to track the health of the downstream service and automatically fail requests if necessary:
import axios from 'axios';
// Enum to track the state of the circuit breaker
enum CircuitBreakerState {
CLOSED, // Circuit is closed and requests to the downstream service are allowed through
OPEN, // Circuit is open and requests to the downstream service are failed
HALF_OPEN, // Circuit is half-open and a limited number of requests are allowed through
}
class CircuitBreaker {
// The current state of the circuit breaker
private state: CircuitBreakerState;
// The number of errors that need to occur before the circuit breaker transitions to the open state
private errorThreshold: number;
// The amount of time the circuit breaker stays in the open state before transitioning to the half-open state
private resetTimeout: number;
// The number of errors that have occurred
private errorCount: number;
// The number of requests allowed through in the half-open state
private halfOpenRequests: number;
// The time when the circuit breaker last changed state
private lastStateChange: number;
// Constructor for the circuit breaker class
constructor(errorThreshold: number, resetTimeout: number, halfOpenRequests: number) {
this.state = CircuitBreakerState.CLOSED;
this.errorThreshold = errorThreshold;
this.resetTimeout = resetTimeout;
this.halfOpenRequests = halfOpenRequests;
this.errorCount = 0;
this.lastStateChange = Date.now();
}
// Method to call the downstream service using the circuit breaker
async callService(): Promise<string> {
try {
// Check the current state of the circuit breaker
switch (this.state) {
case CircuitBreakerState.CLOSED:
// Circuit is closed, so allow the request through and reset the error count
return await this.callServiceAndResetErrorCount();
case CircuitBreakerState.OPEN:
// Circuit is open, so check if the reset timeout has expired
if (this.isResetTimeoutExpired()) {
// Reset timeout has expired, so transition to the half-open state and allow a request through
this.transitionToHalfOpenState();
return await this.callServiceAndIncrementErrorCount();
} else {
// Reset timeout has not expired, so fail the request
throw new Error('Circuit is open');
}
case CircuitBreakerState.HALF_OPEN:
// Circuit is half-open, so check if the number of allowed requests has been exceeded
if (this.errorCount < this.halfOpenRequests) {
// Allowed requests have not been exceeded, so allow the request through
return await this.callServiceAndIncrementErrorCount();
} else {
// Allowed requests have been exceeded, so transition back to the open state and fail the request
this.transitionToOpenState();
throw new Error('Circuit is open');
}
}
} catch (error) {
// An error occurred, so increment the error count
this.incrementErrorCount();
throw error;
}
}
// Method to call the downstream service and reset the error count
private async callServiceAndResetErrorCount(): Promise<string> {
try {
// Call the downstream service
const response = await callDownstreamService();
// Reset the error count
this.resetErrorCount();
// Return the response from the downstream service
return response;
} catch (error) {
// An error occurred, so increment the error count
this.incrementErrorCount();
throw error;
}
}
// Method to call the downstream service, increment the error count, and transition to the closed state
// if the request is successful
private async callServiceAndIncrementErrorCount(): Promise<string> {
try {
// Call the downstream service
const response = await callDownstreamService();
// Reset the error count
this.resetErrorCount();
// Transition to the closed state
this.transitionToClosedState();
// Return the response from the downstream service
return response;
} catch (error) {
// An error occurred, so increment the error count
this.incrementErrorCount();
throw error;
}
}
// Method to reset the error count
private resetErrorCount(): void {
this.errorCount = 0;
}
// Method to increment the error count and transition to the open state if the error threshold is reached
private incrementErrorCount(): void {
this.errorCount++;
if (this.errorCount >= this.errorThreshold) {
this.transitionToOpenState();
}
}
// Method to check if the reset timeout has expired
private isResetTimeoutExpired(): boolean {
return Date.now() - this.lastStateChange > this.resetTimeout;
}
// Method to transition to the closed state
private transitionToClosedState(): void {
this.state = CircuitBreakerState.CLOSED;
this.lastStateChange = Date.now();
}
// Method to transition to the open state
private transitionToOpenState(): void {
this.state = CircuitBreakerState.OPEN;
this.lastStateChange = Date.now();
}
// Method to transition to the half-open state
private transitionToHalfOpenState(): void {
this.state = CircuitBreakerState.HALF_OPEN;
this.lastStateChange = Date.now();
}
}
Finally, we can use the CircuitBreaker
class in an Express route handler to protect the call to the downstream service:
import express from 'express';
import { CircuitBreaker } from './circuit-breaker';
const app = express();
// Create a new circuit breaker with an error threshold of 5, a reset timeout of 10 seconds, and allowing 2 requests through in the half-open state
const circuitBreaker = new CircuitBreaker(5, 10000, 2);
app.get('/data', async (req, res) => {
try {
// Call the downstream service using the circuit breaker
const data = await circuitBreaker.callService();
// Send the response data back to the client
res.send(data);
} catch (error) {
// An error occurred, so send a 500 error back to the client
res.status(500).send(error.message);
}
});
app.listen(3000, () => {
console.log('Server listening on port 3000');
});
This example creates a circuit breaker with an error threshold of 5, a reset timeout of 10 seconds, and allows 2 requests through in the half-open state. If the downstream service fails 5 times in a row, the circuit breaker will transition to the open state and start failing requests. After 10 seconds have passed, the circuit breaker will transition to the half-open state and allow 2 requests through. If these requests are successful, the circuit breaker will transition back to the closed state. If they fail, the circuit breaker will transition back to the open state.
Using a circuit breaker can help to improve the resilience and reliability of a microservice architecture by providing a mechanism to fail fast and prevent cascading failures. It is important to tune the circuit breaker's parameters, such as the time to stay in the open state and the number of requests to allow through in the half-open state, to ensure that it is effective in protecting the system without causing undue disruption.
I hope this example helps to illustrate how the Circuit Breaker pattern can be implemented in a Node.js application using the Express framework and TypeScript.
Conclusion
When designing an enterprise microservices architecture, it's important to implement resiliency measures to make sure the services remain online when encountering errors or unexpected conditions. A circuit breaker is a fault-tolerance pattern that can be used to handle transient errors and prevent cascading failures in distributed systems. With circuit breakers, you can ensure that your system remains stable even in the event of an unexpected incident.