Implementing Resilience in Your Application: A Guide to the Resilience Library

The resilience-library equips your application with powerful tools to manage concurrency, rate limiting, and fault tolerance. These mechanisms ensure your application can gracefully handle high loads and unexpected outages.

Implementing Resilience in Your Application: A Guide to the Resilience Library

As modern applications grow in complexity, managing concurrent requests and ensuring reliability become increasingly important. Whether it's preventing system overload or handling unexpected failures, resilience mechanisms such as rate limiters, semaphores, and circuit breakers play a key role. The resilience library offers a comprehensive solution for implementing these controls, allowing your system to gracefully degrade and recover from potential failures.

In this blog, we'll walk through the features of this library, demonstrating how to implement and use its core components—rate limiters, semaphores, and circuit breakers. We’ll also explore how to combine these mechanisms to ensure maximum stability and responsiveness in your applications.

Rate Limiting Strategies

The resilience-library provides three rate limiting strategies:
Fixed Window Counter
Leaky Bucket
Token Bucket
Each has distinct use cases and configurations, ensuring you can control request
rates effectively.

Fixed Window Counter

The Fixed Window Counter is the simplest strategy, counting the number of
requests in a fixed time. If the count exceeds the limit, further requests are
blocked until the window resets.

const fixedWindowCounterOptions: FixedWindowCounterOptions = {
    type: 'fixed_window',
    maxRequests: 10,
    key: 'api/endpoint'
};

const fixedWindowRateLimiter = RateLimiter.create(fixedWindowCounterOptions);

In this example, the fixedWindowRateLimiter allows up to 10 requests per minute for the endpoint api/endpoint_fixed_window.

Leaky Bucket

The Leaky Bucket works similarly to water dripping from a bucket. Requests can fill the bucket, but they "leak" at a constant rate.

const leakyBucketOptions: LeakyBucketOptions = {
    type: 'leaky_bucket',
    maxRequests: 10,
    key: 'api/endpoint'
};
const leakyBucketRateLimiter = RateLimiter.create(leakyBucketOptions);

This allows up to 10 requests, refilling the bucket over time, which makes it ideal for smoothing out bursts of traffic.

Token Bucket

In the Token Bucket strategy, tokens represent the capacity for making requests. Tokens are refilled periodically, and each request consumes a token.

const tokenBucketOptions: TokenBucketOptions = {
    type: 'token_bucket',
    maxTokens: 10,
    key: 'api/endpoint'
};
const tokenBucketRateLimiter = RateLimiter.create(tokenBucketOptions);

Here, tokens are refilled at a rate of one per second, allowing a maximum of 10 tokens at any given time.

Semaphore for Concurrency Control

Managing concurrent access to shared resources can prevent overloads. Semaphores in the resilience-library control the number of concurrent requests that can be made.

Semaphore Implementation

To limit access to a resource, simply create a semaphore:

const semaphore = Semaphore.create('resource_key', 3);

This allows only 3 concurrent accesses to the resource identified by resource_key. Requests beyond this limit will be queued or rejected based on your logic.

Acquiring and Releasing a Semaphore

async function acquireResource() {
    const acquired = await semaphore.acquire();
    if (acquired) {
        console.log('Resource acquired successfully.');
    } else {
        console.log('Resource limit reached. Cannot acquire.');
    }
}
async function releaseResource() {
    try {
        await semaphore.release();
        console.log('Resource released successfully.');
    } catch (error) {
        const er = error as Error;
        console.error('Release failed:', er.message);
    }
}

These functions control access, acquiring and releasing the resource as needed.

Circuit Breaker for Fault Tolerance

Circuit breakers provide fault tolerance by detecting failures and temporarily halting further requests until the system recovers. The resilience-library supports two strategies: Error Percentage and Explicit Threshold.

Error Percentage Strategy

This strategy monitors the error rate over a rolling window and trips the circuit when the error percentage exceeds a certain threshold.

const errorPercentageOptions: ErrorPercentageCircuitBreakerOptions = {
    resourceName: 'ResourceService',
    rollingWindowSize: 10000,
    requestVolumeThreshold: 10,
    errorThresholdPercentage: 50,
    sleepWindow: 3000,
    fallbackMethod: () => 'Fallback response',
    pingService: async () => {
        const isServiceOperational = Math.random() < 0.8; // 80% chance of service being operational
        return isServiceOperational;
    }
};

const errorPercentageCircuitBreaker = CircuitBreakerFactory.create(errorPercentageOptions);

Here, if 50% of requests fail within the window of 10,000 milliseconds, the circuit breaker opens, halting requests for 3 seconds before retrying.

Explicit Threshold Strategy

This tracks specific failure metrics, such as timeouts or successful recoveries.

const explicitThresholdOptions: ExplicitThresholdCircuitBreakerOptions = {
    resourceName: 'ResourceService',
    rollingWindowSize: 10000,
    failureThreshold: 5,
    timeoutThreshold: 2,
    successThreshold: 3,
    sleepWindow: 3000,
    fallbackMethod: () => 'Fallback response',
    pingService: async () => {
        const isServiceOperational = Math.random() < 0.8; // 80% chance of service being operational
        return isServiceOperational;
    }
};

const explicitThresholdCircuitBreaker = CircuitBreakerFactory.create(explicitThresholdOptions);

In this configuration, the circuit opens after 5 failures or 2 timeouts and only closes after 3 successful requests, ensuring stability during recovery.

Composing Policies for Enhanced Resilience

In many scenarios, combining multiple resilience mechanisms into a single, cohesive workflow is critical for ensuring application stability. A policy in the context of resilience engineering refers to a structured set of rules or mechanisms that control how certain operations are executed under given conditions, such as limiting request rates, managing concurrent access to shared resources, or preventing cascading failures.

In the resilience-library, a policy wraps these resilience mechanisms—such as rate limiters, semaphores, and circuit breakers—into a single unit that enforces various controls and safeguards in a consistent and predictable manner. By composing policies, you create a unified strategy that handles multiple potential failure scenarios, ensuring that your system remains stable even under stress.

What is a Policy?

A policy can be seen as an orchestrator of resilience mechanisms. When multiple constraints (like rate limiting, concurrency control, and fault detection) are combined, the policy ensures that they work together seamlessly. It follows a logical flow where the request must pass through each defined control. If any control fails—whether due to rate limiting, concurrency limits, or service failures—the request is either deferred, rejected, or processed by a fallback mechanism, depending on the policy rules.

Example: Combining Rate Limiter, Semaphore, and Circuit Breaker

Let’s walk through how you can build a system that integrates rate limiting, concurrency management, and fault tolerance into a single workflow.

// Define token bucket options
const tokenBucketOptions: TokenBucketOptions = {
    type: 'token_bucket',
    maxTokens: 10,
    refillRate: 1,
    key: 'api/endpoint'
};

// Create rate limiter instance
const rateLimiter = RateLimiter.create(tokenBucketOptions);

// Define circuit breaker options
const errorPercentageOptions: ErrorPercentageCircuitBreakerOptions = {
    resourceName: 'ResourceService',
    rollingWindowSize: 10000,
    requestVolumeThreshold: 10,
    errorThresholdPercentage: 50,
    sleepWindow: 3000,
    fallbackMethod: () => 'Fallback response',
    pingService: async () => {
        const isServiceOperational = Math.random() < 0.8; // 80% chance of service being operational
        return isServiceOperational;
    }
};

// Create circuit breaker instance
const circuitBreaker = CircuitBreakerFactory.create(errorPercentageOptions);

// Create semaphore instance with a limit of 3 concurrent accesses
const semaphore = Semaphore.create('resource_key', 3);

// Combine the policies
const policy = Policy.wrap(semaphore, rateLimiter, circuitBreaker);

// Define the HTTP request function
async function makeRequest() {
    try {
        await policy.execute(async () => {
            const response = await axios.get('https://jsonplaceholder.typicode.com/posts/1');
            console.log(response.data);
        });
    } catch (error) {
        const er = error as Error;
        console.error('Request failed:', er.message);
    }
}

// Make a request
makeRequest();

Breaking Down the Example

  • Token Bucket Rate Limiter:
    Limits requests to 10 tokens, with 1 token refilling every second. This helps control the flow of requests by ensuring they don’t exceed the defined limit, protecting your API from excessive traffic.
  • Semaphore:
    Limits the number of concurrent requests to a shared resource. In this case, only 3 requests can access the resource at any given time. If the semaphore limit is reached, the request waits until a spot becomes available.
  • Error Percentage Circuit Breaker:
    Monitors the service’s error rate and trips the circuit if the error rate exceeds 50% over a rolling window of 10 seconds. When the circuit is open, no further requests are made to the service, and a fallback response is returned instead. After 3 seconds, the circuit will attempt to close if the service recovers.

How the Policy Works

  • When the makeRequest() function is called, the request first passes through the semaphore to check whether the concurrency limit has been reached. If it hasn’t, the request moves to the next stage.
  • The rate limiter then verifies whether the request quota has been met. If the request is within limits, it proceeds.
  • Finally, the circuit breaker checks the current health of the service. If the error rate is too high, the circuit will be open, and a fallback response will be returned. If the circuit is closed (indicating the service is healthy), the request is executed.

This ensures that the system remains resilient, handling failures in a controlled manner while preventing resource exhaustion and protecting against system-wide failures.

Adding Hooks for Monitoring

The resilience-library also provides a way to monitor and track the execution of policies. Using hooks, you can add logging or send metrics to your monitoring system, which is useful for gaining insights into system behavior, detecting issues early, and ensuring performance tuning.
You can add hooks at different stages of the policy execution:

policy.beforeExecute = async (context: IPolicyContext) => {
    loggingAdapter.log('Before execution');
    telemetryAdapter.collect({ event: 'before_execution' });
};

policy.afterExecute = async (context: IPolicyContext) => {
    loggingAdapter.log('After execution');
    telemetryAdapter.collect({ event: 'after_execution' });
};

With these hooks, you can log relevant data points before and after each execution of the policy, enabling you to track performance, analyze request patterns, and identify bottlenecks.

Conclusion

The resilience-library equips your application with powerful tools to manage concurrency, rate limiting, and fault tolerance. Whether you're controlling API traffic, managing shared resources, or preventing cascading failures, these mechanisms ensure your application can gracefully handle high loads and unexpected outages.
By combining these strategies, you can create a robust, resilient system capable of scaling to meet demands while protecting its internal components from being overwhelmed. Start incorporating rate limiters, semaphores, and circuit breakers in your system today and watch your application's reliability soar!

Explore the full code and documentation for this project on GitHub:
Resilience Library GitHub Repository

Have you used any of these resilience mechanisms before? Let us know your experience in the comments! For those new to these concepts, feel free to ask questions—we’re here to help!