「Cloud Design Patterns」Bulkhead pattern

132 阅读4分钟

总结:

  1. 是一种容错性设计模式。将应用程序的元素隔离到池中,如果其中一个失败,不影响其它继续运行,从而避免潜在的连锁故障,也经常用在隔离并限制故障的爆炸半径上。命名上来源于船舶设计。

  2. 可以考虑与重试,断路器和限流模式相结合,以提供更精细的故障处理。

  3. 联想到的此模式的几种实现方案:

    1. 应用系统中的各种线程池。每个线程池都有专门的用途,互相并不会干扰。

    2. 多租户的隔离、包括软硬件层次上的。软件比如 Presto 中的 ResourceGroup 概念;硬件上则一般对应的是部署模式上的区别,如特别重要的租户专门部署对应的资源,小型租户则共享资源。

    3. 还有典型的比如 k8s 的各种部署方案,通常都会涵盖一些资源的理念。


Bulkhead pattern

The Bulkhead pattern is a type of application design that is tolerant of failure. In a bulkhead architecture, elements of an application are isolated into pools so that if one fails, the others will continue to function. It's named after the sectioned partitions (bulkheads) of a ship's hull. If the hull of a ship is compromised, only the damaged section fills with water, which prevents the ship from sinking.

Context and problem

A cloud-based application may include multiple services, with each service having one or more consumers. Excessive load or failure in a service will impact all consumers of the service.

Moreover, a consumer may send requests to multiple services simultaneously, using resources for each request. When the consumer sends a request to a service that is misconfigured or not responding, the resources used by the client's request may not be freed in a timely manner. As requests to the service continue, those resources may be exhausted. For example, the client's connection pool may be exhausted. At that point, requests by the consumer to other services are affected. Eventually the consumer can no longer send requests to other services, not just the original unresponsive service.

The same issue of resource exhaustion affects services with multiple consumers. A large number of requests originating from one client may exhaust available resources in the service. Other consumers are no longer able to consume the service, causing a cascading failure effect.

Solution

Partition service instances into different groups, based on consumer load and availability requirements. This design helps to isolate failures, and allows you to sustain service functionality for some consumers, even during a failure.

A consumer can also partition resources, to ensure that resources used to call one service don't affect the resources used to call another service. For example, a consumer that calls multiple services may be assigned a connection pool for each service. If a service begins to fail, it only affects the connection pool assigned for that service, allowing the consumer to continue using the other services.

The benefits of this pattern include:

  • Isolates consumers and services from cascading failures. An issue affecting a consumer or service can be isolated within its own bulkhead, preventing the entire solution from failing.

  • Allows you to preserve some functionality in the event of a service failure. Other services and features of the application will continue to work.

  • Allows you to deploy services that offer a different quality of service for consuming applications. A high-priority consumer pool can be configured to use high-priority services.

The following diagram shows bulkheads structured around connection pools that call individual services. If Service A fails or causes some other issue, the connection pool is isolated, so only workloads using the thread pool assigned to Service A are affected. Workloads that use Service B and C are not affected and can continue working without interruption.

image

The next diagram shows multiple clients calling a single service. Each client is assigned a separate service instance. Client 1 has made too many requests and overwhelmed its instance. Because each service instance is isolated from the others, the other clients can continue making calls.

image

Issues and considerations

  • Define partitions around the business and technical requirements of the application.

  • When partitioning services or consumers into bulkheads, consider the level of isolation offered by the technology as well as the overhead in terms of cost, performance and manageability.

  • Consider combining bulkheads with retry, circuit breaker, and throttling patterns to provide more sophisticated fault handling.

  • When partitioning consumers into bulkheads, consider using processes, thread pools, and semaphores. Projects like resilience4j and Polly offer a framework for creating consumer bulkheads.

  • When partitioning services into bulkheads, consider deploying them into separate virtual machines, containers, or processes. Containers offer a good balance of resource isolation with fairly low overhead.

  • Services that communicate using asynchronous messages can be isolated through different sets of queues. Each queue can have a dedicated set of instances processing messages on the queue, or a single group of instances using an algorithm to dequeue and dispatch processing.

  • Determine the level of granularity for the bulkheads. For example, if you want to distribute tenants across partitions, you could place each tenant into a separate partition, or put several tenants into one partition.

  • Monitor each partition's performance and SLA.

When to use this pattern

Use this pattern to:

  • Isolate resources used to consume a set of backend services, especially if the application can provide some level of functionality even when one of the services is not responding.

  • Isolate critical consumers from standard consumers.

  • Protect the application from cascading failures.

This pattern may not be suitable when:

  • Less efficient use of resources may not be acceptable in the project.

  • The added complexity is not necessary

Example

The following Kubernetes configuration file creates an isolated container to run a single service, with its own CPU and memory resources and limits.

apiVersion: v1kind: Podmetadata:  name: drone-managementspec:  containers:  - name: drone-management-container    image: drone-service    resources:      requests:        memory: "64Mi"        cpu: "250m"      limits:        memory: "128Mi"        cpu: "1"

REF:learn.microsoft.com/en-us/azure…