小白学习spring-cloud(九): Spring Cloud Gateway的断路器(CircuitBreaker)功能

463 阅读14分钟

关于断路器(CircuitBreaker)

  • 下图来自resilience4j官方文档,介绍了什么是断路器:

image.png

  1. CLOSED状态时,请求正常放行
  2. 请求失败率达到设定阈值时,变为OPEN状态,此时请求全部不放行
  3. OPEN状态持续设定时间后,进入半开状态(HALE_OPEN),放过部分请求
  4. 半开状态下,失败率低于设定阈值,就进入CLOSE状态,即全部放行
  5. 半开状态下,失败率高于设定阈值,就进入OPEN状态,即全部不放行

确认概念

  • 有个概念先确认一下,即Spring Cloud断路器与Spring Cloud Gateway断路器功能不是同一个概念,Spring Cloud Gateway断路器功能还涉及过滤器,即在过滤器的规则下使用断路器:

image.png

  • 本篇的重点是Spring Cloud Gateway如何配置和使用断路器(CircuitBreaker),因此不会讨论Resilience4J的细节,如果您想深入了解Resilience4J,推荐资料是Spring Cloud Circuit Breaker

关于Spring Cloud断路器

  • 先看Spring Cloud断路器,如下图,HystrixSentinel这些都是熟悉的概念:

image.png

关于Spring Cloud Gateway的断路器功能

  • 来看Spring Cloud Gateway的官方文档,如下图,有几个关键点稍后介绍:

image.png

  • 上图透露了几个关键信息:
  1. Spring Cloud Gateway内置了断路器filter
  2. 具体做法是使用Spring Cloud断路器的API,将gateway的路由逻辑封装到断路器中
  3. 有多个断路器的库都可以用在Spring Cloud Gateway(遗憾的是没有列举是哪些)
  4. Resilience4JSpring Cloud来说是开箱即用的
  • 简单来说Spring Cloud Gateway的断路器功能是通过内置filter实现的,这个filter使用了Spring Cloud断路器;
  • 官方说多个断路器的库都可以用在Spring Cloud Gateway,但是并没有说具体是哪些,这就郁闷了,此时咱们去了解一位牛人的观点:Piotr Mińkowski,就是下面这本书的作者:

image.png

Piotr Mińkowski的博客Spring Cloud Gateway的断路器功能做了详细介绍,如下图,有几个重要信息稍后会提到:

image.png

  • 上图可以获取到三个关键信息:
  1. 2.2.1版本起,Spring Cloud Gateway集成了Resilience4J的断路器实现
  2. NetflixHystrix进入了维护阶段(能理解为即将退休吗?)
  3. NetflixHystrix依然可用,但是已废弃(deprecated),而且Spring Cloud将来的版本可能会不支持
  • 再关联到官方文档也以resilience4为例(如下图),胆小的我似乎没有别的选择了,就Resilience4J吧:

image.png

理论分析就到此。

实战

  • 服务提供者nacos-provider新增/account/{id}接口:
package com.example.controller;

import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpHeaders;
import org.springframework.web.bind.annotation.*;

import java.text.SimpleDateFormat;
import java.util.Date;

@Slf4j
@RestController
@RequestMapping("/nacos")
public class NacosController {

    private String dateStr(){
        return new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());
    }

    @GetMapping(value = "/test")
    public String test(){
        return "成功访问服务者接口111";
    }

    @GetMapping("/testHeader")
    public String testHeader(@RequestHeader HttpHeaders headers) {
        log.info("header: {}", headers);

        return "Hello World" + new Date();
    }

    @GetMapping(value = "/account/{id}")
    public String account(@PathVariable("id") int id) throws InterruptedException {
        if(1==id) {
            Thread.sleep(500);
        }

        return "Account" + dateStr();
    }
}
  • 新增circuitbreaker-gateway子模块:
  1. 添加如下依赖:
<dependencies>
    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-gateway</artifactId>
    </dependency>

    <dependency>
        <groupId>io.projectreactor</groupId>
        <artifactId>reactor-test</artifactId>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-test</artifactId>
        <scope>test</scope>
    </dependency>

    <dependency>
        <groupId>org.springframework.cloud</groupId>
        <artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
    </dependency>
</dependencies>
  1. 配置文件application.yml如下:
server:
  #服务端口
  port: 10012
spring:
  application:
    name: circuitbreaker-gateway
  cloud:
    gateway:
      routes:
        - id: path_route
          uri: http://localhost:9001
          predicates:
            - Path=/nacos/**
          filters:
            - name: CircuitBreakerStatePrinter
            - name: CircuitBreaker
              args:
                name: myCircuitBreaker
  1. 启动类:
package com.example;

import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;

@SpringBootApplication
public class CircuitBreakerApplication {
    public static void main(String[] args) {
        SpringApplication.run(CircuitBreakerApplication.class, args);
    }
}
  1. 配置类如下,这是断路器相关的参数配置:
package com.example.config;

import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.timelimiter.TimeLimiterConfig;
import org.springframework.cloud.circuitbreaker.resilience4j.ReactiveResilience4JCircuitBreakerFactory;
import org.springframework.cloud.circuitbreaker.resilience4j.Resilience4JConfigBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import java.time.Duration;

@Configuration
public class CustomizeCircuitBreakerConfig {


    @Bean
    public ReactiveResilience4JCircuitBreakerFactory defaultCustomizer() {

        CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom() //
                .slidingWindowType(CircuitBreakerConfig.SlidingWindowType.TIME_BASED) // 滑动窗口的类型为时间窗口
                .slidingWindowSize(10) // 时间窗口的大小为60秒
                .minimumNumberOfCalls(5) // 在单位时间窗口内最少需要5次调用才能开始进行统计计算
                .failureRateThreshold(50) // 在单位时间窗口内调用失败率达到50%后会启动断路器
                .enableAutomaticTransitionFromOpenToHalfOpen() // 允许断路器自动由打开状态转换为半开状态
                .permittedNumberOfCallsInHalfOpenState(5) // 在半开状态下允许进行正常调用的次数
                .waitDurationInOpenState(Duration.ofSeconds(5)) // 断路器打开状态转换为半开状态需要等待60秒
                .recordExceptions(Throwable.class) // 所有异常都当作失败来处理
                .build();

        ReactiveResilience4JCircuitBreakerFactory factory = new ReactiveResilience4JCircuitBreakerFactory();
        factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
                .timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build())
                .circuitBreakerConfig(circuitBreakerConfig).build());

        return factory;
    }
}
  • 上述代码有一点需要注意:timeLimiterConfig方法设置了超时时间,服务提供者如果超过200毫秒没有响应,Spring Cloud Gateway就会向调用者返回失败
  • 开发完成了,接下来要考虑的是如何验证

单元测试类

为了验证Spring Cloud Gateway的断路器功能,咱们可以用Junit单元测试来精确控制请求参数和请求次数,测试类如下,可见测试类会连续发一百次请求,在前五十次中,请求参数始终在01之间切换,参数等于1的时候,接口会有500毫秒延时,超过了Spring Cloud Gateway200毫秒超时限制,这时候就会返回失败,等失败多了,就会触发断路器的断开:

package com.example;

import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import org.junit.jupiter.api.RepeatedTest;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.autoconfigure.web.reactive.AutoConfigureWebTestClient;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.MediaType;
import org.springframework.test.context.junit.jupiter.SpringExtension;
import org.springframework.test.web.reactive.server.WebTestClient;

@SpringBootTest
@ExtendWith(SpringExtension.class)
@AutoConfigureWebTestClient
public class CircuitbreakerTest {

    // 测试的总次数
    private static int i=0;

    @Autowired
    private WebTestClient webClient;

    @Test
    @RepeatedTest(100)
    void testHelloPredicates() throws InterruptedException {
        // 低于50次时,gen在0和1之间切换,也就是一次正常一次超时,
        // 超过50次时,gen固定为0,此时每个请求都不会超时
        int gen = (i<50) ? (i % 2) : 0;

        // 次数加一
        i++;

        final String tag = "[" + i + "]";

        // 发起web请求
        webClient.get()
                .uri("/nacos/account/" + gen)
                .accept(MediaType.APPLICATION_JSON)
                .exchange()
                .expectBody(String.class).consumeWith(result  -> System.out.println(tag + result.getRawStatusCode() + " - " + result.getResponseBody()));

        Thread.sleep(1000);
    }

}

验证

  • 启动nacos(服务提供者依赖的)
  • 启动子工程nacos-provider
  • 运行咱们刚才开发的单元测试类,控制台输入的内容截取部分如下,稍后会有分析:
state : CLOSED
[2]504 - {"timestamp":"2024-01-19T09:22:21.454+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"35b9be8a"}
state : CLOSED
[3]200 - Account2024-01-19 05:22:22
state : CLOSED
[4]504 - {"timestamp":"2024-01-19T09:22:23.700+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"48d357af"}
state : CLOSED
[5]200 - Account2024-01-19 05:22:24
state : OPEN
[6]503 - {"timestamp":"2024-01-19T09:22:25.738+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"e43da69"}
state : OPEN
[7]503 - {"timestamp":"2024-01-19T09:22:26.755+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"8a09e9"}
state : OPEN
[8]503 - {"timestamp":"2024-01-19T09:22:27.770+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"57918132"}
state : OPEN
[9]503 - {"timestamp":"2024-01-19T09:22:28.782+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"7b9ae109"}
state : HALF_OPEN
[10]504 - {"timestamp":"2024-01-19T09:22:30.001+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"3f41d5fa"}
state : HALF_OPEN
[11]200 - Account2024-01-19 05:22:31
state : HALF_OPEN
[12]504 - {"timestamp":"2024-01-19T09:22:32.233+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"729aafe"}
state : HALF_OPEN
[13]200 - Account2024-01-19 05:22:33
state : HALF_OPEN
[14]504 - {"timestamp":"2024-01-19T09:22:34.471+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"71c45d5b"}
state : OPEN
[15]503 - {"timestamp":"2024-01-19T09:22:35.486+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"7ef53536"}
state : OPEN
[16]503 - {"timestamp":"2024-01-19T09:22:36.499+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"29b9e385"}
state : OPEN
[17]503 - {"timestamp":"2024-01-19T09:22:37.511+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"70b6dde8"}
state : OPEN
[18]503 - {"timestamp":"2024-01-19T09:22:38.522+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"4bf39267"}
state : HALF_OPEN
[19]200 - Account2024-01-19 05:22:39
state : HALF_OPEN
[20]504 - {"timestamp":"2024-01-19T09:22:40.757+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"5595fb9b"}
state : HALF_OPEN
[21]200 - Account2024-01-19 05:22:41
state : HALF_OPEN
[22]504 - {"timestamp":"2024-01-19T09:22:42.992+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"1cf20304"}
state : HALF_OPEN
[23]200 - Account2024-01-19 05:22:44
state : CLOSED
[24]504 - {"timestamp":"2024-01-19T09:22:45.223+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"1a062662"}
state : CLOSED
[25]200 - Account2024-01-19 05:22:46
  • 分析上述输出的返回码:
  1. 504是超时返回的错误,200是服务提供者的正常返回
  2. 504200两种返回码都表示请求到达了服务提供者,所以此时断路器是关闭状态
  3. 多次504错误后,达到了配置的门限,触发断路器开启
  4. 连续出现的503就是断路器开启后的返回码,此时请求是无法到达服务提供者的
  5. 连续的503之后,504200再次交替出现,证明此时进入半开状态,然后504再次达到门限触发断路器从半开转为开启,五十次之后,由于不在发送超时请求,断路器进入关闭状态

fallback

  • 通过上述测试可见,Spring Cloud Gateway通过返回码来告知调用者错误信息,这种方式不够友好,我们可以自定义fallback,在返回错误时由它来构建返回信息
  • circuitbreaker-gateway工程中添加一个接口:
package com.example.controller;

import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;

import java.text.SimpleDateFormat;
import java.util.Date;

@RestController
public class FallbackController {
    private String dateStr(){
        return new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());
    }

    /**
     * 返回字符串类型
     * @return
     */
    @GetMapping("/myfallback")
    public String helloStr() {
        return "myfallback, " + dateStr();
    }
}
  • application.yml配置如下,可见是给filter增加了fallbackUri属性:
server:
  #服务端口
  port: 10012
spring:
  application:
    name: circuitbreaker-gateway
  cloud:
    gateway:
      routes:
        - id: path_route
          uri: http://localhost:9001
          predicates:
            - Path=/nacos/**
          filters:
            - name: CircuitBreakerStatePrinter
            - name: CircuitBreaker
              args:
                name: myCircuitBreaker
                fallbackUri: forward:/myfallback
  • 再运行单元测试,可见返回码全部是200,原来的错误现在全部变成了刚才新增的接口的返回内容:
state : CLOSED
[2]200 - myfallback, 2024-01-19 05:36:01
state : CLOSED
[3]200 - Account2024-01-19 05:36:02
state : CLOSED
[4]200 - myfallback, 2024-01-19 05:36:03
state : CLOSED
[5]200 - Account2024-01-19 05:36:04
state : OPEN
[6]200 - myfallback, 2024-01-19 05:36:05
state : OPEN
[7]200 - myfallback, 2024-01-19 05:36:06
state : OPEN
[8]200 - myfallback, 2024-01-19 05:36:07
state : OPEN
[9]200 - myfallback, 2024-01-19 05:36:08
state : HALF_OPEN
[10]200 - myfallback, 2024-01-19 05:36:09
state : HALF_OPEN
[11]200 - Account2024-01-19 05:36:11
state : HALF_OPEN
[12]200 - myfallback, 2024-01-19 05:36:12
state : HALF_OPEN
[13]200 - Account2024-01-19 05:36:13
state : HALF_OPEN
[14]200 - myfallback, 2024-01-19 05:36:14
state : OPEN
[15]200 - myfallback, 2024-01-19 05:36:15
state : OPEN
[16]200 - myfallback, 2024-01-19 05:36:16
state : OPEN
[17]200 - myfallback, 2024-01-19 05:36:17
state : OPEN
[18]200 - myfallback, 2024-01-19 05:36:18
state : HALF_OPEN
[19]200 - Account2024-01-19 05:36:19
state : HALF_OPEN
[20]200 - myfallback, 2024-01-19 05:36:20
state : HALF_OPEN
[21]200 - Account2024-01-19 05:36:21
state : HALF_OPEN
[22]200 - myfallback, 2024-01-19 05:36:22
state : HALF_OPEN
[23]200 - Account2024-01-19 05:36:24
state : CLOSED
[24]200 - myfallback, 2024-01-19 05:36:25
state : CLOSED
[25]200 - Account2024-01-19 05:36:26
state : CLOSED
[26]200 - myfallback, 2024-01-19 05:36:27
state : CLOSED
[27]200 - Account2024-01-19 05:36:28
state : CLOSED
[28]200 - myfallback, 2024-01-19 05:36:29
state : OPEN
[29]200 - myfallback, 2024-01-19 05:36:30
state : OPEN
[30]200 - myfallback, 2024-01-19 05:36:31
state : OPEN
[31]200 - myfallback, 2024-01-19 05:36:32
state : OPEN
[32]200 - myfallback, 2024-01-19 05:36:33
state : HALF_OPEN
[33]200 - Account2024-01-19 05:36:34
state : HALF_OPEN
[34]200 - myfallback, 2024-01-19 05:36:35
state : HALF_OPEN
[35]200 - Account2024-01-19 05:36:36
state : HALF_OPEN
[36]200 - myfallback, 2024-01-19 05:36:38
state : HALF_OPEN
[37]200 - Account2024-01-19 05:36:39
state : CLOSED
[38]200 - myfallback, 2024-01-19 05:36:40
state : CLOSED
[39]200 - Account2024-01-19 05:36:41
  • 至此,咱们已完成了Spring Cloud Gateway的断路器功能的开发和测试,如果聪明好学的您并不满足这寥寥几行配置和代码,想要深入了解断路器的内部,那么请您接往下看,咱们聊聊它的源码;

源码分析

  • RouteDefinitionRouteLocator的构造方法(bean注入)中有如下代码,将name和实例绑定:
gatewayFilterFactories.forEach(factory -> this.gatewayFilterFactories.put(factory.name(), factory));
  • 然后会在loadGatewayFilters方法中使用这个map,找到上面putbean
  • 最终的效果:路由配置中指定了name等于CircuitBreaker,即可对应SpringCloudCircuitBreakerFilterFactory类型的bean,因为它的name方法返回了"CircuitBreaker",如下图:

image.png

  • 现在的问题:SpringCloudCircuitBreakerFilterFactory类型的bean是什么?如下图红框,SpringCloudCircuitBreakerResilience4JFilterFactorySpringCloudCircuitBreakerFilterFactory唯一的子类:

image.png

  • 从上图来看,CircuitBreaker类型的filter应该是SpringCloudCircuitBreakerResilience4JFilterFactory,不过那只是从继承关系推断出来的,还差一个关键证据:在spring中,到底存不存在SpringCloudCircuitBreakerResilience4JFilterFactory类型的bean
  • 最终发现了GatewayResilience4JCircuitBreakerAutoConfiguration中的配置,可以证明SpringCloudCircuitBreakerResilience4JFilterFactory会被实例化并注册到spring
@Bean
@ConditionalOnBean(ReactiveResilience4JCircuitBreakerFactory.class)
@ConditionalOnEnabledFilter
public SpringCloudCircuitBreakerResilience4JFilterFactory springCloudCircuitBreakerResilience4JFilterFactory(ReactiveResilience4JCircuitBreakerFactory reactiveCircuitBreakerFactory, ObjectProvider<DispatcherHandler> dispatcherHandler) {
	return new SpringCloudCircuitBreakerResilience4JFilterFactory(reactiveCircuitBreakerFactory, dispatcherHandler);
}
  • 综上所述,当您配置了CircuitBreaker过滤器时,实际上是SpringCloudCircuitBreakerResilience4JFilterFactory类在为您服务,而关键代码都集中在其父类SpringCloudCircuitBreakerFilterFactory中;
  • 所以,要想深入了解Spring Cloud Gateway的断路器功能,请阅读SpringCloudCircuitBreakerFilterFactory.apply方法

一点遗憾

  • 还记得刚才分析控制台输出的那段内容吗?就是下图红框中的那段,当时咱们用返回码来推测断路器处于什么状态:

image.png

  • 看这段纯文字时,还是存在疑惑的,根据返回码就把断路器的状态确定了?例如504的时候到底是关闭还是半开呢?都有可能吧,所以,这种推测只能证明断路器正在工作,但是无法确定某个时刻具体的状态
  • 所以,需要一种更准确的方式知道每个时刻断路器的状态,这样才算对断路器有了深刻的了解