关于断路器(CircuitBreaker)
- 下图来自
resilience4j官方文档,介绍了什么是断路器:
- CLOSED状态时,请求正常放行
- 请求失败率达到设定阈值时,变为OPEN状态,此时请求全部不放行
- OPEN状态持续设定时间后,进入半开状态(HALE_OPEN),放过部分请求
- 半开状态下,失败率低于设定阈值,就进入CLOSE状态,即全部放行
- 半开状态下,失败率高于设定阈值,就进入OPEN状态,即全部不放行
确认概念
- 有个概念先确认一下,即
Spring Cloud断路器与Spring Cloud Gateway断路器功能不是同一个概念,Spring Cloud Gateway断路器功能还涉及过滤器,即在过滤器的规则下使用断路器:
- 本篇的重点是
Spring Cloud Gateway如何配置和使用断路器(CircuitBreaker),因此不会讨论Resilience4J的细节,如果您想深入了解Resilience4J,推荐资料是Spring Cloud Circuit Breaker
关于Spring Cloud断路器
- 先看
Spring Cloud断路器,如下图,Hystrix、Sentinel这些都是熟悉的概念:
关于Spring Cloud Gateway的断路器功能
- 来看
Spring Cloud Gateway的官方文档,如下图,有几个关键点稍后介绍:
- 上图透露了几个关键信息:
Spring Cloud Gateway内置了断路器filter,- 具体做法是使用
Spring Cloud断路器的API,将gateway的路由逻辑封装到断路器中 - 有多个断路器的库都可以用在
Spring Cloud Gateway(遗憾的是没有列举是哪些) Resilience4J对Spring Cloud来说是开箱即用的
- 简单来说
Spring Cloud Gateway的断路器功能是通过内置filter实现的,这个filter使用了Spring Cloud断路器; - 官方说多个断路器的库都可以用在
Spring Cloud Gateway,但是并没有说具体是哪些,这就郁闷了,此时咱们去了解一位牛人的观点:Piotr Mińkowski,就是下面这本书的作者:
Piotr Mińkowski的博客对Spring Cloud Gateway的断路器功能做了详细介绍,如下图,有几个重要信息稍后会提到:
- 上图可以获取到三个关键信息:
- 从
2.2.1版本起,Spring Cloud Gateway集成了Resilience4J的断路器实现 Netflix的Hystrix进入了维护阶段(能理解为即将退休吗?)Netflix的Hystrix依然可用,但是已废弃(deprecated),而且Spring Cloud将来的版本可能会不支持
- 再关联到官方文档也以
resilience4为例(如下图),胆小的我似乎没有别的选择了,就Resilience4J吧:
理论分析就到此。
实战
- 服务提供者
nacos-provider新增/account/{id}接口:
package com.example.controller;
import lombok.extern.slf4j.Slf4j;
import org.springframework.http.HttpHeaders;
import org.springframework.web.bind.annotation.*;
import java.text.SimpleDateFormat;
import java.util.Date;
@Slf4j
@RestController
@RequestMapping("/nacos")
public class NacosController {
private String dateStr(){
return new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());
}
@GetMapping(value = "/test")
public String test(){
return "成功访问服务者接口111";
}
@GetMapping("/testHeader")
public String testHeader(@RequestHeader HttpHeaders headers) {
log.info("header: {}", headers);
return "Hello World" + new Date();
}
@GetMapping(value = "/account/{id}")
public String account(@PathVariable("id") int id) throws InterruptedException {
if(1==id) {
Thread.sleep(500);
}
return "Account" + dateStr();
}
}
- 新增
circuitbreaker-gateway子模块:
- 添加如下依赖:
<dependencies>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-gateway</artifactId>
</dependency>
<dependency>
<groupId>io.projectreactor</groupId>
<artifactId>reactor-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.springframework.cloud</groupId>
<artifactId>spring-cloud-starter-circuitbreaker-reactor-resilience4j</artifactId>
</dependency>
</dependencies>
- 配置文件
application.yml如下:
server:
#服务端口
port: 10012
spring:
application:
name: circuitbreaker-gateway
cloud:
gateway:
routes:
- id: path_route
uri: http://localhost:9001
predicates:
- Path=/nacos/**
filters:
- name: CircuitBreakerStatePrinter
- name: CircuitBreaker
args:
name: myCircuitBreaker
- 启动类:
package com.example;
import org.springframework.boot.SpringApplication;
import org.springframework.boot.autoconfigure.SpringBootApplication;
@SpringBootApplication
public class CircuitBreakerApplication {
public static void main(String[] args) {
SpringApplication.run(CircuitBreakerApplication.class, args);
}
}
- 配置类如下,这是断路器相关的参数配置:
package com.example.config;
import io.github.resilience4j.circuitbreaker.CircuitBreakerConfig;
import io.github.resilience4j.timelimiter.TimeLimiterConfig;
import org.springframework.cloud.circuitbreaker.resilience4j.ReactiveResilience4JCircuitBreakerFactory;
import org.springframework.cloud.circuitbreaker.resilience4j.Resilience4JConfigBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import java.time.Duration;
@Configuration
public class CustomizeCircuitBreakerConfig {
@Bean
public ReactiveResilience4JCircuitBreakerFactory defaultCustomizer() {
CircuitBreakerConfig circuitBreakerConfig = CircuitBreakerConfig.custom() //
.slidingWindowType(CircuitBreakerConfig.SlidingWindowType.TIME_BASED) // 滑动窗口的类型为时间窗口
.slidingWindowSize(10) // 时间窗口的大小为60秒
.minimumNumberOfCalls(5) // 在单位时间窗口内最少需要5次调用才能开始进行统计计算
.failureRateThreshold(50) // 在单位时间窗口内调用失败率达到50%后会启动断路器
.enableAutomaticTransitionFromOpenToHalfOpen() // 允许断路器自动由打开状态转换为半开状态
.permittedNumberOfCallsInHalfOpenState(5) // 在半开状态下允许进行正常调用的次数
.waitDurationInOpenState(Duration.ofSeconds(5)) // 断路器打开状态转换为半开状态需要等待60秒
.recordExceptions(Throwable.class) // 所有异常都当作失败来处理
.build();
ReactiveResilience4JCircuitBreakerFactory factory = new ReactiveResilience4JCircuitBreakerFactory();
factory.configureDefault(id -> new Resilience4JConfigBuilder(id)
.timeLimiterConfig(TimeLimiterConfig.custom().timeoutDuration(Duration.ofMillis(200)).build())
.circuitBreakerConfig(circuitBreakerConfig).build());
return factory;
}
}
- 上述代码有一点需要注意:
timeLimiterConfig方法设置了超时时间,服务提供者如果超过200毫秒没有响应,Spring Cloud Gateway就会向调用者返回失败 - 开发完成了,接下来要考虑的是如何验证
单元测试类
为了验证Spring Cloud Gateway的断路器功能,咱们可以用Junit单元测试来精确控制请求参数和请求次数,测试类如下,可见测试类会连续发一百次请求,在前五十次中,请求参数始终在0和1之间切换,参数等于1的时候,接口会有500毫秒延时,超过了Spring Cloud Gateway的200毫秒超时限制,这时候就会返回失败,等失败多了,就会触发断路器的断开:
package com.example;
import io.github.resilience4j.circuitbreaker.CircuitBreaker;
import org.junit.jupiter.api.RepeatedTest;
import org.junit.jupiter.api.Test;
import org.junit.jupiter.api.extension.ExtendWith;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.boot.test.autoconfigure.web.reactive.AutoConfigureWebTestClient;
import org.springframework.boot.test.context.SpringBootTest;
import org.springframework.http.MediaType;
import org.springframework.test.context.junit.jupiter.SpringExtension;
import org.springframework.test.web.reactive.server.WebTestClient;
@SpringBootTest
@ExtendWith(SpringExtension.class)
@AutoConfigureWebTestClient
public class CircuitbreakerTest {
// 测试的总次数
private static int i=0;
@Autowired
private WebTestClient webClient;
@Test
@RepeatedTest(100)
void testHelloPredicates() throws InterruptedException {
// 低于50次时,gen在0和1之间切换,也就是一次正常一次超时,
// 超过50次时,gen固定为0,此时每个请求都不会超时
int gen = (i<50) ? (i % 2) : 0;
// 次数加一
i++;
final String tag = "[" + i + "]";
// 发起web请求
webClient.get()
.uri("/nacos/account/" + gen)
.accept(MediaType.APPLICATION_JSON)
.exchange()
.expectBody(String.class).consumeWith(result -> System.out.println(tag + result.getRawStatusCode() + " - " + result.getResponseBody()));
Thread.sleep(1000);
}
}
验证
- 启动
nacos(服务提供者依赖的) - 启动子工程
nacos-provider - 运行咱们刚才开发的单元测试类,控制台输入的内容截取部分如下,稍后会有分析:
state : CLOSED
[2]504 - {"timestamp":"2024-01-19T09:22:21.454+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"35b9be8a"}
state : CLOSED
[3]200 - Account2024-01-19 05:22:22
state : CLOSED
[4]504 - {"timestamp":"2024-01-19T09:22:23.700+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"48d357af"}
state : CLOSED
[5]200 - Account2024-01-19 05:22:24
state : OPEN
[6]503 - {"timestamp":"2024-01-19T09:22:25.738+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"e43da69"}
state : OPEN
[7]503 - {"timestamp":"2024-01-19T09:22:26.755+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"8a09e9"}
state : OPEN
[8]503 - {"timestamp":"2024-01-19T09:22:27.770+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"57918132"}
state : OPEN
[9]503 - {"timestamp":"2024-01-19T09:22:28.782+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"7b9ae109"}
state : HALF_OPEN
[10]504 - {"timestamp":"2024-01-19T09:22:30.001+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"3f41d5fa"}
state : HALF_OPEN
[11]200 - Account2024-01-19 05:22:31
state : HALF_OPEN
[12]504 - {"timestamp":"2024-01-19T09:22:32.233+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"729aafe"}
state : HALF_OPEN
[13]200 - Account2024-01-19 05:22:33
state : HALF_OPEN
[14]504 - {"timestamp":"2024-01-19T09:22:34.471+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"71c45d5b"}
state : OPEN
[15]503 - {"timestamp":"2024-01-19T09:22:35.486+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"7ef53536"}
state : OPEN
[16]503 - {"timestamp":"2024-01-19T09:22:36.499+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"29b9e385"}
state : OPEN
[17]503 - {"timestamp":"2024-01-19T09:22:37.511+00:00","path":"/nacos/account/0","status":503,"error":"Service Unavailable","requestId":"70b6dde8"}
state : OPEN
[18]503 - {"timestamp":"2024-01-19T09:22:38.522+00:00","path":"/nacos/account/1","status":503,"error":"Service Unavailable","requestId":"4bf39267"}
state : HALF_OPEN
[19]200 - Account2024-01-19 05:22:39
state : HALF_OPEN
[20]504 - {"timestamp":"2024-01-19T09:22:40.757+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"5595fb9b"}
state : HALF_OPEN
[21]200 - Account2024-01-19 05:22:41
state : HALF_OPEN
[22]504 - {"timestamp":"2024-01-19T09:22:42.992+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"1cf20304"}
state : HALF_OPEN
[23]200 - Account2024-01-19 05:22:44
state : CLOSED
[24]504 - {"timestamp":"2024-01-19T09:22:45.223+00:00","path":"/nacos/account/1","status":504,"error":"Gateway Timeout","requestId":"1a062662"}
state : CLOSED
[25]200 - Account2024-01-19 05:22:46
- 分析上述输出的返回码:
504是超时返回的错误,200是服务提供者的正常返回504和200两种返回码都表示请求到达了服务提供者,所以此时断路器是关闭状态- 多次
504错误后,达到了配置的门限,触发断路器开启 - 连续出现的
503就是断路器开启后的返回码,此时请求是无法到达服务提供者的 - 连续的
503之后,504和200再次交替出现,证明此时进入半开状态,然后504再次达到门限触发断路器从半开转为开启,五十次之后,由于不在发送超时请求,断路器进入关闭状态
fallback
- 通过上述测试可见,
Spring Cloud Gateway通过返回码来告知调用者错误信息,这种方式不够友好,我们可以自定义fallback,在返回错误时由它来构建返回信息 - 在
circuitbreaker-gateway工程中添加一个接口:
package com.example.controller;
import org.springframework.web.bind.annotation.GetMapping;
import org.springframework.web.bind.annotation.RestController;
import java.text.SimpleDateFormat;
import java.util.Date;
@RestController
public class FallbackController {
private String dateStr(){
return new SimpleDateFormat("yyyy-MM-dd hh:mm:ss").format(new Date());
}
/**
* 返回字符串类型
* @return
*/
@GetMapping("/myfallback")
public String helloStr() {
return "myfallback, " + dateStr();
}
}
application.yml配置如下,可见是给filter增加了fallbackUri属性:
server:
#服务端口
port: 10012
spring:
application:
name: circuitbreaker-gateway
cloud:
gateway:
routes:
- id: path_route
uri: http://localhost:9001
predicates:
- Path=/nacos/**
filters:
- name: CircuitBreakerStatePrinter
- name: CircuitBreaker
args:
name: myCircuitBreaker
fallbackUri: forward:/myfallback
- 再运行单元测试,可见返回码全部是
200,原来的错误现在全部变成了刚才新增的接口的返回内容:
state : CLOSED
[2]200 - myfallback, 2024-01-19 05:36:01
state : CLOSED
[3]200 - Account2024-01-19 05:36:02
state : CLOSED
[4]200 - myfallback, 2024-01-19 05:36:03
state : CLOSED
[5]200 - Account2024-01-19 05:36:04
state : OPEN
[6]200 - myfallback, 2024-01-19 05:36:05
state : OPEN
[7]200 - myfallback, 2024-01-19 05:36:06
state : OPEN
[8]200 - myfallback, 2024-01-19 05:36:07
state : OPEN
[9]200 - myfallback, 2024-01-19 05:36:08
state : HALF_OPEN
[10]200 - myfallback, 2024-01-19 05:36:09
state : HALF_OPEN
[11]200 - Account2024-01-19 05:36:11
state : HALF_OPEN
[12]200 - myfallback, 2024-01-19 05:36:12
state : HALF_OPEN
[13]200 - Account2024-01-19 05:36:13
state : HALF_OPEN
[14]200 - myfallback, 2024-01-19 05:36:14
state : OPEN
[15]200 - myfallback, 2024-01-19 05:36:15
state : OPEN
[16]200 - myfallback, 2024-01-19 05:36:16
state : OPEN
[17]200 - myfallback, 2024-01-19 05:36:17
state : OPEN
[18]200 - myfallback, 2024-01-19 05:36:18
state : HALF_OPEN
[19]200 - Account2024-01-19 05:36:19
state : HALF_OPEN
[20]200 - myfallback, 2024-01-19 05:36:20
state : HALF_OPEN
[21]200 - Account2024-01-19 05:36:21
state : HALF_OPEN
[22]200 - myfallback, 2024-01-19 05:36:22
state : HALF_OPEN
[23]200 - Account2024-01-19 05:36:24
state : CLOSED
[24]200 - myfallback, 2024-01-19 05:36:25
state : CLOSED
[25]200 - Account2024-01-19 05:36:26
state : CLOSED
[26]200 - myfallback, 2024-01-19 05:36:27
state : CLOSED
[27]200 - Account2024-01-19 05:36:28
state : CLOSED
[28]200 - myfallback, 2024-01-19 05:36:29
state : OPEN
[29]200 - myfallback, 2024-01-19 05:36:30
state : OPEN
[30]200 - myfallback, 2024-01-19 05:36:31
state : OPEN
[31]200 - myfallback, 2024-01-19 05:36:32
state : OPEN
[32]200 - myfallback, 2024-01-19 05:36:33
state : HALF_OPEN
[33]200 - Account2024-01-19 05:36:34
state : HALF_OPEN
[34]200 - myfallback, 2024-01-19 05:36:35
state : HALF_OPEN
[35]200 - Account2024-01-19 05:36:36
state : HALF_OPEN
[36]200 - myfallback, 2024-01-19 05:36:38
state : HALF_OPEN
[37]200 - Account2024-01-19 05:36:39
state : CLOSED
[38]200 - myfallback, 2024-01-19 05:36:40
state : CLOSED
[39]200 - Account2024-01-19 05:36:41
- 至此,咱们已完成了
Spring Cloud Gateway的断路器功能的开发和测试,如果聪明好学的您并不满足这寥寥几行配置和代码,想要深入了解断路器的内部,那么请您接往下看,咱们聊聊它的源码;
源码分析
RouteDefinitionRouteLocator的构造方法(bean注入)中有如下代码,将name和实例绑定:
gatewayFilterFactories.forEach(factory -> this.gatewayFilterFactories.put(factory.name(), factory));
- 然后会在
loadGatewayFilters方法中使用这个map,找到上面put的bean; - 最终的效果:路由配置中指定了
name等于CircuitBreaker,即可对应SpringCloudCircuitBreakerFilterFactory类型的bean,因为它的name方法返回了"CircuitBreaker",如下图:
- 现在的问题:
SpringCloudCircuitBreakerFilterFactory类型的bean是什么?如下图红框,SpringCloudCircuitBreakerResilience4JFilterFactory是SpringCloudCircuitBreakerFilterFactory唯一的子类:
- 从上图来看,
CircuitBreaker类型的filter应该是SpringCloudCircuitBreakerResilience4JFilterFactory,不过那只是从继承关系推断出来的,还差一个关键证据:在spring中,到底存不存在SpringCloudCircuitBreakerResilience4JFilterFactory类型的bean? - 最终发现了
GatewayResilience4JCircuitBreakerAutoConfiguration中的配置,可以证明SpringCloudCircuitBreakerResilience4JFilterFactory会被实例化并注册到spring:
@Bean
@ConditionalOnBean(ReactiveResilience4JCircuitBreakerFactory.class)
@ConditionalOnEnabledFilter
public SpringCloudCircuitBreakerResilience4JFilterFactory springCloudCircuitBreakerResilience4JFilterFactory(ReactiveResilience4JCircuitBreakerFactory reactiveCircuitBreakerFactory, ObjectProvider<DispatcherHandler> dispatcherHandler) {
return new SpringCloudCircuitBreakerResilience4JFilterFactory(reactiveCircuitBreakerFactory, dispatcherHandler);
}
- 综上所述,当您配置了
CircuitBreaker过滤器时,实际上是SpringCloudCircuitBreakerResilience4JFilterFactory类在为您服务,而关键代码都集中在其父类SpringCloudCircuitBreakerFilterFactory中; - 所以,要想深入了解
Spring Cloud Gateway的断路器功能,请阅读SpringCloudCircuitBreakerFilterFactory.apply方法
一点遗憾
- 还记得刚才分析控制台输出的那段内容吗?就是下图红框中的那段,当时咱们用返回码来推测断路器处于什么状态:
- 看这段纯文字时,还是存在疑惑的,根据返回码就把断路器的状态确定了?例如
504的时候到底是关闭还是半开呢?都有可能吧,所以,这种推测只能证明断路器正在工作,但是无法确定某个时刻具体的状态 - 所以,需要一种更准确的方式知道每个时刻断路器的状态,这样才算对断路器有了深刻的了解