Feign 集成 Hystrix 并发导致RejectedExecutionException

814 阅读2分钟

  在测试环境上,突然发现报错,报错地方是 feign 调用别的服务接口时打印的:

java.util.concurrent.RejectedExecutionException: Task java.util.concurrent.FutureTask@18168c4a rejected from java.util.concurrent.ThreadPoolExecutor@52aa7130[Running, pool size = 10, active threads = 10, queued tasks = 0, completed tasks = 0]
	at java.util.concurrent.ThreadPoolExecutor$AbortPolicy.rejectedExecution(ThreadPoolExecutor.java:2047) ~[na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.reject(ThreadPoolExecutor.java:823) [na:1.8.0_131]
	at java.util.concurrent.ThreadPoolExecutor.execute(ThreadPoolExecutor.java:1369) [na:1.8.0_131]
	at java.util.concurrent.AbstractExecutorService.submit(AbstractExecutorService.java:112) ~[na:1.8.0_131]
	at com.netflix.hystrix.strategy.concurrency.HystrixContextScheduler$ThreadPoolWorker.schedule(HystrixContextScheduler.java:172) ~[hystrix-core-1.5.18.jar:1.5.18]
	at com.netflix.hystrix.strategy.concurrency.HystrixContextScheduler$HystrixContextSchedulerWorker.schedule(HystrixContextScheduler.java:106) ~[hystrix-core-1.5.18.jar:1.5.18]
	at rx.internal.operators.OperatorSubscribeOn.call(OperatorSubscribeOn.java:50) ~[rxjava-1.3.8.jar:1.3.8]
	at rx.internal.operators.OperatorSubscribeOn.call(OperatorSubscribeOn.java:30) ~[rxjava-1.3.8.jar:1.3.8]
........

  第一眼看到是线程池拒绝策略,但是为什么是线程池呢?feign 底层应该也是 http 请求,http 请求和线程池有什么关系呢?

排查解决

  怀着疑问,我们看到报错日志堆栈信息上有第 6 行 HystrixContextScheduler 类里打印的,点进去发现:

  这很明显是定时线程池的方法,那说明 worker 是个线程池,我们在本地打断点

  worker 内部确实有个 threadPool,而最大线程是 10 ,队列长度是 0 ,说明并发超过 10 个,那么 feign 调用就会报错,因此最简单的方式就是调整线程或队列大小来增加容错率

  那么我们百度:”feign hystrix 线程池配置“,将下面配置复制到配置文件内,完成(当然这些配置要根据服务器配置等情况来进行设置)。

hystrix:
    threadpool:
        default:
            #并发执行的最大线程数,默认10
            coreSize: 200
            #BlockingQueue的最大队列数,默认值-1
            maxQueueSize: 1000
            #即便maxQueueSize没有达到,达到queueSizeRejectionThreshold该值后,请求也会被拒绝,默认值5
            queueSizeRejectionThreshold: 800 

刨根问题

queueSizeRejectionThreshold

  在百度的结果里,发现有个 queueSizeRejectionThreshold,对此我很怀疑,这个参数是干什么的?

  而想到刚才的 schedule 方法里,有一个 if 条件,我们看到名字就知道是判断是否有可用的队列空间,并且点进去发现有个方法就是判断 queueSizeRejectionThreshold 的

public boolean isQueueSpaceAvailable() {
    if (queueSize <= 0) {
        // we don't have a queue so we won't look for space but instead
        // let the thread-pool reject or not
        return true;
    } else {

        //   这里
        return threadPool.getQueue().size() < properties.queueSizeRejectionThreshold().get();
    }
}

  再跟着往里看就到了 这个参数,文档上写的有以下注释:

Queue size rejection threshold is an artificial "max" size at which rejections will occur even if maxQueueSize has not been reached. This is done because the maxQueueSize of a BlockingQueue can not be dynamically changed and we want to support dynamically changing the queue size that affects rejections.

  因为 blockingQueue 不能动态更改 size,是为了实现动态更改影响拒绝的队列大小。

hystrix 隔离级别

  虽然已经解决了,但是 feign 为什么会需要线程池呢?

  还是百度:hystrix 线程池,有个文章是:hystrix 隔离策略:线程池、信号量。当然最权威的还是官网

  hystrix 有个作用是:保护服务器资源,到这里就真相大白了,正常来说,一个用户请求,tomcat 就会创建一个线程来进行处理(tomcat 默认最大线程池是 200),信号量和线程池都是为了保护请求过多时,服务器压力过大。