Hystrix源码阅读(二)缓存、熔断、资源隔离、降级

1,762 阅读15分钟

前言

本章结合Hystrix的配置,阅读Hystrix核心功能源码,包括缓存、熔断、资源隔离、降级。

一、缓存

配置

  • requestCache.enabled:缓存功能是否开启,默认true。

核心源码

缓存,是HystrixCommand进入toObservable方法的第一个主流程,先读缓存,如果缓存命中则直接返回;否则继续执行,待执行结束后将结果放入缓存。Hystrix的缓存由HystrixRequestCache管理,不细讲,因为目前业务中很少使用Hystrix缓存。

// 缓存单例
protected final HystrixRequestCache requestCache;
// 构造方法
protected AbstractCommand(...) {
    this.requestCache = HystrixRequestCache.getInstance(this.commandKey, this.concurrencyStrategy);
}
public Observable<R> toObservable() {
        final AbstractCommand<R> _cmd = this;
		// onTerminate处理方法
        final Action0 terminateCommandCleanup = ...;
		// onUnsubscribe处理方法
        final Action0 unsubscribeCommandCleanup = ...;
		// 后续流程
        final Func0<Observable<R>> applyHystrixSemantics = ...;
		// HystrixCommandExecutionHook的onComplete、onEmit封装
        final Func1<R, R> wrapWithAllOnNextHooks = ...;
		// OnCompleted
        final Action0 fireOnCompletedHook = ...;
		// defer主逻辑
        return Observable.defer(new Func0<Observable<R>>() {
            @Override
            public Observable<R> call() {
            	// 设置命令状态
                if (!commandState.compareAndSet(CommandState.NOT_STARTED, CommandState.OBSERVABLE_CHAIN_CREATED)) {
                    throw new HystrixRuntimeException();
                }
                // 省略requestLog执行逻辑...
                // 1. 根据配置和cacheKey!=null 共同决定是否开启缓存
                final boolean requestCacheEnabled = isRequestCachingEnabled();
                // 2. 获取cacheKey
                final String cacheKey = getCacheKey();

                if (requestCacheEnabled) {
                	// 3. 从缓存获取HystrixCommandResponseFromCache
                    HystrixCommandResponseFromCache<R> fromCache = (HystrixCommandResponseFromCache<R>) requestCache.get(cacheKey);
                    if (fromCache != null) {
                        isResponseFromCache = true;
                        // 用缓存构造返回
                        return handleRequestCacheHitAndEmitValues(fromCache, _cmd);
                    }
                }
				// 封装后续流程Observable
                Observable<R> hystrixObservable =
                        Observable.defer(applyHystrixSemantics)
                                .map(wrapWithAllOnNextHooks);

                Observable<R> afterCache;
				// 4. 如果缓存命中
                if (requestCacheEnabled && cacheKey != null) {
                    // 封装命令和hystrixObservable
                    HystrixCachedObservable<R> toCache = HystrixCachedObservable.from(hystrixObservable, _cmd);
                    // 放入缓存
                    HystrixCommandResponseFromCache<R> fromCache = (HystrixCommandResponseFromCache<R>) requestCache.putIfAbsent(cacheKey, toCache);
                    if (fromCache != null) {
                        toCache.unsubscribe();
                        isResponseFromCache = true;
                        return handleRequestCacheHitAndEmitValues(fromCache, _cmd);
                    } else {
                        afterCache = toCache.toObservable();
                    }
                } else {
                    afterCache = hystrixObservable;
                }

                return afterCache
                        .doOnTerminate(terminateCommandCleanup)
                        .doOnUnsubscribe(unsubscribeCommandCleanup)
                        .doOnCompleted(fireOnCompletedHook);
            }
        });
    }

isRequestCachingEnabled判断缓存是否开启,getCacheKey方法由子类重写,所以没有实现getCacheKey方法的HystrixCommand是不会走缓存的。

protected boolean isRequestCachingEnabled() {
    return properties.requestCacheEnabled().get() && getCacheKey() != null;
}

二、熔断

配置

  • circuitBreaker.enabled:是否允许使用断路器,默认true。
  • circuitBreaker.forceOpen:是否强制开启断路器,默认false。
  • circuitBreaker.forceClosed:是否强制关闭断路器,默认false。
  • circuitBreaker.requestVolumeThreshold:滑动窗口时间内(默认metrics.rollingStats.timeInMilliseconds=10s),请求数量总数,达到n后才可以统计失败率进而执行熔断,默认20。
  • circuitBreaker.errorThresholdPercentage:滑动窗口时间内(默认metrics.rollingStats.timeInMilliseconds=10s),错误率达到n%后,开启断路器,默认50。
  • circuitBreaker.sleepWindowInMilliseconds:断路器开启后n毫秒内,拒绝请求,超过这个时间就可以尝试再次发起请求。

HystrixCircuitBreaker

HystrixCircuitBreaker是Hystrix断路器的抽象接口,有三个抽象方法需要实现:

  • boolean allowRequest():每次HystrixCommand执行,都要调用这个方法,确定是否可以继续执行。
  • boolean isOpen():断路器是否打开,可见allowRequest和isOpen可能是不同的实现,并不是断路器打开,就不允许发起请求。
  • void markSuccess():HystrixCommand执行成功后调用,更新断路器状态。

HystrixCircuitBreakerImpl

HystrixCircuitBreakerImpl是HystrixCircuitBreaker的默认实现。

  • 成员变量和构造方法:
static class HystrixCircuitBreakerImpl implements HystrixCircuitBreaker {
		// 配置文件
        private final HystrixCommandProperties properties;
        // HystrixCommandMetrics负责HystrixCommand的指标统计管理
        private final HystrixCommandMetrics metrics;
		// 断路器开关
        private AtomicBoolean circuitOpen = new AtomicBoolean(false);
		// 断路器 首次打开时间 / 半开状态 上次尝试时间
        private AtomicLong circuitOpenedOrLastTestedTime = new AtomicLong();
        protected HystrixCircuitBreakerImpl(HystrixCommandKey key, HystrixCommandGroupKey commandGroup, HystrixCommandProperties properties, HystrixCommandMetrics metrics) {
            this.properties = properties;
            this.metrics = metrics;
        }
}        
  • isOpen:根据标识位和统计信息判断断路器是否开启
@Override
public boolean isOpen() {
	// 如果断路器开启,返回断路器开启
    if (circuitOpen.get()) {
        return true;
    }
    // 获取HealthCounts
    HealthCounts health = metrics.getHealthCounts();

    // 如果 请求总数 < circuitBreaker.requestVolumeThreshold,返回断路器关闭
    if (health.getTotalRequests() < properties.circuitBreakerRequestVolumeThreshold().get()) {
        return false;
    }

	// 如果错误率 < circuitBreaker.errorThresholdPercentage,返回断路器关闭
    if (health.getErrorPercentage() < properties.circuitBreakerErrorThresholdPercentage().get()) {
        return false;
    } else {
        // 断路器开启
        if (circuitOpen.compareAndSet(false, true)) {
            // 如果cas成功,设置断路器初次开启的时间戳
            circuitOpenedOrLastTestedTime.set(System.currentTimeMillis());
            return true;
        } else {
            return true;
        }
    }
}
  • allowSingleTest:断路器半开状态,判断是否可以尝试执行一次请求
 public boolean allowSingleTest() {
 	// 获取上次断路器开启的时间戳
    long timeCircuitOpenedOrWasLastTested = circuitOpenedOrLastTestedTime.get();
    // 如果断路器开启,并且 当前时间 - 断路器上次开启时间 > 配置circuitBreaker.sleepWindowInMilliseconds
    if (circuitOpen.get() && System.currentTimeMillis() > timeCircuitOpenedOrWasLastTested + properties.circuitBreakerSleepWindowInMilliseconds().get()) {
    	// cas修改断路器上次开启时间
        if (circuitOpenedOrLastTestedTime.compareAndSet(timeCircuitOpenedOrWasLastTested, System.currentTimeMillis())) {
        	// 修改成功后,返回允许尝试执行一次请求
            return true;
        }
    }
    // 其他情况都返回不能执行请求
    return false;
}
  • allowRequest:是否允许执行HystrixCommand
@Override
public boolean allowRequest() {
	// 如果circuitBreaker.forceOpen=true,返回不能执行
    if (properties.circuitBreakerForceOpen().get()) {
        return false;
    }
    // 如果circuitBreaker.forceClosed=true,返回可以执行
    if (properties.circuitBreakerForceClosed().get()) {
        // 调用isOpen,根据统计信息,更新断路器的开闭状态
        isOpen();
        return true;
    }
    // 没特殊配置的情况下
    // 断路器关闭 或 半开状态允许尝试发起一次请求
    return !isOpen() || allowSingleTest();
}
  • markSuccess:断路器标识更新,HealthCountsStream重置
public void markSuccess() {
    if (circuitOpen.get()) {
        if (circuitOpen.compareAndSet(true, false)) {
            // HealthCountsStream重置
            metrics.resetStream();
        }
    }
}

HealthCounts

HealthCounts维护了时间窗口内请求数量相关的统计信息,不同时间窗口内的HealthCounts统一由HealthCountsStream管理。

public static class HealthCounts {
		// 总请求数量
        private final long totalCount;
        // 错误数量
        private final long errorCount;
        // 错误率
        private final int errorPercentage;
}

plus方法,用于更新时间窗口内的统计信息,入参是事件数组,下标是事件枚举.ordinal,元素是事件发生次数。

public HealthCounts plus(long[] eventTypeCounts) {
    long updatedTotalCount = totalCount;
    long updatedErrorCount = errorCount;
    long successCount = eventTypeCounts[HystrixEventType.SUCCESS.ordinal()];
    long failureCount = eventTypeCounts[HystrixEventType.FAILURE.ordinal()];
    long timeoutCount = eventTypeCounts[HystrixEventType.TIMEOUT.ordinal()];
    long threadPoolRejectedCount = eventTypeCounts[HystrixEventType.THREAD_POOL_REJECTED.ordinal()];
    long semaphoreRejectedCount = eventTypeCounts[HystrixEventType.SEMAPHORE_REJECTED.ordinal()];
    // 总数
    updatedTotalCount += (successCount + failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
    // 错误数 = 总数 - 成功数
    updatedErrorCount += (failureCount + timeoutCount + threadPoolRejectedCount + semaphoreRejectedCount);
    return new HealthCounts(updatedTotalCount, updatedErrorCount);
}

核心源码

AbstractCommand构造方法初始化断路器

protected AbstractCommand(HystrixCommandGroupKey group, HystrixCommandKey key, HystrixThreadPoolKey threadPoolKey, HystrixCircuitBreaker circuitBreaker, HystrixThreadPool threadPool,
            HystrixCommandProperties.Setter commandPropertiesDefaults, HystrixThreadPoolProperties.Setter threadPoolPropertiesDefaults,
            HystrixCommandMetrics metrics, TryableSemaphore fallbackSemaphore, TryableSemaphore executionSemaphore,
            HystrixPropertiesStrategy propertiesStrategy, HystrixCommandExecutionHook executionHook) {
        // 初始化断路器
        this.circuitBreaker = initCircuitBreaker(this.properties.circuitBreakerEnabled().get(), circuitBreaker, this.commandGroup, this.commandKey, this.properties, this.metrics);
}


private static HystrixCircuitBreaker initCircuitBreaker(boolean enabled, HystrixCircuitBreaker fromConstructor,
                                                            HystrixCommandGroupKey groupKey, HystrixCommandKey commandKey,
                                                            HystrixCommandProperties properties, HystrixCommandMetrics metrics) {
    if (enabled) { // circuitBreaker.enabled = true
        if (fromConstructor == null) {
        	// 创建HystrixCircuitBreakerImpl
            // 基于commandKey维度
            return HystrixCircuitBreaker.Factory.getInstance(commandKey, groupKey, properties, metrics);
        } else {
            return fromConstructor;
        }
    } else {
    	// NoOpCircuitBreaker
        // allowRequest返回true
        // isOpen返回false
        return new NoOpCircuitBreaker();
    }
}

toObservable的第一层是缓存处理,第二层就是断路器判断,入口方法是applyHystrixSemantics

private Observable<R> applyHystrixSemantics(final AbstractCommand<R> _cmd) {
	// 如果断路器允许请求,执行后续逻辑,allowRequest上面讲过
    if (circuitBreaker.allowRequest()) {
        // ...
    } else {
    	// 否则定制异常,执行降级逻辑
        return handleShortCircuitViaFallback();
    }
}

handleShortCircuitViaFallback包装断路异常,执行降级逻辑

 private Observable<R> handleShortCircuitViaFallback() {
    // HystrixEventNotifier默认空实现
    eventNotifier.markEvent(HystrixEventType.SHORT_CIRCUITED, commandKey);
    // 异常封装
    Exception shortCircuitException = new RuntimeException("Hystrix circuit short-circuited and is OPEN");
    // 设置执行异常
    executionResult = executionResult.setExecutionException(shortCircuitException);
    try {
    	// 降级逻辑 后续阅读
        return getFallbackOrThrowException(this, HystrixEventType.SHORT_CIRCUITED, FailureType.SHORTCIRCUIT,
                "short-circuited", shortCircuitException);
    } catch (Exception e) {
        return Observable.error(e);
    }
}

三、资源隔离

配置

公共配置

  • execution.isolation.strategy:隔离策略,可选Thread、Semaphore,默认Thread。官方推荐,当有网络调用时,使用Thread;当纯内存操作或并发非常高导致创建线程资源消耗非常大时,使用Semaphore信号量。
  • execution.timeout.enabled:HystrixCommand.run()是否有超时控制,默认true。
  • execution.isolation.thread.timeoutInMilliseconds:超时时间,默认1s。不要被配置命名误导,目前无论是信号量隔离还是线程隔离策略,都可以通过这个属性设置超时时间。

线程池配置

  • coreSize:核心线程数,默认10。
  • allowMaximumSizeToDivergeFromCoreSize:是否支持最大线程数配置,默认false,核心线程数等于最大线程数。
  • maximumSize:最大线程数,默认10。
  • maxQueueSize:线程池等待队列长度,默认-1,队列实现为SynchronousQueue。设置为大于0的数,队列实现为LinkedBlockingQueue。不支持动态更新这个配置,如果要更新这个配置需要重新初始化线程池。
  • queueSizeRejectionThreshold:当maxQueueSize>0,因为maxQueueSize不支持动态更新,这个配置的目的是动态更新,来拒绝请求,默认5。
  • keepAliveTimeMinutes:非核心线程空闲时间,默认1min。

信号量配置

  • execution.isolation.semaphore.maxConcurrentRequests:HystrixCommand.run同时执行的最大数量,默认10。

核心源码

toObservable第三层逻辑就是资源隔离。

信号量隔离

首先处理的是信号量隔离策略,入口是AbstractCommand#applyHystrixSemantics

private Observable<R> applyHystrixSemantics(final AbstractCommand<R> _cmd) {
	// 断路器允许请求
    if (circuitBreaker.allowRequest()) {
    	// 获取信号量
        final TryableSemaphore executionSemaphore = getExecutionSemaphore();
        // 信号量是否被释放
        final AtomicBoolean semaphoreHasBeenReleased = new AtomicBoolean(false);
        // 释放信号量的方法
        final Action0 singleSemaphoreRelease = new Action0() {
            @Override
            public void call() {
                if (semaphoreHasBeenReleased.compareAndSet(false, true)) {
                    executionSemaphore.release();
                }
            }
        };
		// 空实现 忽略
        final Action1<Throwable> markExceptionThrown = new Action1<Throwable>() {
            @Override
            public void call(Throwable t) {
                eventNotifier.markEvent(HystrixEventType.EXCEPTION_THROWN, commandKey);
            }
        };
		// 尝试获取信号量
        if (executionSemaphore.tryAcquire()) {
            try {
            	// 标记开始处理时间
                executionResult = executionResult.setInvocationStartTime(System.currentTimeMillis());
                // 后续流程
                return executeCommandAndObserve(_cmd)
                        .doOnError(markExceptionThrown)
                        .doOnTerminate(singleSemaphoreRelease)
                        .doOnUnsubscribe(singleSemaphoreRelease);
            } catch (RuntimeException e) {
                return Observable.error(e);
            }
        } else {
        	// 获取信号量失败 降级处理
            return handleSemaphoreRejectionViaFallback();
        }
    } else {
    	// 断路器打开 降级处理
        return handleShortCircuitViaFallback();
    }
}

getExecutionSemaphore方法获取信号量。

// 静态成员变量 executionSemaphorePerCircuit 
// 保存 commandKey - 信号量 的 映射关系
protected static final ConcurrentHashMap<String, TryableSemaphore> executionSemaphorePerCircuit = new ConcurrentHashMap<String, TryableSemaphore>();

protected TryableSemaphore getExecutionSemaphore() {
    if (properties.executionIsolationStrategy().get() == ExecutionIsolationStrategy.SEMAPHORE) {
    	// 如果策略是信号量
        if (executionSemaphoreOverride == null) {
        	// 先从静态成员变量executionSemaphorePerCircuit获取信号量
            TryableSemaphore _s = executionSemaphorePerCircuit.get(commandKey.name());
            if (_s == null) {
            	// 创建Hystrix自己的信号量实现类TryableSemaphoreActual
                executionSemaphorePerCircuit.putIfAbsent(commandKey.name(), new TryableSemaphoreActual(properties.executionIsolationSemaphoreMaxConcurrentRequests()));
                return executionSemaphorePerCircuit.get(commandKey.name());
            } else {
                return _s;
            }
        } else {
            return executionSemaphoreOverride;
        }
    } else {
        // 如果策略是线程隔离,返回TryableSemaphoreNoOp.DEFAULT
        // TryableSemaphoreNoOp.DEFAULT的tryAcquire方法永远返回true
        return TryableSemaphoreNoOp.DEFAULT;
    }
}

TryableSemaphoreActual是一个不依赖于JUC的AQS的简单信号量实现。

static class TryableSemaphoreActual implements TryableSemaphore {
	// execution.isolation.semaphore.maxConcurrentRequests动态配置
    protected final HystrixProperty<Integer> numberOfPermits;
    // 当前已经获取的信号量
    private final AtomicInteger count = new AtomicInteger(0);

    public TryableSemaphoreActual(HystrixProperty<Integer> numberOfPermits) {
        this.numberOfPermits = numberOfPermits;
    }

    @Override
    public boolean tryAcquire() {
        int currentCount = count.incrementAndGet();
        if (currentCount > numberOfPermits.get()) {
            count.decrementAndGet();
            return false;
        } else {
            return true;
        }
    }

    @Override
    public void release() {
        count.decrementAndGet();
    }

    @Override
    public int getNumberOfPermitsUsed() {
        return count.get();
    }

}

线程隔离

AbstractCommand构造方法初始化线程池

protected final HystrixThreadPool threadPool;
protected AbstractCommand(HystrixCommandGroupKey group, HystrixCommandKey key, HystrixThreadPoolKey threadPoolKey, HystrixCircuitBreaker circuitBreaker, HystrixThreadPool threadPool,
            HystrixCommandProperties.Setter commandPropertiesDefaults, HystrixThreadPoolProperties.Setter threadPoolPropertiesDefaults,
            HystrixCommandMetrics metrics, TryableSemaphore fallbackSemaphore, TryableSemaphore executionSemaphore,
            HystrixPropertiesStrategy propertiesStrategy, HystrixCommandExecutionHook executionHook) {
        // 获取HystrixThreadPoolKey,优先取传入的HystrixThreadPoolKey
        // 如果没传入取HystrixCommandGroupKey
        this.threadPoolKey = initThreadPoolKey(threadPoolKey, this.commandGroup, this.properties.executionIsolationThreadPoolKeyOverride().get());
        // 构造时初始化threadPoolKey对应的HystrixThreadPool
        // 根据配置构造HystrixThreadPoolDefault
        this.threadPool = initThreadPool(threadPool, this.threadPoolKey,threadPoolPropertiesDefaults);
}

private static HystrixThreadPool initThreadPool(HystrixThreadPool fromConstructor, HystrixThreadPoolKey threadPoolKey, HystrixThreadPoolProperties.Setter threadPoolPropertiesDefaults) {
    if (fromConstructor == null) {
        return HystrixThreadPool.Factory.getInstance(threadPoolKey, threadPoolPropertiesDefaults);
    } else {
        return fromConstructor;
    }
}

executeCommandAndObserve发起调度

AbstractCommand#executeCommandAndObserve线程隔离代码入口

private Observable<R> executeCommandAndObserve(final AbstractCommand<R> _cmd) {
		// 获取当前线程HystrixRequestContext
        final HystrixRequestContext currentRequestContext = HystrixRequestContext.getContextForCurrentThread();
		// onNext
        final Action1<R> markEmits = ...;
		// onCompleted
        final Action0 markOnCompleted = new Action0() {
            @Override
            public void call() {
                if (!commandIsScalar()) {
                    long latency = System.currentTimeMillis() - executionResult.getStartTimestamp();
                    // 执行成功结果记录
                    executionResult = executionResult.addEvent((int) latency, HystrixEventType.SUCCESS);
                    // 断路器标记成功
                    circuitBreaker.markSuccess();
                }
            }
        };
		// 降级处理
        final Func1<Throwable, Observable<R>> handleFallback = ...;

		// 设置当前线程请求上下文
        final Action1<Notification<? super R>> setRequestContext = new Action1<Notification<? super R>>() {
            @Override
            public void call(Notification<? super R> rNotification) {
                setRequestContextIfNeeded(currentRequestContext);
            }
        };

        Observable<R> execution;
        if (properties.executionTimeoutEnabled().get()) {
        	// 如果execution.timeout.enabled=true
            // 执行后续逻辑
            execution = executeCommandWithSpecifiedIsolation(_cmd)
            		// HystrixObservableTimeoutOperator超时逻辑,后续讲解
                    .lift(new HystrixObservableTimeoutOperator<R>(_cmd));
        } else {
       		// 执行后续逻辑
            execution = executeCommandWithSpecifiedIsolation(_cmd);
        }

        return execution.doOnNext(markEmits)
                .doOnCompleted(markOnCompleted)
                .onErrorResumeNext(handleFallback)
                .doOnEach(setRequestContext);
    }

AbstractCommand#executeCommandWithSpecifiedIsolation根据隔离策略不同,执行不同的逻辑。信号量隔离只是简单的执行用户代码。

private Observable<R> executeCommandWithSpecifiedIsolation(final AbstractCommand<R> _cmd) {
    if (properties.executionIsolationStrategy().get() == ExecutionIsolationStrategy.THREAD) {
        return Observable.defer(new Func0<Observable<R>>() {
            @Override
            public Observable<R> call() {
                // cas修改命令状态
                if (!commandState.compareAndSet(CommandState.OBSERVABLE_CHAIN_CREATED, CommandState.USER_CODE_EXECUTED)) {
                    return Observable.error(new IllegalStateException("execution attempted while in state : " + commandState.get().name()));
                }
				// 如果超时
                if (isCommandTimedOut.get() == TimedOutStatus.TIMED_OUT) {
                    return Observable.error(new RuntimeException("timed out before executing run()"));
                }
                // cas修改线程状态
                if (threadState.compareAndSet(ThreadState.NOT_USING_THREAD, ThreadState.STARTED)) {
                	// 执行HystrixCommand.run方法
                    return getUserExecutionObservable(_cmd);
                }
            }
        }).doOnTerminate(new Action0() {
            @Override
            public void call() {
                // onTerminate...
            }
        }).doOnUnsubscribe(new Action0() {
            @Override
            public void call() {
                // onUnsubscribe...
            }
        }).subscribeOn(threadPool.getScheduler(new Func0<Boolean>() {// getScheduler获取Scheduler
        	// 判断执行超时后,是否中断执行线程
            @Override
            public Boolean call() {
                return properties.executionIsolationThreadInterruptOnTimeout().get() && _cmd.isCommandTimedOut.get() == TimedOutStatus.TIMED_OUT;
            }
        }));
    } else {
    	// 信号量隔离执行逻辑
        return Observable.defer(new Func0<Observable<R>>() {
            @Override
            public Observable<R> call() {
                executionResult = executionResult.setExecutionOccurred();
                // cas更新命令执行状态
                if (!commandState.compareAndSet(CommandState.OBSERVABLE_CHAIN_CREATED, CommandState.USER_CODE_EXECUTED)) {
                    return Observable.error(new IllegalStateException("execution attempted while in state : " + commandState.get().name()));
                }
                // 执行HystrixCommand.run方法
                return getUserExecutionObservable(_cmd);
            }
        });
    }
}

HystrixThreadPoolDefault.getScheduler获取Scheduler。注意,这里的HystrixThreadPoolDefault实例已经是threadPoolKey对应的HystrixThreadPool。

static class HystrixThreadPoolDefault implements HystrixThreadPool {
    private final HystrixThreadPoolProperties properties;
    private final ThreadPoolExecutor threadPool;
    @Override
    public Scheduler getScheduler(Func0<Boolean> shouldInterruptThread) {
    	// 1. 根据动态配置,更新ThreadPoolExecutor的配置
        touchConfig();
        // 2. 创建HystrixContextScheduler
        return new HystrixContextScheduler(HystrixPlugins.getInstance().getConcurrencyStrategy(), this, shouldInterruptThread);
    }

    // 根据动态配置,更新ThreadPoolExecutor的配置
    private void touchConfig() {
    	// 核心线程数
        final int dynamicCoreSize = properties.coreSize().get();
        // 配置最大线程数
        final int configuredMaximumSize = properties.maximumSize().get();
        // 根据allowMaximumSizeToDivergeFromCoreSize和coreSize和maximumSize共同决定
        // 实际的最大线程数
        int dynamicMaximumSize = properties.actualMaximumSize();
        final boolean allowSizesToDiverge = properties.getAllowMaximumSizeToDivergeFromCoreSize().get();
        // 最大线程数
        boolean maxTooLow = false;
		// 支持最大线程数 且 最大线程数小于核心线程数
        if (allowSizesToDiverge && configuredMaximumSize < dynamicCoreSize) {
            dynamicMaximumSize = dynamicCoreSize;
            maxTooLow = true;
        }
        if (threadPool.getCorePoolSize() != dynamicCoreSize || (allowSizesToDiverge && threadPool.getMaximumPoolSize() != dynamicMaximumSize)) {
            if (maxTooLow) {
                logger.error(...);
            }
            // 设置ThreadPoolExecutor
            threadPool.setCorePoolSize(dynamicCoreSize);
            threadPool.setMaximumPoolSize(dynamicMaximumSize);
        }
        // 设置ThreadPoolExecutor
        threadPool.setKeepAliveTime(properties.keepAliveTimeMinutes().get(), TimeUnit.MINUTES);
    }
}

HystrixObservableTimeoutOperator超时检测

HystrixObservableTimeoutOperator.call超时检测逻辑,通过延迟定时任务+CAS来实现超时检测。

private static class HystrixObservableTimeoutOperator<R> implements Operator<R, R> {
    final AbstractCommand<R> originalCommand;
    public HystrixObservableTimeoutOperator(final AbstractCommand<R> originalCommand) {
        this.originalCommand = originalCommand;
    }

    @Override
    public Subscriber<? super R> call(final Subscriber<? super R> child) {
    	// 省略无关代码...
        
        // HystrixContextRunnable让执行线程能获取到HystrixRequestContext
        final HystrixContextRunnable timeoutRunnable = new HystrixContextRunnable(originalCommand.concurrencyStrategy, new Runnable() {
            @Override
            public void run() {
            	// 触发onError
                child.onError(new HystrixTimeoutException());
            }
        });
        // TimerListener后续会提交到ScheduledThreadPoolExecutor中定时执行
        TimerListener listener = new TimerListener() {
            @Override
            public void tick() {
            	// 通过CAS来判断是否可以设置超时
                if (originalCommand.isCommandTimedOut.compareAndSet(TimedOutStatus.NOT_EXECUTED, TimedOutStatus.TIMED_OUT)) {
                    s.unsubscribe();
					// 执行timeoutRunnable的run方法,触发HystrixTimeoutException异常
                    timeoutRunnable.run();
                }
            }

			// execution.isolation.thread.timeoutInMilliseconds
            @Override
            public int getIntervalTimeInMilliseconds() {
                return originalCommand.properties.executionTimeoutInMilliseconds().get();
            }
        };
		// 提交TimerListener到ScheduledThreadPoolExecutor中定时执行
        final Reference<TimerListener> tl = HystrixTimer.getInstance().addTimerListener(listener);
        // 省略无关代码...
    }
}

HystrixTimer#addTimerListener提交超时检测定时任务,注意到Hystrix全局会开启一个检测超时的线程池,默认核心线程数为8,最大线程数为Integer.MAX_VALUE。

public class HystrixTimer {
public Reference<TimerListener> addTimerListener(final TimerListener listener) {
	// 初始化线程池ScheduledExecutor.initialize
    startThreadIfNeeded();
    Runnable r = new Runnable() {
        @Override
        public void run() {
            try {
            	// 执行tick方法
                listener.tick();
            } catch (Exception e) {
                logger.error("Failed while ticking TimerListener", e);
            }
        }
    };
	// 延迟并间隔execution.isolation.thread.timeoutInMilliseconds时长执行,默认1秒
    ScheduledFuture<?> f = executor.get().getThreadPool().scheduleAtFixedRate(r, listener.getIntervalTimeInMilliseconds(), listener.getIntervalTimeInMilliseconds(), TimeUnit.MILLISECONDS);
    return new TimerReference(listener, f);
}

static class ScheduledExecutor {
    volatile ScheduledThreadPoolExecutor executor;
    private volatile boolean initialized;
    // 初始化线程池
    public void initialize() {
        HystrixPropertiesStrategy propertiesStrategy = HystrixPlugins.getInstance().getPropertiesStrategy();
        // hystrix.timer.threadpool.default.coreSize 默认大小为8
        int coreSize = propertiesStrategy.getTimerThreadPoolProperties().getCorePoolSize().get();
		// 线程工厂,线程名HystrixTimer-*
        ThreadFactory threadFactory = new ThreadFactory() {
            final AtomicInteger counter = new AtomicInteger();
            @Override
            public Thread newThread(Runnable r) {
                Thread thread = new Thread(r, "HystrixTimer-" + counter.incrementAndGet());
                thread.setDaemon(true);
                return thread;
            }

        };
        // super(corePoolSize, Integer.MAX_VALUE, 0, NANOSECONDS, new DelayedWorkQueue(), threadFactory);
        // maxPoolSize无限大,非核心线程存活时间0
        executor = new ScheduledThreadPoolExecutor(coreSize, threadFactory);
        initialized = true;
    }
    public ScheduledThreadPoolExecutor getThreadPool() {
        return executor;
    }
    public boolean isInitialized() {
        return initialized;
    }
}
}

HystrixContextScheduler执行Command

HystrixContextScheduler之前先了解一下RxJava的两个类rx.Schedulerrx.Scheduler.Worker,这两个类都是抽象类,Scheduler负责创建Worker,Worker负责实际调度Action。

HystrixContextScheduler#createWorker操作ThreadPoolScheduler创建HystrixContextSchedulerWorker

public class HystrixContextScheduler extends Scheduler {
    private final HystrixConcurrencyStrategy concurrencyStrategy;
    private final Scheduler actualScheduler;
    private final HystrixThreadPool threadPool;
    public HystrixContextScheduler(HystrixConcurrencyStrategy concurrencyStrategy, HystrixThreadPool threadPool, Func0<Boolean> shouldInterruptThread) {
        this.concurrencyStrategy = concurrencyStrategy;
        this.threadPool = threadPool;
        this.actualScheduler = new ThreadPoolScheduler(threadPool, shouldInterruptThread);
    }
    @Override
    public Worker createWorker() {
        return new HystrixContextSchedulerWorker(actualScheduler.createWorker());
    }
}

HystrixContextScheduler.HystrixContextSchedulerWorker#schedule(rx.functions.Action0)调度Action,实际操作WorkerHystrixContextScheduler.ThreadPoolWorker

private class HystrixContextSchedulerWorker extends Worker {
  private final Worker worker;
  private HystrixContextSchedulerWorker(Worker actualWorker) {
      this.worker = actualWorker;
  }

  @Override
  public Subscription schedule(Action0 action) {
      if (threadPool != null) {
      	  // 判断动态队列长度是否足够
          if (!threadPool.isQueueSpaceAvailable()) {
              throw new RejectedExecutionException("Rejected command because thread-pool queueSize is at rejection threshold.");
          }
      }
      // HystrixContextScheduler.ThreadPoolWorker的schedule方法实际调度
      return worker.schedule(new HystrixContexSchedulerAction(concurrencyStrategy, action));
  }
}

HystrixThreadPool.HystrixThreadPoolDefault#isQueueSpaceAvailable判断动态队列长度是否足够

@Override
public boolean isQueueSpaceAvailable() {
    if (queueSize <= 0) { // 取得是maxQueueSize配置,默认-1
        return true;
    } else {
    	// ThreadPoolExecutor的队列长度小于queueSizeRejectionThreshold配置的动态队列长度限制
        return threadPool.getQueue().size() < properties.queueSizeRejectionThreshold().get();
    }
}

ThreadPoolWorker#schedule是真正将任务提交到线程池执行的地方。

private static class ThreadPoolWorker extends Worker {
    private final HystrixThreadPool threadPool;
    private final CompositeSubscription subscription = new CompositeSubscription();
    private final Func0<Boolean> shouldInterruptThread;
    public ThreadPoolWorker(HystrixThreadPool threadPool, Func0<Boolean> shouldInterruptThread) {
        this.threadPool = threadPool;
        this.shouldInterruptThread = shouldInterruptThread;
    }

    @Override
    public Subscription schedule(final Action0 action) {
        if (subscription.isUnsubscribed()) {
            return Subscriptions.unsubscribed();
        }
        ScheduledAction sa = new ScheduledAction(action);
        subscription.add(sa);
        sa.addParent(subscription);

        ThreadPoolExecutor executor = (ThreadPoolExecutor) threadPool.getExecutor();
        // 提交任务到线程池,会抛出RejectedExecutionException
        FutureTask<?> f = (FutureTask<?>) executor.submit(sa);
        sa.add(new FutureCompleterWithConfigurableInterrupt(f, shouldInterruptThread, executor));
        return sa;
    }
}

四、降级

相关配置

  • fallback.enabled:是否允许降级,默认true。
  • fallback.isolation.semaphore.maxConcurrentRequests:降级信号量,默认10。

源码案例

AbstractCommand#getFallbackOrThrowException方法是降级逻辑的入口,Command执行过程中的异常基本都会走这个方法,来判断异常是否需要走降级方法,以及降级的具体逻辑。

  • 断路器拒绝请求AbstractCommand#applyHystrixSemantics
private Observable<R> applyHystrixSemantics(final AbstractCommand<R> _cmd) {
    if (circuitBreaker.allowRequest()) {
        // ... 忽略
    } else {
    	// 熔断fallback
        return handleShortCircuitViaFallback();
    }
}
private Observable<R> handleShortCircuitViaFallback() {
    eventNotifier.markEvent(HystrixEventType.SHORT_CIRCUITED, commandKey);
    // 构造原始异常
    Exception shortCircuitException = new RuntimeException("Hystrix circuit short-circuited and is OPEN");
    // 设置执行异常
    executionResult = executionResult.setExecutionException(shortCircuitException);
    try {
    	// 执行getFallbackOrThrowException
        return getFallbackOrThrowException(this, HystrixEventType.SHORT_CIRCUITED, FailureType.SHORTCIRCUIT,
                "short-circuited", shortCircuitException);
    } catch (Exception e) {
        return Observable.error(e);
    }
}
  • 信号量获取失败AbstractCommand#applyHystrixSemantics
private Observable<R> applyHystrixSemantics(final AbstractCommand<R> _cmd) {
    if (circuitBreaker.allowRequest()) { // 断路器允许请求
        final TryableSemaphore executionSemaphore = getExecutionSemaphore();
        if (executionSemaphore.tryAcquire()) {
        	// ... 忽略
        } else {
        	// 获取信号量失败,执行fallback逻辑
            return handleSemaphoreRejectionViaFallback();
        }
    } else {
        return handleShortCircuitViaFallback();
    }
}
private Observable<R> handleSemaphoreRejectionViaFallback() {
	  // 原始异常
      Exception semaphoreRejectionException = new RuntimeException("could not acquire a semaphore for execution");
      // 设置执行异常
      executionResult = executionResult.setExecutionException(semaphoreRejectionException);
      // 调用 getFallbackOrThrowException
      return getFallbackOrThrowException(this, HystrixEventType.SEMAPHORE_REJECTED, FailureType.REJECTED_SEMAPHORE_EXECUTION,
              "could not acquire a semaphore for execution", semaphoreRejectionException);
  }

哪些异常不走降级逻辑

private Observable<R> getFallbackOrThrowException(final AbstractCommand<R> _cmd, final HystrixEventType eventType, final FailureType failureType, final String message, final Exception originalException) {
    if (shouldNotBeWrapped(originalException)){
    	// ExceptionNotWrappedByHystrix标记的异常不会走fallback
        return Observable.error(e);
    } else if (isUnrecoverable(originalException)) {
    	// 一些不可恢复的Error不会走fallback
        logger.error("Unrecoverable Error for HystrixCommand so will throw HystrixRuntimeException and not apply fallback. ", originalException);
        return Observable.error(new HystrixRuntimeException(failureType, this.getClass(), getLogMessagePrefix() + " " + message + " and encountered unrecoverable error.", e, null));
    } else {
		// ...
    }
}

shouldNotBeWrapped方法可以被子类重写,定义哪些异常不会走降级逻辑。AbstractCommand的默认实现是实现标记接口ExceptionNotWrappedByHystrix的异常类不会走降级逻辑。

protected boolean shouldNotBeWrapped(Throwable underlying) {
    return underlying instanceof ExceptionNotWrappedByHystrix;
}
// 一个标记接口
public interface ExceptionNotWrappedByHystrix {

}

isUnrecoverable判断异常的cause是不是不可恢复的。比如栈溢出、虚拟机异常、线程终止。

 private boolean isUnrecoverable(Throwable t) {
    if (t != null && t.getCause() != null) {
        Throwable cause = t.getCause();
        if (cause instanceof StackOverflowError) {
            return true;
        } else if (cause instanceof VirtualMachineError) {
            return true;
        } else if (cause instanceof ThreadDeath) {
            return true;
        } else if (cause instanceof LinkageError) {
            return true;
        }
    }
    return false;
}

降级逻辑

阅读一下getFallbackOrThrowException方法的JavaDoc,这对我们平常使用降级有指导作用。

Execute getFallback() within protection of a semaphore that limits number of concurrent executions. Fallback implementations shouldn't perform anything that can be blocking, but we protect against it anyways in case someone doesn't abide by the contract. If something in the getFallback() implementation is latent (such as a network call) then the semaphore will cause us to start rejecting requests rather than allowing potentially all threads to pile up and block.

执行getFallback方法会有信号量保护。fallback的实现不应该有阻塞,hystrix的作者为了防止开发者不遵守这个规范,才对这个方法做了信号量保护。如果fallback的实现有类似网络调用等情况,使用信号量会拒绝请求,而不是让线程堆积和阻塞。

getFallbackOrThrowException降级逻辑

private Observable<R> getFallbackOrThrowException(final AbstractCommand<R> _cmd, final HystrixEventType eventType, final FailureType failureType, final String message, final Exception originalException) {
	// ... 省略无关逻辑
    // fallback.enabled = true
    if (properties.fallbackEnabled().get()) {
		// 设置上下文
        final Action1<Notification<? super R>> setRequestContext = new Action1<Notification<? super R>>() {
            @Override
            public void call(Notification<? super R> rNotification) {
                setRequestContextIfNeeded(requestContext);
            }
        };
		// onNext
        final Action1<R> markFallbackEmit = ...;
		// OnCompleted
        final Action0 markFallbackCompleted = ...;
		// 处理降级方法抛出的异常,封装一下重新抛出
        final Func1<Throwable, Observable<R>> handleFallbackError = new Func1<Throwable, Observable<R>>() {
            @Override
            public Observable<R> call(Throwable t) {
                Exception e = originalException;
                Exception fe = getExceptionFromThrowable(t);
                if (fe instanceof UnsupportedOperationException) {
					// 从子类找不到fallback时会抛出UnsupportedOperationException
                    return Observable.error(new HystrixRuntimeException(failureType, _cmd.getClass(), getLogMessagePrefix() + " " + message + " and no fallback available.", e, fe));
                } else {
                	// 其他异常
                    return Observable.error(new HystrixRuntimeException(failureType, _cmd.getClass(), getLogMessagePrefix() + " " + message + " and fallback failed.", e, fe));
                }
            }
        };
		// 对于fallback方法有信号量控制(fallback.isolation.semaphore.maxConcurrentRequests)
        // 默认大小为10
        final TryableSemaphore fallbackSemaphore = getFallbackSemaphore();
        final AtomicBoolean semaphoreHasBeenReleased = new AtomicBoolean(false);
        // 释放降级信号量
        final Action0 singleSemaphoreRelease = new Action0() {
            @Override
            public void call() {
                if (semaphoreHasBeenReleased.compareAndSet(false, true)) {
                    fallbackSemaphore.release();
                }
            }
        };
		// 降级逻辑Observable
        Observable<R> fallbackExecutionChain;
		// 获取降级信号量
        if (fallbackSemaphore.tryAcquire()) {
            try {
                fallbackExecutionChain = getFallbackObservable();
            } catch (Throwable ex) {
                fallbackExecutionChain = Observable.error(ex);
            }
            return fallbackExecutionChain
                    .doOnEach(setRequestContext)
                    .lift(new FallbackHookApplication(_cmd))
                    .lift(new DeprecatedOnFallbackHookApplication(_cmd))
                    .doOnNext(markFallbackEmit)
                    .doOnCompleted(markFallbackCompleted)
                    .onErrorResumeNext(handleFallbackError)
                    .doOnTerminate(singleSemaphoreRelease)
                    .doOnUnsubscribe(singleSemaphoreRelease);
        } else {
           // 处理降级的信号量获取失败,封装为Observable.error(new HystrixRuntimeException(...));
           return handleFallbackRejectionByEmittingError();
        }
    } else {
    	// 处理禁用降级,封装为Observable.error(new HystrixRuntimeException(...));
        return handleFallbackDisabledByEmittingError(originalException, failureType, message);
    }
}

AbstractCommand#getFallbackObservable获取降级逻辑。

@Override
final protected Observable<R> getFallbackObservable() {
    return Observable.defer(new Func0<Observable<R>>() {
        @Override
        public Observable<R> call() {
            try {
                return Observable.just(getFallback());
            } catch (Throwable ex) {
                return Observable.error(ex);
            }
        }
    });
}

HystrixCommand#getFallback默认抛出UnsupportedOperationException。需要调用者自己重写降级逻辑。

protected R getFallback() {
    throw new UnsupportedOperationException("No fallback available.");
}

总结

本文结合Hystrix常用配置,分析了Hystrix核心的几个功能模块

  • 缓存:不常用,大致了解一下。
  • 熔断:由CommandKey维度的短路器(HystrixCircuitBreaker)控制,当滑动窗口(默认metrics.rollingStats.timeInMilliseconds=10s)内请求次数达到circuitBreaker.requestVolumeThreshold(默认20次)阈值后,统计错误率超过circuitBreaker.errorThresholdPercentage(默认50%)后,发生熔断。经过circuitBreaker.sleepWindowInMilliseconds(默认5秒)后,可以尝试再次发起请求(断路器处于半开状态)。如果这次请求成功,断路器关闭,否则再过circuitBreaker.sleepWindowInMilliseconds时长,可以尝试再次发起请求。
  • 资源隔离:HystrixThreadPoolKey/HystrixCommandGroupKey维度控制。默认使用线程隔离。支持超时控制,默认execution.timeout.enabled=true,超时时长1秒execution.isolation.thread.timeoutInMilliseconds=1000。建议execution.isolation.thread.timeoutInMilliseconds > ribbon.ReadTimeout+ribbon.ConnectTimeout
  • 降级:默认开启降级fallback.enabled=true,需要重写HystrixCommand的getFallback方法。默认情况下ExceptionNotWrappedByHystrix标记的异常不会进入降级逻辑。一些不可恢复的异常也不会进入降级逻辑,如OOM、SOF。用户的降级方法中不要进行网络调用或者其他阻塞操作,虽然降级也有信号量fallback.isolation.semaphore.maxConcurrentRequests=10控制。
  • Hystrix动态配置:Hystrix的动态配置,由HystrixDynamicPropertiesArchaius实现,底层还是利用spring-cloud-starter-netflix-archaius提供的Spring与NetflixArchaius集成。之前在Feign源码阅读的时候提到过juejin.cn/post/687856… 。需要注意的是对于资源隔离的线程池配置项maxQueueSize是不支持动态配置的,要通过queueSizeRejectionThreshold配置项实现。
private final String configKey = "hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds";
@Test
public void testDynamicProperty() throws InterruptedException {
    DynamicProperty instance = DynamicProperty.getInstance(configKey);
    // hystrix.command.default.execution.isolation.thread.timeoutInMilliseconds:1000
    System.out.println(configKey + ":" + instance.getString());
}