sentinel 熔断源码分析

134 阅读4分钟

礼拜一被老板灵魂拷问,sentinel的熔断是怎么样的,我回答根据时间窗口,老板接着发问,rt熔断是根据什么rt呢,平均rt吗,还是什么?我黑人问号,接不上话了.老板说,下去看看,搞清楚一点,卑微菜鸡只能回去看源码.本文基于1.8.0. 首先,我们看下 DegradeSlot这个类
我们从入口方法开始看 entry 方法

@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,boolean prioritized, Object... args) throws Throwable {
    // 校验熔断器是否实例化
    performChecking(context, resourceWrapper);
    fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
void performChecking(Context context, ResourceWrapper r) throws BlockException {
// 熔断器是否存在
    List<CircuitBreaker> circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
    if (circuitBreakers == null || circuitBreakers.isEmpty()) {
        return;
    }
    // 循环遍历所有熔断器,看是否能够通过
    for (CircuitBreaker cb : circuitBreakers) {
        if (!cb.tryPass(context)) {
            throw new DegradeException(cb.getRule().getLimitApp(), cb.getRule());
        }
    }
}

继续往下

@Override
public boolean tryPass(Context context) {
    // Template implementation.
    if (currentState.get() == State.CLOSED) {
        return true;
    }
    if (currentState.get() == State.OPEN) {
        // For half-open state we allow a request for probing.
        return retryTimeoutArrived() && fromOpenToHalfOpen(context);
    }
    return false;
}

此方法用与判断当前时间是否在下一个时间里面

protected boolean retryTimeoutArrived() {
    return TimeUtil.currentTimeMillis() >= nextRetryTimestamp;
}

设置半开

protected boolean fromOpenToHalfOpen(Context context) {
    if (currentState.compareAndSet(State.OPEN, State.HALF_OPEN)) {
        notifyObservers(State.OPEN, State.HALF_OPEN, null);
        Entry entry = context.getCurEntry();
        entry.whenTerminate(new BiConsumer<Context, Entry>() {
            @Override
            public void accept(Context context, Entry entry) {
                // Note: This works as a temporary workaround for https://github.com/alibaba/Sentinel/issues/1638
                // Without the hook, the circuit breaker won't recover from half-open state in some circumstances
                // when the request is actually blocked by upcoming rules (not only degrade rules).
                if (entry.getBlockError() != null) {
                    // Fallback to OPEN due to detecting request is blocked
                    currentState.compareAndSet(State.HALF_OPEN, State.OPEN);
                    notifyObservers(State.HALF_OPEN, State.OPEN, 1.0d);
                }
            }
        });
        return true;
    }
    return false;
}

` 这里有2个状态, 开,关
这里我下去查看了下,一共有三个状态

  1. 开 熔断器打开(开启熔断)
  2. 关 熔断器关闭(关闭熔断)
  3. 半开(此时的时间超过了时间窗口,之前熔断器是开启的,此次请求是通的,设置熔断器为半开)

入口方法比较简单 我们看下出口方法

public void exit(Context context, ResourceWrapper r, int count, Object... args) {
// 看下有没有抛出异常,如果抛出了异常,直接跳过
    Entry curEntry = context.getCurEntry();
    if (curEntry.getBlockError() != null) {
        fireExit(context, r, count, args);
        return;
    }
    // 查看熔断器实例,没有的话,直接跳过
    List<CircuitBreaker> circuitBreakers = DegradeRuleManager.getCircuitBreakers(r.getName());
    if (circuitBreakers == null || circuitBreakers.isEmpty()) {
        fireExit(context, r, count, args);
        return;
    }
   // 如果没有熔断异常,遍历熔断器实例,处理请求
    if (curEntry.getBlockError() == null) {
        // passed request
        for (CircuitBreaker circuitBreaker : circuitBreakers) {
            circuitBreaker.onRequestComplete(context);
        }
    }

    fireExit(context, r, count, args);
}

这里也还好,主要是onRequestComplete方法,CircuitBreaker是一个接口,有两个实现,ResponseTimeCircuitBreaker和ExceptionCircuitBreaker 这里我们主要聊ResponseTimeCircuitBreaker

private final LeapArray<SlowRequestCounter> slidingCounter;
@Override
public void onRequestComplete(Context context) {
// 这里很关键!!!
    SlowRequestCounter counter = slidingCounter.currentWindow().value();
    Entry entry = context.getCurEntry();
    if (entry == null) {
        return;
    }
    long completeTime = entry.getCompleteTimestamp();
    if (completeTime <= 0) {
        completeTime = TimeUtil.currentTimeMillis();
    }
    long rt = completeTime - entry.getCreateTimestamp();
    if (rt > maxAllowedRt) {
        counter.slowCount.add(1);
    }
    counter.totalCount.add(1);

    handleStateChangeWhenThresholdExceeded(rt);
}

有的同学可能会疑问,这个跟rt的熔断配置有什么关系,我们收一下,看下DegradeRuleManager的newCircuitBreakerFrom方法

private static CircuitBreaker newCircuitBreakerFrom(/*@Valid*/ DegradeRule rule) {
    switch (rule.getGrade()) {
    // rt模式
        case RuleConstant.DEGRADE_GRADE_RT:
            return new ResponseTimeCircuitBreaker(rule);
        case RuleConstant.DEGRADE_GRADE_EXCEPTION_RATIO:
        case RuleConstant.DEGRADE_GRADE_EXCEPTION_COUNT:
            return new ExceptionCircuitBreaker(rule);
        default:
            return null;
    }
}
public ResponseTimeCircuitBreaker(DegradeRule rule) {
    this(rule, new SlowRequestLeapArray(1, rule.getStatIntervalMs()));
}
public SlowRequestLeapArray(int sampleCount, int intervalInMs) {
    super(sampleCount, intervalInMs);
}

我们看下这个LeapArray 这个是什么呢,就是sentinel实现熔断的数据结构

public abstract class LeapArray<T> {
    protected int windowLengthInMs;
    protected int sampleCount;
    protected int intervalInMs;

    protected final AtomicReferenceArray<WindowWrap<T>> array;
public LeapArray(int sampleCount, int intervalInMs) {
    AssertUtil.isTrue(sampleCount > 0, "bucket count is invalid: " + sampleCount);
    AssertUtil.isTrue(intervalInMs > 0, "total time interval of the sliding window should be positive");
    AssertUtil.isTrue(intervalInMs % sampleCount == 0, "time span needs to be evenly divided");
    // 桶的时间区间
    this.windowLengthInMs = intervalInMs / sampleCount;
    this.intervalInMs = intervalInMs;
    this.sampleCount = sampleCount;
    this.array = new AtomicReferenceArray<>(sampleCount);
}
    

这是整个熔断器实例化的过程 intervalInMs 就是页面输入的时间床,sampleCount默认为1,rt模式下的一个时间窗口就是intervalInMs
我们再看下currentWindow()方法

public WindowWrap<T> currentWindow(long timeMillis) {
// 传入获取的当前时间
    if (timeMillis < 0) {
        return null;
    }
    // 获取这个时间在这个桶位置
    int idx = calculateTimeIdx(timeMillis);
    // 计算这个桶的开始时间
    long windowStart = calculateWindowStart(timeMillis);
    while (true) {
        WindowWrap<T> old = array.get(idx);
        // 如果这个桶为空, 创建一个新的
        if (old == null) {
            WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            // 这个桶没有数据
            if (array.compareAndSet(idx, null, window)) {
                return window;
            } else {
                Thread.yield();
            }
            // 桶存在
        } else if (windowStart == old.windowStart()) {
            return old;
            // 桶过期
        } else if (windowStart > old.windowStart()) {
            if (updateLock.tryLock()) {
                try {
                // 重置开始时间
                    return resetWindowTo(old, windowStart);
                } finally {
                    updateLock.unlock();
                }
            } else {
                Thread.yield();
            }
            // 这个请求早于桶开始时间
        } else if (windowStart < old.windowStart()) {
            return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
        }
    }
}

我们再回到这里

@Override
public void onRequestComplete(Context context) {
// 拿到对应的桶
    SlowRequestCounter counter = slidingCounter.currentWindow().value();
    Entry entry = context.getCurEntry();
    if (entry == null) {
        return;
    }
    // 获取请求结束的时间
    long completeTime = entry.getCompleteTimestamp();
    // 保证可用
    if (completeTime <= 0) {
        completeTime = TimeUtil.currentTimeMillis();
    }
    // 计算rt
    long rt = completeTime - entry.getCreateTimestamp();
    // 大于设置的最大响应时间
    if (rt > maxAllowedRt) {
    // slow+1
        counter.slowCount.add(1);
    }
    // total+1
    counter.totalCount.add(1);
    // 设置熔断器的状态
    handleStateChangeWhenThresholdExceeded(rt);
}

我们看下设置熔断的过程

private void handleStateChangeWhenThresholdExceeded(long rt) {
// 如果熔断器是打开的,不管
    if (currentState.get() == State.OPEN) {
        return;
    }
    
    if (currentState.get() == State.HALF_OPEN) {
        // In detecting request
        // TODO: improve logic for half-open recovery
        if (rt > maxAllowedRt) {
        // 如果是半开,判断这个请求的rt是否大于设置的最大时间.大于的话,再次设置为开,继续熔断直至时间串口结束
            fromHalfOpenToOpen(1.0d);
        } else {
        // 否则关闭熔断
            fromHalfOpenToClose();
        }
        return;
    }
    // 获取桶
    List<SlowRequestCounter> counters = slidingCounter.values();
    long slowCount = 0;
    long totalCount = 0;
    for (SlowRequestCounter counter : counters) {
        slowCount += counter.slowCount.sum();
        totalCount += counter.totalCount.sum();
    }
    // 小于设置的最小请求数,直接返回
    if (totalCount < minRequestAmount) {
        return;
    }
    // 计算比例数目,大于设置的比例,开启熔断器
    double currentRatio = slowCount * 1.0d / totalCount;
    if (currentRatio > maxSlowRequestRatio) {
        transformToOpen(currentRatio);
    }
}

好了,rt的结束了,有不对的地方请指正,谢谢各位了