Sentinel全系列之七 —— 限流源码

限流判断

  在系列五中我们详细分析了限流的准备工作-构造责任链,Sentinel 就是通过责任链中的每一个插槽来实现限流熔断等功能的,具体流程如下图所示:

  接下来我们将对插槽中限流判断的的核心源码部分结合滑动窗口进行分析。接上文在构造完责任链后,我们走到了检测并应用流控规则的方法:

@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,boolean prioritized, Object... args) throws Throwable {
    // 检测并应用流控规则
    checkFlow(resourceWrapper, context, node, count, prioritized);
    // 触发下一个Slot
    fireEntry(context, resourceWrapper, node, count, prioritized, args);
}

void checkFlow(ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized)
    throws BlockException {
    checker.checkFlow(ruleProvider, resource, context, node, count, prioritized);
}

checkFlow()源码:

  这里就是去判断指定资源的流控规则是否能够正常通过,若不能通过则抛出异常,被限流了,且后续规则不再应用。其中canPassCheck()就是进行详细判断的方法。

FlowRuleChecker#checkFlow()

public void checkFlow(Function<String, Collection<FlowRule>> ruleProvider, ResourceWrapper resource,Context context, DefaultNode node, int count, boolean prioritized) throws BlockException {
        if (ruleProvider == null || resource == null) {
            return;
        }
         // 获取到指定资源的所有流控规则
        Collection<FlowRule> rules = ruleProvider.apply(resource.getName());
        if (rules != null) {
             // 逐个应用流控规则。若无法通过则抛出异常,后续规则不再应用for (FlowRule rule : rules) {
                if (!canPassCheck(rule, context, node, count, prioritized)) {
                    throw new FlowException(rule.getLimitApp(), rule);
                }
            }
        }
    }

  然后我们进入canPassCheck()方法中:

public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node, int acquireCount,boolean prioritized) {
    String limitApp = rule.getLimitApp();
    if (limitApp == null) {
        return true;
    }

    if (rule.isClusterMode()) {
        return passClusterCheck(rule, context, node, acquireCount, prioritized);
    }

    return passLocalCheck(rule, context, node, acquireCount, prioritized);
}

  canPassCheck()首先会通过passClusterCheck()方法判断流规则是否为集群限流:

private static boolean passClusterCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,boolean prioritized) {
    try {
        TokenService clusterService = pickClusterService();
        if (clusterService == null) {
            //如果获取不到tokenserver的client则退回本地限流
            return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
        }
        long flowId = rule.getClusterConfig().getFlowId();
        TokenResult result = clusterService.requestToken(flowId, acquireCount, prioritized);
        return applyTokenResult(result, rule, context, node, acquireCount, prioritized);
        // If client is absent, then fallback to local mode.
    } catch (Throwable ex) {
        RecordLog.warn("[FlowRuleChecker] Request cluster token unexpected failed", ex);
    }
    // Fallback to local flow control when token client or server for this rule is not available.
    // If fallback is not enabled, then directly pass.
    //集群限流失效则退回本地限流
    return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
}

  通过这段代码可以发现当集群限流失效时会自动退回本地限流。这样我们先走本地限流的源码,发现它最终到了canPass()方法,话不多说,先上源码:

public boolean canPass(Node node, int acquireCount, boolean prioritized) {
    int curCount = avgUsedTokens(node);
    //判断是否需要限流
    if (curCount + acquireCount > count) {
        //优先级等待处理,此处省略介绍
        if (prioritized && grade == RuleConstant.FLOW_GRADE_QPS) {
            long currentTime;
            long waitInMs;
            currentTime = TimeUtil.currentTimeMillis();
            RecordLog.info("");
            waitInMs = node.tryOccupyNext(currentTime, acquireCount, count);
            if (waitInMs < OccupyTimeoutProperty.getOccupyTimeout()) {
                node.addWaitingRequest(currentTime + waitInMs, acquireCount);
                node.addOccupiedPass(acquireCount);
                sleep(waitInMs);

                // PriorityWaitException indicates that the request will pass after waiting for {@link @waitInMs}.
                throw new PriorityWaitException(waitInMs);
            }
        }
        return false;
    }
    return true;
}

  首先,我们把注意力,放在这段代码上:int curCount = avgUsedTokens(node),发现他是在获取 node 中的信息,也就是我们前面利用滑动窗口存储好的信息,跟进去看一下,源码如下:

private int avgUsedTokens(Node node) {
    if (node == null) {
        return DEFAULT_AVG_USED_TOKENS;
    }
    return grade == RuleConstant.FLOW_GRADE_THREAD ? node.curThreadNum() : (int)(node.passQps());
}

  发现他去调用了 node 的方法根据 grade 分别去获取 curThreadNum(当前线程数)和 passQps,我们在这里主要研究 qps,研究每秒限制通过多少请求,继续跟入进去,来到 StatisticNode 类中,源码如下:

@Override
public double passQps() {
    return rollingCounterInSecond.pass() / rollingCounterInSecond.getWindowIntervalInSec();
}

  会发现 rollingCounterInSecond 其实是一个我们前面看到过到过的 ArrayMetric 计算器,接着我们跟进进去,源码如下:

@Override
public long pass() {
    //获取当前样本窗口
    data.currentWindow();
    long pass = 0;
    //获取滑动窗口下所有的样本窗口
    List<MetricBucket> list = data.values();

    for (MetricBucket window : list) {
        // 增加通过数
        pass += window.pass();
    }
    return pass;
}

  我们会发现该源码调用了 LeapArray 的values()方法来获得,当前滑动窗口内的所有有效样本窗口,注意是有效的样本窗口,何为失效的就是不在当前时间段里的,具体大家可以查看前面的源码,在滑动窗口中有详细介绍,跟入进来,查看源码如下:

public List<T> values(long timeMillis) {
    if (timeMillis < 0) {
        return new ArrayList<T>();
    }
    int size = array.length();
    List<T> result = new ArrayList<T>(size);

    for (int i = 0; i < size; i++) {
        WindowWrap<T> windowWrap = array.get(i);
        if (windowWrap == null || isWindowDeprecated(timeMillis, windowWrap)) {
            continue;
        }
        result.add(windowWrap.value());
    }
    return result;
}

  发现入参为当前时间,然后会去遍历 LeapArray 中的所有样本窗口,并通过isWindowDeprecated()方法,判定当前时间下该窗口是否有效或者过期,接着返回所有有效的样本窗口,接着回到 ArrayMetric 类中,将所有样本窗口中的值相加,源码如下:

for (MetricBucket window : list) {
        // 增加通过数
        pass += window.pass();
}

  接着,大家的视线看回到上方第一段源码:

public boolean canPass(Node node, int acquireCount, boolean prioritized) {
    int curCount = avgUsedTokens(node);
    //判断是否需要限流
    if (curCount + acquireCount > count) {
        //优先级等待处理,此处省略介绍
        if (prioritized && grade == RuleConstant.FLOW_GRADE_QPS) {
                //...省略部分源码
                throw new PriorityWaitException(waitInMs);
            }
        }
        return false;
    }
    return true;
}

  我们目前完成了他最关键的一步,获得 curCount(当前通过数),他是判断是否限流的重要依据,接着查看源码发现,他判断了 curCount + acquireCount 是否大于 count(最大qps),来用判断返回 true 和 false,此处省略优先级等待的代码。

  这样一个本地限流的流程就完成了,后续我们会详细介绍集群限流的源码实现,敬请期待。