sentinel限流原理

582 阅读23分钟

前言

大家好,我是熊猫与乐乐,今天给大家介绍一下我对sentinel底层限流实现的理解。在介绍sentinel限流之前,我们需要先知道:

  1. 什么是限流?限流:对超出系统负载的流量进行限制,避免系统崩溃。
  2. 为什么要限流?内存、cpu、IO、线程等资源都是有限的,这就导致系统所能承受的流量也是有限的,所以我们需要对系统做压测,获悉系统在不奔溃的前提下所能承受的最大流量阈值,对超出阈值的流量直接拒绝,保证系统能正常运行。

我们还需要知道4种常见的限流算法

  1. 固定时间窗口算法
  2. 滑动时间窗口算法
  3. 漏桶算法
  4. 令牌桶算法

在工作当中,我们可能用到的限流工具是:

  1. 阿里的sentinel,sentinel基于滑动时间窗口算法统计调用数据,提供了如下限流实现
    1. 快速失败(DefaultController),基于滑动时间窗口的统计数据,如果当前QPS已达到阈值则立即限流,严格保证流量小于限流阈值,属于 “滑动时间窗口限流算法”的特点。
    2. 排队等待(RateLimiterController),根据限流阈值计算请求放行的时间间隔,如果过两个请求之间的间隔时间过短,则计算第二个请求需要等待的时间,并通过Thread.sleep让第二请求线程进行等待。如果第N请求的等待时间大于最大等待时间,则该请求被限流。这种以固定速率释放请求,且允许一定数量的请求堆积,属于“漏桶限流算法”的特点。
    3. WarmUp(WarmUpController、WarmUpRateLimiterController),内部实现参考了guava的RateLimiter,但是逻辑比guava更加复杂,功能也更加强大。可以确定的是WarmUp使用的是“令牌桶限流算法”。
  2. 谷歌guava包里的RateLimiter,基于令牌桶算法进行限流。

sentinel使用滑动时间窗口算法统计接口的访问次数,对超出阈值的请求做限流,值得我们考虑的点是:

  1. sentinel如何在并发的情况下保证统计数据的准确性?
  2. sentinel如何组织统计数据的模型?比如查询用户信息的方法userInfo(),对外可能提供了http和facade接口,A应用和B应用都会访问userInfo(),那么,sentinel如何区分和统计访问userInfo()的流量来自于哪个应用(应用A、应用B)的哪种访问方式(http、facade)?
  3. 有了准确的统计数据之后,限流的整体链路是怎样的?

这篇笔记主要是记录sentinel限流的底层逻辑,项目只引入sentinel-core依赖。

最后,在阅读本文之前,你需要提前了解以下知识:

  1. 什么是ThreadLocal
  2. 责任链设计模式

读者可以先阅读sentinel底层实现的官方文档,再辩证看待本文,如有错误,请多包涵。

环境准备

  1. 创建一个maven项目
  2. 引入sentinel的核心依赖:
<dependency>
    <groupId>com.alibaba.csp</groupId>
    <artifactId>sentinel-core</artifactId>
    <version>1.8.6</version>
</dependency>
  1. 创建一个测试类SentinelTest:
    1. 在initFlowRules()中定义了限流规则:针对app_A的调用,如果QPS>20则限流
    2. 模拟两个应用(app_A、app_B)通过HTTP的方式并发调用/sentinelTest/user/info。
    3. 调用成功则日志打印pass,否则打印block
package sentinel;

import com.alibaba.csp.sentinel.Entry;
import com.alibaba.csp.sentinel.SphU;
import com.alibaba.csp.sentinel.context.ContextUtil;
import com.alibaba.csp.sentinel.slots.block.BlockException;
import com.alibaba.csp.sentinel.slots.block.RuleConstant;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;

import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.Executor;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;

public class SentinelTest {
    private static final String USER_INFO_API = "/sentinelTest/user/info";
    private static final String APP_A = "app_A";
    private static final String APP_B = "app_B";

    private static final Executor EXECUTOR_FOR_HTTP = new ThreadPoolExecutor(10, 20, 1000, TimeUnit.MICROSECONDS, new ArrayBlockingQueue<>(180));
    private static final String INVOKE_TYPE_HTTP = "HTTP";

    private static final Executor EXECUTOR_FOR_RPC = new ThreadPoolExecutor(3, 5, 1000, TimeUnit.MICROSECONDS, new ArrayBlockingQueue<>(100));
    private static final String INVOKE_TYPE_RPC = "RPC";

    private static void initFlowRules(){
        List<FlowRule> rules = new ArrayList<>();
        FlowRule rule = new FlowRule();
        rule.setLimitApp(APP_A);
        rule.setResource(USER_INFO_API);
        rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
        // Set limit QPS to 20.
        rule.setCount(20);
        rules.add(rule);
        FlowRuleManager.loadRules(rules);
    }

    public static void main(String[] args) {
        // 配置规则.
        initFlowRules();
        int i = 0;

        AtomicInteger COUNTER_FRO_APP_A = new AtomicInteger(0);
        AtomicInteger COUNTER_FRO_APP_B = new AtomicInteger(0);
        while (i++ < 100) {
            EXECUTOR_FOR_HTTP.execute(() -> userInfo(COUNTER_FRO_APP_A, INVOKE_TYPE_HTTP, APP_A));
            EXECUTOR_FOR_HTTP.execute(() -> userInfo(COUNTER_FRO_APP_B, INVOKE_TYPE_HTTP, APP_B));
        }
    }

    public static String userInfo(AtomicInteger counter, String contextName, String origin) {
        ContextUtil.enter(contextName, origin);
        try (Entry entry = SphU.entry(USER_INFO_API)) {
            // 被保护的逻辑
            System.out.printf("请求次数[%s],应用[%s]通过[%s]调用资源[%s],pass%n", counter.addAndGet(1), origin, contextName, USER_INFO_API);
        } catch (BlockException ex) {
            // 处理被流控的逻辑
            System.out.printf("请求次数[%s],应用[%s]通过[%s]调用资源[%s],block%n", counter.addAndGet(1), origin, contextName, USER_INFO_API);
        }
        ContextUtil.exit();
        return "user";
    }
}

  1. 执行结果
  2. 结果解析
    1. 应用app_A在第21次调用时,被block了,说明限流规则生效了,执行结果:符合预期
    2. 应用app_B在第40次调用时,依然能pass,说明没被限流,执行结果:符合预取

示例代码github地址

github.com/siyuburno/s…

源码分析

概述

  1. sentinel将限流接口名、接口所在上下文、调用方名称等重要信息封装成context上下文存入ThreadLocal,这样线程能随时方便地获取这些信息。
  2. sentinel通过全局的map维护了限流接口和处理器插槽链的映射,一个限流接口会对应一条处理器插槽链,sentinel的限流过程可以简单理解为处理器插槽链的执行过程:
    1. 完善context上下文信息
    2. 维护Node五层结构
    3. 对限流接口的调用情况进行统计
    4. 最后根据限流规则判断是否需要对本次调用进行限流
  3. sentinel维护Node五层结构的目的是:从多维度统计接口的调用情况,进而从多个维度对接口进行限流。
  4. Node节点统计数据的能力归功于内部持有的ArrayMetric对象,ArrayMetric是基于滑动时间窗口算法的线程安全的统计类。
  5. 处理器插槽链中有一个FlowSlot对象,它负责根据接口的统计数据以及限流规则,判断当前调用是否需要被限流。

sentinel上下文

在示例代码中,ContextUtil.enter(contextName, origin)会初始化一个Context对象,设置Context对象的上下文名称、调用方名称,最后将Context对象存入ThreadLocal。

Context类的属性如下:

    /**
     * Context name.
     */
    private final String name;

    /**
     * The entrance node of current invocation tree.
     */
    private DefaultNode entranceNode;

    /**
     * Current processing entry.
     */
    private Entry curEntry;

    /**
     * The origin of this context (usually indicate different invokers, e.g. service consumer name or origin IP).
     */
    private String origin = "";

    private final boolean async;
  1. name:接口所在的上下文的名称。
  2. origin:接口调用方的名称。
  3. entranceNode:本次调用所在的上下文。
  4. curEntry:本此接口调用的相关信息:
    1. 创建时间(限流接口调用的时间)
    2. 资源名称(限流接口的名称)
    3. 本次调用的defaultNode(限流接口当前的统计数据,比如访问线程数、qps等)
    4. 本次调用发生的异常
    5. 本次调用发生的阻塞异常

这些上下文数据为后续处理器插槽链的执行提供了基础。

处理器插槽链

在示例代码中,我们通过Entry entry = SphU.entry(USER_INFO_API)设置要限流的接口,SphU.entry()最终会调用:

private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
        throws BlockException {
        Context context = ContextUtil.getContext();
        if (context instanceof NullContext) {
            // The {@link NullContext} indicates that the amount of context has exceeded the threshold,
            // so here init the entry only. No rule checking will be done.
            return new CtEntry(resourceWrapper, null, context);
        }

        if (context == null) {
            // Using default context.
            context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
        }

        // Global switch is close, no rule checking will do.
        if (!Constants.ON) {
            return new CtEntry(resourceWrapper, null, context);
        }

        ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);

        /*
         * Means amount of resources (slot chain) exceeds {@link Constants.MAX_SLOT_CHAIN_SIZE},
         * so no rule checking will be done.
         */
        if (chain == null) {
            return new CtEntry(resourceWrapper, null, context);
        }

        Entry e = new CtEntry(resourceWrapper, chain, context);
        try {
            chain.entry(context, resourceWrapper, null, count, prioritized, args);
        } catch (BlockException e1) {
            e.exit(count, args);
            throw e1;
        } catch (Throwable e1) {
            // This should not happen, unless there are errors existing in Sentinel internal.
            RecordLog.info("Sentinel unexpected exception", e1);
        }
        return e;
    }


该方法的逻辑是,创建一个Entry对象,然后根据限流接口(也就是resourceWrapper)获取对应处理插槽链chain,执行处理器插槽链,然后返回Entry对象。

什么是处理器插槽链

在sentinel中有一个全局map存储限流接口和处理器插槽链的映射关系,一个限流接口对应一条处理器插槽链:

    /**
     * Same resource({@link ResourceWrapper#equals(Object)}) will share the same
     * {@link ProcessorSlotChain}, no matter in which {@link Context}.
     */
    private static volatile Map<ResourceWrapper, ProcessorSlotChain> chainMap
        = new HashMap<ResourceWrapper, ProcessorSlotChain>();

处理器插槽链对应的是DefaultProcessorSlotChain,DefaultProcessorSlotChain内部还有两个AbstractLinkedProcessorSlot类型的属性值first、end。first指向链表的首节点(空节点,无特殊处逻辑),end指向链表的尾节点,中间的处理器插槽按默认顺序依次连接:

画板

处理器插槽的默认顺序定义在Contants类中:

    /**
     * Order of default processor slots
     */
    public static final int ORDER_NODE_SELECTOR_SLOT = -10000;
    public static final int ORDER_CLUSTER_BUILDER_SLOT = -9000;
    public static final int ORDER_LOG_SLOT = -8000;
    public static final int ORDER_STATISTIC_SLOT = -7000;
    public static final int ORDER_AUTHORITY_SLOT = -6000;
    public static final int ORDER_SYSTEM_SLOT = -5000;
    public static final int ORDER_FLOW_SLOT = -2000;
    public static final int ORDER_DEGRADE_SLOT = -1000;

重要的处理器插槽

处理器插槽继承了AbstractLinkedProcessorSlot,AbstractLinkedProcessorSlot实现ProcessorSlot接口。

NodeSelectorSlot

作用:根据上下文名称找到对应的DefaultNode节点,并将节点设置到Context的curEntry中,维护DefaultNode在Node五层结构中的位置。

ClusterBuilderSlot

作用:根据接口名称找到对应的ClusterNode节点,并将节点设置到DefualtNode节点中。如果指定了来源origin,则会创建对应的OriginNode存入ClusterNode内部维护的Map中。维护了ClusterNode、OriginNode在Node五层结构中的位置。

StatisticSlot

作用:针对DefualtNode、ClusterNode、OriginNode在不同维度对限流接口进行统计。

  1. DefualtNode表示:接口在不同上下文中的执行情况。
  2. ClusterNode表示:接口整体执行情况,不区分上下文。
  3. OriginNode表示:接口在不同调用方的执行情况。
FlowSlot

作用:基于Node五层结构中的统计数据,再根据配置的限流规则,判断是否进行限流操作。

DegradeSlot

作用:用于熔断。

Node五层结构

什么是Node五层结构

sentinel中按维度区分,分为五种Node:

  1. Root节点:根节点,一个应用仅有一个根节点,处于Node五层结构的第一层。
  2. EntranceNode:上下文节点,标识应用中存在的上下文,处于Node五层结构的第二层。
  3. DefaultNode:默认节点,标识限流接口在不同上下文中调用情况,处于Node五层结构的第三层。
  4. ClusterNode:集群节点,标识限流接口在应用中整体调用情况,处于Node五层结构的第四层。
  5. OriginNode:来源节点,标识限流接口在应用中被不同来源调用的情况,处于Node五层结构的第五层。

假设应用C有queryDepatment()、queryUser()两个接口,queryDepatment()对外提供了http这一种访问方式,queryUser()对外提供了http、rpc这两种访问方式。

现在应用A和应用B都会访问应用C的queryDepatment()、queryUser(),如果我们使用sentinel对这两个接口限流,sentinel会在内存中维护下图中的Node五层结构:

画板

  1. 第一层是root节点。
  2. 第二层是2个EntranceNode节点。分别统计通过http、rpc方式访问的数据。
  3. 第三层是3个DefaultNode节点。http这个EntranceNode指向了2个DefaultNode节点,分别统计了通过http的方式访问queryDepatment()和queryUser()的数据;rpc这个EntranceNode指向了1个DefaultNode节点,统计了通过rpc的方式访问queryUser()的数据。
  4. 第四层是2个ClusterNode节点,分别统计了queryDepatment()和queryUser()整体的访问数据。
  5. 第五层是4个OriginNode节点,queryDepatment这个ClusterNode指向的2个OriginNode分别统计了queryDepatment()被应用A和应用B访问的数据,queryUser这个ClusterNode指向的2个OriginNode分别统计了queryUser()被应用A和应用B访问的数据

Node五层结构的作用

Q:假设针对queryUser()限流,取哪个节点的统计数据呢?

A:名为queryUser的ClusterNode,因为它记录queryUser方法在应用C中整体调用情况。

Q:假设针对来自应用A的queryUser()调用进行限流,取哪个节点的统计数据呢?

A:名为App_A的OriginNode(被名为queryUser的ClusterNode所指),因为它记录了queryUser方法来自应用A的调用情况

Q:假设针对通过Http方式访问queryUser()的调用进行限流,取哪个节点的统计数据呢?

A:名为queryUser的DefaultNode(被名为http的EntranceNode指向的)。因为它记录了queryUser方法来自http方式的调用情况。

所以,Node五层结构的作用是从不同维度记录限流接口的调用情况,方便sentinel从不同维度对接口进行限流。

Node五层结构的维护

ROOT节点的维护

Constants常量类定义了一个全局的ROOT节点。

    /**
     * Global ROOT statistic node that represents the universal parent node.
     */
    public final static DefaultNode ROOT = new EntranceNode(new StringResourceWrapper(ROOT_ID, EntryType.IN),
        new ClusterNode(ROOT_ID, ResourceTypeConstants.COMMON));

EntranceNode的维护

ContextUtil内部定义了一个全局contextNameNodeMap,存储了上下文名称和EntranceNode的映射关系。

    /**
     * Holds all {@link EntranceNode}. Each {@link EntranceNode} is associated with a distinct context name.
     */
    private static volatile Map<String, DefaultNode> contextNameNodeMap = new HashMap<>();

在通过ContextUtil.enter(contextName, origin)设置限流接口的上下文、调用方时,最终会调用如下方法:

protected static Context trueEnter(String name, String origin) {
    Context context = contextHolder.get();
    if (context == null) {
        Map<String, DefaultNode> localCacheNameMap = contextNameNodeMap;
        DefaultNode node = localCacheNameMap.get(name);
        if (node == null) {
            if (localCacheNameMap.size() > Constants.MAX_CONTEXT_NAME_SIZE) {
                setNullContext();
                return NULL_CONTEXT;
            } else {
                LOCK.lock();
                try {
                    node = contextNameNodeMap.get(name);
                    if (node == null) {
                        if (contextNameNodeMap.size() > Constants.MAX_CONTEXT_NAME_SIZE) {
                            setNullContext();
                            return NULL_CONTEXT;
                        } else {
                            node = new EntranceNode(new StringResourceWrapper(name, EntryType.IN), null);
                            // Add entrance node.
                            Constants.ROOT.addChild(node);

                            Map<String, DefaultNode> newMap = new HashMap<>(contextNameNodeMap.size() + 1);
                            newMap.putAll(contextNameNodeMap);
                            newMap.put(name, node);
                            contextNameNodeMap = newMap;
                        }
                    }
                } finally {
                    LOCK.unlock();
                }
            }
        }
        context = new Context(node, name);
        context.setOrigin(origin);
        contextHolder.set(context);
    }

    return context;
}

该方法实际上在初始化一个Context对象并存入ThreadLocal中。初始化过程是:

  1. 根据上下文名称从contextNameNodeMap获取对应的EntranceNode对象,如果对应的EntraceNode不存在则初始化一个EntranceNode节点并存入contextNameNodeMap。
  2. 将新增的EntranceNode加入到ROOT节点的孩子节点集合。
  3. 根据上下文名称、EntranceNode节点初始化一个Context对象,给Context对象设置origin参数,最后将这个Context对象放入ThreadLocal中。

至此sentinel维护了ROOT节点和EntranceNode的关系。

DefaultNode的维护

在处理器插槽链中,第一个执行的处理器插槽就是NodeSelectorSlot对象。NodeSelectorSlot对象内部维护了一个map,key对应上下文名称,值对应DefaultNode节点。map存储了限流接口在不同上下文中调用情况。

    /**
     * {@link DefaultNode}s of the same resource in different context.
     */
    private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);

NodeSelectorSlot的执行逻辑如下:

@Override
    public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
        throws Throwable {
        /*
         * It's interesting that we use context name rather resource name as the map key.
         *
         * Remember that same resource({@link ResourceWrapper#equals(Object)}) will share
         * the same {@link ProcessorSlotChain} globally, no matter in which context. So if
         * code goes into {@link #entry(Context, ResourceWrapper, DefaultNode, int, Object...)},
         * the resource name must be same but context name may not.
         *
         * If we use {@link com.alibaba.csp.sentinel.SphU#entry(String resource)} to
         * enter same resource in different context, using context name as map key can
         * distinguish the same resource. In this case, multiple {@link DefaultNode}s will be created
         * of the same resource name, for every distinct context (different context name) each.
         *
         * Consider another question. One resource may have multiple {@link DefaultNode},
         * so what is the fastest way to get total statistics of the same resource?
         * The answer is all {@link DefaultNode}s with same resource name share one
         * {@link ClusterNode}. See {@link ClusterBuilderSlot} for detail.
         */
        DefaultNode node = map.get(context.getName());
        if (node == null) {
            synchronized (this) {
                node = map.get(context.getName());
                if (node == null) {
                    node = new DefaultNode(resourceWrapper, null);
                    HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
                    cacheMap.putAll(map);
                    cacheMap.put(context.getName(), node);
                    map = cacheMap;
                    // Build invocation tree
                    ((DefaultNode) context.getLastNode()).addChild(node);
                }

            }
        }

        context.setCurNode(node);
        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }
  1. 根据上下文名称从map找对应的DefaultNode,如果没找到则根据限流接口名称初始化一个DefaultNode存入map,将DefaultNode加入到EntranceNode的孩子节点集合。
  2. 将DefaultNode节点存入上下文。

至此sentinel维护了EntranceNode和DefaultNode的关系

ClusterNode的维护

NodeSelectorSlot之后接下来执行ClusterBuilderSlot,ClusterBuilderSlot定义了一个全局的clusterNodeMap用来维护限流接口和ClusterNode的映射关系。

    /**
     * <p>
     * Remember that same resource({@link ResourceWrapper#equals(Object)}) will share
     * the same {@link ProcessorSlotChain} globally, no matter in which context. So if
     * code goes into {@link #entry(Context, ResourceWrapper, DefaultNode, int, boolean, Object...)},
     * the resource name must be same but context name may not.
     * </p>
     * <p>
     * To get total statistics of the same resource in different context, same resource
     * shares the same {@link ClusterNode} globally. All {@link ClusterNode}s are cached
     * in this map.
     * </p>
     * <p>
     * The longer the application runs, the more stable this mapping will
     * become. so we don't concurrent map but a lock. as this lock only happens
     * at the very beginning while concurrent map will hold the lock all the time.
     * </p>
     */
    private static volatile Map<ResourceWrapper, ClusterNode> clusterNodeMap = new HashMap<>();

ClusterBuilderSlot的entry()如下:

public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args)
        throws Throwable {
        if (clusterNode == null) {
            synchronized (lock) {
                if (clusterNode == null) {
                    // Create the cluster node.
                    clusterNode = new ClusterNode(resourceWrapper.getName(), resourceWrapper.getResourceType());
                    HashMap<ResourceWrapper, ClusterNode> newMap = new HashMap<>(Math.max(clusterNodeMap.size(), 16));
                    newMap.putAll(clusterNodeMap);
                    newMap.put(node.getId(), clusterNode);

                    clusterNodeMap = newMap;
                }
            }
        }
        node.setClusterNode(clusterNode);

        /*
         * if context origin is set, we should get or create a new {@link Node} of
         * the specific origin.
         */
        if (!"".equals(context.getOrigin())) {
            Node originNode = node.getClusterNode().getOrCreateOriginNode(context.getOrigin());
            context.getCurEntry().setOriginNode(originNode);
        }

        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

逻辑是:根据限流接口从clusterNodeMap获取对应的ClusterNode,如果ClusterNode不存在则创建一个存入clusterNodeMap,然后通过node.setClusterNode(clusterNode)将DefaultNode指向ClusterNode。

至此sentinel维护了DefualtNode和ClusterNode的关系。

OriginNode的维护

在ClusterNode中有一个originCountMap维护了调用方和OriginNode的映射关系。

    /**
     * <p>The origin map holds the pair: (origin, originNode) for one specific resource.</p>
     * <p>
     * The longer the application runs, the more stable this mapping will become.
     * So we didn't use concurrent map here, but a lock, as this lock only happens
     * at the very beginning while concurrent map will hold the lock all the time.
     * </p>
     */
    private Map<String, StatisticNode> originCountMap = new HashMap<>();

同样在ClusterBuilderSlot的entry()方法中,如果上下文中的origin参数不会空,则会通过ClusterNode创建一个OriginNode存入originCountMap,最后再将这个OriginNode存入上下文。

    if (!"".equals(context.getOrigin())) {
        Node originNode = node.getClusterNode().getOrCreateOriginNode(context.getOrigin());
        context.getCurEntry().setOriginNode(originNode);
    }

至此sentinel维护了ClusterNode和OriginNode的关系。

统计类ArrayMetric

不管是DefaultNode、ClusterNode、EntranceNode都继承自StatisticNode,StatisticNode内部持有两个ArrayMetric对象,分别记录1s、1min的统计数据:


    /**
     * Holds statistics of the recent {@code INTERVAL} milliseconds. The {@code INTERVAL} is divided into time spans
     * by given {@code sampleCount}.
     */
    private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
        IntervalProperty.INTERVAL);

    /**
     * Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately set to 1000 milliseconds,
     * meaning each bucket per second, in this way we can get accurate statistics of each second.
     */
    private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);
什么是ArratMetric

Node节点内部持有2个ArrayMetric类型的对象,所以具备统计数据能力。Node和各个统计类之间的映射关系:

画板

ArrayMetric内部持有一个BucketLeapArray。BucketLeapArray表示滑动窗口数组,它包含多个MetricBucket。一个MetricBucket对象表示一个时间窗口的统计数据,它内部持有一个LongAdder数组。LongAdder数组表示各个统计项,比如要统计接口的总访问数、成功数、异常数、阻塞数,那么LongAdder数组的长度就是4,依次记录接口的总访问数、成功数、异常数、阻塞数。

画板

举个例子:假设一个时间窗口的是200ms,要统计接口A在1s内请求的总访问数、成功数、异常数、阻塞数。那么接口A的某一个Node会对应一个ArrayMetric对象,即对应一个BucketLeapArray对象,BucketLeapArray保存了5个MetricBucket对象,这5个MetricBucket对象依次记录接口A在【1-200ms】【201-400ms】【401-600ms】【601-800ms】【801-1000ms】的调用数据,一个MetricBucket对应一个长度为4个LongAdder数组,依次记录接口的总访问数、成功数、异常数、阻塞数。

滑动窗口算法在LeapArray的实现

LeapArray抽象类的属性值如下

// 一个窗口的毫秒数
protected int windowLengthInMs;
// 窗口数
protected int sampleCount;
// 间隔毫秒数,滑动时间窗口总毫秒数
protected int intervalInMs;
// 间隔秒数,滑动时间窗口总秒数
private double intervalInSecond;
// 滑动窗口数组
protected final AtomicReferenceArray<WindowWrap<T>> array;

/**
     * The conditional (predicate) update lock is used only when current bucket is deprecated.
     */
private final ReentrantLock updateLock = new ReentrantLock();

LeapArray的构造方法如下:

    /**
     * The total bucket count is: {@code sampleCount = intervalInMs / windowLengthInMs}.
     *
     * @param sampleCount  bucket count of the sliding window
     * @param intervalInMs the total time interval of this {@link LeapArray} in milliseconds
     */
    public LeapArray(int sampleCount, int intervalInMs) {
        AssertUtil.isTrue(sampleCount > 0, "bucket count is invalid: " + sampleCount);
        AssertUtil.isTrue(intervalInMs > 0, "total time interval of the sliding window should be positive");
        AssertUtil.isTrue(intervalInMs % sampleCount == 0, "time span needs to be evenly divided");

        this.windowLengthInMs = intervalInMs / sampleCount;
        this.intervalInMs = intervalInMs;
        this.intervalInSecond = intervalInMs / 1000.0;
        this.sampleCount = sampleCount;

        this.array = new AtomicReferenceArray<>(sampleCount);
    }

根据窗口数、滑动窗口总毫秒数计算单个窗口长度、滑动窗口总秒数,初始化一个线程安全的 AtomicReferenceArray<WindowWrap> array来表示滑动时间窗口。

WindowWrap就表示一个时间窗口,它内部属性是:


    /**
     * Time length of a single window bucket in milliseconds.
     */
    private final long windowLengthInMs;

    /**
     * Start timestamp of the window in milliseconds.
     */
    private long windowStart;

    /**
     * Statistic data.
     */
    private T value;

  1. windowLengthInMs表示一个窗口的毫秒数。
  2. windowStart表示该窗口的起始时间的时间戳。
  3. 泛型T表示一个窗口内的统计数据,这里的T实际就是MetricBucket。

通过上述参数就可以对[windowStart~windowStart+windowLengthInMs]时间范围内的数据进行统计,统计数据定义成泛型T可以让滑动时间窗口类LeapArray的使用更加灵活。

滑动时间窗口算法的重点是根据时间获取对应的时间窗口,LeapArray的实现如下:

    /**
     * Get bucket item at provided timestamp.
     *
     * @param timeMillis a valid timestamp in milliseconds
     * @return current bucket item at provided timestamp if the time is valid; null if time is invalid
     */
    public WindowWrap<T> currentWindow(long timeMillis) {
        if (timeMillis < 0) {
            return null;
        }

        int idx = calculateTimeIdx(timeMillis);
        // Calculate current bucket start time.
        long windowStart = calculateWindowStart(timeMillis);

        /*
         * Get bucket item at given time from the array.
         *
         * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
         * (2) Bucket is up-to-date, then just return the bucket.
         * (3) Bucket is deprecated, then reset current bucket.
         */
        while (true) {
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
                /*
                 *     B0       B1      B2    NULL      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            bucket is empty, so create new and update
                 *
                 * If the old bucket is absent, then we create a new bucket at {@code windowStart},
                 * then try to update circular array via a CAS operation. Only one thread can
                 * succeed to update, while other threads yield its time slice.
                 */
                WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
                    // Successfully updated, return the created bucket.
                    return window;
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
                /*
                 *     B0       B1      B2     B3      B4
                 * ||_______|_______|_______|_______|_______||___
                 * 200     400     600     800     1000    1200  timestamp
                 *                             ^
                 *                          time=888
                 *            startTime of Bucket 3: 800, so it's up-to-date
                 *
                 * If current {@code windowStart} is equal to the start timestamp of old bucket,
                 * that means the time is within the bucket, so directly return the bucket.
                 */
                return old;
            } else if (windowStart > old.windowStart()) {
                /*
                 *   (old)
                 *             B0       B1      B2    NULL      B4
                 * |_______||_______|_______|_______|_______|_______||___
                 * ...    1200     1400    1600    1800    2000    2200  timestamp
                 *                              ^
                 *                           time=1676
                 *          startTime of Bucket 2: 400, deprecated, should be reset
                 *
                 * If the start timestamp of old bucket is behind provided time, that means
                 * the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
                 * Note that the reset and clean-up operations are hard to be atomic,
                 * so we need a update lock to guarantee the correctness of bucket update.
                 *
                 * The update lock is conditional (tiny scope) and will take effect only when
                 * bucket is deprecated, so in most cases it won't lead to performance loss.
                 */
                if (updateLock.tryLock()) {
                    try {
                        // Successfully get the update lock, now we reset the bucket.
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    // Contention failed, the thread will yield its time slice to wait for bucket available.
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
                // Should not go through here, as the provided time is already behind.
                return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }

实现逻辑注释写的很清楚,就不赘述了。

MetricBucket中的线程安全实现

MetricBucket的各项统计指标存在一个LongAdder[]中

private final LongAdder[] counters;

MetricBucket中对各项指标的统计时,没有通过Syncronized或则ReentrantLock,而是根据指标名称找到对应LongAdder对象,然后直接调LongAdder对象的add(),MetircBucket的add方法:

    public MetricBucket add(MetricEvent event, long n) {
        counters[event.ordinal()].add(n);
        return this;
    }

LongAdder通过继承juc包下的Striped64.class,能线程安全地对Long类型数据进行加减操作,Striped64底层通过cas保证共享资源的线程安全:

    /**
     * CASes the base field.
     */
    final boolean casBase(long cmp, long val) {
        return UNSAFE.compareAndSwapLong(this, BASE, cmp, val);
    }

访问数据统计

构建好Node五层结构之后,对限流接口的调用进行统计就交给StatisticSlot了,它的entry方法如下:

    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        try {
            // Do some checking.
            fireEntry(context, resourceWrapper, node, count, prioritized, args);

            // Request passed, add thread count and pass count.
            node.increaseThreadNum();
            node.addPassRequest(count);

            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
                context.getCurEntry().getOriginNode().addPassRequest(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
                Constants.ENTRY_NODE.addPassRequest(count);
            }

            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }
        } catch (PriorityWaitException ex) {
            node.increaseThreadNum();
            if (context.getCurEntry().getOriginNode() != null) {
                // Add count for origin node.
                context.getCurEntry().getOriginNode().increaseThreadNum();
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseThreadNum();
            }
            // Handle pass event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onPass(context, resourceWrapper, node, count, args);
            }
        } catch (BlockException e) {
            // Blocked, set block exception to current entry.
            context.getCurEntry().setBlockError(e);

            // Add block count.
            node.increaseBlockQps(count);
            if (context.getCurEntry().getOriginNode() != null) {
                context.getCurEntry().getOriginNode().increaseBlockQps(count);
            }

            if (resourceWrapper.getEntryType() == EntryType.IN) {
                // Add count for global inbound entry node for global statistics.
                Constants.ENTRY_NODE.increaseBlockQps(count);
            }

            // Handle block event with registered entry callback handlers.
            for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
                handler.onBlocked(e, context, resourceWrapper, node, count, args);
            }

            throw e;
        } catch (Throwable e) {
            // Unexpected internal error, set error to current entry.
            context.getCurEntry().setError(e);

            throw e;
        }
    }

主要看try里的逻辑:如果方法执行成功,说明没有被限流,增加DefaultNode、ClusterNode的访问线程数和通过请求数,如果还设置了调用来源origin,则增加OriginNode的访问线程数和通过请求数。如果执行抛出BlockException说明被限流了,则增加上述相应节点的访问线程数和阻塞请求数。

限流实现

sentinel通过FlowSlot进行限流操作,FlowSlot有两个属性:

    // 限流规则检验器
    private final FlowRuleChecker checker;
    // 限流规则提供器
    private final Function<String, Collection<FlowRule>> ruleProvider = new Function<String, Collection<FlowRule>>() {
        @Override
        public Collection<FlowRule> apply(String resource) {
            // Flow rule map should not be null.
            Map<String, List<FlowRule>> flowRules = FlowRuleManager.getFlowRuleMap();
            return flowRules.get(resource);
        }
    };

  1. checker的作用:根据限流规则判断当前调用是否触发限流规则。
  2. ruleProvider的作用:获取当前限流接口设置的限流规则。

FlowSlot的entry方法:

    @Override
    public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
                      boolean prioritized, Object... args) throws Throwable {
        checkFlow(resourceWrapper, context, node, count, prioritized);

        fireEntry(context, resourceWrapper, node, count, prioritized, args);
    }

    void checkFlow(ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized)
        throws BlockException {
        checker.checkFlow(ruleProvider, resource, context, node, count, prioritized);
    }

entry方法实际就是调用了checker.checkFlow(),判断是否需要限流,要限流的话会抛出BlockException异常。

FlowRuleChecker类的关键实现:

    /**
     * 根据限流接口名获取对应的限流规则FlowRule对象列表,
     * 依次判断是否违反了限流规则。
     */
    public void checkFlow(Function<String, Collection<FlowRule>> ruleProvider, ResourceWrapper resource,
                          Context context, DefaultNode node, int count, boolean prioritized) throws BlockException {
        if (ruleProvider == null || resource == null) {
            return;
        }
        Collection<FlowRule> rules = ruleProvider.apply(resource.getName());
        if (rules != null) {
            for (FlowRule rule : rules) {
                if (!canPassCheck(rule, context, node, count, prioritized)) {
                    throw new FlowException(rule.getLimitApp(), rule);
                }
            }
        }
    }

    public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node,
                                                    int acquireCount) {
        return canPassCheck(rule, context, node, acquireCount, false);
    }

    public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                                    boolean prioritized) {
        String limitApp = rule.getLimitApp();
        if (limitApp == null) {
            return true;
        }

        if (rule.isClusterMode()) {
            return passClusterCheck(rule, context, node, acquireCount, prioritized);
        }

        return passLocalCheck(rule, context, node, acquireCount, prioritized);
    }

    // 根据限流策列获取对应Node,根据Node中的数据判断当前调用是否违反限流规则。
    private static boolean passLocalCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                          boolean prioritized) {
        Node selectedNode = selectNodeByRequesterAndStrategy(rule, context, node);
        if (selectedNode == null) {
            return true;
        }

        return rule.getRater().canPass(selectedNode, acquireCount, prioritized);
    }

    // 根据限流规则、上下文等参数,选择对应的Node节点。
    // 后续需要根据这个Node节点里的数据判断当前是否需限流。
    static Node selectNodeByRequesterAndStrategy(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node) {
        // The limit app should not be empty.
        String limitApp = rule.getLimitApp();
        int strategy = rule.getStrategy();
        String origin = context.getOrigin();

        if (limitApp.equals(origin) && filterOrigin(origin)) {
            if (strategy == RuleConstant.STRATEGY_DIRECT) {
                // Matches limit origin, return origin statistic node.
                return context.getOriginNode();
            }

            return selectReferenceNode(rule, context, node);
        } else if (RuleConstant.LIMIT_APP_DEFAULT.equals(limitApp)) {
            if (strategy == RuleConstant.STRATEGY_DIRECT) {
                // Return the cluster node.
                return node.getClusterNode();
            }

            return selectReferenceNode(rule, context, node);
        } else if (RuleConstant.LIMIT_APP_OTHER.equals(limitApp)
            && FlowRuleManager.isOtherOrigin(origin, rule.getResource())) {
            if (strategy == RuleConstant.STRATEGY_DIRECT) {
                return context.getOriginNode();
            }

            return selectReferenceNode(rule, context, node);
        }

        return null;
    }

根据限流接口名获取对应的限流规则FlowRule对象列表,遍历FlowRule对象列表,判断是否触发限流规则,判断逻辑是:

  1. 根据限流策列选择Node,后续需要根据Node节点里的数据判断当前是否需限流
  2. 将被选中的Node节点作为参数,调用rule.getRater().canPass(selectedNode, acquireCount, prioritized),将实际限流操作委托给流量整形控制器(TrafficShapingController)。