简介

Sentinel 是面向分布式、多语言异构化服务架构的流量治理组件，主要以流量为切入点，从流量路由、流量控制、流量整形、熔断降级、系统自适应过载保护、热点流量防护等多个维度来帮助开发者保障微服务的稳定性。

上面是 Sentinel 官方的介绍。以流量为切入点，通过对流量的统计，来达到对流量进行控制的目标。那么可以猜测出，Sentinel 的核心包括流量的统计和流量控制两部分。

if 流量到了规则 {
 流量控制();
} else {
 请求通过();
}

核心设计

1. 元素关系图.png

核心概念

在 Sentinel 中，将待保护的代码称为 Resource，由唯一的 name 标识。

不管是流量统计还是流量控制，都是按照调用来的，那么在这一次的调用里必然需要一个上下文对象，来记录调用的元数据信息，在 Sentinel 中被称为 Context，同样由唯一的 name 标识。

流量的统计和控制可能需要不同的维度，比如统计运行时耗时，qps，rt 等，包括控制也可以是按照黑白名单、熔断降级。Sentinel 为了让每个功能尽可能地解耦，Sentinel 使用了责任链的设计模式，每一个节点被称为 slot ，控制和统计整体就是一个 SlotChain。

Entry 作为调用的入口。

Node 是 Slot 统计维度的实施节点，比如核心的滑动窗口就是由 Node 来实现的。

SlotChain 设计

ProcessorSlot 接口

Slot 在 Sentinel 起到流量统计和控制的作用，是 Sentinel 的核心组件之一。本节主要介绍其设计思路以及具体的代码设计。

所有的 Slot 都要实现 ProcessorSlot 接口，该接口提供了 entry（开始）和 exit（退出）方法，并提供了触发责任链调用下一节点的 fire 方法。

加载 SlotChain

一、SPI 机制

Sentinel 为了拓展性，仿照 Java SPI 自定义了一套 SPI 加载机制。同 Java 原生类似，也是通过在 META-INF/services 下定义文件，文件内容是要加载类的全限定名。通过在类上加 @Spi 注解，供 SpiLoader 加载，最后也是通过反射创建 Slot。

二、初始化 slot

@Spi(isDefault = true)
public class DefaultSlotChainBuilder implements SlotChainBuilder {

    @Override
    public ProcessorSlotChain build() {
        ProcessorSlotChain chain = new DefaultProcessorSlotChain();

        List<ProcessorSlot> sortedSlotList = SpiLoader.of(ProcessorSlot.class).loadInstanceListSorted();
        for (ProcessorSlot slot : sortedSlotList) {
            if (!(slot instanceof AbstractLinkedProcessorSlot)) {
                RecordLog.warn("The ProcessorSlot(" + slot.getClass().getCanonicalName() + ") is not an instance of AbstractLinkedProcessorSlot, can't be added into ProcessorSlotChain");
                continue;
            }

            chain.addLast((AbstractLinkedProcessorSlot<?>) slot);
        }

        return chain;
    }
}

在 DefaultSlotChainBuilder 中完成整个责任链的初始化工作， DefaultProcessorSlotChain 中维护了所有 Slot 的链表。

Node 设计

Node 主要用来保存资源的统计数据，不同类型的 Node 在数据统计维度上起到不同的作用。

StatisticNode 封装了核心指标的计算逻辑，是其它 Node 的功能支撑。
ClusterNode 与 Resource 一一对应，负责某个资源的整体数据统计。
DefaultNode 是由 Resource 和 ContextName 共同决定的，主要用于统计在某个资源在上下文 name 中的统计。其与 ClusterNode 的不同之处在于它支持 origin 的统计。

编程技巧

Context 作为每次调用的存储者，能随时随地地被快速用到是很重要的设计目标，在 Java 中，我们通常使用 ThreadLocal 来实现与调用线程的绑定与消息传递。像是 Spring 中也是通过 ThreadLocal 来传递事务的。

当流量超出规则限制，应当如何通知上层代码？ 在异常编程中，分为返回值和异常两种处理方案。Sentinel 同时支持这两种，对于异常，Sentinel 使用了 BlockException ，这是一个 CheckedException，对于调用者来说，这个异常必须处理，保证了流量控制的完整性。

怎样保证能完美控制一段代码呢？ 在 Spring 中，我们常用 模版模式 来实现。比如 Spring 对事务的封装，将用户的代码封装成模版中的一个节点，前置获取事务，后置提交事务都是由 Spring 的模板实现。在 Sentinel 中，通过 try-catch-finally 的结构模拟了模板模式。 SphU.entry("Name") 的调用触发了前置 slotChain 的操作。 Entry 同时也实现了 AutoCloseable 接口，在 close 方法中完成了 Context 的销毁动作。

流量统计

数据统计的方式分为累加型和时间窗口型两种。

累加型常见的就是线程数、请求数，时间窗口型常见的比如 QPS。Sentinel 作为高性能的限流框架，数据统计必须要能应对高并发的考验。

public class StatisticNode implements Node {

    /**
    * Holds statistics of the recent {@code INTERVAL} milliseconds. The {@code INTERVAL} is divided into time spans
    * by given {@code sampleCount}.
    */
    private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
        IntervalProperty.INTERVAL);
    /**
    * Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately set to 1000 milliseconds,
    * meaning each bucket per second, in this way we can get accurate statistics of each second.
    */
    private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);
    /**
    * The counter for thread count.
    */
    private LongAdder curThreadNum = new LongAdder();
}

累加型

Sentinel 借助 LongAddr 类实现线程数的累加，保证了性能和准确性。

时间窗口

Sentinel 通过滑动窗口完成了对数据的统计工作，下面介绍下滑动窗口的实现。

所谓滑动窗口，就是将大窗口拆分为更小的窗口 bucket，从而实现更细粒度的统计与控制。比如在 StatisticNode 中，将 1s 的时间间隔分为两段，第一段从 0ms 到 500ms，第二段从 501ms 到 1000ms。

LeapArray 设计

滑动窗口的基础类是 LeapArray，由 intervalInMs（总长度）和 sampleCount（总 bucket 数）两个属性组成。自然地，intervalInMs/sampleCount 可以计算出每个 bucket 的长度。

考虑到并发和线程安全，Sentinel 使用 AtomicReferenceArray 作为数据存储的底层依赖。

其中，currentWindow 方法属于核心方法，下面我们介绍其实现。

/**
* Get bucket item at provided timestamp.
*
* @param timeMillis a valid timestamp in milliseconds
* @return current bucket item at provided timestamp if the time is valid; null if time is invalid
*/
    public WindowWrap<T> currentWindow(long timeMillis) {
        if (timeMillis < 0) {
            return null;
        }

        int idx = calculateTimeIdx(timeMillis);
        // Calculate current bucket start time.
        long windowStart = calculateWindowStart(timeMillis);

        /*
        * Get bucket item at given time from the array.
        *
        * (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
        * (2) Bucket is up-to-date, then just return the bucket.
        * (3) Bucket is deprecated, then reset current bucket.
        */
        while (true) {
            WindowWrap<T> old = array.get(idx);
            if (old == null) {
                WindowWrap<T> window = new WindowWrap<>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
                if (array.compareAndSet(idx, null, window)) {
                    return window;
                } else {
                    Thread.yield();
                }
            } else if (windowStart == old.windowStart()) {
                return old;
            } else if (windowStart > old.windowStart()) {
                if (updateLock.tryLock()) {
                    try {
                        return resetWindowTo(old, windowStart);
                    } finally {
                        updateLock.unlock();
                    }
                } else {
                    Thread.yield();
                }
            } else if (windowStart < old.windowStart()) {
                return new WindowWrap<>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
            }
        }
    }

一、根据当前毫秒数来定位到 bucket 位置，并且计算出在这个 bucket 里的开始时间戳。

二、在死循环中找到合适的 bucket。bucket 分为三种情况：① 如果不存在，则创建一个新的。② 如果正好在 bucket 区间内，则返回。③ 如果 bucket 已经过期（bucket 的开始时间戳小于计算出的毫秒时间戳）则获取锁更新该 bucket。

考虑到限流框架的高并发性，在进行并发操作时，这里只通过 cas 或者 tryLock 尝试获取执行权，如果没有获取到，则执行 Thread.yield() 暂时让出执行权，避免出现过多的竞争而导致 CPU 升高。

流量控制

这里我们以断路器为例，来探究 Sentinel 是如何做流量控制的。

DefaultCircuitBreakerSlot 是负责断路器逻辑的 slot。在 slot#exit 时，会将统计数据维护到本地的 LeapArray 中。

@Override
    public void onRequestComplete(Context context) {
        SlowRequestCounter counter = slidingCounter.currentWindow().value();
        Entry entry = context.getCurEntry();
        if (entry == null) {
            return;
        }
        long completeTime = entry.getCompleteTimestamp();
        if (completeTime <= 0) {
            completeTime = TimeUtil.currentTimeMillis();
        }
        long rt = completeTime - entry.getCreateTimestamp();
        if (rt > maxAllowedRt) {
            counter.slowCount.add(1);
        }
        counter.totalCount.add(1);

        handleStateChangeWhenThresholdExceeded(rt);
    }

上面代码是 ResponseTimeCircuitBreaker#onRequestComplete，当 slot#exit调用时，会调用所属的断路器来进行统计工作。如果超时的比例大于设定的慢响应比例，则将断路器状态置为打开。等到下次 slot#entry 调用时，就会检查断路器的状态，如果是打开，则直接抛出 DegradeException 异常，不再执行后续的代码。

总结

一、Sentinel 通过责任链模式实现了流量统计与控制的解耦。
二、Sentinel 通过实现滑动窗口算法来保存时间数据。
三、在设计时，需要考虑准确性和性能的权衡，不能厚此薄彼。

Sentinel 设计与实现

简介