前言
大家好,我是熊猫与乐乐,今天给大家介绍一下我对sentinel底层限流实现的理解。在介绍sentinel限流之前,我们需要先知道:
- 什么是限流?限流:对超出系统负载的流量进行限制,避免系统崩溃。
- 为什么要限流?内存、cpu、IO、线程等资源都是有限的,这就导致系统所能承受的流量也是有限的,所以我们需要对系统做压测,获悉系统在不奔溃的前提下所能承受的最大流量阈值,对超出阈值的流量直接拒绝,保证系统能正常运行。
我们还需要知道4种常见的限流算法:
- 固定时间窗口算法
- 滑动时间窗口算法
- 漏桶算法
- 令牌桶算法
在工作当中,我们可能用到的限流工具是:
- 阿里的sentinel,sentinel基于滑动时间窗口算法统计调用数据,提供了如下限流实现
- 快速失败(DefaultController),基于滑动时间窗口的统计数据,如果当前QPS已达到阈值则立即限流,严格保证流量小于限流阈值,属于 “滑动时间窗口限流算法”的特点。
- 排队等待(RateLimiterController),根据限流阈值计算请求放行的时间间隔,如果过两个请求之间的间隔时间过短,则计算第二个请求需要等待的时间,并通过Thread.sleep让第二请求线程进行等待。如果第N请求的等待时间大于最大等待时间,则该请求被限流。这种以固定速率释放请求,且允许一定数量的请求堆积,属于“漏桶限流算法”的特点。
- WarmUp(WarmUpController、WarmUpRateLimiterController),内部实现参考了guava的RateLimiter,但是逻辑比guava更加复杂,功能也更加强大。可以确定的是WarmUp使用的是“令牌桶限流算法”。
- 谷歌guava包里的RateLimiter,基于令牌桶算法进行限流。
sentinel使用滑动时间窗口算法统计接口的访问次数,对超出阈值的请求做限流,值得我们考虑的点是:
- sentinel如何在并发的情况下保证统计数据的准确性?
- sentinel如何组织统计数据的模型?比如查询用户信息的方法userInfo(),对外可能提供了http和facade接口,A应用和B应用都会访问userInfo(),那么,sentinel如何区分和统计访问userInfo()的流量来自于哪个应用(应用A、应用B)的哪种访问方式(http、facade)?
- 有了准确的统计数据之后,限流的整体链路是怎样的?
这篇笔记主要是记录sentinel限流的底层逻辑,项目只引入sentinel-core依赖。
最后,在阅读本文之前,你需要提前了解以下知识:
- 什么是ThreadLocal
- 责任链设计模式
读者可以先阅读sentinel底层实现的官方文档,再辩证看待本文,如有错误,请多包涵。
环境准备
- 创建一个maven项目
- 引入sentinel的核心依赖:
<dependency>
<groupId>com.alibaba.csp</groupId>
<artifactId>sentinel-core</artifactId>
<version>1.8.6</version>
</dependency>
- 创建一个测试类SentinelTest:
- 在initFlowRules()中定义了限流规则:针对app_A的调用,如果QPS>20则限流
- 模拟两个应用(app_A、app_B)通过HTTP的方式并发调用/sentinelTest/user/info。
- 调用成功则日志打印pass,否则打印block
package sentinel;
import com.alibaba.csp.sentinel.Entry;
import com.alibaba.csp.sentinel.SphU;
import com.alibaba.csp.sentinel.context.ContextUtil;
import com.alibaba.csp.sentinel.slots.block.BlockException;
import com.alibaba.csp.sentinel.slots.block.RuleConstant;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRule;
import com.alibaba.csp.sentinel.slots.block.flow.FlowRuleManager;
import java.util.ArrayList;
import java.util.List;
import java.util.concurrent.ArrayBlockingQueue;
import java.util.concurrent.Executor;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.atomic.AtomicInteger;
public class SentinelTest {
private static final String USER_INFO_API = "/sentinelTest/user/info";
private static final String APP_A = "app_A";
private static final String APP_B = "app_B";
private static final Executor EXECUTOR_FOR_HTTP = new ThreadPoolExecutor(10, 20, 1000, TimeUnit.MICROSECONDS, new ArrayBlockingQueue<>(180));
private static final String INVOKE_TYPE_HTTP = "HTTP";
private static final Executor EXECUTOR_FOR_RPC = new ThreadPoolExecutor(3, 5, 1000, TimeUnit.MICROSECONDS, new ArrayBlockingQueue<>(100));
private static final String INVOKE_TYPE_RPC = "RPC";
private static void initFlowRules(){
List<FlowRule> rules = new ArrayList<>();
FlowRule rule = new FlowRule();
rule.setLimitApp(APP_A);
rule.setResource(USER_INFO_API);
rule.setGrade(RuleConstant.FLOW_GRADE_QPS);
// Set limit QPS to 20.
rule.setCount(20);
rules.add(rule);
FlowRuleManager.loadRules(rules);
}
public static void main(String[] args) {
// 配置规则.
initFlowRules();
int i = 0;
AtomicInteger COUNTER_FRO_APP_A = new AtomicInteger(0);
AtomicInteger COUNTER_FRO_APP_B = new AtomicInteger(0);
while (i++ < 100) {
EXECUTOR_FOR_HTTP.execute(() -> userInfo(COUNTER_FRO_APP_A, INVOKE_TYPE_HTTP, APP_A));
EXECUTOR_FOR_HTTP.execute(() -> userInfo(COUNTER_FRO_APP_B, INVOKE_TYPE_HTTP, APP_B));
}
}
public static String userInfo(AtomicInteger counter, String contextName, String origin) {
ContextUtil.enter(contextName, origin);
try (Entry entry = SphU.entry(USER_INFO_API)) {
// 被保护的逻辑
System.out.printf("请求次数[%s],应用[%s]通过[%s]调用资源[%s],pass%n", counter.addAndGet(1), origin, contextName, USER_INFO_API);
} catch (BlockException ex) {
// 处理被流控的逻辑
System.out.printf("请求次数[%s],应用[%s]通过[%s]调用资源[%s],block%n", counter.addAndGet(1), origin, contextName, USER_INFO_API);
}
ContextUtil.exit();
return "user";
}
}
- 执行结果
- 结果解析
- 应用app_A在第21次调用时,被block了,说明限流规则生效了,执行结果:符合预期
- 应用app_B在第40次调用时,依然能pass,说明没被限流,执行结果:符合预取
示例代码github地址
源码分析
概述
- sentinel将限流接口名、接口所在上下文、调用方名称等重要信息封装成context上下文存入ThreadLocal,这样线程能随时方便地获取这些信息。
- sentinel通过全局的map维护了限流接口和处理器插槽链的映射,一个限流接口会对应一条处理器插槽链,sentinel的限流过程可以简单理解为处理器插槽链的执行过程:
- 完善context上下文信息
- 维护Node五层结构
- 对限流接口的调用情况进行统计
- 最后根据限流规则判断是否需要对本次调用进行限流
- sentinel维护Node五层结构的目的是:从多维度统计接口的调用情况,进而从多个维度对接口进行限流。
- Node节点统计数据的能力归功于内部持有的ArrayMetric对象,ArrayMetric是基于滑动时间窗口算法的线程安全的统计类。
- 处理器插槽链中有一个FlowSlot对象,它负责根据接口的统计数据以及限流规则,判断当前调用是否需要被限流。
sentinel上下文
在示例代码中,ContextUtil.enter(contextName, origin)会初始化一个Context对象,设置Context对象的上下文名称、调用方名称,最后将Context对象存入ThreadLocal。
Context类的属性如下:
/**
* Context name.
*/
private final String name;
/**
* The entrance node of current invocation tree.
*/
private DefaultNode entranceNode;
/**
* Current processing entry.
*/
private Entry curEntry;
/**
* The origin of this context (usually indicate different invokers, e.g. service consumer name or origin IP).
*/
private String origin = "";
private final boolean async;
- name:接口所在的上下文的名称。
- origin:接口调用方的名称。
- entranceNode:本次调用所在的上下文。
- curEntry:本此接口调用的相关信息:
- 创建时间(限流接口调用的时间)
- 资源名称(限流接口的名称)
- 本次调用的defaultNode(限流接口当前的统计数据,比如访问线程数、qps等)
- 本次调用发生的异常
- 本次调用发生的阻塞异常
这些上下文数据为后续处理器插槽链的执行提供了基础。
处理器插槽链
在示例代码中,我们通过Entry entry = SphU.entry(USER_INFO_API)设置要限流的接口,SphU.entry()最终会调用:
private Entry entryWithPriority(ResourceWrapper resourceWrapper, int count, boolean prioritized, Object... args)
throws BlockException {
Context context = ContextUtil.getContext();
if (context instanceof NullContext) {
// The {@link NullContext} indicates that the amount of context has exceeded the threshold,
// so here init the entry only. No rule checking will be done.
return new CtEntry(resourceWrapper, null, context);
}
if (context == null) {
// Using default context.
context = InternalContextUtil.internalEnter(Constants.CONTEXT_DEFAULT_NAME);
}
// Global switch is close, no rule checking will do.
if (!Constants.ON) {
return new CtEntry(resourceWrapper, null, context);
}
ProcessorSlot<Object> chain = lookProcessChain(resourceWrapper);
/*
* Means amount of resources (slot chain) exceeds {@link Constants.MAX_SLOT_CHAIN_SIZE},
* so no rule checking will be done.
*/
if (chain == null) {
return new CtEntry(resourceWrapper, null, context);
}
Entry e = new CtEntry(resourceWrapper, chain, context);
try {
chain.entry(context, resourceWrapper, null, count, prioritized, args);
} catch (BlockException e1) {
e.exit(count, args);
throw e1;
} catch (Throwable e1) {
// This should not happen, unless there are errors existing in Sentinel internal.
RecordLog.info("Sentinel unexpected exception", e1);
}
return e;
}
该方法的逻辑是,创建一个Entry对象,然后根据限流接口(也就是resourceWrapper)获取对应处理插槽链chain,执行处理器插槽链,然后返回Entry对象。
什么是处理器插槽链
在sentinel中有一个全局map存储限流接口和处理器插槽链的映射关系,一个限流接口对应一条处理器插槽链:
/**
* Same resource({@link ResourceWrapper#equals(Object)}) will share the same
* {@link ProcessorSlotChain}, no matter in which {@link Context}.
*/
private static volatile Map<ResourceWrapper, ProcessorSlotChain> chainMap
= new HashMap<ResourceWrapper, ProcessorSlotChain>();
处理器插槽链对应的是DefaultProcessorSlotChain,DefaultProcessorSlotChain内部还有两个AbstractLinkedProcessorSlot类型的属性值first、end。first指向链表的首节点(空节点,无特殊处逻辑),end指向链表的尾节点,中间的处理器插槽按默认顺序依次连接:
处理器插槽的默认顺序定义在Contants类中:
/**
* Order of default processor slots
*/
public static final int ORDER_NODE_SELECTOR_SLOT = -10000;
public static final int ORDER_CLUSTER_BUILDER_SLOT = -9000;
public static final int ORDER_LOG_SLOT = -8000;
public static final int ORDER_STATISTIC_SLOT = -7000;
public static final int ORDER_AUTHORITY_SLOT = -6000;
public static final int ORDER_SYSTEM_SLOT = -5000;
public static final int ORDER_FLOW_SLOT = -2000;
public static final int ORDER_DEGRADE_SLOT = -1000;
重要的处理器插槽
处理器插槽继承了AbstractLinkedProcessorSlot,AbstractLinkedProcessorSlot实现ProcessorSlot接口。
NodeSelectorSlot
作用:根据上下文名称找到对应的DefaultNode节点,并将节点设置到Context的curEntry中,维护DefaultNode在Node五层结构中的位置。
ClusterBuilderSlot
作用:根据接口名称找到对应的ClusterNode节点,并将节点设置到DefualtNode节点中。如果指定了来源origin,则会创建对应的OriginNode存入ClusterNode内部维护的Map中。维护了ClusterNode、OriginNode在Node五层结构中的位置。
StatisticSlot
作用:针对DefualtNode、ClusterNode、OriginNode在不同维度对限流接口进行统计。
- DefualtNode表示:接口在不同上下文中的执行情况。
- ClusterNode表示:接口整体执行情况,不区分上下文。
- OriginNode表示:接口在不同调用方的执行情况。
FlowSlot
作用:基于Node五层结构中的统计数据,再根据配置的限流规则,判断是否进行限流操作。
DegradeSlot
作用:用于熔断。
Node五层结构
什么是Node五层结构
sentinel中按维度区分,分为五种Node:
- Root节点:根节点,一个应用仅有一个根节点,处于Node五层结构的第一层。
- EntranceNode:上下文节点,标识应用中存在的上下文,处于Node五层结构的第二层。
- DefaultNode:默认节点,标识限流接口在不同上下文中调用情况,处于Node五层结构的第三层。
- ClusterNode:集群节点,标识限流接口在应用中整体调用情况,处于Node五层结构的第四层。
- OriginNode:来源节点,标识限流接口在应用中被不同来源调用的情况,处于Node五层结构的第五层。
假设应用C有queryDepatment()、queryUser()两个接口,queryDepatment()对外提供了http这一种访问方式,queryUser()对外提供了http、rpc这两种访问方式。
现在应用A和应用B都会访问应用C的queryDepatment()、queryUser(),如果我们使用sentinel对这两个接口限流,sentinel会在内存中维护下图中的Node五层结构:
- 第一层是root节点。
- 第二层是2个EntranceNode节点。分别统计通过http、rpc方式访问的数据。
- 第三层是3个DefaultNode节点。http这个EntranceNode指向了2个DefaultNode节点,分别统计了通过http的方式访问queryDepatment()和queryUser()的数据;rpc这个EntranceNode指向了1个DefaultNode节点,统计了通过rpc的方式访问queryUser()的数据。
- 第四层是2个ClusterNode节点,分别统计了queryDepatment()和queryUser()整体的访问数据。
- 第五层是4个OriginNode节点,queryDepatment这个ClusterNode指向的2个OriginNode分别统计了queryDepatment()被应用A和应用B访问的数据,queryUser这个ClusterNode指向的2个OriginNode分别统计了queryUser()被应用A和应用B访问的数据
Node五层结构的作用
Q:假设针对queryUser()限流,取哪个节点的统计数据呢?
A:名为queryUser的ClusterNode,因为它记录queryUser方法在应用C中整体调用情况。
Q:假设针对来自应用A的queryUser()调用进行限流,取哪个节点的统计数据呢?
A:名为App_A的OriginNode(被名为queryUser的ClusterNode所指),因为它记录了queryUser方法来自应用A的调用情况
Q:假设针对通过Http方式访问queryUser()的调用进行限流,取哪个节点的统计数据呢?
A:名为queryUser的DefaultNode(被名为http的EntranceNode指向的)。因为它记录了queryUser方法来自http方式的调用情况。
所以,Node五层结构的作用是从不同维度记录限流接口的调用情况,方便sentinel从不同维度对接口进行限流。
Node五层结构的维护
ROOT节点的维护
Constants常量类定义了一个全局的ROOT节点。
/**
* Global ROOT statistic node that represents the universal parent node.
*/
public final static DefaultNode ROOT = new EntranceNode(new StringResourceWrapper(ROOT_ID, EntryType.IN),
new ClusterNode(ROOT_ID, ResourceTypeConstants.COMMON));
EntranceNode的维护
ContextUtil内部定义了一个全局contextNameNodeMap,存储了上下文名称和EntranceNode的映射关系。
/**
* Holds all {@link EntranceNode}. Each {@link EntranceNode} is associated with a distinct context name.
*/
private static volatile Map<String, DefaultNode> contextNameNodeMap = new HashMap<>();
在通过ContextUtil.enter(contextName, origin)设置限流接口的上下文、调用方时,最终会调用如下方法:
protected static Context trueEnter(String name, String origin) {
Context context = contextHolder.get();
if (context == null) {
Map<String, DefaultNode> localCacheNameMap = contextNameNodeMap;
DefaultNode node = localCacheNameMap.get(name);
if (node == null) {
if (localCacheNameMap.size() > Constants.MAX_CONTEXT_NAME_SIZE) {
setNullContext();
return NULL_CONTEXT;
} else {
LOCK.lock();
try {
node = contextNameNodeMap.get(name);
if (node == null) {
if (contextNameNodeMap.size() > Constants.MAX_CONTEXT_NAME_SIZE) {
setNullContext();
return NULL_CONTEXT;
} else {
node = new EntranceNode(new StringResourceWrapper(name, EntryType.IN), null);
// Add entrance node.
Constants.ROOT.addChild(node);
Map<String, DefaultNode> newMap = new HashMap<>(contextNameNodeMap.size() + 1);
newMap.putAll(contextNameNodeMap);
newMap.put(name, node);
contextNameNodeMap = newMap;
}
}
} finally {
LOCK.unlock();
}
}
}
context = new Context(node, name);
context.setOrigin(origin);
contextHolder.set(context);
}
return context;
}
该方法实际上在初始化一个Context对象并存入ThreadLocal中。初始化过程是:
- 根据上下文名称从contextNameNodeMap获取对应的EntranceNode对象,如果对应的EntraceNode不存在则初始化一个EntranceNode节点并存入contextNameNodeMap。
- 将新增的EntranceNode加入到ROOT节点的孩子节点集合。
- 根据上下文名称、EntranceNode节点初始化一个Context对象,给Context对象设置origin参数,最后将这个Context对象放入ThreadLocal中。
至此sentinel维护了ROOT节点和EntranceNode的关系。
DefaultNode的维护
在处理器插槽链中,第一个执行的处理器插槽就是NodeSelectorSlot对象。NodeSelectorSlot对象内部维护了一个map,key对应上下文名称,值对应DefaultNode节点。map存储了限流接口在不同上下文中调用情况。
/**
* {@link DefaultNode}s of the same resource in different context.
*/
private volatile Map<String, DefaultNode> map = new HashMap<String, DefaultNode>(10);
NodeSelectorSlot的执行逻辑如下:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, Object obj, int count, boolean prioritized, Object... args)
throws Throwable {
/*
* It's interesting that we use context name rather resource name as the map key.
*
* Remember that same resource({@link ResourceWrapper#equals(Object)}) will share
* the same {@link ProcessorSlotChain} globally, no matter in which context. So if
* code goes into {@link #entry(Context, ResourceWrapper, DefaultNode, int, Object...)},
* the resource name must be same but context name may not.
*
* If we use {@link com.alibaba.csp.sentinel.SphU#entry(String resource)} to
* enter same resource in different context, using context name as map key can
* distinguish the same resource. In this case, multiple {@link DefaultNode}s will be created
* of the same resource name, for every distinct context (different context name) each.
*
* Consider another question. One resource may have multiple {@link DefaultNode},
* so what is the fastest way to get total statistics of the same resource?
* The answer is all {@link DefaultNode}s with same resource name share one
* {@link ClusterNode}. See {@link ClusterBuilderSlot} for detail.
*/
DefaultNode node = map.get(context.getName());
if (node == null) {
synchronized (this) {
node = map.get(context.getName());
if (node == null) {
node = new DefaultNode(resourceWrapper, null);
HashMap<String, DefaultNode> cacheMap = new HashMap<String, DefaultNode>(map.size());
cacheMap.putAll(map);
cacheMap.put(context.getName(), node);
map = cacheMap;
// Build invocation tree
((DefaultNode) context.getLastNode()).addChild(node);
}
}
}
context.setCurNode(node);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
- 根据上下文名称从map找对应的DefaultNode,如果没找到则根据限流接口名称初始化一个DefaultNode存入map,将DefaultNode加入到EntranceNode的孩子节点集合。
- 将DefaultNode节点存入上下文。
至此sentinel维护了EntranceNode和DefaultNode的关系
ClusterNode的维护
NodeSelectorSlot之后接下来执行ClusterBuilderSlot,ClusterBuilderSlot定义了一个全局的clusterNodeMap用来维护限流接口和ClusterNode的映射关系。
/**
* <p>
* Remember that same resource({@link ResourceWrapper#equals(Object)}) will share
* the same {@link ProcessorSlotChain} globally, no matter in which context. So if
* code goes into {@link #entry(Context, ResourceWrapper, DefaultNode, int, boolean, Object...)},
* the resource name must be same but context name may not.
* </p>
* <p>
* To get total statistics of the same resource in different context, same resource
* shares the same {@link ClusterNode} globally. All {@link ClusterNode}s are cached
* in this map.
* </p>
* <p>
* The longer the application runs, the more stable this mapping will
* become. so we don't concurrent map but a lock. as this lock only happens
* at the very beginning while concurrent map will hold the lock all the time.
* </p>
*/
private static volatile Map<ResourceWrapper, ClusterNode> clusterNodeMap = new HashMap<>();
ClusterBuilderSlot的entry()如下:
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
boolean prioritized, Object... args)
throws Throwable {
if (clusterNode == null) {
synchronized (lock) {
if (clusterNode == null) {
// Create the cluster node.
clusterNode = new ClusterNode(resourceWrapper.getName(), resourceWrapper.getResourceType());
HashMap<ResourceWrapper, ClusterNode> newMap = new HashMap<>(Math.max(clusterNodeMap.size(), 16));
newMap.putAll(clusterNodeMap);
newMap.put(node.getId(), clusterNode);
clusterNodeMap = newMap;
}
}
}
node.setClusterNode(clusterNode);
/*
* if context origin is set, we should get or create a new {@link Node} of
* the specific origin.
*/
if (!"".equals(context.getOrigin())) {
Node originNode = node.getClusterNode().getOrCreateOriginNode(context.getOrigin());
context.getCurEntry().setOriginNode(originNode);
}
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
逻辑是:根据限流接口从clusterNodeMap获取对应的ClusterNode,如果ClusterNode不存在则创建一个存入clusterNodeMap,然后通过node.setClusterNode(clusterNode)将DefaultNode指向ClusterNode。
至此sentinel维护了DefualtNode和ClusterNode的关系。
OriginNode的维护
在ClusterNode中有一个originCountMap维护了调用方和OriginNode的映射关系。
/**
* <p>The origin map holds the pair: (origin, originNode) for one specific resource.</p>
* <p>
* The longer the application runs, the more stable this mapping will become.
* So we didn't use concurrent map here, but a lock, as this lock only happens
* at the very beginning while concurrent map will hold the lock all the time.
* </p>
*/
private Map<String, StatisticNode> originCountMap = new HashMap<>();
同样在ClusterBuilderSlot的entry()方法中,如果上下文中的origin参数不会空,则会通过ClusterNode创建一个OriginNode存入originCountMap,最后再将这个OriginNode存入上下文。
if (!"".equals(context.getOrigin())) {
Node originNode = node.getClusterNode().getOrCreateOriginNode(context.getOrigin());
context.getCurEntry().setOriginNode(originNode);
}
至此sentinel维护了ClusterNode和OriginNode的关系。
统计类ArrayMetric
不管是DefaultNode、ClusterNode、EntranceNode都继承自StatisticNode,StatisticNode内部持有两个ArrayMetric对象,分别记录1s、1min的统计数据:
/**
* Holds statistics of the recent {@code INTERVAL} milliseconds. The {@code INTERVAL} is divided into time spans
* by given {@code sampleCount}.
*/
private transient volatile Metric rollingCounterInSecond = new ArrayMetric(SampleCountProperty.SAMPLE_COUNT,
IntervalProperty.INTERVAL);
/**
* Holds statistics of the recent 60 seconds. The windowLengthInMs is deliberately set to 1000 milliseconds,
* meaning each bucket per second, in this way we can get accurate statistics of each second.
*/
private transient Metric rollingCounterInMinute = new ArrayMetric(60, 60 * 1000, false);
什么是ArratMetric
Node节点内部持有2个ArrayMetric类型的对象,所以具备统计数据能力。Node和各个统计类之间的映射关系:
ArrayMetric内部持有一个BucketLeapArray。BucketLeapArray表示滑动窗口数组,它包含多个MetricBucket。一个MetricBucket对象表示一个时间窗口的统计数据,它内部持有一个LongAdder数组。LongAdder数组表示各个统计项,比如要统计接口的总访问数、成功数、异常数、阻塞数,那么LongAdder数组的长度就是4,依次记录接口的总访问数、成功数、异常数、阻塞数。
举个例子:假设一个时间窗口的是200ms,要统计接口A在1s内请求的总访问数、成功数、异常数、阻塞数。那么接口A的某一个Node会对应一个ArrayMetric对象,即对应一个BucketLeapArray对象,BucketLeapArray保存了5个MetricBucket对象,这5个MetricBucket对象依次记录接口A在【1-200ms】【201-400ms】【401-600ms】【601-800ms】【801-1000ms】的调用数据,一个MetricBucket对应一个长度为4个LongAdder数组,依次记录接口的总访问数、成功数、异常数、阻塞数。
滑动窗口算法在LeapArray的实现
LeapArray抽象类的属性值如下
// 一个窗口的毫秒数
protected int windowLengthInMs;
// 窗口数
protected int sampleCount;
// 间隔毫秒数,滑动时间窗口总毫秒数
protected int intervalInMs;
// 间隔秒数,滑动时间窗口总秒数
private double intervalInSecond;
// 滑动窗口数组
protected final AtomicReferenceArray<WindowWrap<T>> array;
/**
* The conditional (predicate) update lock is used only when current bucket is deprecated.
*/
private final ReentrantLock updateLock = new ReentrantLock();
LeapArray的构造方法如下:
/**
* The total bucket count is: {@code sampleCount = intervalInMs / windowLengthInMs}.
*
* @param sampleCount bucket count of the sliding window
* @param intervalInMs the total time interval of this {@link LeapArray} in milliseconds
*/
public LeapArray(int sampleCount, int intervalInMs) {
AssertUtil.isTrue(sampleCount > 0, "bucket count is invalid: " + sampleCount);
AssertUtil.isTrue(intervalInMs > 0, "total time interval of the sliding window should be positive");
AssertUtil.isTrue(intervalInMs % sampleCount == 0, "time span needs to be evenly divided");
this.windowLengthInMs = intervalInMs / sampleCount;
this.intervalInMs = intervalInMs;
this.intervalInSecond = intervalInMs / 1000.0;
this.sampleCount = sampleCount;
this.array = new AtomicReferenceArray<>(sampleCount);
}
根据窗口数、滑动窗口总毫秒数计算单个窗口长度、滑动窗口总秒数,初始化一个线程安全的 AtomicReferenceArray<WindowWrap> array来表示滑动时间窗口。
WindowWrap就表示一个时间窗口,它内部属性是:
/**
* Time length of a single window bucket in milliseconds.
*/
private final long windowLengthInMs;
/**
* Start timestamp of the window in milliseconds.
*/
private long windowStart;
/**
* Statistic data.
*/
private T value;
- windowLengthInMs表示一个窗口的毫秒数。
- windowStart表示该窗口的起始时间的时间戳。
- 泛型T表示一个窗口内的统计数据,这里的T实际就是MetricBucket。
通过上述参数就可以对[windowStart~windowStart+windowLengthInMs]时间范围内的数据进行统计,统计数据定义成泛型T可以让滑动时间窗口类LeapArray的使用更加灵活。
滑动时间窗口算法的重点是根据时间获取对应的时间窗口,LeapArray的实现如下:
/**
* Get bucket item at provided timestamp.
*
* @param timeMillis a valid timestamp in milliseconds
* @return current bucket item at provided timestamp if the time is valid; null if time is invalid
*/
public WindowWrap<T> currentWindow(long timeMillis) {
if (timeMillis < 0) {
return null;
}
int idx = calculateTimeIdx(timeMillis);
// Calculate current bucket start time.
long windowStart = calculateWindowStart(timeMillis);
/*
* Get bucket item at given time from the array.
*
* (1) Bucket is absent, then just create a new bucket and CAS update to circular array.
* (2) Bucket is up-to-date, then just return the bucket.
* (3) Bucket is deprecated, then reset current bucket.
*/
while (true) {
WindowWrap<T> old = array.get(idx);
if (old == null) {
/*
* B0 B1 B2 NULL B4
* ||_______|_______|_______|_______|_______||___
* 200 400 600 800 1000 1200 timestamp
* ^
* time=888
* bucket is empty, so create new and update
*
* If the old bucket is absent, then we create a new bucket at {@code windowStart},
* then try to update circular array via a CAS operation. Only one thread can
* succeed to update, while other threads yield its time slice.
*/
WindowWrap<T> window = new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
if (array.compareAndSet(idx, null, window)) {
// Successfully updated, return the created bucket.
return window;
} else {
// Contention failed, the thread will yield its time slice to wait for bucket available.
Thread.yield();
}
} else if (windowStart == old.windowStart()) {
/*
* B0 B1 B2 B3 B4
* ||_______|_______|_______|_______|_______||___
* 200 400 600 800 1000 1200 timestamp
* ^
* time=888
* startTime of Bucket 3: 800, so it's up-to-date
*
* If current {@code windowStart} is equal to the start timestamp of old bucket,
* that means the time is within the bucket, so directly return the bucket.
*/
return old;
} else if (windowStart > old.windowStart()) {
/*
* (old)
* B0 B1 B2 NULL B4
* |_______||_______|_______|_______|_______|_______||___
* ... 1200 1400 1600 1800 2000 2200 timestamp
* ^
* time=1676
* startTime of Bucket 2: 400, deprecated, should be reset
*
* If the start timestamp of old bucket is behind provided time, that means
* the bucket is deprecated. We have to reset the bucket to current {@code windowStart}.
* Note that the reset and clean-up operations are hard to be atomic,
* so we need a update lock to guarantee the correctness of bucket update.
*
* The update lock is conditional (tiny scope) and will take effect only when
* bucket is deprecated, so in most cases it won't lead to performance loss.
*/
if (updateLock.tryLock()) {
try {
// Successfully get the update lock, now we reset the bucket.
return resetWindowTo(old, windowStart);
} finally {
updateLock.unlock();
}
} else {
// Contention failed, the thread will yield its time slice to wait for bucket available.
Thread.yield();
}
} else if (windowStart < old.windowStart()) {
// Should not go through here, as the provided time is already behind.
return new WindowWrap<T>(windowLengthInMs, windowStart, newEmptyBucket(timeMillis));
}
}
}
实现逻辑注释写的很清楚,就不赘述了。
MetricBucket中的线程安全实现
MetricBucket的各项统计指标存在一个LongAdder[]中
private final LongAdder[] counters;
MetricBucket中对各项指标的统计时,没有通过Syncronized或则ReentrantLock,而是根据指标名称找到对应LongAdder对象,然后直接调LongAdder对象的add(),MetircBucket的add方法:
public MetricBucket add(MetricEvent event, long n) {
counters[event.ordinal()].add(n);
return this;
}
LongAdder通过继承juc包下的Striped64.class,能线程安全地对Long类型数据进行加减操作,Striped64底层通过cas保证共享资源的线程安全:
/**
* CASes the base field.
*/
final boolean casBase(long cmp, long val) {
return UNSAFE.compareAndSwapLong(this, BASE, cmp, val);
}
访问数据统计
构建好Node五层结构之后,对限流接口的调用进行统计就交给StatisticSlot了,它的entry方法如下:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
boolean prioritized, Object... args) throws Throwable {
try {
// Do some checking.
fireEntry(context, resourceWrapper, node, count, prioritized, args);
// Request passed, add thread count and pass count.
node.increaseThreadNum();
node.addPassRequest(count);
if (context.getCurEntry().getOriginNode() != null) {
// Add count for origin node.
context.getCurEntry().getOriginNode().increaseThreadNum();
context.getCurEntry().getOriginNode().addPassRequest(count);
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
// Add count for global inbound entry node for global statistics.
Constants.ENTRY_NODE.increaseThreadNum();
Constants.ENTRY_NODE.addPassRequest(count);
}
// Handle pass event with registered entry callback handlers.
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onPass(context, resourceWrapper, node, count, args);
}
} catch (PriorityWaitException ex) {
node.increaseThreadNum();
if (context.getCurEntry().getOriginNode() != null) {
// Add count for origin node.
context.getCurEntry().getOriginNode().increaseThreadNum();
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
// Add count for global inbound entry node for global statistics.
Constants.ENTRY_NODE.increaseThreadNum();
}
// Handle pass event with registered entry callback handlers.
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onPass(context, resourceWrapper, node, count, args);
}
} catch (BlockException e) {
// Blocked, set block exception to current entry.
context.getCurEntry().setBlockError(e);
// Add block count.
node.increaseBlockQps(count);
if (context.getCurEntry().getOriginNode() != null) {
context.getCurEntry().getOriginNode().increaseBlockQps(count);
}
if (resourceWrapper.getEntryType() == EntryType.IN) {
// Add count for global inbound entry node for global statistics.
Constants.ENTRY_NODE.increaseBlockQps(count);
}
// Handle block event with registered entry callback handlers.
for (ProcessorSlotEntryCallback<DefaultNode> handler : StatisticSlotCallbackRegistry.getEntryCallbacks()) {
handler.onBlocked(e, context, resourceWrapper, node, count, args);
}
throw e;
} catch (Throwable e) {
// Unexpected internal error, set error to current entry.
context.getCurEntry().setError(e);
throw e;
}
}
主要看try里的逻辑:如果方法执行成功,说明没有被限流,增加DefaultNode、ClusterNode的访问线程数和通过请求数,如果还设置了调用来源origin,则增加OriginNode的访问线程数和通过请求数。如果执行抛出BlockException说明被限流了,则增加上述相应节点的访问线程数和阻塞请求数。
限流实现
sentinel通过FlowSlot进行限流操作,FlowSlot有两个属性:
// 限流规则检验器
private final FlowRuleChecker checker;
// 限流规则提供器
private final Function<String, Collection<FlowRule>> ruleProvider = new Function<String, Collection<FlowRule>>() {
@Override
public Collection<FlowRule> apply(String resource) {
// Flow rule map should not be null.
Map<String, List<FlowRule>> flowRules = FlowRuleManager.getFlowRuleMap();
return flowRules.get(resource);
}
};
- checker的作用:根据限流规则判断当前调用是否触发限流规则。
- ruleProvider的作用:获取当前限流接口设置的限流规则。
FlowSlot的entry方法:
@Override
public void entry(Context context, ResourceWrapper resourceWrapper, DefaultNode node, int count,
boolean prioritized, Object... args) throws Throwable {
checkFlow(resourceWrapper, context, node, count, prioritized);
fireEntry(context, resourceWrapper, node, count, prioritized, args);
}
void checkFlow(ResourceWrapper resource, Context context, DefaultNode node, int count, boolean prioritized)
throws BlockException {
checker.checkFlow(ruleProvider, resource, context, node, count, prioritized);
}
entry方法实际就是调用了checker.checkFlow(),判断是否需要限流,要限流的话会抛出BlockException异常。
FlowRuleChecker类的关键实现:
/**
* 根据限流接口名获取对应的限流规则FlowRule对象列表,
* 依次判断是否违反了限流规则。
*/
public void checkFlow(Function<String, Collection<FlowRule>> ruleProvider, ResourceWrapper resource,
Context context, DefaultNode node, int count, boolean prioritized) throws BlockException {
if (ruleProvider == null || resource == null) {
return;
}
Collection<FlowRule> rules = ruleProvider.apply(resource.getName());
if (rules != null) {
for (FlowRule rule : rules) {
if (!canPassCheck(rule, context, node, count, prioritized)) {
throw new FlowException(rule.getLimitApp(), rule);
}
}
}
}
public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node,
int acquireCount) {
return canPassCheck(rule, context, node, acquireCount, false);
}
public boolean canPassCheck(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node, int acquireCount,
boolean prioritized) {
String limitApp = rule.getLimitApp();
if (limitApp == null) {
return true;
}
if (rule.isClusterMode()) {
return passClusterCheck(rule, context, node, acquireCount, prioritized);
}
return passLocalCheck(rule, context, node, acquireCount, prioritized);
}
// 根据限流策列获取对应Node,根据Node中的数据判断当前调用是否违反限流规则。
private static boolean passLocalCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
boolean prioritized) {
Node selectedNode = selectNodeByRequesterAndStrategy(rule, context, node);
if (selectedNode == null) {
return true;
}
return rule.getRater().canPass(selectedNode, acquireCount, prioritized);
}
// 根据限流规则、上下文等参数,选择对应的Node节点。
// 后续需要根据这个Node节点里的数据判断当前是否需限流。
static Node selectNodeByRequesterAndStrategy(/*@NonNull*/ FlowRule rule, Context context, DefaultNode node) {
// The limit app should not be empty.
String limitApp = rule.getLimitApp();
int strategy = rule.getStrategy();
String origin = context.getOrigin();
if (limitApp.equals(origin) && filterOrigin(origin)) {
if (strategy == RuleConstant.STRATEGY_DIRECT) {
// Matches limit origin, return origin statistic node.
return context.getOriginNode();
}
return selectReferenceNode(rule, context, node);
} else if (RuleConstant.LIMIT_APP_DEFAULT.equals(limitApp)) {
if (strategy == RuleConstant.STRATEGY_DIRECT) {
// Return the cluster node.
return node.getClusterNode();
}
return selectReferenceNode(rule, context, node);
} else if (RuleConstant.LIMIT_APP_OTHER.equals(limitApp)
&& FlowRuleManager.isOtherOrigin(origin, rule.getResource())) {
if (strategy == RuleConstant.STRATEGY_DIRECT) {
return context.getOriginNode();
}
return selectReferenceNode(rule, context, node);
}
return null;
}
根据限流接口名获取对应的限流规则FlowRule对象列表,遍历FlowRule对象列表,判断是否触发限流规则,判断逻辑是:
- 根据限流策列选择Node,后续需要根据Node节点里的数据判断当前是否需限流
- 将被选中的Node节点作为参数,调用rule.getRater().canPass(selectedNode, acquireCount, prioritized),将实际限流操作委托给流量整形控制器(TrafficShapingController)。