一、背景
1、使用场景
引用官方介绍集群流控的使用场景:
为什么要使用集群流控呢?假设我们希望给某个用户限制调用某个 API 的总 QPS 为 50,但机器数可能很多(比如有 100 台)。这时候我们很自然地就想到,找一个 server 来专门来统计总的调用量,其它的实例都与这台 server 通信来判断是否可以调用。这就是最基础的集群流控的方式。
另外集群流控还可以解决流量不均匀导致总体限流效果不佳的问题。假设集群中有 10 台机器,我们给每台机器设置单机限流阈值为 10 QPS,理想情况下整个集群的限流阈值就为 100 QPS。不过实际情况下流量到每台机器可能会不均匀,会导致总量没有到的情况下某些机器就开始限流。因此仅靠单机维度去限制的话会无法精确地限制总体流量。而集群流控可以精确地控制整个集群的调用总量,结合单机限流兜底,可以更好地发挥流量控制的效果。
集群流控中共有两种身份:
- Token Client:集群流控客户端,用于向所属 Token Server 通信请求 token。集群限流服务端会返回给客户端结果,决定是否限流。
- Token Server:即集群流控服务端,处理来自 Token Client 的请求,根据配置的集群规则判断是否应该发放 token(是否允许通过)。
2、使用方式
3、关于控制台
另外,官方提到,使用控制台操控集群流控,需要对控制台做二次开发。
二、TokenServer启动方式
TokenServer启动方式分为两种:嵌入式TokenServer、独立TokenServere。
1、嵌入TokenServer
嵌入TokenServer最重要的一点就是每个节点都是普通应用实例(对等),他即可能是TokenServer,将来也可能成为TokenClient。
官方提供了嵌入TokenServer的案例,这里利用了Sentinel的InitFunc扩展点,在Sentinel的Env类加载初始化时,执行所有InitFunc的init初始化方法。
public class DemoClusterInitFunc implements InitFunc {
@Override
public void init() throws Exception {
// 1. 加载FlowRule流控规则到FlowRuleManager
initDynamicRuleProperty();
// 2. [ClusterClientConfig]加载TokenClient通用配置,如请求TokenServer超时时间
initClientConfigProperty();
// 3. [ClusterGroupEntity]加载TokenClient分配配置,即哪些机器是TokenClient,哪个机器是TokenServer
initClientServerAssignProperty();
// 4. 加载ClusterFlowRule规则到ClusterFlowRuleManager,这是集群流控特有的衍生规则
registerClusterRuleSupplier();
// 5. [ServerTransportConfig]加载TokenServer通讯配置,如端口
initServerTransportConfigProperty();
// 6. [ClusterGroupEntity]加载当前嵌入TokenServer的状态(是TokenClient 还是TokenServer 还是未启动)
initStateProperty();
}
}
为了保证嵌入式TokenServer正常工作,需要加载这些配置(demo都集成Nacos作为动态数据源):
-
普通流控规则配置:包括流控规则、热点参数流控规则,其中如FlowRule的clusterMode属性为true;
// DemoClusterInitFunc private void initDynamicRuleProperty() { ReadableDataSource<String, List<FlowRule>> ruleSource = new NacosDataSource<>(remoteAddress, groupId, flowDataId, source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {})); FlowRuleManager.register2Property(ruleSource.getProperty()); } public class FlowRule extends AbstractRule { // 是否集群流控 private boolean clusterMode; // 集群流控配置 private ClusterFlowConfig clusterConfig; } -
TokenClient侧配置:因为嵌入式TokenServer每个节点对等,都可能成为TokenClient;
1)initClientConfigProperty,加载Client的通用配置,目前ClusterClientConfig只包含请求TokenServer的超时时间;
// DemoClusterInitFunc private void initClientConfigProperty() { ReadableDataSource<String, ClusterClientConfig> clientConfigDs = new NacosDataSource<>(remoteAddress, groupId, configDataId, source -> JSON.parseObject(source, new TypeReference<ClusterClientConfig>() {})); ClusterClientConfigManager.registerClientConfigProperty(clientConfigDs.getProperty()); } public class ClusterClientConfig { private Integer requestTimeout; }2)initClientServerAssignProperty,加载TokenClient分配配置,即哪些机器是TokenClient,哪个机器是TokenServer。目前只保存了当前节点所在集群的TokenServer的ip和port;
// DemoClusterInitFunc private void initClientServerAssignProperty() { ReadableDataSource<String, ClusterClientAssignConfig> clientAssignDs = new NacosDataSource<>(remoteAddress, groupId, clusterMapDataId, source -> { List<ClusterGroupEntity> groupList = JSON.parseObject(source, new TypeReference<List<ClusterGroupEntity>>() {}); return Optional.ofNullable(groupList) .flatMap(this::extractClientAssignment) .orElse(null); }); ClusterClientConfigManager.registerServerAssignProperty(clientAssignDs.getProperty()); } public class ClusterClientAssignConfig { // TokenServer ip private String serverHost; // TokenServer port private Integer serverPort; } -
TokenServer配置:
1)网络通讯配置,目前只有一个TokenServer的启动端口配置;
// DemoClusterInitFunc private void initServerTransportConfigProperty() { ReadableDataSource<String, ServerTransportConfig> serverTransportDs = new NacosDataSource<>(remoteAddress, groupId, clusterMapDataId, source -> { List<ClusterGroupEntity> groupList = JSON.parseObject(source, new TypeReference<List<ClusterGroupEntity>>() {}); return Optional.ofNullable(groupList) .flatMap(this::extractServerTransportConfig) .orElse(null); }); ClusterServerConfigManager.registerServerTransportProperty(serverTransportDs.getProperty()); } public class ServerTransportConfig { // TokenServer端口 private int port; }2)集群流控规则配置:如果使用集群流控,TokenServer必须要加载集群流控规则;数据源是流控规则FlowRule,但是ClusterFlowRuleManager内部将FlowRule做了一个转换,后续再看;
// DemoClusterInitFunc private void registerClusterRuleSupplier() { ClusterFlowRuleManager.setPropertySupplier(namespace -> { ReadableDataSource<String, List<FlowRule>> ds = new NacosDataSource<>(remoteAddress, groupId, namespace + DemoConstants.FLOW_POSTFIX, source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {})); return ds.getProperty(); }); } -
当前节点状态配置:设置当前节点是TokenServer还是TokenClient。这里Nacos配置模型解析为一个Integer的逻辑就不说了,因为数据源可能是其他数据源,比如SentinelDashboard;
// DemoClusterInitFunc private void initStateProperty() { ReadableDataSource<String, Integer> clusterModeDs = new NacosDataSource<>(remoteAddress, groupId, clusterMapDataId, source -> { List<ClusterGroupEntity> groupList = JSON.parseObject(source, new TypeReference<List<ClusterGroupEntity>>() {}); return Optional.ofNullable(groupList) .map(this::extractMode) .orElse(ClusterStateManager.CLUSTER_NOT_STARTED); }); ClusterStateManager.registerProperty(clusterModeDs.getProperty()); } private int extractMode(List<ClusterGroupEntity> groupList) { // 1. 判断当前节点是否是某个集群中的TokenServer角色 if (groupList.stream().anyMatch(this::machineEqual)) { return ClusterStateManager.CLUSTER_SERVER; } // 2. 判断当前节点是否是某个集群中的节点,如果是的话,成为TokenClient,否则标记为没开启 boolean canBeClient = groupList.stream() .flatMap(e -> e.getClientSet().stream()) .filter(Objects::nonNull) .anyMatch(e -> e.equals(getCurrentMachineId())); return canBeClient ? ClusterStateManager.CLUSTER_CLIENT : ClusterStateManager.CLUSTER_NOT_STARTED; }
2、独立TokenServer
独立TokenServer即启动单独一个进程负责为各应用实例颁发Token。
下面是官方的demo:
public class ClusterServerDemo {
public static void main(String[] args) throws Exception {
// 创建TokenServer
ClusterTokenServer tokenServer = new SentinelDefaultTokenServer();
// 加载TokenServer网络通讯配置
ClusterServerConfigManager.loadGlobalTransportConfig(new ServerTransportConfig()
.setIdleSeconds(600)
.setPort(11111));
// 加载NameSpace --- 区分集群
ClusterServerConfigManager.loadServerNamespaceSet(Collections.singleton(DemoConstants.APP_NAME));
// 启动TokenServer
tokenServer.start();
}
}
public class DemoClusterServerInitFunc implements InitFunc {
private final String remoteAddress = "localhost:8848";
private final String groupId = "SENTINEL_GROUP";
private final String namespaceSetDataId = "cluster-server-namespace-set";
private final String serverTransportDataId = "cluster-server-transport-config";
@Override
public void init() throws Exception {
// 动态集群流控规则配置
ClusterFlowRuleManager.setPropertySupplier(namespace -> {
ReadableDataSource<String, List<FlowRule>> ds = new NacosDataSource<>(remoteAddress, groupId,
namespace + DemoConstants.FLOW_POSTFIX,
source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {}));
return ds.getProperty();
});
// 动态namespace配置
ReadableDataSource<String, Set<String>> namespaceDs = new NacosDataSource<>(remoteAddress, groupId,
namespaceSetDataId, source -> JSON.parseObject(source, new TypeReference<Set<String>>() {}));
ClusterServerConfigManager.registerNamespaceSetProperty(namespaceDs.getProperty());
// 动态网络通讯配置
ReadableDataSource<String, ServerTransportConfig> transportConfigDs = new NacosDataSource<>(remoteAddress,
groupId, serverTransportDataId,
source -> JSON.parseObject(source, new TypeReference<ServerTransportConfig>() {}));
ClusterServerConfigManager.registerServerTransportProperty(transportConfigDs.getProperty());
}
}
独立TokenServer需要加载以下配置:
- 集群流控规则:同嵌入式TokenServer,一般用单机流控规则做转换;
- namespace配置:独立TokenServer可以为多个集群颁发Token,不同集群通过namespace区分;
- 网络通讯配置:同嵌入式TokenServer;
三、ClusterFlowConfig
集群流控规则配置,需要在FlowRule中开启clusterMode=true代表启用集群流控,本地流控规则很多特性都无效。
public class FlowRule extends AbstractRule {
// 阈值类型 0-线程数 1-QPS(默认)--- 无效,只有QPS
private int grade = RuleConstant.FLOW_GRADE_QPS;
// 阈值
private double count;
// 流控模式 0-直接 1-关联 2-链路 --- 无效
private int strategy = RuleConstant.STRATEGY_DIRECT;
// 引用资源 --- 无效
private String refResource;
// 流控效果 0-快速失败 1-Warm up 2-排队等待 --- 无效
private int controlBehavior = RuleConstant.CONTROL_BEHAVIOR_DEFAULT;
// 预热时长(s) --- 无效
private int warmUpPeriodSec = 10;
// 排队等待时长(ms)--- 无效
private int maxQueueingTimeMs = 500;
// 是否集群流控
private boolean clusterMode;
// 集群流控配置
private ClusterFlowConfig clusterConfig;
// 流量整形控制器 --- 无效
private TrafficShapingController controller;
}
集群流控规则配置ClusterFlowConfig,中间只有部分配置生效。
public class ClusterFlowConfig {
// 全局唯一id,在一个集群内部,同一个集群流控规则对应一个id
private Long flowId;
// 集群阈值模式:0-单机均摊 1-总体阈值
private int thresholdType = ClusterRuleConstant.FLOW_THRESHOLD_AVG_LOCAL;
// TokenServer不可用时,是否降级使用本地流控 默认true
private boolean fallbackToLocalWhenFail = true;
// 策略 只有默认 --- 无效
private int strategy = ClusterRuleConstant.FLOW_CLUSTER_STRATEGY_NORMAL;
// 采样数量,默认10
private int sampleCount = ClusterRuleConstant.DEFAULT_CLUSTER_SAMPLE_COUNT;
// 窗口大小1s
private int windowIntervalMs = RuleConstant.DEFAULT_WINDOW_INTERVAL_MS;
// 持有token的超时时间 --- 无效
private long resourceTimeout = 2000;
// 持有token超时后,处理策略 0-忽略 1-释放 默认0 --- 无效
private int resourceTimeoutStrategy = RuleConstant.DEFAULT_RESOURCE_TIMEOUT_STRATEGY;
// 如果prioritized=true的情况下,处理策略 --- 无效
private int acquireRefuseStrategy = RuleConstant.DEFAULT_BLOCK_STRATEGY;
// TokenClient下线时间 2s,如果下线,删除所有当前节点持有的token ---无效
private long clientOfflineTime = 2000;
}
四、FlowSlot
之前已经看过流控规则FlowSlot的本地规则校验逻辑,从这里入手,看一下集群流控与本地流控的区别。
// FlowRuleChecker.java
public boolean canPassCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
boolean prioritized) {
String limitApp = rule.getLimitApp();
if (limitApp == null) {
return true;
}
// 集群流控
if (rule.isClusterMode()) {
return passClusterCheck(rule, context, node, acquireCount, prioritized);
}
// 单机流控
return passLocalCheck(rule, context, node, acquireCount, prioritized);
}
集群流控的客户端流程非常简单:
- 获取发放Token服务TokenService;
- 调用TokenService获取Token;
- 处理获取Token结果;
// FlowRuleChecker.java
private static boolean passClusterCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
boolean prioritized) {
try {
// 选择发放token的服务
TokenService clusterService = pickClusterService();
if (clusterService == null) {
return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
}
// 获取token
long flowId = rule.getClusterConfig().getFlowId();
TokenResult result = clusterService.requestToken(flowId, acquireCount, prioritized);
// 处理获取token的结果
return applyTokenResult(result, rule, context, node, acquireCount, prioritized);
} catch (Throwable ex) {
RecordLog.warn("[FlowRuleChecker] Request cluster token unexpected failed", ex);
}
// 降级使用本地流控
return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
}
TokenService总得来说就暴露了一个方法,就是获取token。
public interface TokenService {
// 获取流控规则token
TokenResult requestToken(Long ruleId, int acquireCount, boolean prioritized);
// 获取热点参数流控规则token
TokenResult requestParamToken(Long ruleId, int acquireCount, Collection<Object> params);
}
获取TokenService
首先pickClusterService得到TokenService,如果当前节点状态是TokenClient,返回ClusterTokenClient实例;如果当前节点状态是TokenServer,返回EmbeddedClusterTokenServer实例,这只有嵌入式TokenServer才会有这种情况。
// FlowRuleChecker.java
private static TokenService pickClusterService() {
if (ClusterStateManager.isClient()) {
return TokenClientProvider.getClient();
}
if (ClusterStateManager.isServer()) {
return EmbeddedClusterTokenServerProvider.getServer();
}
return null;
}
获取Token
第二步,TokenService.requestToken获取token。
如果当前节点是TokenClient,组装集群流控规则id(FlowRule.ClusterFlowConfig.flowId)、需要获取token数量acquireCount、是否优先prioritized,请求TokenServer。
// DefaultClusterTokenClient.java
@Override
public TokenResult requestToken(Long flowId, int acquireCount, boolean prioritized) {
if (notValidRequest(flowId, acquireCount)) {
return badRequest();
}
FlowRequestData data = new FlowRequestData().setCount(acquireCount)
.setFlowId(flowId).setPriority(prioritized);
ClusterRequest<FlowRequestData> request = new ClusterRequest<>(ClusterConstants.MSG_TYPE_FLOW, data);
try {
TokenResult result = sendTokenRequest(request);
logForResult(result);
return result;
} catch (Exception ex) {
ClusterClientStatLogUtil.log(ex.getMessage());
return new TokenResult(TokenResultStatus.FAIL);
}
}
如果当前节点是嵌入式TokenServer,本身即为TokenServer,DefaultEmbeddedTokenServer实例委托DefaultTokenService获取Token;
此外,如果当前节点是独立TokenServer,接收到TokenClient的FlowRequestData,也会走DefaultTokenService获取Token。
public class DefaultEmbeddedTokenServer implements EmbeddedClusterTokenServer {
private final TokenService tokenService = TokenServiceProvider.getService();
@Override
public TokenResult requestToken(Long ruleId, int acquireCount, boolean prioritized) {
if (tokenService != null) {
return tokenService.requestToken(ruleId, acquireCount, prioritized);
}
return new TokenResult(TokenResultStatus.FAIL);
}
}
DefaultTokenService的requestToken方法,是集群流控核心逻辑。
如果请求参数非法,返回BAD_REQUEST;
根据集群流控规则id,获取流控规则,如果没找到会返回NO_RULE_EXISTS。
@Spi(isDefault = true)
public class DefaultTokenService implements TokenService {
@Override
public TokenResult requestToken(Long ruleId, int acquireCount, boolean prioritized) {
if (notValidRequest(ruleId, acquireCount)) {
return badRequest(); // BAD_REQUEST
}
// The rule should be valid.
FlowRule rule = ClusterFlowRuleManager.getFlowRuleById(ruleId);
if (rule == null) {
return new TokenResult(TokenResultStatus.NO_RULE_EXISTS);
}
return ClusterFlowChecker.acquireClusterToken(rule, acquireCount, prioritized);
}
在流控规则存在的情况下,进入DefaultTokenService的acquireClusterToken方法。
// DefaultTokenService.java
static TokenResult acquireClusterToken(FlowRule rule, int acquireCount, boolean prioritized) {
Long id = rule.getClusterConfig().getFlowId();
// 一个全局漏桶,控制namespace级别下请求TokenServer的QPS限制,默认3万QPS
if (!allowProceed(id)) {
return new TokenResult(TokenResultStatus.TOO_MANY_REQUEST);
}
// 集群流量统计指标
ClusterMetric metric = ClusterMetricStatistics.getMetric(id);
if (metric == null) {
return new TokenResult(TokenResultStatus.FAIL);
}
// qps
double latestQps = metric.getAvg(ClusterFlowEvent.PASS);
// 阈值 = 配置阈值 * 系数
double globalThreshold = calcGlobalThreshold(rule) * ClusterServerConfigManager.getExceedCount();
// 剩余token数量 = 阈值 - qps - 需求token数量
double nextRemaining = globalThreshold - latestQps - acquireCount;
if (nextRemaining >= 0) {
// 统计通过流量并返回OK
// TODO: checking logic and metric operation should be separated.
metric.add(ClusterFlowEvent.PASS, acquireCount);
metric.add(ClusterFlowEvent.PASS_REQUEST, 1);
if (prioritized) {
metric.add(ClusterFlowEvent.OCCUPIED_PASS, acquireCount);
}
return new TokenResult(TokenResultStatus.OK)
.setRemaining((int) nextRemaining)
.setWaitInMs(0);
} else {
// 如果prioritized=true,尝试获取下一个窗口的token,返回SHOULD_WAIT和等待时长waitInMs
if (prioritized) {
double occupyAvg = metric.getAvg(ClusterFlowEvent.WAITING);
if (occupyAvg <= ClusterServerConfigManager.getMaxOccupyRatio() * globalThreshold) {
int waitInMs = metric.tryOccupyNext(ClusterFlowEvent.PASS, acquireCount, globalThreshold);
if (waitInMs > 0) {
ClusterServerStatLogUtil.log("flow|waiting|" + id);
return new TokenResult(TokenResultStatus.SHOULD_WAIT)
.setRemaining(0)
.setWaitInMs(waitInMs);
}
}
}
// 其他情况统计失败流量,返回BLOCKED
metric.add(ClusterFlowEvent.BLOCK, acquireCount);
metric.add(ClusterFlowEvent.BLOCK_REQUEST, 1);
ClusterServerStatLogUtil.log("flow|block|" + id, acquireCount);
ClusterServerStatLogUtil.log("flow|block_request|" + id, 1);
if (prioritized) {
metric.add(ClusterFlowEvent.OCCUPIED_BLOCK, acquireCount);
ClusterServerStatLogUtil.log("flow|occupied_block|" + id, 1);
}
return blockedResult();
}
}
首先,TokenServer对于获取Token接口有全局限流,如果全局限流返回TOO_MANY_REQUEST。
根据集群流控规则找到所属namespace,每个namespace有一个限流器RequestLimiter,用于控制一个namespace请求TokenServer的速率。默认使用default命名空间,限流阈值是3万。
// ClusterFlowChecker.java
static boolean allowProceed(long flowId) {
String namespace = ClusterFlowRuleManager.getNamespace(flowId);
return GlobalRequestLimiter.tryPass(namespace);
}
// GlobalRequestLimiter.java
public static boolean tryPass(String namespace) {
if (namespace == null) {
return false;
}
RequestLimiter limiter = GLOBAL_QPS_LIMITER_MAP.get(namespace);
if (limiter == null) {
return true;
}
return limiter.tryPass();
}
第二步,计算剩余token数量 = 阈值 - qps - 需求token数量。如果阈值类型是总体阈值,直接返回配置的阈值;如果阈值类型是单机均摊,阈值 = 配置阈值 * 使用当前集群流控规则的客户端长连接数量。
// ClusterFlowChecker.java
private static double calcGlobalThreshold(FlowRule rule) {
double count = rule.getCount();
switch (rule.getClusterConfig().getThresholdType()) {
case ClusterRuleConstant.FLOW_THRESHOLD_GLOBAL:
return count;
case ClusterRuleConstant.FLOW_THRESHOLD_AVG_LOCAL:
default:
int connectedCount = ClusterFlowRuleManager.getConnectedCount(rule.getClusterConfig().getFlowId());
return count * connectedCount;
}
}
第三步,如果剩余token数量大于等于0,表示获取token成功,统计成功请求数量,返回OK。
第四步,如果剩余token数量小于0,表示没有获取token成功,处理方式和本地流控类似。
如果prioritized=true,尝试将本次请求算到下一个时间窗口,给客户端返回SHOULD_WAIT状态,并返回等待时长,默认情况下,要求当前等待数量ClusterFlowEvent.WAITING不能超过阈值。
如果prioritized=false 或 prioritized=true但尝试等待失败,返回客户端BLOCK状态。
处理TokenResult
对于TokenServer或自身(本身是嵌入式TokenServer)返回的TokenResult,根据不同的状态码做不同的处理。
- OK:返回true;
- SHOULD_WAIT:代表prioritized=true获取Token成功,但是要睡眠到下一个时间窗口,返回true;
- NO_RULE_EXISTS、BAD_REQUEST、FAIL、TOO_MANY_REQUEST:这几种情况下,允许集群流控规则降级为本地流控规则(fallbackToLocalWhenFail=true);
- BLOCKED:当前资源被集群流控,返回false;
// ClusterFlowChecker.java
private static boolean applyTokenResult(TokenResult result, FlowRule rule, Context context,
DefaultNode node,
int acquireCount, boolean prioritized) {
switch (result.getStatus()) {
case TokenResultStatus.OK: // 集群流控校验通过
return true;
case TokenResultStatus.SHOULD_WAIT: // prioritized=true的情况下,睡眠x毫秒,占用下一个时间窗口
// Wait for next tick.
try {
Thread.sleep(result.getWaitInMs());
} catch (InterruptedException e) {
e.printStackTrace();
}
return true;
case TokenResultStatus.NO_RULE_EXISTS: // 集群流控规则不存在
case TokenResultStatus.BAD_REQUEST: // 非法请求参数
case TokenResultStatus.FAIL: // 集群流量指标不存在
case TokenResultStatus.TOO_MANY_REQUEST: // TokenServer限流,拒绝访问
return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
case TokenResultStatus.BLOCKED: // 集群流控生效
default:
return false;
}
}
五、网络通讯
上面已经过完了所有集群流控的逻辑,非常简单。
下面过一下TokenClient与TokenServer的底层网络通讯方式。
1、TokenServer
ClusterTokenServer是TokenServer的抽象接口,提供启动和停止两个抽象方法。
public interface ClusterTokenServer {
void start() throws Exception;
void stop() throws Exception;
}
NettyTransportServer真正实现底层网络通讯。
public class NettyTransportServer implements ClusterTokenServer {
// 启动端口
private final int port;
// Netty acceptor eventloop
private NioEventLoopGroup bossGroup;
// Netty io+业务线程池
private NioEventLoopGroup workerGroup;
// 客户端连接池
private final ConnectionPool connectionPool = new ConnectionPool();
// 服务端启动状态
private final AtomicInteger currentState = new AtomicInteger(SERVER_STATUS_OFF);
// 启动失败次数
private final AtomicInteger failedTimes = new AtomicInteger(0);
public NettyTransportServer(int port) {
this.port = port;
}
@Override
public void start() {
if (!currentState.compareAndSet(SERVER_STATUS_OFF, SERVER_STATUS_STARTING)) {
return;
}
ServerBootstrap b = new ServerBootstrap();
this.bossGroup = new NioEventLoopGroup(1);
// 2*核数
this.workerGroup = new NioEventLoopGroup(DEFAULT_EVENT_LOOP_THREADS);
b.group(bossGroup, workerGroup)
.channel(NioServerSocketChannel.class)
.option(ChannelOption.SO_BACKLOG, 128)
.handler(new LoggingHandler(LogLevel.INFO))
.childHandler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
ChannelPipeline p = ch.pipeline();
// 长度解码
p.addLast(new LengthFieldBasedFrameDecoder(1024, 0, 2, 0, 2));
// 业务解码
p.addLast(new NettyRequestDecoder());
// 长度编码
p.addLast(new LengthFieldPrepender(2));
// 业务编码
p.addLast(new NettyResponseEncoder());
// 业务处理器
p.addLast(new TokenServerHandler(connectionPool));
}
})
.childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
.childOption(ChannelOption.SO_SNDBUF, 32 * 1024)
.childOption(ChannelOption.CONNECT_TIMEOUT_MILLIS, 10000)
.childOption(ChannelOption.SO_TIMEOUT, 10)
.childOption(ChannelOption.TCP_NODELAY, true)
.childOption(ChannelOption.SO_RCVBUF, 32 * 1024);
b.bind(port).addListener(new GenericFutureListener<ChannelFuture>() {
@Override
public void operationComplete(ChannelFuture future) {
if (future.cause() != null) {
// 启动失败重试
RecordLog.info("[NettyTransportServer] Token server start failed (port=" + port + "), failedTimes: " + failedTimes.get(),
future.cause());
currentState.compareAndSet(SERVER_STATUS_STARTING, SERVER_STATUS_OFF);
int failCount = failedTimes.incrementAndGet();
if (failCount > MAX_RETRY_TIMES) { // 最多重试3次
return;
}
try {
Thread.sleep(failCount * RETRY_SLEEP_MS); // 间隔2s重试
start();
} catch (Throwable e) {
RecordLog.info("[NettyTransportServer] Failed to start token server when retrying", e);
}
} else {
currentState.compareAndSet(SERVER_STATUS_STARTING, SERVER_STATUS_STARTED);
}
}
});
}
重点关注几个点:
- 线程模型:TokenServer使用Netty作为底层通讯框架,boss线程组有1个线程负责建立客户端连接;worker线程组有核数*2个线程,负责包括io读写和所有业务处理;
- 编码解码:TokenServer使用定长编解码的方式(LengthFieldBasedFrameDecoder、LengthFieldPrepender),解决了粘包拆包问题。同时也提供了自己的业务编解码器(NettyRequestDecoder、NettyResponseEncoder);
- 启动失败重试:TokenServer支持启动失败重试,最多重试3次,每次间隔2秒;
对于嵌入式TokenServer,实现类是DefaultEmbeddedTokenServer。底层委托SentinelDefaultTokenServer处理start和stop逻辑。当ClusterStateManager管理的当前节点状态变为TokenServer时,会启动DefaultEmbeddedTokenServer。
public class DefaultEmbeddedTokenServer implements EmbeddedClusterTokenServer {
private final ClusterTokenServer server = new SentinelDefaultTokenServer(true);
@Override
public void start() throws Exception {
server.start();
}
@Override
public void stop() throws Exception {
server.stop();
}
}
对于独立TokenServer,参考官方ClusterServerDemo,直接使用SentinelDefaultTokenServer启动。
public class ClusterServerDemo {
public static void main(String[] args) throws Exception {
// 创建TokenServer
ClusterTokenServer tokenServer = new SentinelDefaultTokenServer();
// ...
tokenServer.start();
}
}
SentinelDefaultTokenServer底层还是委托NettyTransportServer实现start和stop方法。
public class SentinelDefaultTokenServer implements ClusterTokenServer {
// 是否嵌入
private final boolean embedded;
// NettyTransportServer实例
private ClusterTokenServer server;
public SentinelDefaultTokenServer() {
this(false);
}
public SentinelDefaultTokenServer(boolean embedded) {
this.embedded = embedded;
ClusterServerConfigManager.addTransportConfigChangeObserver(new ServerTransportConfigObserver() {
@Override
public void onTransportConfigChange(ServerTransportConfig config) {
changeServerConfig(config);
}
});
initNewServer();
}
private void initNewServer() {
if (server != null) {
return;
}
int port = ClusterServerConfigManager.getPort();
if (port > 0) {
this.server = new NettyTransportServer(port);
this.port = port;
}
}
}
2、TokenClient
DefaultClusterTokenClient在构造时,创建了ClusterTransportClient负责TokenClient底层网络通讯。
public class DefaultClusterTokenClient implements ClusterTokenClient {
private ClusterTransportClient transportClient;
private TokenServerDescriptor serverDescriptor;
private final AtomicBoolean shouldStart = new AtomicBoolean(false);
public DefaultClusterTokenClient() {
ClusterClientConfigManager.addServerChangeObserver(new ServerChangeObserver() {
@Override
public void onRemoteServerChange(ClusterClientAssignConfig assignConfig) {
changeServer(assignConfig);
}
});
initNewConnection();
}
private void initNewConnection() {
if (transportClient != null) {
return;
}
String host = ClusterClientConfigManager.getServerHost();
int port = ClusterClientConfigManager.getServerPort();
if (StringUtil.isBlank(host) || port <= 0) {
return;
}
try {
this.transportClient = new NettyTransportClient(host, port);
this.serverDescriptor = new TokenServerDescriptor(host, port);
} catch (Exception ex) {
RecordLog.warn("[DefaultClusterTokenClient] Failed to initialize new token client", ex);
}
}
}
当ClusterStateManager管理的当前节点状态变为TokenClient时,会启动DefaultClusterTokenClient。
// ClusterStateManager.java
public static boolean setToClient() {
if (mode == CLUSTER_CLIENT) {
return true;
}
mode = CLUSTER_CLIENT;
sleepIfNeeded();
lastModified = TimeUtil.currentTimeMillis();
return startClient();
}
private static boolean startClient() {
try {
// 如果之前当前节点是嵌入式TokenServer,先停止
EmbeddedClusterTokenServer server = EmbeddedClusterTokenServerProvider.getServer();
if (server != null) {
server.stop();
}
// 启动TokenClient
ClusterTokenClient tokenClient = TokenClientProvider.getClient();
if (tokenClient != null) {
tokenClient.start();
return true;
} else {
return false;
}
} catch (Exception ex) {
return false;
}
}
// DefaultClusterTokenClient.java
public void start() throws Exception {
if (shouldStart.compareAndSet(false, true)) {
startClientIfScheduled();
}
}
private void startClientIfScheduled() throws Exception {
if (shouldStart.get()) {
if (transportClient != null) {
transportClient.start();
} else {
// ...
}
}
}
DefaultClusterTokenClient底层启动了NettyTransportClient。
NettyTransportClient通过initClientBootstrap创建了Bootstrap,connect建立与TokenServer的连接。TokenClient用一个线程负责io读写和业务处理。
// NettyTransportClient.java
private Bootstrap initClientBootstrap() {
Bootstrap b = new Bootstrap();
// 1个线程
eventLoopGroup = new NioEventLoopGroup();
b.group(eventLoopGroup)
.channel(NioSocketChannel.class)
.option(ChannelOption.TCP_NODELAY, true)
.option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, ClusterClientConfigManager.getConnectTimeout())
.handler(new ChannelInitializer<SocketChannel>() {
@Override
public void initChannel(SocketChannel ch) throws Exception {
clientHandler = new TokenClientHandler(currentState, disconnectCallback);
ChannelPipeline pipeline = ch.pipeline();
pipeline.addLast(new LengthFieldBasedFrameDecoder(1024, 0, 2, 0, 2));
pipeline.addLast(new NettyResponseDecoder());
pipeline.addLast(new LengthFieldPrepender(2));
pipeline.addLast(new NettyRequestEncoder());
pipeline.addLast(clientHandler);
}
});
return b;
}
private void connect(Bootstrap b) {
if (currentState.compareAndSet(ClientConstants.CLIENT_STATUS_OFF, ClientConstants.CLIENT_STATUS_PENDING)) {
b.connect(host, port)
.addListener(new GenericFutureListener<ChannelFuture>() {
@Override
public void operationComplete(ChannelFuture future) {
if (future.cause() != null) {
failConnectedTime.incrementAndGet();
channel = null;
} else {
failConnectedTime.set(0);
channel = future.channel();
}
}
});
}
}
总结
集群流控出现的目的是:
- 集群流量由统一节点统计,并根据集群情况给集群节点分发token;
- 解决在同一应用的不同节点间流量分布不均匀,导致本地流控效果不佳的问题;
集群流控中有两种角色:
- Token Client:集群流控客户端,用于向所属 Token Server 通信请求 token。集群限流服务端会返回给客户端结果,决定是否限流;
- Token Server:即集群流控服务端,处理来自 Token Client 的请求,根据配置的集群规则判断是否应该发放 token(是否允许通过);
TokenServer有两种启动方式:
- 嵌入TokenServer:TokenServer与业务应用在同一个进程中。业务集群中每个节点是对等的,每个节点都可能成为TokenServer,也可能成为TokenClient,具体是什么角色,取决于外部数据源(如Nacos);
- 独立TokenServer:单独进程启动,与业务应用无关;
集群流控配置:
-
集群流控配置存在于FlowRule内部,当设置clusterMode=true时生效;
-
ClusterFlowConfig集群流控配置中重要的属性包括:
- flowId:集群流控规则唯一id;
- thresholdType:集群阈值模式:0-单机均摊 1-总体阈值;单机均摊在做集群流控规则校验时,会将FlowRule中的阈值乘以集群中节点数量作为最终的校验阈值;总体阈值即为FlowRule中的阈值;
- fallbackToLocalWhenFail:当TokenServer返回TokenResult的状态码是NO_RULE_EXISTS、BAD_REQUEST、FAIL、TOO_MANY_REQUEST其中之一时,是否允许降级使用本地流控规则;
FlowSlot校验集群流控规则分为三步:
-
获取TokenService:如果当前节点是嵌入式TokenServer,直接获取本地EmbeddedClusterTokenServer,不用走远程调用;如果当前节点是TokenClient,获取ClusterTokenClient实例,需要远程调用TokenServer获取Token;
-
获取Token:DefaultTokenService的requestToken方法是获取Token的核心方法。
-
处理TokenResult:根据TokenResult中的状态码不同,客户端执行流控效果。
- OK:返回true;
- SHOULD_WAIT:代表prioritized=true获取Token成功,但是要睡眠到下一个时间窗口,返回true;
- NO_RULE_EXISTS、BAD_REQUEST、FAIL、TOO_MANY_REQUEST:这几种情况下,允许集群流控规则降级为本地流控规则(fallbackToLocalWhenFail=true);
- BLOCKED:当前资源被集群流控,返回false;
网络通讯:
TokenServer和TokenClient使用Netty作为底层网络通讯框架,客户端和服务端保持长连接。
线程模型上,TokenServer启动1个boss线程负责与客户端建立连接,2*核数个worker线程负责io读写和业务处理;TokenClient启动1个线程处理所有业务。
编解码方面,利用定长编解码器解决粘包拆包问题。