Sentinel源码(六)集群流控

3,229 阅读14分钟

一、背景

1、使用场景

引用官方介绍集群流控的使用场景:

为什么要使用集群流控呢?假设我们希望给某个用户限制调用某个 API 的总 QPS 为 50,但机器数可能很多(比如有 100 台)。这时候我们很自然地就想到,找一个 server 来专门来统计总的调用量,其它的实例都与这台 server 通信来判断是否可以调用。这就是最基础的集群流控的方式。

另外集群流控还可以解决流量不均匀导致总体限流效果不佳的问题。假设集群中有 10 台机器,我们给每台机器设置单机限流阈值为 10 QPS,理想情况下整个集群的限流阈值就为 100 QPS。不过实际情况下流量到每台机器可能会不均匀,会导致总量没有到的情况下某些机器就开始限流。因此仅靠单机维度去限制的话会无法精确地限制总体流量。而集群流控可以精确地控制整个集群的调用总量,结合单机限流兜底,可以更好地发挥流量控制的效果。

集群流控中共有两种身份:

  • Token Client:集群流控客户端,用于向所属 Token Server 通信请求 token。集群限流服务端会返回给客户端结果,决定是否限流。
  • Token Server:即集群流控服务端,处理来自 Token Client 的请求,根据配置的集群规则判断是否应该发放 token(是否允许通过)。

2、使用方式

参照github.com/alibaba/Sen…

3、关于控制台

另外,官方提到,使用控制台操控集群流控,需要对控制台做二次开发。

wiki集群流控.png

二、TokenServer启动方式

TokenServer启动方式分为两种:嵌入式TokenServer、独立TokenServere。

1、嵌入TokenServer

嵌入TokenServer.png

嵌入TokenServer最重要的一点就是每个节点都是普通应用实例(对等),他即可能是TokenServer,将来也可能成为TokenClient。

官方提供了嵌入TokenServer的案例,这里利用了Sentinel的InitFunc扩展点,在Sentinel的Env类加载初始化时,执行所有InitFunc的init初始化方法。

public class DemoClusterInitFunc implements InitFunc {

    @Override
    public void init() throws Exception {
        // 1. 加载FlowRule流控规则到FlowRuleManager
        initDynamicRuleProperty();

        // 2. [ClusterClientConfig]加载TokenClient通用配置,如请求TokenServer超时时间
        initClientConfigProperty();
        // 3. [ClusterGroupEntity]加载TokenClient分配配置,即哪些机器是TokenClient,哪个机器是TokenServer
        initClientServerAssignProperty();

        // 4. 加载ClusterFlowRule规则到ClusterFlowRuleManager,这是集群流控特有的衍生规则
        registerClusterRuleSupplier();
        // 5. [ServerTransportConfig]加载TokenServer通讯配置,如端口
        initServerTransportConfigProperty();

        // 6. [ClusterGroupEntity]加载当前嵌入TokenServer的状态(是TokenClient 还是TokenServer 还是未启动)
        initStateProperty();
    }
}

为了保证嵌入式TokenServer正常工作,需要加载这些配置(demo都集成Nacos作为动态数据源):

  1. 普通流控规则配置:包括流控规则、热点参数流控规则,其中如FlowRule的clusterMode属性为true;

    // DemoClusterInitFunc
    private void initDynamicRuleProperty() {
     ReadableDataSource<String, List<FlowRule>> ruleSource = new NacosDataSource<>(remoteAddress, groupId, flowDataId, source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {}));
     FlowRuleManager.register2Property(ruleSource.getProperty());
    }
    public class FlowRule extends AbstractRule {
       // 是否集群流控
       private boolean clusterMode;
       // 集群流控配置
       private ClusterFlowConfig clusterConfig;
    }
    
  2. TokenClient侧配置:因为嵌入式TokenServer每个节点对等,都可能成为TokenClient;

    1)initClientConfigProperty,加载Client的通用配置,目前ClusterClientConfig只包含请求TokenServer的超时时间;

    // DemoClusterInitFunc
    private void initClientConfigProperty() {
           ReadableDataSource<String, ClusterClientConfig> clientConfigDs = new NacosDataSource<>(remoteAddress, groupId,
               configDataId, source -> JSON.parseObject(source, new TypeReference<ClusterClientConfig>() {}));
           ClusterClientConfigManager.registerClientConfigProperty(clientConfigDs.getProperty());
       }
    
    public class ClusterClientConfig {
       private Integer requestTimeout;
    }
    

    2)initClientServerAssignProperty,加载TokenClient分配配置,即哪些机器是TokenClient,哪个机器是TokenServer。目前只保存了当前节点所在集群的TokenServer的ip和port;

    // DemoClusterInitFunc
    private void initClientServerAssignProperty() {
       ReadableDataSource<String, ClusterClientAssignConfig> clientAssignDs = new NacosDataSource<>(remoteAddress, groupId,
           clusterMapDataId, source -> {
           List<ClusterGroupEntity> groupList = JSON.parseObject(source, new TypeReference<List<ClusterGroupEntity>>() {});
           return Optional.ofNullable(groupList)
               .flatMap(this::extractClientAssignment)
               .orElse(null);
       });
       ClusterClientConfigManager.registerServerAssignProperty(clientAssignDs.getProperty());
    }
    public class ClusterClientAssignConfig {
           // TokenServer ip
       private String serverHost;
       // TokenServer port
       private Integer serverPort;
    }
    
  3. TokenServer配置

    1)网络通讯配置,目前只有一个TokenServer的启动端口配置;

    // DemoClusterInitFunc
    private void initServerTransportConfigProperty() {
       ReadableDataSource<String, ServerTransportConfig> serverTransportDs = new NacosDataSource<>(remoteAddress, groupId,
           clusterMapDataId, source -> {
           List<ClusterGroupEntity> groupList = JSON.parseObject(source, new TypeReference<List<ClusterGroupEntity>>() {});
           return Optional.ofNullable(groupList)
               .flatMap(this::extractServerTransportConfig)
               .orElse(null);
       });
       ClusterServerConfigManager.registerServerTransportProperty(serverTransportDs.getProperty());
    }
    
    public class ServerTransportConfig {
       // TokenServer端口
       private int port;
    }
    

    2)集群流控规则配置:如果使用集群流控,TokenServer必须要加载集群流控规则;数据源是流控规则FlowRule,但是ClusterFlowRuleManager内部将FlowRule做了一个转换,后续再看;

    // DemoClusterInitFunc
    private void registerClusterRuleSupplier() {
       ClusterFlowRuleManager.setPropertySupplier(namespace -> {
           ReadableDataSource<String, List<FlowRule>> ds = new NacosDataSource<>(remoteAddress, groupId,
               namespace + DemoConstants.FLOW_POSTFIX, source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {}));
           return ds.getProperty();
       });
    }
    
  4. 当前节点状态配置:设置当前节点是TokenServer还是TokenClient。这里Nacos配置模型解析为一个Integer的逻辑就不说了,因为数据源可能是其他数据源,比如SentinelDashboard;

    // DemoClusterInitFunc
    private void initStateProperty() {
       ReadableDataSource<String, Integer> clusterModeDs = new NacosDataSource<>(remoteAddress, groupId,
           clusterMapDataId, source -> {
           List<ClusterGroupEntity> groupList = JSON.parseObject(source, new TypeReference<List<ClusterGroupEntity>>() {});
           return Optional.ofNullable(groupList)
               .map(this::extractMode)
               .orElse(ClusterStateManager.CLUSTER_NOT_STARTED);
       });
       ClusterStateManager.registerProperty(clusterModeDs.getProperty());
    }
    private int extractMode(List<ClusterGroupEntity> groupList) {
       // 1. 判断当前节点是否是某个集群中的TokenServer角色
       if (groupList.stream().anyMatch(this::machineEqual)) {
         return ClusterStateManager.CLUSTER_SERVER;
       }
       // 2. 判断当前节点是否是某个集群中的节点,如果是的话,成为TokenClient,否则标记为没开启
       boolean canBeClient = groupList.stream()
         .flatMap(e -> e.getClientSet().stream())
         .filter(Objects::nonNull)
         .anyMatch(e -> e.equals(getCurrentMachineId()));
       return canBeClient ? ClusterStateManager.CLUSTER_CLIENT : ClusterStateManager.CLUSTER_NOT_STARTED;
    }
    

2、独立TokenServer

独立TokenServer.png 独立TokenServer即启动单独一个进程负责为各应用实例颁发Token。

下面是官方的demo:

public class ClusterServerDemo {

    public static void main(String[] args) throws Exception {
        // 创建TokenServer
        ClusterTokenServer tokenServer = new SentinelDefaultTokenServer();

        // 加载TokenServer网络通讯配置
        ClusterServerConfigManager.loadGlobalTransportConfig(new ServerTransportConfig()
            .setIdleSeconds(600)
            .setPort(11111));
        // 加载NameSpace --- 区分集群
        ClusterServerConfigManager.loadServerNamespaceSet(Collections.singleton(DemoConstants.APP_NAME));

        // 启动TokenServer
        tokenServer.start();
    }
}

public class DemoClusterServerInitFunc implements InitFunc {

    private final String remoteAddress = "localhost:8848";
    private final String groupId = "SENTINEL_GROUP";
    private final String namespaceSetDataId = "cluster-server-namespace-set";
    private final String serverTransportDataId = "cluster-server-transport-config";

    @Override
    public void init() throws Exception {
        // 动态集群流控规则配置
        ClusterFlowRuleManager.setPropertySupplier(namespace -> {
            ReadableDataSource<String, List<FlowRule>> ds = new NacosDataSource<>(remoteAddress, groupId,
                namespace + DemoConstants.FLOW_POSTFIX,
                source -> JSON.parseObject(source, new TypeReference<List<FlowRule>>() {}));
            return ds.getProperty();
        });

        // 动态namespace配置
        ReadableDataSource<String, Set<String>> namespaceDs = new NacosDataSource<>(remoteAddress, groupId,
            namespaceSetDataId, source -> JSON.parseObject(source, new TypeReference<Set<String>>() {}));
        ClusterServerConfigManager.registerNamespaceSetProperty(namespaceDs.getProperty());
        // 动态网络通讯配置
        ReadableDataSource<String, ServerTransportConfig> transportConfigDs = new NacosDataSource<>(remoteAddress,
            groupId, serverTransportDataId,
            source -> JSON.parseObject(source, new TypeReference<ServerTransportConfig>() {}));
        ClusterServerConfigManager.registerServerTransportProperty(transportConfigDs.getProperty());
    }
}

独立TokenServer需要加载以下配置:

  1. 集群流控规则:同嵌入式TokenServer,一般用单机流控规则做转换;
  2. namespace配置:独立TokenServer可以为多个集群颁发Token,不同集群通过namespace区分;
  3. 网络通讯配置:同嵌入式TokenServer;

三、ClusterFlowConfig

集群流控规则.png

集群流控规则配置,需要在FlowRule中开启clusterMode=true代表启用集群流控,本地流控规则很多特性都无效。

public class FlowRule extends AbstractRule {
    // 阈值类型 0-线程数 1-QPS(默认)--- 无效,只有QPS
    private int grade = RuleConstant.FLOW_GRADE_QPS;
    // 阈值
    private double count;
    // 流控模式 0-直接 1-关联 2-链路 --- 无效
    private int strategy = RuleConstant.STRATEGY_DIRECT;
    // 引用资源 --- 无效
    private String refResource;
    // 流控效果 0-快速失败 1-Warm up 2-排队等待 --- 无效
    private int controlBehavior = RuleConstant.CONTROL_BEHAVIOR_DEFAULT;
    // 预热时长(s) --- 无效
    private int warmUpPeriodSec = 10;
    // 排队等待时长(ms)--- 无效
    private int maxQueueingTimeMs = 500;
    // 是否集群流控
    private boolean clusterMode;
    // 集群流控配置
    private ClusterFlowConfig clusterConfig;
     // 流量整形控制器 --- 无效
    private TrafficShapingController controller;
}

集群流控规则配置ClusterFlowConfig,中间只有部分配置生效。

public class ClusterFlowConfig {
    // 全局唯一id,在一个集群内部,同一个集群流控规则对应一个id
    private Long flowId;
    // 集群阈值模式:0-单机均摊 1-总体阈值
    private int thresholdType = ClusterRuleConstant.FLOW_THRESHOLD_AVG_LOCAL;
    // TokenServer不可用时,是否降级使用本地流控 默认true
    private boolean fallbackToLocalWhenFail = true;
    // 策略 只有默认 --- 无效
    private int strategy = ClusterRuleConstant.FLOW_CLUSTER_STRATEGY_NORMAL;
    // 采样数量,默认10
    private int sampleCount = ClusterRuleConstant.DEFAULT_CLUSTER_SAMPLE_COUNT;
    // 窗口大小1s
    private int windowIntervalMs = RuleConstant.DEFAULT_WINDOW_INTERVAL_MS;
    // 持有token的超时时间 --- 无效
    private long resourceTimeout = 2000;
    // 持有token超时后,处理策略 0-忽略 1-释放 默认0 --- 无效
    private int resourceTimeoutStrategy = RuleConstant.DEFAULT_RESOURCE_TIMEOUT_STRATEGY;
    // 如果prioritized=true的情况下,处理策略 --- 无效
    private int acquireRefuseStrategy = RuleConstant.DEFAULT_BLOCK_STRATEGY;
    // TokenClient下线时间 2s,如果下线,删除所有当前节点持有的token ---无效
    private long clientOfflineTime = 2000;
}

四、FlowSlot

之前已经看过流控规则FlowSlot的本地规则校验逻辑,从这里入手,看一下集群流控与本地流控的区别。

// FlowRuleChecker.java
public boolean canPassCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                                boolean prioritized) {
    String limitApp = rule.getLimitApp();
    if (limitApp == null) {
        return true;
    }
    // 集群流控
    if (rule.isClusterMode()) {
        return passClusterCheck(rule, context, node, acquireCount, prioritized);
    }
    // 单机流控
    return passLocalCheck(rule, context, node, acquireCount, prioritized);
}

集群流控的客户端流程非常简单:

  1. 获取发放Token服务TokenService;
  2. 调用TokenService获取Token;
  3. 处理获取Token结果;
// FlowRuleChecker.java
private static boolean passClusterCheck(FlowRule rule, Context context, DefaultNode node, int acquireCount,
                                        boolean prioritized) {
    try {
        // 选择发放token的服务
        TokenService clusterService = pickClusterService();
        if (clusterService == null) {
            return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
        }
        // 获取token
        long flowId = rule.getClusterConfig().getFlowId();
        TokenResult result = clusterService.requestToken(flowId, acquireCount, prioritized);
        // 处理获取token的结果
        return applyTokenResult(result, rule, context, node, acquireCount, prioritized);
    } catch (Throwable ex) {
        RecordLog.warn("[FlowRuleChecker] Request cluster token unexpected failed", ex);
    }
    // 降级使用本地流控
    return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
}

TokenService总得来说就暴露了一个方法,就是获取token。

public interface TokenService {

    // 获取流控规则token
    TokenResult requestToken(Long ruleId, int acquireCount, boolean prioritized);

    // 获取热点参数流控规则token
    TokenResult requestParamToken(Long ruleId, int acquireCount, Collection<Object> params);
}

获取TokenService

首先pickClusterService得到TokenService,如果当前节点状态是TokenClient,返回ClusterTokenClient实例;如果当前节点状态是TokenServer,返回EmbeddedClusterTokenServer实例,这只有嵌入式TokenServer才会有这种情况。

// FlowRuleChecker.java
private static TokenService pickClusterService() {
    if (ClusterStateManager.isClient()) {
        return TokenClientProvider.getClient();
    }
    if (ClusterStateManager.isServer()) {
        return EmbeddedClusterTokenServerProvider.getServer();
    }
    return null;
}

获取Token

第二步,TokenService.requestToken获取token。

如果当前节点是TokenClient,组装集群流控规则id(FlowRule.ClusterFlowConfig.flowId)、需要获取token数量acquireCount、是否优先prioritized,请求TokenServer。

// DefaultClusterTokenClient.java
@Override
public TokenResult requestToken(Long flowId, int acquireCount, boolean prioritized) {
    if (notValidRequest(flowId, acquireCount)) {
        return badRequest();
    }
    FlowRequestData data = new FlowRequestData().setCount(acquireCount)
        .setFlowId(flowId).setPriority(prioritized);
    ClusterRequest<FlowRequestData> request = new ClusterRequest<>(ClusterConstants.MSG_TYPE_FLOW, data);
    try {
        TokenResult result = sendTokenRequest(request);
        logForResult(result);
        return result;
    } catch (Exception ex) {
        ClusterClientStatLogUtil.log(ex.getMessage());
        return new TokenResult(TokenResultStatus.FAIL);
    }
}

如果当前节点是嵌入式TokenServer,本身即为TokenServer,DefaultEmbeddedTokenServer实例委托DefaultTokenService获取Token;

此外,如果当前节点是独立TokenServer,接收到TokenClient的FlowRequestData,也会走DefaultTokenService获取Token。

public class DefaultEmbeddedTokenServer implements EmbeddedClusterTokenServer {

    private final TokenService tokenService = TokenServiceProvider.getService();

    @Override
    public TokenResult requestToken(Long ruleId, int acquireCount, boolean prioritized) {
        if (tokenService != null) {
            return tokenService.requestToken(ruleId, acquireCount, prioritized);
        }
        return new TokenResult(TokenResultStatus.FAIL);
    }
}

DefaultTokenService的requestToken方法,是集群流控核心逻辑

如果请求参数非法,返回BAD_REQUEST

根据集群流控规则id,获取流控规则,如果没找到会返回NO_RULE_EXISTS

@Spi(isDefault = true)
public class DefaultTokenService implements TokenService {

    @Override
    public TokenResult requestToken(Long ruleId, int acquireCount, boolean prioritized) {
        if (notValidRequest(ruleId, acquireCount)) {
            return badRequest(); // BAD_REQUEST
        }
        // The rule should be valid.
        FlowRule rule = ClusterFlowRuleManager.getFlowRuleById(ruleId);
        if (rule == null) {
            return new TokenResult(TokenResultStatus.NO_RULE_EXISTS);
        }
        return ClusterFlowChecker.acquireClusterToken(rule, acquireCount, prioritized);
    }

在流控规则存在的情况下,进入DefaultTokenService的acquireClusterToken方法。

// DefaultTokenService.java
static TokenResult acquireClusterToken(FlowRule rule, int acquireCount, boolean prioritized) {
    Long id = rule.getClusterConfig().getFlowId();
    // 一个全局漏桶,控制namespace级别下请求TokenServer的QPS限制,默认3万QPS
    if (!allowProceed(id)) {
        return new TokenResult(TokenResultStatus.TOO_MANY_REQUEST);
    }
    // 集群流量统计指标
    ClusterMetric metric = ClusterMetricStatistics.getMetric(id);
    if (metric == null) {
        return new TokenResult(TokenResultStatus.FAIL);
    }
    // qps
    double latestQps = metric.getAvg(ClusterFlowEvent.PASS);
    // 阈值 = 配置阈值 * 系数
    double globalThreshold = calcGlobalThreshold(rule) * ClusterServerConfigManager.getExceedCount();
    // 剩余token数量 = 阈值 - qps - 需求token数量
    double nextRemaining = globalThreshold - latestQps - acquireCount;
    if (nextRemaining >= 0) {
        // 统计通过流量并返回OK
        // TODO: checking logic and metric operation should be separated.
        metric.add(ClusterFlowEvent.PASS, acquireCount);
        metric.add(ClusterFlowEvent.PASS_REQUEST, 1);
        if (prioritized) {
            metric.add(ClusterFlowEvent.OCCUPIED_PASS, acquireCount);
        }
        return new TokenResult(TokenResultStatus.OK)
            .setRemaining((int) nextRemaining)
            .setWaitInMs(0);
    } else {
        // 如果prioritized=true,尝试获取下一个窗口的token,返回SHOULD_WAIT和等待时长waitInMs
        if (prioritized) {
            double occupyAvg = metric.getAvg(ClusterFlowEvent.WAITING);
            if (occupyAvg <= ClusterServerConfigManager.getMaxOccupyRatio() * globalThreshold) {
                int waitInMs = metric.tryOccupyNext(ClusterFlowEvent.PASS, acquireCount, globalThreshold);
                if (waitInMs > 0) {
                    ClusterServerStatLogUtil.log("flow|waiting|" + id);
                    return new TokenResult(TokenResultStatus.SHOULD_WAIT)
                        .setRemaining(0)
                        .setWaitInMs(waitInMs);
                }
            }
        }
        // 其他情况统计失败流量,返回BLOCKED
        metric.add(ClusterFlowEvent.BLOCK, acquireCount);
        metric.add(ClusterFlowEvent.BLOCK_REQUEST, 1);
        ClusterServerStatLogUtil.log("flow|block|" + id, acquireCount);
        ClusterServerStatLogUtil.log("flow|block_request|" + id, 1);
        if (prioritized) {
            metric.add(ClusterFlowEvent.OCCUPIED_BLOCK, acquireCount);
            ClusterServerStatLogUtil.log("flow|occupied_block|" + id, 1);
        }
        return blockedResult();
    }
}

集群限流校验.png

首先,TokenServer对于获取Token接口有全局限流,如果全局限流返回TOO_MANY_REQUEST

根据集群流控规则找到所属namespace,每个namespace有一个限流器RequestLimiter,用于控制一个namespace请求TokenServer的速率。默认使用default命名空间,限流阈值是3万。

// ClusterFlowChecker.java
static boolean allowProceed(long flowId) {
    String namespace = ClusterFlowRuleManager.getNamespace(flowId);
    return GlobalRequestLimiter.tryPass(namespace);
}
// GlobalRequestLimiter.java
 public static boolean tryPass(String namespace) {
   if (namespace == null) {
     return false;
   }
   RequestLimiter limiter = GLOBAL_QPS_LIMITER_MAP.get(namespace);
   if (limiter == null) {
     return true;
   }
   return limiter.tryPass();
 }

第二步,计算剩余token数量 = 阈值 - qps - 需求token数量。如果阈值类型是总体阈值,直接返回配置的阈值;如果阈值类型是单机均摊,阈值 = 配置阈值 * 使用当前集群流控规则的客户端长连接数量

// ClusterFlowChecker.java
private static double calcGlobalThreshold(FlowRule rule) {
    double count = rule.getCount();
    switch (rule.getClusterConfig().getThresholdType()) {
        case ClusterRuleConstant.FLOW_THRESHOLD_GLOBAL:
            return count;
        case ClusterRuleConstant.FLOW_THRESHOLD_AVG_LOCAL:
        default:
            int connectedCount = ClusterFlowRuleManager.getConnectedCount(rule.getClusterConfig().getFlowId());
            return count * connectedCount;
    }
}

第三步,如果剩余token数量大于等于0,表示获取token成功,统计成功请求数量,返回OK

第四步,如果剩余token数量小于0,表示没有获取token成功,处理方式和本地流控类似。

如果prioritized=true,尝试将本次请求算到下一个时间窗口,给客户端返回SHOULD_WAIT状态,并返回等待时长,默认情况下,要求当前等待数量ClusterFlowEvent.WAITING不能超过阈值。

如果prioritized=false 或 prioritized=true但尝试等待失败,返回客户端BLOCK状态

处理TokenResult

对于TokenServer或自身(本身是嵌入式TokenServer)返回的TokenResult,根据不同的状态码做不同的处理。

  • OK:返回true;
  • SHOULD_WAIT:代表prioritized=true获取Token成功,但是要睡眠到下一个时间窗口,返回true;
  • NO_RULE_EXISTS、BAD_REQUEST、FAIL、TOO_MANY_REQUEST:这几种情况下,允许集群流控规则降级为本地流控规则(fallbackToLocalWhenFail=true);
  • BLOCKED:当前资源被集群流控,返回false;
// ClusterFlowChecker.java
private static boolean applyTokenResult(TokenResult result, FlowRule rule, Context context,
                                                         DefaultNode node,
                                                         int acquireCount, boolean prioritized) {
  switch (result.getStatus()) {
    case TokenResultStatus.OK: // 集群流控校验通过
      return true;
    case TokenResultStatus.SHOULD_WAIT: // prioritized=true的情况下,睡眠x毫秒,占用下一个时间窗口
      // Wait for next tick.
      try {
        Thread.sleep(result.getWaitInMs());
      } catch (InterruptedException e) {
        e.printStackTrace();
      }
      return true;
    case TokenResultStatus.NO_RULE_EXISTS: // 集群流控规则不存在
    case TokenResultStatus.BAD_REQUEST: // 非法请求参数
    case TokenResultStatus.FAIL: // 集群流量指标不存在
    case TokenResultStatus.TOO_MANY_REQUEST: // TokenServer限流,拒绝访问
      return fallbackToLocalOrPass(rule, context, node, acquireCount, prioritized);
    case TokenResultStatus.BLOCKED: // 集群流控生效
    default:
      return false;
  }
}

五、网络通讯

上面已经过完了所有集群流控的逻辑,非常简单。

下面过一下TokenClient与TokenServer的底层网络通讯方式。

1、TokenServer

ClusterTokenServer.png

ClusterTokenServer是TokenServer的抽象接口,提供启动和停止两个抽象方法。

public interface ClusterTokenServer {
    void start() throws Exception;
    void stop() throws Exception;
}

NettyTransportServer真正实现底层网络通讯。

public class NettyTransportServer implements ClusterTokenServer {
    // 启动端口
    private final int port;
    // Netty acceptor eventloop
    private NioEventLoopGroup bossGroup;
    // Netty io+业务线程池
    private NioEventLoopGroup workerGroup;
    // 客户端连接池
    private final ConnectionPool connectionPool = new ConnectionPool();
    // 服务端启动状态
    private final AtomicInteger currentState = new AtomicInteger(SERVER_STATUS_OFF);
    // 启动失败次数
    private final AtomicInteger failedTimes = new AtomicInteger(0);
    public NettyTransportServer(int port) {
        this.port = port;
    }

    @Override
    public void start() {
        if (!currentState.compareAndSet(SERVER_STATUS_OFF, SERVER_STATUS_STARTING)) {
            return;
        }

        ServerBootstrap b = new ServerBootstrap();
        this.bossGroup = new NioEventLoopGroup(1);
        // 2*核数
        this.workerGroup = new NioEventLoopGroup(DEFAULT_EVENT_LOOP_THREADS);
        b.group(bossGroup, workerGroup)
            .channel(NioServerSocketChannel.class)
            .option(ChannelOption.SO_BACKLOG, 128)
            .handler(new LoggingHandler(LogLevel.INFO))
            .childHandler(new ChannelInitializer<SocketChannel>() {
                @Override
                public void initChannel(SocketChannel ch) throws Exception {
                    ChannelPipeline p = ch.pipeline();
                    // 长度解码
                    p.addLast(new LengthFieldBasedFrameDecoder(1024, 0, 2, 0, 2));
                    // 业务解码
                    p.addLast(new NettyRequestDecoder());
                    // 长度编码
                    p.addLast(new LengthFieldPrepender(2));
                    // 业务编码
                    p.addLast(new NettyResponseEncoder());
                    // 业务处理器
                    p.addLast(new TokenServerHandler(connectionPool));
                }
            })
            .childOption(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
            .childOption(ChannelOption.SO_SNDBUF, 32 * 1024)
            .childOption(ChannelOption.CONNECT_TIMEOUT_MILLIS, 10000)
            .childOption(ChannelOption.SO_TIMEOUT, 10)
            .childOption(ChannelOption.TCP_NODELAY, true)
            .childOption(ChannelOption.SO_RCVBUF, 32 * 1024);
        b.bind(port).addListener(new GenericFutureListener<ChannelFuture>() {
            @Override
            public void operationComplete(ChannelFuture future) {
                if (future.cause() != null) {
                    // 启动失败重试
                    RecordLog.info("[NettyTransportServer] Token server start failed (port=" + port + "), failedTimes: " + failedTimes.get(),
                        future.cause());
                    currentState.compareAndSet(SERVER_STATUS_STARTING, SERVER_STATUS_OFF);
                    int failCount = failedTimes.incrementAndGet();
                    if (failCount > MAX_RETRY_TIMES) { // 最多重试3次
                        return;
                    }

                    try {
                        Thread.sleep(failCount * RETRY_SLEEP_MS); // 间隔2s重试
                        start();
                    } catch (Throwable e) {
                        RecordLog.info("[NettyTransportServer] Failed to start token server when retrying", e);
                    }
                } else {
                    currentState.compareAndSet(SERVER_STATUS_STARTING, SERVER_STATUS_STARTED);
                }
            }
        });
    }

重点关注几个点:

  1. 线程模型:TokenServer使用Netty作为底层通讯框架,boss线程组有1个线程负责建立客户端连接;worker线程组有核数*2个线程,负责包括io读写和所有业务处理;
  2. 编码解码:TokenServer使用定长编解码的方式(LengthFieldBasedFrameDecoder、LengthFieldPrepender),解决了粘包拆包问题。同时也提供了自己的业务编解码器(NettyRequestDecoder、NettyResponseEncoder);
  3. 启动失败重试:TokenServer支持启动失败重试,最多重试3次,每次间隔2秒;

对于嵌入式TokenServer,实现类是DefaultEmbeddedTokenServer。底层委托SentinelDefaultTokenServer处理start和stop逻辑。当ClusterStateManager管理的当前节点状态变为TokenServer时,会启动DefaultEmbeddedTokenServer。

public class DefaultEmbeddedTokenServer implements EmbeddedClusterTokenServer {

    private final ClusterTokenServer server = new SentinelDefaultTokenServer(true);

    @Override
    public void start() throws Exception {
        server.start();
    }

    @Override
    public void stop() throws Exception {
        server.stop();
    }
}

对于独立TokenServer,参考官方ClusterServerDemo,直接使用SentinelDefaultTokenServer启动。

public class ClusterServerDemo {
    public static void main(String[] args) throws Exception {
        // 创建TokenServer
        ClusterTokenServer tokenServer = new SentinelDefaultTokenServer();
        // ...
        tokenServer.start();
    }
}

SentinelDefaultTokenServer底层还是委托NettyTransportServer实现start和stop方法。

public class SentinelDefaultTokenServer implements ClusterTokenServer {
        // 是否嵌入
    private final boolean embedded;
        // NettyTransportServer实例
    private ClusterTokenServer server;

     public SentinelDefaultTokenServer() {
        this(false);
    }

    public SentinelDefaultTokenServer(boolean embedded) {
        this.embedded = embedded;
        ClusterServerConfigManager.addTransportConfigChangeObserver(new ServerTransportConfigObserver() {
            @Override
            public void onTransportConfigChange(ServerTransportConfig config) {
                changeServerConfig(config);
            }
        });
        initNewServer();
    }

    private void initNewServer() {
        if (server != null) {
            return;
        }
        int port = ClusterServerConfigManager.getPort();
        if (port > 0) {
            this.server = new NettyTransportServer(port);
            this.port = port;
        }
    }
}

2、TokenClient

DefaultClusterTokenClient在构造时,创建了ClusterTransportClient负责TokenClient底层网络通讯。

public class DefaultClusterTokenClient implements ClusterTokenClient {

    private ClusterTransportClient transportClient;
    private TokenServerDescriptor serverDescriptor;

    private final AtomicBoolean shouldStart = new AtomicBoolean(false);

    public DefaultClusterTokenClient() {
        ClusterClientConfigManager.addServerChangeObserver(new ServerChangeObserver() {
            @Override
            public void onRemoteServerChange(ClusterClientAssignConfig assignConfig) {
                changeServer(assignConfig);
            }
        });
        initNewConnection();
    }

    private void initNewConnection() {
        if (transportClient != null) {
            return;
        }
        String host = ClusterClientConfigManager.getServerHost();
        int port = ClusterClientConfigManager.getServerPort();
        if (StringUtil.isBlank(host) || port <= 0) {
            return;
        }

        try {
            this.transportClient = new NettyTransportClient(host, port);
            this.serverDescriptor = new TokenServerDescriptor(host, port);
        } catch (Exception ex) {
            RecordLog.warn("[DefaultClusterTokenClient] Failed to initialize new token client", ex);
        }
    }
}

ClusterStateManager管理的当前节点状态变为TokenClient时,会启动DefaultClusterTokenClient。

// ClusterStateManager.java
public static boolean setToClient() {
    if (mode == CLUSTER_CLIENT) {
        return true;
    }
    mode = CLUSTER_CLIENT;
    sleepIfNeeded();
    lastModified = TimeUtil.currentTimeMillis();
    return startClient();
}

private static boolean startClient() {
    try {
        // 如果之前当前节点是嵌入式TokenServer,先停止
        EmbeddedClusterTokenServer server = EmbeddedClusterTokenServerProvider.getServer();
        if (server != null) {
            server.stop();
        }
        // 启动TokenClient
        ClusterTokenClient tokenClient = TokenClientProvider.getClient();
        if (tokenClient != null) {
            tokenClient.start();
            return true;
        } else {
            return false;
        }
    } catch (Exception ex) {
        return false;
    }
}

// DefaultClusterTokenClient.java
public void start() throws Exception {
  if (shouldStart.compareAndSet(false, true)) {
    startClientIfScheduled();
  }
}
private void startClientIfScheduled() throws Exception {
  if (shouldStart.get()) {
    if (transportClient != null) {
      transportClient.start();
    } else {
       // ...
    }
  }
}

DefaultClusterTokenClient底层启动了NettyTransportClient

NettyTransportClient通过initClientBootstrap创建了Bootstrap,connect建立与TokenServer的连接。TokenClient用一个线程负责io读写和业务处理。

// NettyTransportClient.java
private Bootstrap initClientBootstrap() {
    Bootstrap b = new Bootstrap();
    // 1个线程
    eventLoopGroup = new NioEventLoopGroup();
    b.group(eventLoopGroup)
        .channel(NioSocketChannel.class)
        .option(ChannelOption.TCP_NODELAY, true)
        .option(ChannelOption.ALLOCATOR, PooledByteBufAllocator.DEFAULT)
        .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, ClusterClientConfigManager.getConnectTimeout())
        .handler(new ChannelInitializer<SocketChannel>() {
            @Override
            public void initChannel(SocketChannel ch) throws Exception {
                clientHandler = new TokenClientHandler(currentState, disconnectCallback);

                ChannelPipeline pipeline = ch.pipeline();
                pipeline.addLast(new LengthFieldBasedFrameDecoder(1024, 0, 2, 0, 2));
                pipeline.addLast(new NettyResponseDecoder());
                pipeline.addLast(new LengthFieldPrepender(2));
                pipeline.addLast(new NettyRequestEncoder());
                pipeline.addLast(clientHandler);
            }
        });

    return b;
}

private void connect(Bootstrap b) {
    if (currentState.compareAndSet(ClientConstants.CLIENT_STATUS_OFF, ClientConstants.CLIENT_STATUS_PENDING)) {
        b.connect(host, port)
            .addListener(new GenericFutureListener<ChannelFuture>() {
            @Override
            public void operationComplete(ChannelFuture future) {
                if (future.cause() != null) {
                    failConnectedTime.incrementAndGet();
                    channel = null;
                } else {
                    failConnectedTime.set(0);
                    channel = future.channel();
                }
            }
        });
    }
}

总结

集群流控出现的目的是

  1. 集群流量由统一节点统计,并根据集群情况给集群节点分发token;
  2. 解决在同一应用的不同节点间流量分布不均匀,导致本地流控效果不佳的问题;

集群流控中有两种角色

  1. Token Client:集群流控客户端,用于向所属 Token Server 通信请求 token。集群限流服务端会返回给客户端结果,决定是否限流;
  2. Token Server:即集群流控服务端,处理来自 Token Client 的请求,根据配置的集群规则判断是否应该发放 token(是否允许通过);

TokenServer有两种启动方式

  1. 嵌入TokenServer:TokenServer与业务应用在同一个进程中。业务集群中每个节点是对等的,每个节点都可能成为TokenServer,也可能成为TokenClient,具体是什么角色,取决于外部数据源(如Nacos);
  2. 独立TokenServer:单独进程启动,与业务应用无关;

集群流控配置

  1. 集群流控配置存在于FlowRule内部,当设置clusterMode=true时生效;

  2. ClusterFlowConfig集群流控配置中重要的属性包括:

    1. flowId:集群流控规则唯一id;
    2. thresholdType:集群阈值模式:0-单机均摊 1-总体阈值;单机均摊在做集群流控规则校验时,会将FlowRule中的阈值乘以集群中节点数量作为最终的校验阈值;总体阈值即为FlowRule中的阈值;
    3. fallbackToLocalWhenFail:当TokenServer返回TokenResult的状态码是NO_RULE_EXISTS、BAD_REQUEST、FAIL、TOO_MANY_REQUEST其中之一时,是否允许降级使用本地流控规则;

FlowSlot校验集群流控规则分为三步

  1. 获取TokenService:如果当前节点是嵌入式TokenServer,直接获取本地EmbeddedClusterTokenServer,不用走远程调用;如果当前节点是TokenClient,获取ClusterTokenClient实例,需要远程调用TokenServer获取Token;

  2. 获取Token:DefaultTokenService的requestToken方法是获取Token的核心方法。

    集群限流校验.png

  3. 处理TokenResult:根据TokenResult中的状态码不同,客户端执行流控效果。

    • OK:返回true;
    • SHOULD_WAIT:代表prioritized=true获取Token成功,但是要睡眠到下一个时间窗口,返回true;
    • NO_RULE_EXISTS、BAD_REQUEST、FAIL、TOO_MANY_REQUEST:这几种情况下,允许集群流控规则降级为本地流控规则(fallbackToLocalWhenFail=true);
    • BLOCKED:当前资源被集群流控,返回false;

网络通讯

TokenServer和TokenClient使用Netty作为底层网络通讯框架,客户端和服务端保持长连接。

线程模型上,TokenServer启动1个boss线程负责与客户端建立连接,2*核数个worker线程负责io读写和业务处理;TokenClient启动1个线程处理所有业务。

编解码方面,利用定长编解码器解决粘包拆包问题。