Xxl-job调度中心原理

753 阅读7分钟

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第5天,点击查看活动详情

Xxl-job的架构思想

image.png

Xxl-job组成结构
  • 执行器
  • 调度中心

调度中心原理

调度中心承担着以下职责:

  • 执行器列表维护及心跳检测
  • 任务管理.
  • 通知执行器任务执行.失败重试
  • 执行日志记录
  • 执行日志报表.

根据上述职责调度中心需要完成以下功能:

  • 维护定时任务.
  • 定时触发任务.
  • 与执行器进行通信.
  • 历史日志清理.
任务触发执行.

调度中心的核心逻辑由XxlJobScheduler实现.那我们就进入代码的世界来一探究竟:

public void init() throws Exception {
    // init i18n  国际化
    initI18n();

    // admin registry monitor run 处理执行器心跳超时的节点进行下线
    JobRegistryMonitorHelper.getInstance().start();

    // admin fail-monitor run 失败重试+ 邮件告警
    JobFailMonitorHelper.getInstance().start();

    // admin lose-monitor run  任务结果丢失处理 任务执行超过指定时间 并且执行器下线
    JobLosedMonitorHelper.getInstance().start();

    // admin trigger pool start 初始化任务执行线程池
    // 初始化 fast 和slow 快慢线程池
    JobTriggerPoolHelper.toStart();

    // admin log report start 数据统计
    JobLogReportHelper.getInstance().start();

    // start-schedule
    JobScheduleHelper.getInstance().start();

    logger.info(">>>>>>>>> init xxl-job admin success.");
}
  1. 国际化资源初始化.
  2. 下线执行器心跳超时的节点(90s)
  3. 失败重试+ 邮件告警
  4. 任务结果丢失处理.
  5. 初始化fast 和slow 快慢线程池
  6. 报表统计
  7. 触发定时任务执行通知
心跳检测

执行器与调度器间存在心跳检测,这么做的目的是为了避免任务由下线的执行器节点进行;执行器在启动后会有一个单独的线程每隔30s向调度中心集群的某一个地址进行注册,只要有一个成功,那么后续集群地址就不在触发.

JobRegistryMonitorHelper

public void start(){
   registryThread = new Thread(new Runnable() {
      @Override
      public void run() {
         while (!toStop) {
            try {
               // auto registry group  自动注册的group
               List<XxlJobGroup> groupList = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().findByAddressType(0);
               if (groupList!=null && !groupList.isEmpty()) {

                  // remove dead address (admin/executor)  90秒未心跳的数据
                  List<Integer> ids = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findDead(RegistryConfig.DEAD_TIMEOUT, new Date());
                  if (ids!=null && ids.size()>0) {
                     XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().removeDead(ids);
                  }

                  // fresh online address (admin/executor) 查找存在的数据
                  // appName 地址列表
                  HashMap<String, List<String>> appAddressMap = new HashMap<String, List<String>>();
                  List<XxlJobRegistry> list = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findAll(RegistryConfig.DEAD_TIMEOUT, new Date());
                  if (list != null) {
                     for (XxlJobRegistry item: list) {
                        if (RegistryConfig.RegistType.EXECUTOR.name().equals(item.getRegistryGroup())) {
                           String appname = item.getRegistryKey();
                           List<String> registryList = appAddressMap.get(appname);
                           if (registryList == null) {
                              registryList = new ArrayList<String>();
                           }

                           if (!registryList.contains(item.getRegistryValue())) {
                              registryList.add(item.getRegistryValue());
                           }
                           appAddressMap.put(appname, registryList);
                        }
                     }
                  }

                  // fresh group address
                  for (XxlJobGroup group: groupList) {
                     List<String> registryList = appAddressMap.get(group.getAppname());
                     String addressListStr = null;
                     if (registryList!=null && !registryList.isEmpty()) {
                        Collections.sort(registryList);
                        addressListStr = "";
                        for (String item:registryList) {
                           addressListStr += item + ",";
                        }
                        addressListStr = addressListStr.substring(0, addressListStr.length()-1);
                     }
                     group.setAddressList(addressListStr);
                     // 更新地址列表
                     XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().update(group);
                  }
               }
            } catch (Exception e) {
               if (!toStop) {
                  logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
               }
            }
            try {
               TimeUnit.SECONDS.sleep(RegistryConfig.BEAT_TIMEOUT);
            } catch (InterruptedException e) {
               if (!toStop) {
                  logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
               }
            }
         }
         logger.info(">>>>>>>>>>> xxl-job, job registry monitor thread stop");
      }
   });
   registryThread.setDaemon(true);
   registryThread.setName("xxl-job, admin JobRegistryMonitorHelper");
   registryThread.start();
}

以上代码主要做了几件事情:

  1. 移除超过90s未续约的注册信息xxl_job_registry
  2. xxl_job_registry中心跳正常的地址信息更新到表xxl_job_group中的address_list(多个用逗号隔开)
  3. 每隔90s(心跳超时时间)进行循环执行
失败重试 JobFailMonitorHelper.getInstance().start();

在触发任务执行时会将任务的执行重试次数绑定到日志xxl_job_log中的executor_fail_retry_count字段上. 对于执行失败的任务进行邮件告警.

public void start(){
   monitorThread = new Thread(new Runnable() {

      @Override
      public void run() {

         // monitor
         while (!toStop) {
            try {

               List<Long> failLogIds = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().findFailJobLogIds(1000);
               if (failLogIds!=null && !failLogIds.isEmpty()) {
                  for (long failLogId: failLogIds) {

                     // lock log
                     int lockRet = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, 0, -1);
                     if (lockRet < 1) {
                        continue;
                     }
                     XxlJobLog log = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().load(failLogId);
                     XxlJobInfo info = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(log.getJobId());

                     // 1、fail retry monitor
                     if (log.getExecutorFailRetryCount() > 0) {
                        JobTriggerPoolHelper.trigger(log.getJobId(), TriggerTypeEnum.RETRY, (log.getExecutorFailRetryCount()-1), log.getExecutorShardingParam(), log.getExecutorParam(), null);
                        String retryMsg = "<br><br><span style="color:#F39C12;" > >>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_type_retry") +"<<<<<<<<<<< </span><br>";
                        log.setTriggerMsg(log.getTriggerMsg() + retryMsg);
                        XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(log);
                     }

                     // 2、fail alarm monitor
                     int newAlarmStatus = 0;       // 告警状态:0-默认、-1=锁定状态、1-无需告警、2-告警成功、3-告警失败
                     if (info!=null ) {
                        boolean alarmResult = XxlJobAdminConfig.getAdminConfig().getJobAlarmer().alarm(info, log);
                        newAlarmStatus = alarmResult?2:3;
                     } else {
                        newAlarmStatus = 1;
                     }

                     XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, -1, newAlarmStatus);
                  }
               }

            } catch (Exception e) {
               if (!toStop) {
                  logger.error(">>>>>>>>>>> xxl-job, job fail monitor thread error:{}", e);
               }
            }

                   try {
                       TimeUnit.SECONDS.sleep(10);
                   } catch (Exception e) {
                       if (!toStop) {
                           logger.error(e.getMessage(), e);
                       }
                   }

               }

         logger.info(">>>>>>>>>>> xxl-job, job fail monitor thread stop");

      }
   });
   monitorThread.setDaemon(true);
   monitorThread.setName("xxl-job, admin JobFailMonitorHelper");
   monitorThread.start();
}
执行结果丢失补偿 JobLosedMonitorHelper.getInstance().start();

假设触发某个节点执行任务时,该节点由于其他原因宕机了,导致无法获取执行结果;从而导致任务进度未知.此处功能是将调度记录停留在 "运行中" 状态超过10min,且对应执行器心跳注册失败不在线,则将本地调度主动标记失败;

快慢线程池初始化 JobTriggerPoolHelper.toStart();

Xxl-job中调度器触发执行器的任务执行时是通过线程池进行,由于线程池资源非常宝贵,不能由于某些耗时非常长的任务长时间占有线程池资源.此处对执行耗时的任务进行了快慢线程池划分.fastTriggerPool称为快线程池,slowTriggerPool称为慢线程池.

public void start(){
    fastTriggerPool = new ThreadPoolExecutor(
            10,
            XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax(),
            60L,
            TimeUnit.SECONDS,
            new LinkedBlockingQueue<Runnable>(1000),
            new ThreadFactory() {
                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-fastTriggerPool-" + r.hashCode());
                }
            });

    slowTriggerPool = new ThreadPoolExecutor(
            10,
            XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax(),
            60L,
            TimeUnit.SECONDS,
            new LinkedBlockingQueue<Runnable>(2000),
            new ThreadFactory() {
                @Override
                public Thread newThread(Runnable r) {
                    return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-slowTriggerPool-" + r.hashCode());
                }
            });
}

那么什么样的任务定义为慢任务呢?

// choose thread pool
ThreadPoolExecutor triggerPool_ = fastTriggerPool;
AtomicInteger jobTimeoutCount = jobTimeoutCountMap.get(jobId);
if (jobTimeoutCount!=null && jobTimeoutCount.get() > 10) {      // job-timeout 10 times in 1 min
    triggerPool_ = slowTriggerPool;
}

每1分钟进行一次统计,耗时超过500ms的即认定为慢任务;

// incr timeout-count-map
long cost = System.currentTimeMillis()-start;
if (cost > 500) {       // ob-timeout threshold 500ms
    AtomicInteger timeoutCount = jobTimeoutCountMap.putIfAbsent(jobId, new AtomicInteger(1));
    if (timeoutCount != null) {
        timeoutCount.incrementAndGet();
    }
}

此处的耗时并不代表具体任务执行,仅代码执行器收到某个任务要执行的通知耗时为准.因为执行器每个任务都对应一个单独的线程JobThread.

image.png

任务触发

对于任务执行而言,调度中心相当于client,执行器相当于具体服务;所以此处的ExecutorBiz为ExecutorBizClient

public static ReturnT<String> runExecutor(TriggerParam triggerParam, String address){
    ReturnT<String> runResult = null;
    try {
        ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
        runResult = executorBiz.run(triggerParam);
    } catch (Exception e) {
        logger.error(">>>>>>>>>>> xxl-job trigger error, please check if the executor[{}] is running.", address, e);
        runResult = new ReturnT<String>(ReturnT.FAIL_CODE, ThrowableUtil.toString(e));
    }

    StringBuffer runResultSB = new StringBuffer(I18nUtil.getString("jobconf_trigger_run") + ":");
    runResultSB.append("<br>address:").append(address);
    runResultSB.append("<br>code:").append(runResult.getCode());
    runResultSB.append("<br>msg:").append(runResult.getMsg());

    runResult.setMsg(runResultSB.toString());
    return runResult;
}

调度中心与执行器的通信

由于调度中心与执行器不在同一个jvm内,那么两者间必然存在通信; 对于通信方式额选择存在很多种:

  1. http
  2. tcp
  3. rpc

在Xxl-job版本演变过程有两种实现方式,分别为Xxl-rpc和RESTful API.在2.2.0版本以前采用的是Xxl-rpc实现方式,在2.2.0版本及以后采用语言无关的RESTful API.

Xxl-rpc

一种基于tcp协议进行通信,序列化方式采用Hessian来实现的,提供稳定高性能的RPC远程服务调用功能,简化分布式服务通讯开发。

架构图

image.png

涉及的几个角色:

  1. provider:服务提供方;
  2. invoker:服务消费方;
  3. serializer: 序列化模块;
  4. remoting:服务通讯模块;
  5. registry:服务注册中心;
  6. admin:服务治理、监控中心:管理服务节点信息,统计服务调用次数、QPS和健康情况;

XXL-RPC采用NIO进行底层通讯,但是NIO是异步通讯模型,调用线程并不会阻塞获取调用结果,因此,XXL-RPC实现了在异步通讯模型上的同步调用,即“sync-over-async”,实现原理如下,可参考上图进行理解:

  • 1、每次请求会生成一个唯一的RequestId和一个RpcResponse,托管到请求池中。
  • 2、调度线程,执行RpcResponse的get方法阻塞获取本次请求结果;
  • 3、然后,底层通过NIO方式发起调用,provider异步响应请求结果,然后根据RequestId寻找到本次调用的RpcResponse,设置响应结果后唤醒调度线程。
  • 4、调度线程被唤醒,返回异步响应的请求数据。

对于这种异步转同步的实现方案,我们再dubbo中已经见过多次了,这种实现方案对应Guarded Suspension模式.

Restful

对于Xxl-job而言只需要简短的通过Http协议进行网络请求调用执行器(执行器内部基于netty实现服务监听);

public static ReturnT postBody(String url, String accessToken, int timeout, Object requestObj, Class returnTargClassOfT) {
    HttpURLConnection connection = null;
    BufferedReader bufferedReader = null;
    try {
        // connection
        URL realUrl = new URL(url);
        connection = (HttpURLConnection) realUrl.openConnection();

        // trust-https
        boolean useHttps = url.startsWith("https");
        if (useHttps) {
            HttpsURLConnection https = (HttpsURLConnection) connection;
            trustAllHosts(https);
        }

        // connection setting
        connection.setRequestMethod("POST");
        connection.setDoOutput(true);
        connection.setDoInput(true);
        connection.setUseCaches(false);
        connection.setReadTimeout(timeout * 1000);
        connection.setConnectTimeout(3 * 1000);
        connection.setRequestProperty("connection", "Keep-Alive");
        connection.setRequestProperty("Content-Type", "application/json;charset=UTF-8");
        connection.setRequestProperty("Accept-Charset", "application/json;charset=UTF-8");

        if(accessToken!=null && accessToken.trim().length()>0){
            connection.setRequestProperty(XXL_JOB_ACCESS_TOKEN, accessToken);
        }

        // do connection
        connection.connect();

        // write requestBody
        if (requestObj != null) {
            String requestBody = GsonTool.toJson(requestObj);

            DataOutputStream dataOutputStream = new DataOutputStream(connection.getOutputStream());
            dataOutputStream.write(requestBody.getBytes("UTF-8"));
            dataOutputStream.flush();
            dataOutputStream.close();
        }

        /*byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
        connection.setRequestProperty("Content-Length", String.valueOf(requestBodyBytes.length));
        OutputStream outwritestream = connection.getOutputStream();
        outwritestream.write(requestBodyBytes);
        outwritestream.flush();
        outwritestream.close();*/

        // valid StatusCode
        int statusCode = connection.getResponseCode();
        if (statusCode != 200) {
            return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting fail, StatusCode("+ statusCode +") invalid. for url : " + url);
        }

        // result
        bufferedReader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
        StringBuilder result = new StringBuilder();
        String line;
        while ((line = bufferedReader.readLine()) != null) {
            result.append(line);
        }
        String resultJson = result.toString();

        // parse returnT
        try {
            ReturnT returnT = GsonTool.fromJson(resultJson, ReturnT.class, returnTargClassOfT);
            return returnT;
        } catch (Exception e) {
            logger.error("xxl-rpc remoting (url="+url+") response content invalid("+ resultJson +").", e);
            return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting (url="+url+") response content invalid("+ resultJson +").");
        }

    } catch (Exception e) {
        logger.error(e.getMessage(), e);
        return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting error("+ e.getMessage() +"), for url : " + url);
    } finally {
        try {
            if (bufferedReader != null) {
                bufferedReader.close();
            }
            if (connection != null) {
                connection.disconnect();
            }
        } catch (Exception e2) {
            logger.error(e2.getMessage(), e2);
        }
    }
}

上述代码比较简单,构建网络请求,并进行数据读写.

结语: 不积跬步无以至千里!