分布式任务调度框架XXL-JOB(三):深入理解XXL-JOB调度中心任务调度原理(原理篇)

2,548 阅读16分钟

在上篇文章中介绍了xxl-job的注册原理和调度中心管理注册信息的原理,接下来将介绍调度中心如何调度任务以及调度中心如何回调执行器的任务接口。

1. 调度中心调度任务

1.1 任务触发类型

调度中心是如何触发任务的呢?xxl-job定义了触发任务的几种来源

  • 手动触发: 通过管理界面手动触发

image.png

  • cron定时触发:通过配置的cron规则来后台触发,比如0 */1 * * * ?表示每分钟执行一次
  • 失败重试:如果任务调度失败需要重试时进行触发
  • 父节点触发:如果存在父子任务关联的,父节点触发时也会触发子节点的任务。管理界面配置任务是进行配置子节点id

image.png

  • Api触发:可以通过api触发
说明:触发任务执行
------
地址格式:{执行器内嵌服务跟地址}/run
HeaderXXL-JOB-ACCESS-TOKEN : {请求令牌}
请求数据格式如下,放置在 RequestBody 中,JSON格式:
    {
        "jobId":1,                                  // 任务ID
        "executorHandler":"demoJobHandler",         // 任务标识
        "executorParams":"demoJobHandler",          // 任务参数
        "executorBlockStrategy":"COVER_EARLY",      // 任务阻塞策略,可选值参考 com.xxl.job.core.enums.ExecutorBlockStrategyEnum
        "executorTimeout":0,                        // 任务超时时间,单位秒,大于零时生效
        "logId":1,                                  // 本次调度日志ID
        "logDateTime":1586629003729,                // 本次调度日志时间
        "glueType":"BEAN",                          // 任务模式,可选值参考 com.xxl.job.core.glue.GlueTypeEnum
        "glueSource":"xxx",                         // GLUE脚本代码
        "glueUpdatetime":1586629003727,             // GLUE脚本更新时间,用于判定脚本是否变更以及是否需要刷新
        "broadcastIndex":0,                         // 分片参数:当前分片
        "broadcastTotal":0                          // 分片参数:总分片
    }
响应数据格式:
    {
      "code": 200,      // 200 表示正常、其他失败
      "msg": null       // 错误提示消息
    }
  • 激活失败触发

1.2 时间轮

既然研究触发原理,就按照正常的cron表达式进行了解即可,其它的触发场景都是异常触发场景,在此我不进行过多阐述。

在旧版本的任务触发中,是基于Quartz来实现的,从V2.1版本替换成了时间轮方案。一方面是精简系统降低冗余复杂度,另一个方面是提高系统的稳定性。

那么时间轮算法流程是怎么样的呢?为什么从Quartz替换成了时间轮?带着这两个疑问我们先看看时间轮的数据结构:

图片转载于xxl-job任务触发流程

image.png

在java中,时间轮是以Map<Integer, List<Integer>>为数据结构的, 其中key是0-59的秒数,value是对应时间点需要执行的任务id列表。

每次扫描任务,会将该任务的下次执行秒数放到对应刻度的列表中,当时间到达对应刻度,则从对应的任列表中取出来触发,每个任务的执行可以通过线程池来实现资源的隔离。

时间轮这样的涉及思想是,提前先把即将要执行的的任务从msyql中取出来放到内存中,到达对应的执行时间从内存中取出来触发即可。从而避免因为任务过多,在进行调度时出现延时的情况。

  • 入轮条件:扫描任务触发时

    1. 本次任务执行完成,下次任务触发事件在5秒内,比如当前任务10:55:01执行完成,下次任务触发时间是10:55:02执行,那么会将该任务放到刻度为2对应的任务列表中。

    2. 本次任务未达到触发时间

  • 出轮条件:获取当前秒数,从时间轮中移除当前秒数前两个秒数刻度任务id列表,并进行触发。

    1. 比如当前时间为10:55:05 ,则从03和04两个刻度的任务列表中取出任务来进行触发

1.3 JobScheduleHelper

JobScheduleHelper 主要进行的是任务调度线程,其中有scheduleThread 调度线程和ringThread时间轮线程。以及时间轮的存储容器Map<Integer, List<Integer>> ringData。每个任务都存储了TriggerNextTimeTriggerLastTime两个触发时间,每次任务触发都会更新这两个时间。

scheduleThread 调度线程 主要做了以下内容:

  1. 获取数据库连接,关闭自动提交,需要手动提交事务
  2. 通过select xxx for update 获取数据库悲观锁,保证同一时刻只有一个调度中心在处理,保证了数据一致性
  3. 从数据库中读取触发时间小于now+5S的任务列表,可以得到下面三个区间的任务

image.png

(1)TriggerNextTime + 5s < now : 这部分任务由于调度器调度失败,没有可用线程执行,任务串形执行等导致上一次调度失败,这种业务场景被称为misfire。

(2) TriggerNextTime + 5s > now and TriggerNextTime < now : 这部分是会被立即调度的任务

(3) TriggerNextTime < now + 5S:这部分是即将会被调度的任务,不会立即触发,会根据下次触发事件找到对应的刻度,放到时间轮中。

  1. 依次遍历每个任务,并且根据任务的下一次触发时间分以上三种情况进行处理
  2. 更新每个任务的TriggerNextTime
/**
 * @author xuxueli 2019-05-21
 */
public class JobScheduleHelper {
    private static Logger logger = LoggerFactory.getLogger(JobScheduleHelper.class);

    private static JobScheduleHelper instance = new JobScheduleHelper();
    public static JobScheduleHelper getInstance(){
        return instance;
    }

    public static final long PRE_READ_MS = 5000;    // pre read

    // 调度线程
    private Thread scheduleThread;
    // 时间轮线程
    private Thread ringThread;
    private volatile boolean scheduleThreadToStop = false;
    private volatile boolean ringThreadToStop = false;
    // 时间轮的数据结构
    private volatile static Map<Integer, List<Integer>> ringData = new ConcurrentHashMap<>();

    public void start(){

        // schedule thread
        scheduleThread = new Thread(new Runnable() {
            @Override
            public void run() {

                try {
                    TimeUnit.MILLISECONDS.sleep(5000 - System.currentTimeMillis()%1000 );
                } catch (InterruptedException e) {
                    if (!scheduleThreadToStop) {
                        logger.error(e.getMessage(), e);
                    }
                }
                logger.info(">>>>>>>>> init xxl-job admin scheduler success.");

                // 预读数量 线程池线程数*触发器qps
                // pre-read count: treadpool-size * trigger-qps (each trigger cost 50ms, qps = 1000/50 = 20)
                int preReadCount = (XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax() + XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax()) * 20;

                while (!scheduleThreadToStop) {

                    // Scan Job
                    long start = System.currentTimeMillis();

                    Connection conn = null;
                    Boolean connAutoCommit = null;
                    PreparedStatement preparedStatement = null;

                    boolean preReadSuc = true;
                    try {

                        // 1. 获取数据库连接
                        conn = XxlJobAdminConfig.getAdminConfig().getDataSource().getConnection();
                        connAutoCommit = conn.getAutoCommit();
                        // 2. 关闭自动提交,需要手动提交事务
                        conn.setAutoCommit(false);

                        // 3. 通过mysql的悲观锁实现数据的一致性,使用select * for update来实现
                        preparedStatement = conn.prepareStatement(  "select * from xxl_job_lock where lock_name = 'schedule_lock' for update" );
                        preparedStatement.execute();

                        // 4. 开启事务
                        // tx start

                        // 5. 从数据库中读取即将在5S内执行的任务,并会有preReadCount的条数限制
                        long nowTime = System.currentTimeMillis();
                        List<XxlJobInfo> scheduleList = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleJobQuery(nowTime + PRE_READ_MS, preReadCount);
                        if (scheduleList!=null && scheduleList.size()>0) {
                            // 6. 时间轮入轮操作
                            for (XxlJobInfo jobInfo: scheduleList) {

                                // 6.1 如果触发时间比now-5s还要小,说明这个任务错过了上一次调度,需要通过misfire策略进行处理
                                // time-ring jump
                                if (nowTime > jobInfo.getTriggerNextTime() + PRE_READ_MS) {
                                    // 2.1、trigger-expire > 5s:pass && make next-trigger-time
                                    logger.warn(">>>>>>>>>>> xxl-job, schedule misfire, jobId = " + jobInfo.getId());

                                    // 1、misfire match
                                    MisfireStrategyEnum misfireStrategyEnum = MisfireStrategyEnum.match(jobInfo.getMisfireStrategy(), MisfireStrategyEnum.DO_NOTHING);
                                    if (MisfireStrategyEnum.FIRE_ONCE_NOW == misfireStrategyEnum) {
                                        // FIRE_ONCE_NOW 》 trigger
                                        JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.MISFIRE, -1, null, null, null);
                                        logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );
                                    }

                                    // 2、fresh next
                                    refreshNextValidTime(jobInfo, new Date());

                                // 6.2 如果任务的触发时间在now-5S内,则立即进行触发,如果下次还是在5s内触发,则重新放到新刻度对应的时间轮中
                                } else if (nowTime > jobInfo.getTriggerNextTime()) {
                                    // 2.2、trigger-expire < 5s:direct-trigger && make next-trigger-time

                                    // 1、trigger
                                    JobTriggerPoolHelper.trigger(jobInfo.getId(), TriggerTypeEnum.CRON, -1, null, null, null);
                                    logger.debug(">>>>>>>>>>> xxl-job, schedule push trigger : jobId = " + jobInfo.getId() );

                                    // 2、fresh next
                                    refreshNextValidTime(jobInfo, new Date());

                                    // next-trigger-time in 5s, pre-read again
                                    if (jobInfo.getTriggerStatus()==1 && nowTime + PRE_READ_MS > jobInfo.getTriggerNextTime()) {

                                        // 1、make ring second
                                        int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);

                                        // 2、push time ring
                                        pushTimeRing(ringSecond, jobInfo.getId());

                                        // 3、fresh next
                                        refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));

                                    }
                                // 6.3 还未到触发时间,则将任务放到时间轮中
                                } else {
                                    // 2.3、trigger-pre-read:time-ring trigger && make next-trigger-time

                                    // 1、make ring second
                                    int ringSecond = (int)((jobInfo.getTriggerNextTime()/1000)%60);

                                    // 2、push time ring
                                    pushTimeRing(ringSecond, jobInfo.getId());

                                    // 3、fresh next
                                    refreshNextValidTime(jobInfo, new Date(jobInfo.getTriggerNextTime()));

                                }

                            }

                            // 7. 更新每个任务的下次触发时间
                            // 3、update trigger info
                            for (XxlJobInfo jobInfo: scheduleList) {
                                XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().scheduleUpdate(jobInfo);
                            }

                        } else {
                            preReadSuc = false;
                        }

                        // tx stop


                    } catch (Exception e) {
                        if (!scheduleThreadToStop) {
                            logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread error:{}", e);
                        }
                    } finally {

                        // commit
                        if (conn != null) {
                            try {
                                conn.commit();
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                            try {
                                conn.setAutoCommit(connAutoCommit);
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                            try {
                                conn.close();
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                        }

                        // close PreparedStatement
                        if (null != preparedStatement) {
                            try {
                                preparedStatement.close();
                            } catch (SQLException e) {
                                if (!scheduleThreadToStop) {
                                    logger.error(e.getMessage(), e);
                                }
                            }
                        }
                    }
                    long cost = System.currentTimeMillis()-start;


                    // Wait seconds, align second
                    if (cost < 1000) {  // scan-overtime, not wait
                        try {
                            // pre-read period: success > scan each second; fail > skip this period;
                            TimeUnit.MILLISECONDS.sleep((preReadSuc?1000:PRE_READ_MS) - System.currentTimeMillis()%1000);
                        } catch (InterruptedException e) {
                            if (!scheduleThreadToStop) {
                                logger.error(e.getMessage(), e);
                            }
                        }
                    }

                }

                logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#scheduleThread stop");
            }
        });
        scheduleThread.setDaemon(true);
        scheduleThread.setName("xxl-job, admin JobScheduleHelper#scheduleThread");
        scheduleThread.start();


        // ring thread
        ringThread = new Thread(new Runnable() {
            @Override
            public void run() {

                while (!ringThreadToStop) {

                    // align second
                    try {
                        TimeUnit.MILLISECONDS.sleep(1000 - System.currentTimeMillis() % 1000);
                    } catch (InterruptedException e) {
                        if (!ringThreadToStop) {
                            logger.error(e.getMessage(), e);
                        }
                    }

                    try {
                        // second data
                        List<Integer> ringItemData = new ArrayList<>();
                        int nowSecond = Calendar.getInstance().get(Calendar.SECOND);   // 避免处理耗时太长,跨过刻度,向前校验一个刻度;
                        for (int i = 0; i < 2; i++) {
                            List<Integer> tmpData = ringData.remove( (nowSecond+60-i)%60 );
                            if (tmpData != null) {
                                ringItemData.addAll(tmpData);
                            }
                        }

                        // ring trigger
                        logger.debug(">>>>>>>>>>> xxl-job, time-ring beat : " + nowSecond + " = " + Arrays.asList(ringItemData) );
                        if (ringItemData.size() > 0) {
                            // do trigger
                            for (int jobId: ringItemData) {
                                // do trigger
                                JobTriggerPoolHelper.trigger(jobId, TriggerTypeEnum.CRON, -1, null, null, null);
                            }
                            // clear
                            ringItemData.clear();
                        }
                    } catch (Exception e) {
                        if (!ringThreadToStop) {
                            logger.error(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread error:{}", e);
                        }
                    }
                }
                logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper#ringThread stop");
            }
        });
        ringThread.setDaemon(true);
        ringThread.setName("xxl-job, admin JobScheduleHelper#ringThread");
        ringThread.start();
    }

    private void refreshNextValidTime(XxlJobInfo jobInfo, Date fromTime) throws Exception {
        Date nextValidTime = generateNextValidTime(jobInfo, fromTime);
        if (nextValidTime != null) {
            jobInfo.setTriggerLastTime(jobInfo.getTriggerNextTime());
            jobInfo.setTriggerNextTime(nextValidTime.getTime());
        } else {
            jobInfo.setTriggerStatus(0);
            jobInfo.setTriggerLastTime(0);
            jobInfo.setTriggerNextTime(0);
            logger.warn(">>>>>>>>>>> xxl-job, refreshNextValidTime fail for job: jobId={}, scheduleType={}, scheduleConf={}",
                    jobInfo.getId(), jobInfo.getScheduleType(), jobInfo.getScheduleConf());
        }
    }

    private void pushTimeRing(int ringSecond, int jobId){
        // push async ring
        List<Integer> ringItemData = ringData.get(ringSecond);
        if (ringItemData == null) {
            ringItemData = new ArrayList<Integer>();
            ringData.put(ringSecond, ringItemData);
        }
        ringItemData.add(jobId);

        logger.debug(">>>>>>>>>>> xxl-job, schedule push time-ring : " + ringSecond + " = " + Arrays.asList(ringItemData) );
    }

    public void toStop(){

        // 1、stop schedule
        scheduleThreadToStop = true;
        try {
            TimeUnit.SECONDS.sleep(1);  // wait
        } catch (InterruptedException e) {
            logger.error(e.getMessage(), e);
        }
        if (scheduleThread.getState() != Thread.State.TERMINATED){
            // interrupt and wait
            scheduleThread.interrupt();
            try {
                scheduleThread.join();
            } catch (InterruptedException e) {
                logger.error(e.getMessage(), e);
            }
        }

        // if has ring data
        boolean hasRingData = false;
        if (!ringData.isEmpty()) {
            for (int second : ringData.keySet()) {
                List<Integer> tmpData = ringData.get(second);
                if (tmpData!=null && tmpData.size()>0) {
                    hasRingData = true;
                    break;
                }
            }
        }
        if (hasRingData) {
            try {
                TimeUnit.SECONDS.sleep(8);
            } catch (InterruptedException e) {
                logger.error(e.getMessage(), e);
            }
        }

        // stop ring (wait job-in-memory stop)
        ringThreadToStop = true;
        try {
            TimeUnit.SECONDS.sleep(1);
        } catch (InterruptedException e) {
            logger.error(e.getMessage(), e);
        }
        if (ringThread.getState() != Thread.State.TERMINATED){
            // interrupt and wait
            ringThread.interrupt();
            try {
                ringThread.join();
            } catch (InterruptedException e) {
                logger.error(e.getMessage(), e);
            }
        }

        logger.info(">>>>>>>>>>> xxl-job, JobScheduleHelper stop");
    }


    // ---------------------- tools ----------------------
    public static Date generateNextValidTime(XxlJobInfo jobInfo, Date fromTime) throws Exception {
        ScheduleTypeEnum scheduleTypeEnum = ScheduleTypeEnum.match(jobInfo.getScheduleType(), null);
        if (ScheduleTypeEnum.CRON == scheduleTypeEnum) {
            Date nextValidTime = new CronExpression(jobInfo.getScheduleConf()).getNextValidTimeAfter(fromTime);
            return nextValidTime;
        } else if (ScheduleTypeEnum.FIX_RATE == scheduleTypeEnum /*|| ScheduleTypeEnum.FIX_DELAY == scheduleTypeEnum*/) {
            return new Date(fromTime.getTime() + Integer.valueOf(jobInfo.getScheduleConf())*1000 );
        }
        return null;
    }

}

1.4 时间轮和Quartz比较

Quartz作为开源作业调度中的佼佼者,是作业调度的首选。但是集群环境中Quartz采用API的方式对任务进行管理,从而可以避免上述问题,但是同样存在以下问题:

问题一:调用API的的方式操作任务,不人性化;

问题二:需要持久化业务QuartzJobBean到底层数据表中,系统侵入性相当严重。

问题三:调度逻辑和QuartzJobBean耦合在同一个项目中,这将导致一个问题,在调度任务数量逐渐增多,同时调度任务逻辑逐渐加重的情况下,此时调度系统的性能将大大受限于业务;

问题四:quartz底层以“抢占式”获取DB锁并由抢占成功节点负责运行任务,会导致节点负载悬殊非常大;而XXL-JOB通过执行器实现“协同分配式”运行任务,充分发挥集群优势,负载各节点均衡。

1.5 总结

  • 任务的信息存储在xxl_job_info表中,每个任务都会存储下一次任务触发时间TriggerNextTime和上一次任务触发时间TriggerLastTime两个触发时间
  • 任务的调度方式包括手动触发、corn条件触发、失败重试或者父节点触发
  • 任务的调度是基于时间轮算法来实现的,时间轮是刻度为60的圆,每个刻度有对应的任务列表,在xxl-job中以Map<Integer, List<Integer>>作为时间轮的数据机构。
  • 调度中心会启动一个调度线程scheduleThread,不断地扫描下一次触发事件小于当前时间+5s的任务,并按照其所在的区间分三种类型进行处理。

2. 调度中心回调执行器

在上一节中了解了调度中心是如何在对应的时间调度任务,接下来看看是调度中心如何回调执行器提供的jobHandler接口。下面是转载的大佬的画的图,很清晰。

图片转载:xxl-job任务触发流程

image.png

2.1 JobTriggerPoolHelper 触发线程池

JobTriggerPoolHelper 中定义了两个线程池,一个快线程池专门处理快任务fastTriggerPool,一个慢线程专门处理慢任务slowTriggerPool

JobTriggerPoolHelper主要是将任务选择相应的线程池中执行,通过addTrigger实现,主要做了以下操作:

  1. 为了避免多个任务执行时常不同,导致资源分配不均,因此采用两个线程池来分别处理快任务和慢任务

  2. 如果一个任务在1分钟内超时的次数超过了10次,则会判定这个任务是慢任务,会交由慢线程池执行

  3. 将任务交由XxlJobTrigger进行处理

/**
 * add trigger
 */
public void addTrigger(final int jobId,
                       final TriggerTypeEnum triggerType,
                       final int failRetryCount,
                       final String executorShardingParam,
                       final String executorParam,
                       final String addressList) {

    // 1. 为了避免多个任务执行时常不同,导致资源分配不均,因此采用两个线程池来分别处理快任务和慢任务
    //    如果一个任务在1分钟内超时的次数超过了10次,则会判定这个任务是慢任务,会交由慢线程池执行
    // choose thread pool
    ThreadPoolExecutor triggerPool_ = fastTriggerPool;
    AtomicInteger jobTimeoutCount = jobTimeoutCountMap.get(jobId);
    if (jobTimeoutCount!=null && jobTimeoutCount.get() > 10) {      // job-timeout 10 times in 1 min
        triggerPool_ = slowTriggerPool;
    }

    // trigger
    triggerPool_.execute(new Runnable() {
        @Override
        public void run() {

            long start = System.currentTimeMillis();

            try {
                // do trigger
                XxlJobTrigger.trigger(jobId, triggerType, failRetryCount, executorShardingParam, executorParam, addressList);
            } catch (Exception e) {
                logger.error(e.getMessage(), e);
            } finally {

                // check timeout-count-map
                long minTim_now = System.currentTimeMillis()/60000;
                if (minTim != minTim_now) {
                    minTim = minTim_now;
                    jobTimeoutCountMap.clear();
                }

                // incr timeout-count-map
                long cost = System.currentTimeMillis()-start;
                if (cost > 500) {       // ob-timeout threshold 500ms
                    AtomicInteger timeoutCount = jobTimeoutCountMap.putIfAbsent(jobId, new AtomicInteger(1));
                    if (timeoutCount != null) {
                        timeoutCount.incrementAndGet();
                    }
                }

            }

        }
    });
}

2.2 XxlJobTrigger 任务触发

XxlJobTrigger 是真正处理啊任务触发的逻辑,其中的trigger主要进行了以下操作:

  1. 从数据库中查询任务信息
  2. 设置失败充实次数和机器执行列表,以传入的参数为准
  3. 处理分片参数
  4. 如果是广播任务,则依次调用每个执行器
/**
 * trigger job
 *
 * @param jobId
 * @param triggerType
 * @param failRetryCount
 *           >=0: use this param
 *           <0: use param from job info config
 * @param executorShardingParam
 * @param executorParam
 *          null: use job param
 *          not null: cover job param
 * @param addressList
 *          null: use executor addressList
 *          not null: cover
 */
public static void trigger(int jobId,
                           TriggerTypeEnum triggerType,
                           int failRetryCount,
                           String executorShardingParam,
                           String executorParam,
                           String addressList) {

    // 1. 从数据库中查询任务的信息
    // load data
    XxlJobInfo jobInfo = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(jobId);
    if (jobInfo == null) {
        logger.warn(">>>>>>>>>>>> trigger fail, jobId invalid,jobId={}", jobId);
        return;
    }
    if (executorParam != null) {
        jobInfo.setExecutorParam(executorParam);
    }
    // 2. 设置失败重试次数和执行器地址列表,以传入的参数为准
    int finalFailRetryCount = failRetryCount>=0?failRetryCount:jobInfo.getExecutorFailRetryCount();
    XxlJobGroup group = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().load(jobInfo.getJobGroup());

    // cover addressList
    if (addressList!=null && addressList.trim().length()>0) {
        group.setAddressType(1);
        group.setAddressList(addressList.trim());
    }

    // 3. 处理分片参数
    // sharding param
    int[] shardingParam = null;
    if (executorShardingParam!=null){
        String[] shardingArr = executorShardingParam.split("/");
        if (shardingArr.length==2 && isNumeric(shardingArr[0]) && isNumeric(shardingArr[1])) {
            shardingParam = new int[2];
            shardingParam[0] = Integer.valueOf(shardingArr[0]);
            shardingParam[1] = Integer.valueOf(shardingArr[1]);
        }
    }
    // 4. 如果是广播任务,则触发每个执行器的任务
    if (ExecutorRouteStrategyEnum.SHARDING_BROADCAST==ExecutorRouteStrategyEnum.match(jobInfo.getExecutorRouteStrategy(), null)
            && group.getRegistryList()!=null && !group.getRegistryList().isEmpty()
            && shardingParam==null) {
        for (int i = 0; i < group.getRegistryList().size(); i++) {
            processTrigger(group, jobInfo, finalFailRetryCount, triggerType, i, group.getRegistryList().size());
        }
    } else {
        if (shardingParam == null) {
            shardingParam = new int[]{0, 1};
        }
        // 5. 如果是其它路由策略
        processTrigger(group, jobInfo, finalFailRetryCount, triggerType, shardingParam[0], shardingParam[1]);
    }

}

processTrigger 是真正处理逻辑,其主要操作包括以下内容:

  1. 存储触发日志,并返回触发日志
  2. 初始化触发参数
  3. 根据路右策略选择执行器中的一个机器
  4. 触发远程执行器,发起rpc调用
  5. 收集返回结果信息,并记录到日志中

runExecutor是将包装好的参数通过一个执行器代理ExecutorBiz来实现rpc调用的。

/**
 * run executor
 * @param triggerParam
 * @param address
 * @return
 */
public static ReturnT<String> runExecutor(TriggerParam triggerParam, String address){
    ReturnT<String> runResult = null;
    try {
        // 1. 远程执行器代理
        ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
        // 2. 调用远程执行器的run方法
        runResult = executorBiz.run(triggerParam);
    } catch (Exception e) {
        logger.error(">>>>>>>>>>> xxl-job trigger error, please check if the executor[{}] is running.", address, e);
        runResult = new ReturnT<String>(ReturnT.FAIL_CODE, ThrowableUtil.toString(e));
    }

    StringBuffer runResultSB = new StringBuffer(I18nUtil.getString("jobconf_trigger_run") + ":");
    runResultSB.append("<br>address:").append(address);
    runResultSB.append("<br>code:").append(runResult.getCode());
    runResultSB.append("<br>msg:").append(runResult.getMsg());

    runResult.setMsg(runResultSB.toString());
    return runResult;
}

2.3 ExecutorBizClient 执行器代理

com.xxl.job.admin.core.scheduler.XxlJobScheduler#getExecutorBiz 中通过懒加载的方式,将远程执行器的地址包装成一个ExecutorBizClient,并放到一个ConcurrentMap类型的缓存中。

private static ConcurrentMap<String, ExecutorBiz> executorBizRepository = new ConcurrentHashMap<String, ExecutorBiz>();
public static ExecutorBiz getExecutorBiz(String address) throws Exception {
    // valid
    if (address==null || address.trim().length()==0) {
        return null;
    }

    // load-cache
    address = address.trim();
    ExecutorBiz executorBiz = executorBizRepository.get(address);
    if (executorBiz != null) {
        return executorBiz;
    }

    // set-cache
    executorBiz = new ExecutorBizClient(address, XxlJobAdminConfig.getAdminConfig().getAccessToken());

    executorBizRepository.put(address, executorBiz);
    return executorBiz;
}

ExecutorBizClient 实现了run方法,run方法调用的执行器的execute方法。

/**
 * admin api test
 *
 * @author xuxueli 2017-07-28 22:14:52
 */
public class ExecutorBizClient implements ExecutorBiz {

    public ExecutorBizClient() {
    }
    public ExecutorBizClient(String addressUrl, String accessToken) {
        this.addressUrl = addressUrl;
        this.accessToken = accessToken;

        // valid
        if (!this.addressUrl.endsWith("/")) {
            this.addressUrl = this.addressUrl + "/";
        }
    }

    private String addressUrl ;
    private String accessToken;
    private int timeout = 3;


    @Override
    public ReturnT<String> beat() {
        return XxlJobRemotingUtil.postBody(addressUrl+"beat", accessToken, timeout, "", String.class);
    }

    @Override
    public ReturnT<String> idleBeat(IdleBeatParam idleBeatParam){
        return XxlJobRemotingUtil.postBody(addressUrl+"idleBeat", accessToken, timeout, idleBeatParam, String.class);
    }

    @Override
    public ReturnT<String> run(TriggerParam triggerParam) {
        return XxlJobRemotingUtil.postBody(addressUrl + "run", accessToken, timeout, triggerParam, String.class);
    }

    @Override
    public ReturnT<String> kill(KillParam killParam) {
        return XxlJobRemotingUtil.postBody(addressUrl + "kill", accessToken, timeout, killParam, String.class);
    }

    @Override
    public ReturnT<LogResult> log(LogParam logParam) {
        return XxlJobRemotingUtil.postBody(addressUrl + "log", accessToken, timeout, logParam, LogResult.class);
    }

}

2.4 ExecutorBizImpl 执行器rpc调用

在执行器端的内置容器启动时会创建一个ExecutorBizImpl实例,这个ExecutorBizImpl就是执行器端接口调度中心rpc调用的处理器,它接受调度中心的请求,根据请求方法来执行对应的接口方法。

com.xxl.job.core.server.EmbedServer.EmbedHttpServerHandler#processexecutorBiz就是ExecutorBizImpl的一个实例

private Object process(HttpMethod httpMethod, String uri, String requestData, String accessTokenReq) {

    // valid
    if (HttpMethod.POST != httpMethod) {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, HttpMethod not support.");
    }
    if (uri==null || uri.trim().length()==0) {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, uri-mapping empty.");
    }
    if (accessToken!=null
            && accessToken.trim().length()>0
            && !accessToken.equals(accessTokenReq)) {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "The access token is wrong.");
    }

    // services mapping
    try {
        if ("/beat".equals(uri)) {
            return executorBiz.beat();
        } else if ("/idleBeat".equals(uri)) {
            IdleBeatParam idleBeatParam = GsonTool.fromJson(requestData, IdleBeatParam.class);
            return executorBiz.idleBeat(idleBeatParam);
        } else if ("/run".equals(uri)) {
            TriggerParam triggerParam = GsonTool.fromJson(requestData, TriggerParam.class);
            return executorBiz.run(triggerParam);
        } else if ("/kill".equals(uri)) {
            KillParam killParam = GsonTool.fromJson(requestData, KillParam.class);
            return executorBiz.kill(killParam);
        } else if ("/log".equals(uri)) {
            LogParam logParam = GsonTool.fromJson(requestData, LogParam.class);
            return executorBiz.log(logParam);
        } else {
            return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, uri-mapping("+ uri +") not found.");
        }
    } catch (Exception e) {
        logger.error(e.getMessage(), e);
        return new ReturnT<String>(ReturnT.FAIL_CODE, "request error:" + ThrowableUtil.toString(e));
    }
}

ExecutorBizImpl 的run方法中,主要做了以下操作:

  1. 从线程池中获取一个任务线程和对应的jobHandler
  2. 根据阻塞策略进行处理
  3. 将任务放到一个队列中
@Override
public ReturnT<String> run(TriggerParam triggerParam) {
    // load old:jobHandler + jobThread
    // 1. 从线程池中获取一个任务线程和对应的jobHandler
    JobThread jobThread = XxlJobExecutor.loadJobThread(triggerParam.getJobId());
    IJobHandler jobHandler = jobThread!=null?jobThread.getHandler():null;
    String removeOldReason = null;

    // valid:jobHandler + jobThread
    GlueTypeEnum glueTypeEnum = GlueTypeEnum.match(triggerParam.getGlueType());
    if (GlueTypeEnum.BEAN == glueTypeEnum) {

        // new jobhandler
        IJobHandler newJobHandler = XxlJobExecutor.loadJobHandler(triggerParam.getExecutorHandler());

        // valid old jobThread
        if (jobThread!=null && jobHandler != newJobHandler) {
            // change handler, need kill old thread
            removeOldReason = "change jobhandler or glue type, and terminate the old job thread.";

            jobThread = null;
            jobHandler = null;
        }

        // valid handler
        if (jobHandler == null) {
            jobHandler = newJobHandler;
            if (jobHandler == null) {
                return new ReturnT<String>(ReturnT.FAIL_CODE, "job handler [" + triggerParam.getExecutorHandler() + "] not found.");
            }
        }

    } else if (GlueTypeEnum.GLUE_GROOVY == glueTypeEnum) {

        // valid old jobThread
        if (jobThread != null &&
                !(jobThread.getHandler() instanceof GlueJobHandler
                    && ((GlueJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
            // change handler or gluesource updated, need kill old thread
            removeOldReason = "change job source or glue type, and terminate the old job thread.";

            jobThread = null;
            jobHandler = null;
        }

        // valid handler
        if (jobHandler == null) {
            try {
                IJobHandler originJobHandler = GlueFactory.getInstance().loadNewInstance(triggerParam.getGlueSource());
                jobHandler = new GlueJobHandler(originJobHandler, triggerParam.getGlueUpdatetime());
            } catch (Exception e) {
                logger.error(e.getMessage(), e);
                return new ReturnT<String>(ReturnT.FAIL_CODE, e.getMessage());
            }
        }
    } else if (glueTypeEnum!=null && glueTypeEnum.isScript()) {

        // valid old jobThread
        if (jobThread != null &&
                !(jobThread.getHandler() instanceof ScriptJobHandler
                        && ((ScriptJobHandler) jobThread.getHandler()).getGlueUpdatetime()==triggerParam.getGlueUpdatetime() )) {
            // change script or gluesource updated, need kill old thread
            removeOldReason = "change job source or glue type, and terminate the old job thread.";

            jobThread = null;
            jobHandler = null;
        }

        // valid handler
        if (jobHandler == null) {
            jobHandler = new ScriptJobHandler(triggerParam.getJobId(), triggerParam.getGlueUpdatetime(), triggerParam.getGlueSource(), GlueTypeEnum.match(triggerParam.getGlueType()));
        }
    } else {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "glueType[" + triggerParam.getGlueType() + "] is not valid.");
    }

    // 2. 根据阻塞策略进行处理
    // executor block strategy
    if (jobThread != null) {
        ExecutorBlockStrategyEnum blockStrategy = ExecutorBlockStrategyEnum.match(triggerParam.getExecutorBlockStrategy(), null);
        if (ExecutorBlockStrategyEnum.DISCARD_LATER == blockStrategy) {
            // discard when running
            if (jobThread.isRunningOrHasQueue()) {
                return new ReturnT<String>(ReturnT.FAIL_CODE, "block strategy effect:"+ExecutorBlockStrategyEnum.DISCARD_LATER.getTitle());
            }
        } else if (ExecutorBlockStrategyEnum.COVER_EARLY == blockStrategy) {
            // kill running jobThread
            if (jobThread.isRunningOrHasQueue()) {
                removeOldReason = "block strategy effect:" + ExecutorBlockStrategyEnum.COVER_EARLY.getTitle();

                jobThread = null;
            }
        } else {
            // just queue trigger
        }
    }

    // replace thread (new or exists invalid)
    if (jobThread == null) {
        jobThread = XxlJobExecutor.registJobThread(triggerParam.getJobId(), jobHandler, removeOldReason);
    }

    // 将任务放到一个队列中
    // push data to queue
    ReturnT<String> pushResult = jobThread.pushTriggerQueue(triggerParam);
    return pushResult;
}

2.5 JobThread 执行器rpc调用

任务线程不断地从任务队列中取出一个任务进行处理,JobThread 主要做了以下内容:

  1. 执行任务的init初始化方法
  2. 从任务队列中取出一个任务,如果任务设置了超时时间,则将任务交由FutureTask执行,否则执行handler的execute方法。
  3. 记录结果日志

com.xxl.job.core.thread.JobThread#run

 @Override
public void run() {

       // 1. 调用任务处理器定义的初始化方法
       // init
       try {
      handler.init();
   } catch (Throwable e) {
          logger.error(e.getMessage(), e);
   }

   // execute
   while(!toStop){
      running = false;
      idleTimes++;

           TriggerParam triggerParam = null;
           try {
               // 2. 从队列中取出一个人任务进行处理
         // to check toStop signal, we need cycle, so wo cannot use queue.take(), instand of poll(timeout)
         triggerParam = triggerQueue.poll(3L, TimeUnit.SECONDS);
         if (triggerParam!=null) {
            running = true;
            idleTimes = 0;
            triggerLogIdSet.remove(triggerParam.getLogId());

            // log filename, like "logPath/yyyy-MM-dd/9999.log"
            String logFileName = XxlJobFileAppender.makeLogFileName(new Date(triggerParam.getLogDateTime()), triggerParam.getLogId());
            XxlJobContext xxlJobContext = new XxlJobContext(
                  triggerParam.getJobId(),
                  triggerParam.getExecutorParams(),
                  logFileName,
                  triggerParam.getBroadcastIndex(),
                  triggerParam.getBroadcastTotal());

            // 初始化任务上下文
            // init job context
            XxlJobContext.setXxlJobContext(xxlJobContext);

            // execute
            XxlJobHelper.log("<br>----------- xxl-job job execute start -----------<br>----------- Param:" + xxlJobContext.getJobParam());

            if (triggerParam.getExecutorTimeout() > 0) {
               // limit timeout
               Thread futureThread = null;
               try {
                  FutureTask<Boolean> futureTask = new FutureTask<Boolean>(new Callable<Boolean>() {
                     @Override
                     public Boolean call() throws Exception {

                        // init job context
                        XxlJobContext.setXxlJobContext(xxlJobContext);

                        handler.execute();
                        return true;
                     }
                  });
                  futureThread = new Thread(futureTask);
                  futureThread.start();

                  Boolean tempResult = futureTask.get(triggerParam.getExecutorTimeout(), TimeUnit.SECONDS);
               } catch (TimeoutException e) {

                  XxlJobHelper.log("<br>----------- xxl-job job execute timeout");
                  XxlJobHelper.log(e);

                  // handle result
                  XxlJobHelper.handleTimeout("job execute timeout ");
               } finally {
                  futureThread.interrupt();
               }
            } else {
               // just execute
               handler.execute();
            }

            // valid execute handle data
            if (XxlJobContext.getXxlJobContext().getHandleCode() <= 0) {
               XxlJobHelper.handleFail("job handle result lost.");
            } else {
               String tempHandleMsg = XxlJobContext.getXxlJobContext().getHandleMsg();
               tempHandleMsg = (tempHandleMsg!=null&&tempHandleMsg.length()>50000)
                     ?tempHandleMsg.substring(0, 50000).concat("...")
                     :tempHandleMsg;
               XxlJobContext.getXxlJobContext().setHandleMsg(tempHandleMsg);
            }
            XxlJobHelper.log("<br>----------- xxl-job job execute end(finish) -----------<br>----------- Result: handleCode="
                  + XxlJobContext.getXxlJobContext().getHandleCode()
                  + ", handleMsg = "
                  + XxlJobContext.getXxlJobContext().getHandleMsg()
            );

         } else {
            if (idleTimes > 30) {
               if(triggerQueue.size() == 0) { // avoid concurrent trigger causes jobId-lost
                  XxlJobExecutor.removeJobThread(jobId, "excutor idel times over limit.");
               }
            }
         }
      } catch (Throwable e) {
         if (toStop) {
            XxlJobHelper.log("<br>----------- JobThread toStop, stopReason:" + stopReason);
         }

         // handle result
         StringWriter stringWriter = new StringWriter();
         e.printStackTrace(new PrintWriter(stringWriter));
         String errorMsg = stringWriter.toString();

         XxlJobHelper.handleFail(errorMsg);

         XxlJobHelper.log("<br>----------- JobThread Exception:" + errorMsg + "<br>----------- xxl-job job execute end(error) -----------");
      } finally {
               if(triggerParam != null) {
                   // callback handler info
                   if (!toStop) {
                       // commonm
                       TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
                              triggerParam.getLogId(),
                     triggerParam.getLogDateTime(),
                     XxlJobContext.getXxlJobContext().getHandleCode(),
                     XxlJobContext.getXxlJobContext().getHandleMsg() )
               );
                   } else {
                       // is killed
                       TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
                              triggerParam.getLogId(),
                     triggerParam.getLogDateTime(),
                     XxlJobContext.HANDLE_COCE_FAIL,
                     stopReason + " [job running, killed]" )
               );
                   }
               }
           }
       }

   // callback trigger request in queue
   while(triggerQueue !=null && triggerQueue.size()>0){
      TriggerParam triggerParam = triggerQueue.poll();
      if (triggerParam!=null) {
         // is killed
         TriggerCallbackThread.pushCallBack(new HandleCallbackParam(
               triggerParam.getLogId(),
               triggerParam.getLogDateTime(),
               XxlJobContext.HANDLE_COCE_FAIL,
               stopReason + " [job not executed, in the job queue, killed.]")
         );
      }
   }

   // destroy
   try {
      handler.destroy();
   } catch (Throwable e) {
      logger.error(e.getMessage(), e);
   }

   logger.info(">>>>>>>>>>> xxl-job JobThread stoped, hashCode:{}", Thread.currentThread());
}

2.6 总结

  • 调度中心会为每个执行器生成一个代理类ExecutorBizClient,并在任务触发时,通过这个代理和执行器进行rpc通信。
  • 在执行器端的内置容器启动时会创建一个ExecutorBizImpl实例,这个ExecutorBizImpl就是执行器端接口调度中心rpc调用的处理器,它接受调度中心的请求,根据请求方法来执行对应的接口方法。
  • 执行器端的任务是交由一个任务线程来执行的,每个任务都会有一个对应的jobThread,任务进来不会立马执行,而是将任务放到一个阻塞队列中。
  • 执行器端的jobThread不断的从队列中取出任务进行执行,并记录执行日志。

3. 总结

本文给大家详细的了调度中心是如何进行调度任务,并且执行器端是如何感知到任务的触发,并执行任务的逻辑的主要流程,可以了解到xxl-job采用线程池的方式,避免单线程因阻塞而引起任务调度延迟, 将任务的调度和执行完全异步化,从而提高系统的吞吐量。采用时间轮的方式替换Quartz从而可以进一步降低系统耦合度,并且避免了任务过时调度。xxl-job很多设计思想值得我们学习。

本文参考:xxl-job任务触发流程