Xxl-job执行器原理

812 阅读7分钟

携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第7天,点击查看活动详情

Xxl-job 执行器

执行器简而言之为具体的定时任务所在的应用.对于业务而言存在两种使用方式:

  1. 将定时任务都编写在同一个应用中
  2. 定时任务分布到对应的业务应用中.

上述两种方案各有利弊,不做过多叙述.本文旨在讨论执行器的具体原理.

执行器与调度中心的通信

执行器与调用中心不在同一个jvm中,那么它们之间必然存在着通信.对于2.2.0版本及以上采用的是http协议来实现.所以执行器必定存在一个监听指定端口的web应用服务. 先说具体结论Xxl-job执行器基于netty来实现的;

EmbedServer 实现web服务

在EmbedServer类中存在一个单独的线程来启动netty服务

// param
EventLoopGroup bossGroup = new NioEventLoopGroup();
EventLoopGroup workerGroup = new NioEventLoopGroup();
ThreadPoolExecutor bizThreadPool = new ThreadPoolExecutor(
        0,
        200,
        60L,
        TimeUnit.SECONDS,
        new LinkedBlockingQueue<Runnable>(2000),
        new ThreadFactory() {
            @Override
            public Thread newThread(Runnable r) {
                return new Thread(r, "xxl-rpc, EmbedServer bizThreadPool-" + r.hashCode());
            }
        },
        new RejectedExecutionHandler() {
            @Override
            public void rejectedExecution(Runnable r, ThreadPoolExecutor executor) {
                throw new RuntimeException("xxl-job, EmbedServer bizThreadPool is EXHAUSTED!");
            }
        });


try {
    // start server
    ServerBootstrap bootstrap = new ServerBootstrap();
    bootstrap.group(bossGroup, workerGroup)
            .channel(NioServerSocketChannel.class)
            .childHandler(new ChannelInitializer<SocketChannel>() {
                @Override
                public void initChannel(SocketChannel channel) throws Exception {
                    channel.pipeline()
                            .addLast(new IdleStateHandler(0, 0, 30 * 3, TimeUnit.SECONDS))  // beat 3N, close if idle
                            .addLast(new HttpServerCodec())
                            .addLast(new HttpObjectAggregator(5 * 1024 * 1024))  // merge request & reponse to FULL
                            .addLast(new EmbedHttpServerHandler(executorBiz, accessToken, bizThreadPool));
                }
            })
            .childOption(ChannelOption.SO_KEEPALIVE, true);

    // bind
    ChannelFuture future = bootstrap.bind(port).sync();

Xxl-job执行器源码入口在XxlJobSpringExecutor中,

  1. 初始化标注了XxlJob注解的方法对应任务的仓储.
  2. 初始化配置目录和调度中心地址
  3. 启动任务执行结果上报线程
  4. 启动netty服务.
  5. 启动注册线程(调度中心心跳保持)
心跳保活

执行器与调度中心存在着心跳,假设无心跳的话,调度中心将某一任务调度到已经下线的节点上,那么会导致任务执行失败.这对于业务来说是不可接受的.所以再执行器启动后两者间存在心跳保持,执行器下线,那么也要通知调度中心自己下线的动作后,才能结束.

 registryThread = new Thread(new Runnable() {
        @Override
        public void run() {

            // registry
            while (!toStop) {
                try {
                    RegistryParam registryParam = new RegistryParam(RegistryConfig.RegistType.EXECUTOR.name(), appname, address);
                    for (AdminBiz adminBiz: XxlJobExecutor.getAdminBizList()) {
                        try {
                            ReturnT<String> registryResult = adminBiz.registry(registryParam);
                            if (registryResult!=null && ReturnT.SUCCESS_CODE == registryResult.getCode()) {
                                registryResult = ReturnT.SUCCESS;
                                logger.debug(">>>>>>>>>>> xxl-job registry success, registryParam:{}, registryResult:{}", new Object[]{registryParam, registryResult});
                                break;
                            } else {
                                logger.info(">>>>>>>>>>> xxl-job registry fail, registryParam:{}, registryResult:{}", new Object[]{registryParam, registryResult});
                            }
                        } catch (Exception e) {
                            logger.info(">>>>>>>>>>> xxl-job registry error, registryParam:{}", registryParam, e);
                        }

                    }
                } catch (Exception e) {
                    if (!toStop) {
                        logger.error(e.getMessage(), e);
                    }

                }

                try {
                    if (!toStop) {
                        TimeUnit.SECONDS.sleep(RegistryConfig.BEAT_TIMEOUT);
                    }
                } catch (InterruptedException e) {
                    if (!toStop) {
                        logger.warn(">>>>>>>>>>> xxl-job, executor registry thread interrupted, error msg:{}", e.getMessage());
                    }
                }
            }

            // registry remove
            try {
                RegistryParam registryParam = new RegistryParam(RegistryConfig.RegistType.EXECUTOR.name(), appname, address);
                for (AdminBiz adminBiz: XxlJobExecutor.getAdminBizList()) {
                    try {
                        ReturnT<String> registryResult = adminBiz.registryRemove(registryParam);
                        if (registryResult!=null && ReturnT.SUCCESS_CODE == registryResult.getCode()) {
                            registryResult = ReturnT.SUCCESS;
                            logger.info(">>>>>>>>>>> xxl-job registry-remove success, registryParam:{}, registryResult:{}", new Object[]{registryParam, registryResult});
                            break;
                        } else {
                            logger.info(">>>>>>>>>>> xxl-job registry-remove fail, registryParam:{}, registryResult:{}", new Object[]{registryParam, registryResult});
                        }
                    } catch (Exception e) {
                        if (!toStop) {
                            logger.info(">>>>>>>>>>> xxl-job registry-remove error, registryParam:{}", registryParam, e);
                        }

                    }

                }
            } catch (Exception e) {
                if (!toStop) {
                    logger.error(e.getMessage(), e);
                }
            }
            logger.info(">>>>>>>>>>> xxl-job, executor registry thread destory.");

        }
    });
    registryThread.setDaemon(true);
    registryThread.setName("xxl-job, executor ExecutorRegistryThread");
    registryThread.start();
}

以上代码是启动后像调度中心集群的某一个节点定时去注册,只要要有一个节点注册成功,那么就进入休眠状态,等待下一次触发.(因为调度器集群连接是同一个数据库,只要保证一个成功,那么其他调度器也会感知到).

任务执行

之前说过,调度器与执行器通讯方式采用http协议进行.所以对于调度器的请求,netty中必定存在对应的处理逻辑EmbedHttpServerHandler

private Object process(HttpMethod httpMethod, String uri, String requestData, String accessTokenReq) {

    // valid
    if (HttpMethod.POST != httpMethod) {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, HttpMethod not support.");
    }
    if (uri==null || uri.trim().length()==0) {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, uri-mapping empty.");
    }
    if (accessToken!=null
            && accessToken.trim().length()>0
            && !accessToken.equals(accessTokenReq)) {
        return new ReturnT<String>(ReturnT.FAIL_CODE, "The access token is wrong.");
    }

    // services mapping
    try {
        if ("/beat".equals(uri)) {
            return executorBiz.beat();
        } else if ("/idleBeat".equals(uri)) {
            IdleBeatParam idleBeatParam = GsonTool.fromJson(requestData, IdleBeatParam.class);
            return executorBiz.idleBeat(idleBeatParam);
        } else if ("/run".equals(uri)) {
            TriggerParam triggerParam = GsonTool.fromJson(requestData, TriggerParam.class);
            return executorBiz.run(triggerParam);
        } else if ("/kill".equals(uri)) {
            KillParam killParam = GsonTool.fromJson(requestData, KillParam.class);
            return executorBiz.kill(killParam);
        } else if ("/log".equals(uri)) {
            LogParam logParam = GsonTool.fromJson(requestData, LogParam.class);
            return executorBiz.log(logParam);
        } else {
            return new ReturnT<String>(ReturnT.FAIL_CODE, "invalid request, uri-mapping("+ uri +") not found.");
        }
    } catch (Exception e) {
        logger.error(e.getMessage(), e);
        return new ReturnT<String>(ReturnT.FAIL_CODE, "request error:" + ThrowableUtil.toString(e));
    }
}

存在以上请求:

  • beat 主动心跳,检测执行器是否下线
  • idleBeat 用于判断线程是否空闲 忙碌转移
  • run 通知任务执行
  • kill 结束任务
  • log 获取某个任务日志
run 通知任务执行

对于执行器而言,任务执行是在单独的线程中进行,每个标注了XxlJob注解的任务方法都会存在一个单独的线程来为止服务,此处类似命令模式,调度器仅仅发送一个命令,而具体执行交由单独线程来完成,避免调度中心的请求长时间阻塞.

// replace thread (new or exists invalid)
if (jobThread == null) {
    jobThread = XxlJobExecutor.registJobThread(triggerParam.getJobId(), jobHandler, removeOldReason);
}

// push data to queue
ReturnT<String> pushResult = jobThread.pushTriggerQueue(triggerParam);

先根据任务id找到任务线程,然后push到该任务线程的队列中.


public class JobThread extends Thread{

private LinkedBlockingQueue<TriggerParam> triggerQueue;
private Set<Long> triggerLogIdSet;

public ReturnT<String> pushTriggerQueue(TriggerParam triggerParam) {
   // avoid repeat
   if (triggerLogIdSet.contains(triggerParam.getLogId())) {
      logger.info(">>>>>>>>>>> repeate trigger job, logId:{}", triggerParam.getLogId());
      return new ReturnT<String>(ReturnT.FAIL_CODE, "repeate trigger job, logId:" + triggerParam.getLogId());
   }

   triggerLogIdSet.add(triggerParam.getLogId());
   triggerQueue.add(triggerParam);
       return ReturnT.SUCCESS;
}

triggerLogIdSet目的是幂等,避免重复执行.JobThread实际上是一个继承了Thread的类,那么该类的run方法必然会循环从队列中取出任务进行执行.

@Override
public void run() {

       // init
       try {
      handler.init();
   } catch (Throwable e) {
          logger.error(e.getMessage(), e);
   }

   // execute
   while(!toStop){
      running = false;
      idleTimes++;

           TriggerParam triggerParam = null;
           ReturnT<String> executeResult = null;
           try {
         triggerParam = triggerQueue.poll(3L, TimeUnit.SECONDS);
         if (triggerParam!=null) {
            running = true;
            idleTimes = 0;
            triggerLogIdSet.remove(triggerParam.getLogId());

            // log filename, like "logPath/yyyy-MM-dd/9999.log"
            String logFileName = XxlJobFileAppender.makeLogFileName(new Date(triggerParam.getLogDateTime()), triggerParam.getLogId());
            XxlJobFileAppender.contextHolder.set(logFileName);
            ShardingUtil.setShardingVo(new ShardingUtil.ShardingVO(triggerParam.getBroadcastIndex(), triggerParam.getBroadcastTotal()));

            if (triggerParam.getExecutorTimeout() > 0) {
               // limit timeout
               Thread futureThread = null;
               try {
                  final TriggerParam triggerParamTmp = triggerParam;
                  FutureTask<ReturnT<String>> futureTask = new FutureTask<ReturnT<String>>(new Callable<ReturnT<String>>() {
                     @Override
                     public ReturnT<String> call() throws Exception {
                        return handler.execute(triggerParamTmp.getExecutorParams());
                     }
                  });
                  futureThread = new Thread(futureTask);
                  futureThread.start();

                  executeResult = futureTask.get(triggerParam.getExecutorTimeout(), TimeUnit.SECONDS);
               } catch (TimeoutException e) {
                  executeResult = new ReturnT<String>(IJobHandler.FAIL_TIMEOUT.getCode(), "job execute timeout ");
               } finally {
                  futureThread.interrupt();
               }
            } else {
               // just execute
               executeResult = handler.execute(triggerParam.getExecutorParams());
            }

            if (executeResult == null) {
               executeResult = IJobHandler.FAIL;
            } 

         } else {
            if (idleTimes > 30) {
               if(triggerQueue.size() == 0) { // avoid concurrent trigger causes jobId-lost
                  XxlJobExecutor.removeJobThread(jobId, "excutor idel times over limit.");
               }
            }
         }
      } catch (Throwable e) {

      } finally {
               if(triggerParam != null) {
                   // callback handler info
                   if (!toStop) {
                       // commonm
                       TriggerCallbackThread.pushCallBack(new HandleCallbackParam(triggerParam.getLogId(), triggerParam.getLogDateTime(), executeResult));
                   } else {
                       // is killed
                       ReturnT<String> stopResult = new ReturnT<String>(ReturnT.FAIL_CODE, stopReason + " [job running, killed]");
                       TriggerCallbackThread.pushCallBack(new HandleCallbackParam(triggerParam.getLogId(), triggerParam.getLogDateTime(), stopResult));
                   }
               }
           }
       }

对于存在超时时间的任务而言,额外创建一个线程,使用Future来实现超时任务等待,不存在超时时间的任务直接在当前线程执行即可. 对于任务执行完后的任务结果可能成功,可能失败,那么该结果必定要通知到调度器,再任务执行线程中,任务执行结果会丢到此处也是采用单独的线程来实现数据结果的上报(triggerCallbackThread);

任务结果回溯

任务执行结果采用单独线程批次上报,此处还设计了一套失败重试的方案.如下图所示

image.png

任务停止

当我们在调度中心管理界面中想停止某个执行器中正在执行的任务时.此时执行任务线程状态存在几种情况:

  • Runnable
  • 阻塞状态
  • 已执行完. 对于阻塞状态我们可以采用interupt()中断处于阻塞状态的线程;对于已执行完的任务,可以忽略;对于处于死循环的任务,我们无法干预这个任务,唯一的办法只有让这个
public ReturnT<String> kill(KillParam killParam) {
    // kill handlerThread, and create new one
    JobThread jobThread = XxlJobExecutor.loadJobThread(killParam.getJobId());
    if (jobThread != null) {
        XxlJobExecutor.removeJobThread(killParam.getJobId(), "scheduling center kill job.");
        return ReturnT.SUCCESS;
    }

    return new ReturnT<String>(ReturnT.SUCCESS_CODE, "job thread already killed.");
}

首先根据任务id获取执行任务的线程.接着尝试移除该任务线程.

ublic static JobThread removeJobThread(int jobId, String removeOldReason){
    JobThread oldJobThread = jobThreadRepository.remove(jobId);
    if (oldJobThread != null) {
        oldJobThread.toStop(removeOldReason);
        oldJobThread.interrupt();

        return oldJobThread;
    }
    return null;
}

任务线程的stop仅仅设置一个标记位.假设该任务由于其他原因处于死循环中,那么kill操作,无法终止该任务.

public void toStop(String stopReason) {
   /**
    * Thread.interrupt只支持终止线程的阻塞状态(wait、join、sleep),
    * 在阻塞出抛出InterruptedException异常,但是并不会终止运行的线程本身;
    * 所以需要注意,此处彻底销毁本线程,需要通过共享变量方式;
    */
   this.toStop = true;
   this.stopReason = stopReason;
}
任务死循环无法kill

在执行器中添加一个任务,在循环中打印当前时间戳.

@XxlJob("demoJobHandler")
public ReturnT<String> demoJobHandler(String param) throws Exception {
    XxlJobLogger.log("XXL-JOB, Hello World.");

    for (;;) {
System.out.println(System.currentTimeMillis());
    }
  //  return ReturnT.SUCCESS;
}

在调度中心启动任务后尝试终止任务.

image.png

image.png 执行器控制一直输出时间戳,任务并未终止;

image.png 那么问题来了,如何找到这个线程呢?

  1. 先找到这个进程 jps
  2. 根据进程找线程jstack pid "Thread-17" #65 prio=10 os_prio=2 tid=0x000000001923b000 nid=0x1878 runnable [0x00000000214ae000] java.lang.Thread.State: RUNNABLE at java.io.FileOutputStream.writeBytes(Native Method) at java.io.FileOutputStream.write(FileOutputStream.java:326) at java.io.BufferedOutputStream.flushBuffer(BufferedOutputStream.java:82) at java.io.BufferedOutputStream.flush(BufferedOutputStream.java:140) - locked <0x0000000081cf0180> (a java.io.BufferedOutputStream) at java.io.PrintStream.write(PrintStream.java:482) - locked <0x0000000081cf0160> (a java.io.PrintStream) at sun.nio.cs.StreamEncoder.writeBytes(StreamEncoder.java:221) at sun.nio.cs.StreamEncoder.implFlushBuffer(StreamEncoder.java:291) at sun.nio.cs.StreamEncoder.flushBuffer(StreamEncoder.java:104) - locked <0x0000000081cf02a8> (a java.io.OutputStreamWriter) at java.io.OutputStreamWriter.flushBuffer(OutputStreamWriter.java:185) at java.io.PrintStream.newLine(PrintStream.java:546) - locked <0x0000000081cf0160> (a java.io.PrintStream) at java.io.PrintStream.println(PrintStream.java:751) - locked <0x0000000081cf0160> (a java.io.PrintStream) at com.xxl.job.executor.service.jobhandler.SampleXxlJob.demoJobHandler(SampleXxlJob.java:43) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at com.xxl.job.core.handler.impl.MethodJobHandler.execute(MethodJobHandler.java:29) at com.xxl.job.core.thread.JobThread.run(JobThread.java:152)

结语: 三人行,必有我师焉!