携手创作,共同成长!这是我参与「掘金日新计划 · 8 月更文挑战」的第5天,点击查看活动详情
Xxl-job的架构思想
Xxl-job组成结构
- 执行器
- 调度中心
调度中心原理
调度中心承担着以下职责:
- 执行器列表维护及心跳检测
- 任务管理.
- 通知执行器任务执行.失败重试
- 执行日志记录
- 执行日志报表.
根据上述职责调度中心需要完成以下功能:
- 维护定时任务.
- 定时触发任务.
- 与执行器进行通信.
- 历史日志清理.
任务触发执行.
调度中心的核心逻辑由XxlJobScheduler实现.那我们就进入代码的世界来一探究竟:
public void init() throws Exception {
// init i18n 国际化
initI18n();
// admin registry monitor run 处理执行器心跳超时的节点进行下线
JobRegistryMonitorHelper.getInstance().start();
// admin fail-monitor run 失败重试+ 邮件告警
JobFailMonitorHelper.getInstance().start();
// admin lose-monitor run 任务结果丢失处理 任务执行超过指定时间 并且执行器下线
JobLosedMonitorHelper.getInstance().start();
// admin trigger pool start 初始化任务执行线程池
// 初始化 fast 和slow 快慢线程池
JobTriggerPoolHelper.toStart();
// admin log report start 数据统计
JobLogReportHelper.getInstance().start();
// start-schedule
JobScheduleHelper.getInstance().start();
logger.info(">>>>>>>>> init xxl-job admin success.");
}
- 国际化资源初始化.
- 下线执行器心跳超时的节点(90s)
- 失败重试+ 邮件告警
- 任务结果丢失处理.
- 初始化fast 和slow 快慢线程池
- 报表统计
- 触发定时任务执行通知
心跳检测
执行器与调度器间存在心跳检测,这么做的目的是为了避免任务由下线的执行器节点进行;执行器在启动后会有一个单独的线程每隔30s向调度中心集群的某一个地址进行注册,只要有一个成功,那么后续集群地址就不在触发.
JobRegistryMonitorHelper
public void start(){
registryThread = new Thread(new Runnable() {
@Override
public void run() {
while (!toStop) {
try {
// auto registry group 自动注册的group
List<XxlJobGroup> groupList = XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().findByAddressType(0);
if (groupList!=null && !groupList.isEmpty()) {
// remove dead address (admin/executor) 90秒未心跳的数据
List<Integer> ids = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findDead(RegistryConfig.DEAD_TIMEOUT, new Date());
if (ids!=null && ids.size()>0) {
XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().removeDead(ids);
}
// fresh online address (admin/executor) 查找存在的数据
// appName 地址列表
HashMap<String, List<String>> appAddressMap = new HashMap<String, List<String>>();
List<XxlJobRegistry> list = XxlJobAdminConfig.getAdminConfig().getXxlJobRegistryDao().findAll(RegistryConfig.DEAD_TIMEOUT, new Date());
if (list != null) {
for (XxlJobRegistry item: list) {
if (RegistryConfig.RegistType.EXECUTOR.name().equals(item.getRegistryGroup())) {
String appname = item.getRegistryKey();
List<String> registryList = appAddressMap.get(appname);
if (registryList == null) {
registryList = new ArrayList<String>();
}
if (!registryList.contains(item.getRegistryValue())) {
registryList.add(item.getRegistryValue());
}
appAddressMap.put(appname, registryList);
}
}
}
// fresh group address
for (XxlJobGroup group: groupList) {
List<String> registryList = appAddressMap.get(group.getAppname());
String addressListStr = null;
if (registryList!=null && !registryList.isEmpty()) {
Collections.sort(registryList);
addressListStr = "";
for (String item:registryList) {
addressListStr += item + ",";
}
addressListStr = addressListStr.substring(0, addressListStr.length()-1);
}
group.setAddressList(addressListStr);
// 更新地址列表
XxlJobAdminConfig.getAdminConfig().getXxlJobGroupDao().update(group);
}
}
} catch (Exception e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
}
}
try {
TimeUnit.SECONDS.sleep(RegistryConfig.BEAT_TIMEOUT);
} catch (InterruptedException e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job registry monitor thread error:{}", e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, job registry monitor thread stop");
}
});
registryThread.setDaemon(true);
registryThread.setName("xxl-job, admin JobRegistryMonitorHelper");
registryThread.start();
}
以上代码主要做了几件事情:
- 移除超过90s未续约的注册信息
xxl_job_registry - 将
xxl_job_registry中心跳正常的地址信息更新到表xxl_job_group中的address_list(多个用逗号隔开) - 每隔90s(心跳超时时间)进行循环执行
失败重试 JobFailMonitorHelper.getInstance().start();
在触发任务执行时会将任务的执行重试次数绑定到日志xxl_job_log中的executor_fail_retry_count字段上. 对于执行失败的任务进行邮件告警.
public void start(){
monitorThread = new Thread(new Runnable() {
@Override
public void run() {
// monitor
while (!toStop) {
try {
List<Long> failLogIds = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().findFailJobLogIds(1000);
if (failLogIds!=null && !failLogIds.isEmpty()) {
for (long failLogId: failLogIds) {
// lock log
int lockRet = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, 0, -1);
if (lockRet < 1) {
continue;
}
XxlJobLog log = XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().load(failLogId);
XxlJobInfo info = XxlJobAdminConfig.getAdminConfig().getXxlJobInfoDao().loadById(log.getJobId());
// 1、fail retry monitor
if (log.getExecutorFailRetryCount() > 0) {
JobTriggerPoolHelper.trigger(log.getJobId(), TriggerTypeEnum.RETRY, (log.getExecutorFailRetryCount()-1), log.getExecutorShardingParam(), log.getExecutorParam(), null);
String retryMsg = "<br><br><span style="color:#F39C12;" > >>>>>>>>>>>"+ I18nUtil.getString("jobconf_trigger_type_retry") +"<<<<<<<<<<< </span><br>";
log.setTriggerMsg(log.getTriggerMsg() + retryMsg);
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateTriggerInfo(log);
}
// 2、fail alarm monitor
int newAlarmStatus = 0; // 告警状态:0-默认、-1=锁定状态、1-无需告警、2-告警成功、3-告警失败
if (info!=null ) {
boolean alarmResult = XxlJobAdminConfig.getAdminConfig().getJobAlarmer().alarm(info, log);
newAlarmStatus = alarmResult?2:3;
} else {
newAlarmStatus = 1;
}
XxlJobAdminConfig.getAdminConfig().getXxlJobLogDao().updateAlarmStatus(failLogId, -1, newAlarmStatus);
}
}
} catch (Exception e) {
if (!toStop) {
logger.error(">>>>>>>>>>> xxl-job, job fail monitor thread error:{}", e);
}
}
try {
TimeUnit.SECONDS.sleep(10);
} catch (Exception e) {
if (!toStop) {
logger.error(e.getMessage(), e);
}
}
}
logger.info(">>>>>>>>>>> xxl-job, job fail monitor thread stop");
}
});
monitorThread.setDaemon(true);
monitorThread.setName("xxl-job, admin JobFailMonitorHelper");
monitorThread.start();
}
执行结果丢失补偿 JobLosedMonitorHelper.getInstance().start();
假设触发某个节点执行任务时,该节点由于其他原因宕机了,导致无法获取执行结果;从而导致任务进度未知.此处功能是将调度记录停留在 "运行中" 状态超过10min,且对应执行器心跳注册失败不在线,则将本地调度主动标记失败;
快慢线程池初始化 JobTriggerPoolHelper.toStart();
在Xxl-job中调度器触发执行器的任务执行时是通过线程池进行,由于线程池资源非常宝贵,不能由于某些耗时非常长的任务长时间占有线程池资源.此处对执行耗时的任务进行了快慢线程池划分.fastTriggerPool称为快线程池,slowTriggerPool称为慢线程池.
public void start(){
fastTriggerPool = new ThreadPoolExecutor(
10,
XxlJobAdminConfig.getAdminConfig().getTriggerPoolFastMax(),
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(1000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-fastTriggerPool-" + r.hashCode());
}
});
slowTriggerPool = new ThreadPoolExecutor(
10,
XxlJobAdminConfig.getAdminConfig().getTriggerPoolSlowMax(),
60L,
TimeUnit.SECONDS,
new LinkedBlockingQueue<Runnable>(2000),
new ThreadFactory() {
@Override
public Thread newThread(Runnable r) {
return new Thread(r, "xxl-job, admin JobTriggerPoolHelper-slowTriggerPool-" + r.hashCode());
}
});
}
那么什么样的任务定义为慢任务呢?
// choose thread pool
ThreadPoolExecutor triggerPool_ = fastTriggerPool;
AtomicInteger jobTimeoutCount = jobTimeoutCountMap.get(jobId);
if (jobTimeoutCount!=null && jobTimeoutCount.get() > 10) { // job-timeout 10 times in 1 min
triggerPool_ = slowTriggerPool;
}
每1分钟进行一次统计,耗时超过500ms的即认定为慢任务;
// incr timeout-count-map
long cost = System.currentTimeMillis()-start;
if (cost > 500) { // ob-timeout threshold 500ms
AtomicInteger timeoutCount = jobTimeoutCountMap.putIfAbsent(jobId, new AtomicInteger(1));
if (timeoutCount != null) {
timeoutCount.incrementAndGet();
}
}
此处的耗时并不代表具体任务执行,仅代码执行器收到某个任务要执行的通知耗时为准.因为执行器每个任务都对应一个单独的线程JobThread.
任务触发
对于任务执行而言,调度中心相当于client,执行器相当于具体服务;所以此处的ExecutorBiz为ExecutorBizClient
public static ReturnT<String> runExecutor(TriggerParam triggerParam, String address){
ReturnT<String> runResult = null;
try {
ExecutorBiz executorBiz = XxlJobScheduler.getExecutorBiz(address);
runResult = executorBiz.run(triggerParam);
} catch (Exception e) {
logger.error(">>>>>>>>>>> xxl-job trigger error, please check if the executor[{}] is running.", address, e);
runResult = new ReturnT<String>(ReturnT.FAIL_CODE, ThrowableUtil.toString(e));
}
StringBuffer runResultSB = new StringBuffer(I18nUtil.getString("jobconf_trigger_run") + ":");
runResultSB.append("<br>address:").append(address);
runResultSB.append("<br>code:").append(runResult.getCode());
runResultSB.append("<br>msg:").append(runResult.getMsg());
runResult.setMsg(runResultSB.toString());
return runResult;
}
调度中心与执行器的通信
由于调度中心与执行器不在同一个jvm内,那么两者间必然存在通信; 对于通信方式额选择存在很多种:
- http
- tcp
- rpc
在Xxl-job版本演变过程有两种实现方式,分别为Xxl-rpc和RESTful API.在2.2.0版本以前采用的是Xxl-rpc实现方式,在2.2.0版本及以后采用语言无关的RESTful API.
Xxl-rpc
一种基于tcp协议进行通信,序列化方式采用Hessian来实现的,提供稳定高性能的RPC远程服务调用功能,简化分布式服务通讯开发。
架构图
涉及的几个角色:
- provider:服务提供方;
- invoker:服务消费方;
- serializer: 序列化模块;
- remoting:服务通讯模块;
- registry:服务注册中心;
- admin:服务治理、监控中心:管理服务节点信息,统计服务调用次数、QPS和健康情况;
XXL-RPC采用NIO进行底层通讯,但是NIO是异步通讯模型,调用线程并不会阻塞获取调用结果,因此,XXL-RPC实现了在异步通讯模型上的同步调用,即“sync-over-async”,实现原理如下,可参考上图进行理解:
- 1、每次请求会生成一个唯一的RequestId和一个RpcResponse,托管到请求池中。
- 2、调度线程,执行RpcResponse的get方法阻塞获取本次请求结果;
- 3、然后,底层通过NIO方式发起调用,provider异步响应请求结果,然后根据RequestId寻找到本次调用的RpcResponse,设置响应结果后唤醒调度线程。
- 4、调度线程被唤醒,返回异步响应的请求数据。
对于这种异步转同步的实现方案,我们再dubbo中已经见过多次了,这种实现方案对应Guarded Suspension模式.
Restful
对于Xxl-job而言只需要简短的通过Http协议进行网络请求调用执行器(执行器内部基于netty实现服务监听);
public static ReturnT postBody(String url, String accessToken, int timeout, Object requestObj, Class returnTargClassOfT) {
HttpURLConnection connection = null;
BufferedReader bufferedReader = null;
try {
// connection
URL realUrl = new URL(url);
connection = (HttpURLConnection) realUrl.openConnection();
// trust-https
boolean useHttps = url.startsWith("https");
if (useHttps) {
HttpsURLConnection https = (HttpsURLConnection) connection;
trustAllHosts(https);
}
// connection setting
connection.setRequestMethod("POST");
connection.setDoOutput(true);
connection.setDoInput(true);
connection.setUseCaches(false);
connection.setReadTimeout(timeout * 1000);
connection.setConnectTimeout(3 * 1000);
connection.setRequestProperty("connection", "Keep-Alive");
connection.setRequestProperty("Content-Type", "application/json;charset=UTF-8");
connection.setRequestProperty("Accept-Charset", "application/json;charset=UTF-8");
if(accessToken!=null && accessToken.trim().length()>0){
connection.setRequestProperty(XXL_JOB_ACCESS_TOKEN, accessToken);
}
// do connection
connection.connect();
// write requestBody
if (requestObj != null) {
String requestBody = GsonTool.toJson(requestObj);
DataOutputStream dataOutputStream = new DataOutputStream(connection.getOutputStream());
dataOutputStream.write(requestBody.getBytes("UTF-8"));
dataOutputStream.flush();
dataOutputStream.close();
}
/*byte[] requestBodyBytes = requestBody.getBytes("UTF-8");
connection.setRequestProperty("Content-Length", String.valueOf(requestBodyBytes.length));
OutputStream outwritestream = connection.getOutputStream();
outwritestream.write(requestBodyBytes);
outwritestream.flush();
outwritestream.close();*/
// valid StatusCode
int statusCode = connection.getResponseCode();
if (statusCode != 200) {
return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting fail, StatusCode("+ statusCode +") invalid. for url : " + url);
}
// result
bufferedReader = new BufferedReader(new InputStreamReader(connection.getInputStream()));
StringBuilder result = new StringBuilder();
String line;
while ((line = bufferedReader.readLine()) != null) {
result.append(line);
}
String resultJson = result.toString();
// parse returnT
try {
ReturnT returnT = GsonTool.fromJson(resultJson, ReturnT.class, returnTargClassOfT);
return returnT;
} catch (Exception e) {
logger.error("xxl-rpc remoting (url="+url+") response content invalid("+ resultJson +").", e);
return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting (url="+url+") response content invalid("+ resultJson +").");
}
} catch (Exception e) {
logger.error(e.getMessage(), e);
return new ReturnT<String>(ReturnT.FAIL_CODE, "xxl-rpc remoting error("+ e.getMessage() +"), for url : " + url);
} finally {
try {
if (bufferedReader != null) {
bufferedReader.close();
}
if (connection != null) {
connection.disconnect();
}
} catch (Exception e2) {
logger.error(e2.getMessage(), e2);
}
}
}
上述代码比较简单,构建网络请求,并进行数据读写.
结语: 不积跬步无以至千里!