1.注册中心数据结构(数据来源于官网)
注册中心在定义的命名空间下,创建作业名称节点,用于区分不同作业,所以作业一旦创建则不能修改作业名称,如果修改名称将视为新的作业。作业名称节点下又包含4个数据子节点,分别是config, instances, sharding, servers和leader。
config节点
作业配置信息,以JSON格式存储
instances节点
作业运行实例信息,子节点是当前作业运行实例的主键。作业运行实例主键由作业运行服务器的IP地址和PID构成。作业运行实例主键均为临时节点,当作业实例上线时注册,下线时自动清理。注册中心监控这些节点的变化来协调分布式作业的分片以及高可用。 可在作业运行实例节点写入TRIGGER表示该实例立即执行一次。
sharding节点
作业分片信息,子节点是分片项序号,从零开始,至分片总数减一。分片项序号的子节点存储详细信息。每个分片项下的子节点用于控制和记录分片运行状态。节点详细信息说明:
可以直接在ShardingNode中查看:
servers节点
作业服务器信息,子节点是作业服务器的IP地址。可在IP地址节点写入DISABLED表示该服务器禁用。 在新的cloud native架构下,servers节点大幅弱化,仅包含控制服务器是否可以禁用这一功能。为了更加纯粹的实现job核心,servers功能未来可能删除,控制服务器是否禁用的能力应该下放至自动化部署系统。
leader节点
作业服务器主节点信息,分为election,sharding和failover三个子节点。分别用于主节点选举,分片和失效转移处理。leader节点是内部使用的节点,如果对作业框架原理不感兴趣,可不关注此节点。
2.JOB启动源码分析
这里用一个官网上面的例子来展开对Elastic-job源码的解析:
public class JobDemo {
public static void main(String[] args) {
new JobScheduler(createRegistryCenter(), createJobConfiguration()).init(); // 初始化任务调度器.
}
private static CoordinatorRegistryCenter createRegistryCenter() {
CoordinatorRegistryCenter regCenter = new ZookeeperRegistryCenter(new ZookeeperConfiguration("zk_host:2181", "elastic-job-demo"));
regCenter.init(); // 初始化zk注册中心.
return regCenter;
}
private static LiteJobConfiguration createJobConfiguration() {
// 定义作业核心配置
JobCoreConfiguration simpleCoreConfig = JobCoreConfiguration.newBuilder("demoSimpleJob", "0/15 * * * * ?", 10).build();
// 定义SIMPLE类型配置
SimpleJobConfiguration simpleJobConfig = new SimpleJobConfiguration(simpleCoreConfig, "com.test.SimpleDemoJob");
return LiteJobConfiguration.newBuilder(simpleJobConfig).build();
}
}
这里重点看下JobScheduler的init方法.
/**
* 初始化作业.
*/
public void init() {
LiteJobConfiguration liteJobConfigFromRegCenter = schedulerFacade.updateJobConfiguration(liteJobConfig); // 修改任务配置.
JobRegistry.getInstance().setCurrentShardingTotalCount(liteJobConfigFromRegCenter.getJobName(), liteJobConfigFromRegCenter.getTypeConfig().getCoreConfig().getShardingTotalCount());
JobScheduleController jobScheduleController = new JobScheduleController(
createScheduler(), createJobDetail(liteJobConfigFromRegCenter.getTypeConfig().getJobClass()), liteJobConfigFromRegCenter.getJobName());
JobRegistry.getInstance().registerJob(liteJobConfigFromRegCenter.getJobName(), jobScheduleController, regCenter);
schedulerFacade.registerStartUpInfo(!liteJobConfigFromRegCenter.isDisabled()); // 2.1
jobScheduleController.scheduleJob(liteJobConfigFromRegCenter.getTypeConfig().getCoreConfig().getCron()); // 2.2
}
上面的JobScheduleController就是对quartz代码的封装类.createScheduler()方法即是调用quartz的代码创建QuartzScheduler的对象.createJobDetail()方法则是将elastic的LiteJob封装为quartz支持的JobDetail对象.
重点1
registerStartUpInfo是重点方法,下面是它的实现.
/**
* 注册作业启动信息.
*
* @param enabled 作业是否启用
*/
public void registerStartUpInfo(final boolean enabled) {
listenerManager.startAllListeners();
leaderService.electLeader(); // 1.选举leader节点.
serverService.persistOnline(enabled); // 持久化server节点.
instanceService.persistOnline(); // 持久化instance节点.
shardingService.setReshardingFlag(); // 设置是否需要分片的标志,后面代码中会用到.
monitorService.listen(); // 初始化作业监控服务.
if (!reconcileService.isRunning()) {
reconcileService.startAsync();
}
}
1.选举主节点(leader) 下面是选举的代码,只有一行.
/**
* 选举主节点.
*/
public void electLeader() {
log.debug("Elect a new leader now.");
jobNodeStorage.executeInLeader(LeaderNode.LATCH, new LeaderElectionExecutionCallback());
log.debug("Leader election completed.");
}
进入executeInLeader方法可以看到,主要是调用了callback的execute()方法.
/**
* 在主节点执行操作.
*
* @param latchNode 分布式锁使用的作业节点名称
* @param callback 执行操作的回调
*/
public void executeInLeader(final String latchNode, final LeaderExecutionCallback callback) {
try (LeaderLatch latch = new LeaderLatch(getClient(), jobNodePath.getFullPath(latchNode))) {
latch.start();
latch.await();
callback.execute();
//CHECKSTYLE:OFF
} catch (final Exception ex) {
//CHECKSTYLE:ON
handleException(ex);
}
}
这里的callback是LeaderElectionExecutionCallback,如下.
@RequiredArgsConstructor
class LeaderElectionExecutionCallback, implements LeaderExecutionCallback{
@Override
public void execute() {
if (!hasLeader()) {
jobNodeStorage.fillEphemeralJobNode(LeaderNode.INSTANCE,JobRegistry.getInstance().getJobInstance(jobName).getJobInstnceId());
}
}
}
综合上面的代码可以看出,选举的过程就是各个客户端去获取分布式锁,谁先将自己的instanceId写入到leader节点中,那么谁就是主节点(master).
2.2
/**
* 调度作业.
*
* @param cron CRON表达式
*/
public void scheduleJob(final String cron) {
try {
// 检查job是否存在,若存在,就直接调度.
if (!scheduler.checkExists(jobDetail.getKey())) {
scheduler.scheduleJob(jobDetail, createTrigger(cron));
}
// 若不存在,则执行start操作.
scheduler.start();
} catch (final SchedulerException ex) {
throw new JobSystemException(ex);
}
}
这里的任务调度是委托给scheduler来执行的,而scheduler最终又是委托QuartzScheduler来进行调度,下面是QuartzScheduler的构造函数.
/**
* <p>
* Create a <code>QuartzScheduler</code> with the given configuration
* properties.
* </p>
*
* @see QuartzSchedulerResources
*/
public QuartzScheduler(QuartzSchedulerResources resources, longidleWaitTime, @Deprecated long dbRetryInterval)
throws SchedulerException {
this.resources = resources;
if (resources.getJobStore() instanceof JobListener) {
addInternalJobListener((JobListener)resources.getJobStore());
}
this.schedThread = new QuartzSchedulerThread(this, resources);
ThreadExecutor schedThreadExecutor = resources.getThreadExecutor(); // 2.3
schedThreadExecutor.execute(this.schedThread);
if (idleWaitTime > 0) {
this.schedThread.setIdleWaitTime(idleWaitTime);
}
jobMgr = new ExecutingJobsManager();
addInternalJobListener(jobMgr);
errLogger = new ErrorLogger();
addInternalSchedulerListener(errLogger);
signaler = new SchedulerSignalerImpl(this, this.schedThread);
if(shouldRunUpdateCheck())
updateTimer = scheduleUpdateCheck();
else
updateTimer = null;
getLog().info("Quartz Scheduler v." + getVersion() + " created.");
}
这里不关注quarts是如何实现scheduleJob和start方法.
2.3
先看下QuartzSchedulerThread类,它是一个线程.这里不关注run方法中的具体实现,只需要关注下面的代码.
JobRunShell shell = null;
try {
shell = qsRsrcs.getJobRunShellFactory().createJobRunShell(bndle);
shell.initialize(qs);
} catch (SchedulerException se) {
qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR);
continue;
}
if (qsRsrcs.getThreadPool().runInThread(shell) == false) {
// this case should never happen, as it is indicative of the
// scheduler being shutdown or a bug in the thread pool or
// a thread pool being used concurrently - which the docs
// say not to do...
getLog().error("ThreadPool.runInThread() return false!");
qsRsrcs.getJobStore().triggeredJobComplete(triggers.get(i), bndle.getJobDetail(), CompletedExecutionInstruction.SET_ALL_JOB_TRIGGERS_ERROR);
}
JobRunShell也是一个线程.并且在随后的代码中被放入线程池中被执行,在该类中最终会调用到elastic-job所定义的job.
public void run() {
qs.addInternalSchedulerListener(this);
try {
...
// execute the job
try {
log.debug("Calling execute on job " + jobDetail.getKey());
job.execute(jec); // 执行任务.
endTime = System.currentTimeMillis();
} catch (JobExecutionException jee) {
endTime = System.currentTimeMillis();
jobExEx = jee;
getLog().info("Job " + jobDetail.getKey() +
" threw a JobExecutionException: ", jobExEx);
} catch (Throwable e) {
endTime = System.currentTimeMillis();
getLog().error("Job " + jobDetail.getKey() +
" threw an unhandled Exception: ", e);
SchedulerException se = new SchedulerException(
"Job threw an unhandled exception.", e);
qs.notifySchedulerListenersError("Job ("
+ jec.getJobDetail().getKey()
+ " threw an exception.", se);
jobExEx = new JobExecutionException(se, false);
}
...
} finally {
qs.removeInternalSchedulerListener(this);
}
}
上面代码中的job即为我们之前定义的SimpleDemoJob.到这里,elasticjob的启动,以及如何与quartz关联起来的逻辑就已经很清晰了.
这里的execute方法,调用的是com.dangdang.ddframe.job.lite.internal.schedule.LiteJob的execute方法.
3.JOB执行流程.
Job执行的起始点即为上面所说的com.dangdang.ddframe.job.lite.internal.schedule.LiteJob#execute方法.
/**
* Lite调度作业.
*
* @author zhangliang
*/
public final class LiteJob implements Job {
@Setter
private ElasticJob elasticJob;
@Setter
private JobFacade jobFacade;
@Override
public void execute(final JobExecutionContext context) throws JobExecutionException {
JobExecutorFactory.getJobExecutor(elasticJob, jobFacade).execute();
}
}
ElasticJob有三种类型,分别为DataflowJob、ScriptJob、SimpleJob,一般使用的未SimpleJob.这三种任务对应了三种任务执行器,如下所示.
/**
* 获取作业执行器.
*
* @param elasticJob 分布式弹性作业
* @param jobFacade 作业内部服务门面服务
* @return 作业执行器
*/
@SuppressWarnings("unchecked")
public static AbstractElasticJobExecutor getJobExecutor(final ElasticJob elasticJob, final JobFacade jobFacade) {
if (null == elasticJob) {
return new ScriptJobExecutor(jobFacade); // 脚本分布式任务执行器
}
if (elasticJob instanceof SimpleJob) {
return new SimpleJobExecutor((SimpleJob) elasticJob, jobFacade); // 简单分布式任务执行器.
}
if (elasticJob instanceof DataflowJob) {
return new DataflowJobExecutor((DataflowJob) elasticJob, jobFacade); // 流式分布式任务执行器.
}
throw new JobConfigurationException("Cannot support job type '%s'", elasticJob.getClass().getCanonicalName());
}
这三种执行器都继承自AbstractElasticJobExecutor,所以在LiteJob的execute方法中调用的其实是AbstractElasticJobExecutor的execute方法.
/**
* 执行作业.
*/
public final void execute() {
try {
jobFacade.checkJobExecutionEnvironment(); // 3.1
} catch (final JobExecutionEnvironmentException cause) {
jobExceptionHandler.handleException(jobName, cause);
}
ShardingContexts shardingContexts = jobFacade.getShardingContexts(); // 3.2
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(shardingContexts.getTaskId(),State.TASK_STAGING, String.format("Job '%s' execute begin.",jobName));
}
if (jobFacade.misfireIfRunning(shardingContexts.getShardingItemParamters().keySet())) {
if (shardingContexts.isAllowSendJobEvent()) {
jobFacade.postJobStatusTraceEvent(shardingContexts.getTaskId, State.TASK_FINISHED, String.format(
"Previous job '%s' - shardingItems '%s' is stillrunning, misfired job will start after previous jobcompleted.", jobName,
shardingContexts.getShardingItemParameters().keySet());
}
return;
}
try {
jobFacade.beforeJobExecuted(shardingContexts);
//CHECKSTYLE:OFF
} catch (final Throwable cause) {
//CHECKSTYLE:ON
jobExceptionHandler.handleException(jobName, cause);
}
execute(shardingContexts,JobExecutionEvent.ExecutionSource.NORMAL_TRIGGER);
while (jobFacade.isExecuteMisfired(shardingContexts.getShardingItemPrameters().keySet())) {
jobFacade.clearMisfire(shardingContexts.getShardingItemParameter().keySet());
execute(shardingContexts,JobExecutionEvent.ExecutionSource.MISFIRE);
}
jobFacade.failoverIfNecessary();
try {
jobFacade.afterJobExecuted(shardingContexts);
//CHECKSTYLE:OFF
} catch (final Throwable cause) {
//CHECKSTYLE:ON
jobExceptionHandler.handleException(jobName, cause);
}
}