kylin源码解读第三篇 job任务运行
任务调度服务的服务类为JobService,包路径:org.apache.kylin.rest.service.JobService。
JobService是通过实现InitializingBean接口,继而实现afterPropertiesSet的方法,然后通过配置spring加载bean的方式被初始化的;具体是通过配置文件来装配bean的,涉及到的配置文件有:在./tomcat/webapps/kylin/WEB-INF/web.xml中引入了./tomcat/webapps/kylin/WEB-INF/classes/applicationContext.xml,然后在applicationContext.xml中配置有:
<context:component-scan base-package="org.apache.kylin.rest"/>
然后spring去扫描目录org.apache.kylin.rest下的标有@Component的类,并注册成bean。由于JobService是通过实现InitializingBean接口,继而实现afterPropertiesSet的方法来初始化bean的,所以在JobService这个bean被初始化的时候,afterPropertiesSet会被调用执行,继而实现JobService的初始化,kylin中的其他服务也是这要被初始化的。
public void afterPropertiesSet() throws Exception {
String timeZone = getConfig().getTimeZone();
TimeZone tzone = TimeZone.getTimeZone(timeZone);
TimeZone.setDefault(tzone);
final KylinConfig kylinConfig = KylinConfig.getInstanceFromEnv();
//获取配置的任务调度器,默认为org.apache.kylin.job.impl.threadpool.DefaultScheduler
final Scheduler<AbstractExecutable> scheduler = (Scheduler<AbstractExecutable>) SchedulerFactory
.scheduler(kylinConfig.getSchedulerType());
new Thread(new Runnable() {
@Override
public void run() {
try {
//调度服务初始化
scheduler.init(new JobEngineConfig(kylinConfig), new ZookeeperJobLock());
if (!scheduler.hasStarted()) {
logger.info("scheduler has not been started");
}
} catch (Exception e) {
throw new RuntimeException(e);
}
}
}).start();
Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
@Override
public void run() {
try {
scheduler.shutdown();
} catch (SchedulerException e) {
logger.error("error occurred to shutdown scheduler", e);
}
}
}));
}
上边代码的那个初始化,就是scheduler就是一个调度线程。
public Map<Integer, String> getSchedulers() {
Map<Integer, String> r = Maps.newLinkedHashMap();
r.put(0, "org.apache.kylin.job.impl.threadpool.DefaultScheduler");
r.put(2, "org.apache.kylin.job.impl.threadpool.DistributedScheduler");
r.put(77, "org.apache.kylin.job.impl.threadpool.NoopScheduler");
r.putAll(convertKeyToInteger(getPropertiesByPrefix("kylin.job.scheduler.provider.")));
return r;
}
kylin的scheduler调度器有三种,第一种是默认的。后边两种要自己看一下。而默认调度器的选择是使用 kylin.job.scheduler.default 这个配置项来确认的,默认值就是0
public synchronized void init(JobEngineConfig jobEngineConfig, JobLock lock) throws SchedulerException {
jobLock = lock;
String serverMode = jobEngineConfig.getConfig().getServerMode();
//只有服务模式为job和all的需要运行任务调度服务,query不需要
if (!("job".equals(serverMode.toLowerCase()) || "all".equals(serverMode.toLowerCase()))) {
logger.info("server mode: " + serverMode + ", no need to run job scheduler");
return;
}
logger.info("Initializing Job Engine ....");
if (!initialized) {
initialized = true;
} else {
return;
}
this.jobEngineConfig = jobEngineConfig;
if (jobLock.lockJobEngine() == false) {
throw new IllegalStateException("Cannot start job scheduler due to lack of job lock");
}
executableManager = ExecutableManager.getInstance(jobEngineConfig.getConfig());
//load all executable, set them to a consistent status
fetcherPool = Executors.newScheduledThreadPool(1);
int corePoolSize = jobEngineConfig.getMaxConcurrentJobLimit();
jobPool = new ThreadPoolExecutor(corePoolSize, corePoolSize, Long.MAX_VALUE, TimeUnit.DAYS,
new SynchronousQueue<Runnable>());
context = new DefaultContext(Maps.<String, Executable> newConcurrentMap(), jobEngineConfig.getConfig());
logger.info("Staring resume all running jobs.");
executableManager.resumeAllRunningJobs();
logger.info("Finishing resume all running jobs.");
//获取调度时间间隔,
int pollSecond = jobEngineConfig.getPollIntervalSecond();
logger.info("Fetching jobs every {} seconds", pollSecond);
JobExecutor jobExecutor = new JobExecutor() {
@Override
public void execute(AbstractExecutable executable) {
jobPool.execute(new JobRunner(executable));
}
};
//判断任务调度是否考虑优先级,默认不考虑,即使用DefaultFetcherRunner
fetcher = jobEngineConfig.getJobPriorityConsidered()
? new PriorityFetcherRunner(jobEngineConfig, context, executableManager, jobExecutor)
: new DefaultFetcherRunner(jobEngineConfig, context, executableManager, jobExecutor);
logger.info("Creating fetcher pool instance:" + System.identityHashCode(fetcher));
//每隔pollSecond去获取一次任务
fetcherPool.scheduleAtFixedRate(fetcher, pollSecond / 10, pollSecond, TimeUnit.SECONDS);
hasStarted = true;
}
这是任务调度器的初始化过程,最后一步就是代表该任务每隔多久去定时获取job。 下面间隔性的执行DefaultFetcherRunner的run方法:
synchronized public void run() {
try (SetThreadName ignored = new SetThreadName(//
"FetcherRunner %s", System.identityHashCode(this))) {//
// logger.debug("Job Fetcher is running...");
Map<String, Executable> runningJobs = context.getRunningJobs();
// 任务调度池是否满了,默认只能同时执行10个job
if (isJobPoolFull()) {
return;
}
......
//获取索引的job
for (final String id : executableManager.getAllJobIds()) {
......
//根据任务id获取具体的任务
final AbstractExecutable executable = executableManager.getJob(id);
......
//添加任务到任务调度池
addToJobPool(executable, executable.getDefaultPriority());
}
......
}
}
主要看下是从哪获取到的所有的job,上面是调用executableManager.getAllJobIds()来获取所有的任务id的,下面看下这个函数:
public List<String> getJobIds() throws PersistentException {
try {
NavigableSet<String> resources = store.listResources(ResourceStore.EXECUTE_RESOURCE_ROOT);
if (resources == null) {
return Collections.emptyList();
}
ArrayList<String> result = Lists.newArrayListWithExpectedSize(resources.size());
for (String path : resources) {
result.add(path.substring(path.lastIndexOf("/") + 1));
}
return result;
} catch (IOException e) {
logger.error("error get all Jobs:", e);
throw new PersistentException(e);
}
}
store.listResources 到存储kylin元数据的数据库获取以“/execute”开始的元数据条目,然后截取出任务的id,接着调用executableManager.getJob(id)来获取具体的任务信息,依然是到存储kylin元数据的数据库中获取
protected void addToJobPool(AbstractExecutable executable, int priority) {
String jobDesc = executable.toString();
logger.info(jobDesc + " prepare to schedule and its priority is " + priority);
try {
context.addRunningJob(executable);
//提交任务到调度池中执行
jobExecutor.execute(executable);
logger.info(jobDesc + " scheduled");
} catch (Exception ex) {
context.removeRunningJob(executable);
logger.warn(jobDesc + " fail to schedule", ex);
}
}
回到DefaultScheduler中的init函数中的jobExecutor,最终调用JobRunner的run方法来执行任务,主要是调用executable.execute(context),kylin中的具体任务都是继承类AbstractExecutable,如果重写了execute方法,就调用具体任务的execute方法来执行相应的任务,如果未重写execute方法,则调用AbstractExecutable中的execute方法,然后调用doWork来执行任务,spark的相关任务的任务类型是SparkExecutable,该类继承自AbstractExecutable,自己实现了doWork方法来提交spark任务,spark任务提交运行的主类为SparkEntry,调用main方法,然后调用AbstractApplication的execute方法,最后调用具体任务类的execute方法运行。上面就是kylin中任务调度的相关代码,下面看下任务是怎么提交到任务调度服务的。
任务提交最终要调用到JobService中submitJobInternal方法,这个方法中最终调用getExecutableManager().addJob(job)来提交任务(这里的job是一个DefaultChainedExecutable的实例,里面包含各种Executable类型的task),这里的getExecutableManager获取了ExecutableManager的单例,然后调用addJob来提交任务,然后调用executableDao.addJob(parse(executable)),接着调用writeJobResource(pathOfJob(job), job)将job信息序列化后存入元数据数据库表中。
JobRunner的run方法 如下:
public void run() {
try (SetThreadName ignored = new SetThreadName("Scheduler %s Job %s",
System.identityHashCode(DefaultScheduler.this), executable.getId())) {
/* 这个任务是job级别 */
executable.execute(context);
} catch (ExecuteException e) {
logger.error("ExecuteException job:" + executable.getId(), e);
} catch (Exception e) {
logger.error("unknown error execute job:" + executable.getId(), e);
} finally {
context.removeRunningJob(executable);
}
// trigger the next step asap
fetcherPool.schedule(fetcher, 0, TimeUnit.SECONDS);
}
会调用 executable 的 execute方法,而executable 是 AbstractExecutable,execute方法如下:
public final ExecuteResult execute(ExecutableContext executableContext) throws ExecuteException {
logger.info("Executing AbstractExecutable (" + this.getName() + ")");
Preconditions.checkArgument(executableContext instanceof DefaultContext);
ExecuteResult result = null;
try {
/**
* 注意这四个方法,代表了执行的先后顺序
*/
onExecuteStart(executableContext);
Throwable exception;
do {
if (retry > 0) {
logger.info("Retry " + retry);
}
exception = null;
result = null;
try {
result = doWork(executableContext);
} catch (Throwable e) {
logger.error("error running Executable: " + this.toString());
exception = e;
}
retry++;
} while (needRetry(this.retry, exception)); //exception in ExecuteResult should handle by user itself.
if (exception != null) {
onExecuteError(exception, executableContext);
throw new ExecuteException(exception);
}
onExecuteFinishedWithRetry(result, executableContext);
} catch (ExecuteException e) {
throw e;
} catch (Exception e) {
throw new ExecuteException(e);
}
return result;
}
在这里可以看到会依次执行四个方法:1、onExecuteStart 2、doWork 这个是核心任务执行方法 3、onExecuteFinishedWithRetry 4、onExecuteError 如果出错会执行此方法。
以上就是一个大致的主要逻辑。有疑问随时给我留言,谢谢各位。