Kylin--job任务运行（源码解读）任务调度服务的服务类为JobService，包路径：org.apache.kyl

kylin源码解读第三篇 job任务运行

任务调度服务的服务类为JobService，包路径：org.apache.kylin.rest.service.JobService。

JobService是通过实现InitializingBean接口，继而实现afterPropertiesSet的方法，然后通过配置spring加载bean的方式被初始化的；具体是通过配置文件来装配bean的，涉及到的配置文件有：在./tomcat/webapps/kylin/WEB-INF/web.xml中引入了./tomcat/webapps/kylin/WEB-INF/classes/applicationContext.xml，然后在applicationContext.xml中配置有：

<context:component-scan base-package="org.apache.kylin.rest"/>

然后spring去扫描目录org.apache.kylin.rest下的标有@Component的类，并注册成bean。由于JobService是通过实现InitializingBean接口，继而实现afterPropertiesSet的方法来初始化bean的，所以在JobService这个bean被初始化的时候，afterPropertiesSet会被调用执行，继而实现JobService的初始化，kylin中的其他服务也是这要被初始化的。

public void afterPropertiesSet() throws Exception {
    String timeZone = getConfig().getTimeZone();
    TimeZone tzone = TimeZone.getTimeZone(timeZone);
    TimeZone.setDefault(tzone);
    final KylinConfig kylinConfig = KylinConfig.getInstanceFromEnv(); 

    //获取配置的任务调度器，默认为org.apache.kylin.job.impl.threadpool.DefaultScheduler

    final Scheduler<AbstractExecutable> scheduler = (Scheduler<AbstractExecutable>) SchedulerFactory
            .scheduler(kylinConfig.getSchedulerType());
    new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                //调度服务初始化
                scheduler.init(new JobEngineConfig(kylinConfig), new ZookeeperJobLock());
                if (!scheduler.hasStarted()) {
                    logger.info("scheduler has not been started");
                }
            } catch (Exception e) {
                throw new RuntimeException(e);
            }
        }
    }).start();

    Runtime.getRuntime().addShutdownHook(new Thread(new Runnable() {
        @Override
        public void run() {
            try {
                scheduler.shutdown();
            } catch (SchedulerException e) {
                logger.error("error occurred to shutdown scheduler", e);
            }
        }
    }));
}

上边代码的那个初始化，就是scheduler就是一个调度线程。

public Map<Integer, String> getSchedulers() {

    Map<Integer, String> r = Maps.newLinkedHashMap();

    r.put(0, "org.apache.kylin.job.impl.threadpool.DefaultScheduler");

    r.put(2, "org.apache.kylin.job.impl.threadpool.DistributedScheduler");

    r.put(77, "org.apache.kylin.job.impl.threadpool.NoopScheduler");

    r.putAll(convertKeyToInteger(getPropertiesByPrefix("kylin.job.scheduler.provider.")));

    return r;

}

kylin的scheduler调度器有三种，第一种是默认的。后边两种要自己看一下。而默认调度器的选择是使用 kylin.job.scheduler.default 这个配置项来确认的，默认值就是0

public synchronized void init(JobEngineConfig jobEngineConfig, JobLock lock) throws SchedulerException {

    jobLock = lock;
    String serverMode = jobEngineConfig.getConfig().getServerMode();
    //只有服务模式为job和all的需要运行任务调度服务，query不需要
    if (!("job".equals(serverMode.toLowerCase()) || "all".equals(serverMode.toLowerCase()))) {
        logger.info("server mode: " + serverMode + ", no need to run job scheduler");
        return;

    }
    logger.info("Initializing Job Engine ....");

    if (!initialized) {
        initialized = true;
    } else {
        return;
    }

    this.jobEngineConfig = jobEngineConfig;

    if (jobLock.lockJobEngine() == false) {
        throw new IllegalStateException("Cannot start job scheduler due to lack of job lock");
    }
    executableManager = ExecutableManager.getInstance(jobEngineConfig.getConfig());

    //load all executable, set them to a consistent status
    fetcherPool = Executors.newScheduledThreadPool(1);
    int corePoolSize = jobEngineConfig.getMaxConcurrentJobLimit();
    jobPool = new ThreadPoolExecutor(corePoolSize, corePoolSize, Long.MAX_VALUE, TimeUnit.DAYS,
            new SynchronousQueue<Runnable>());
    context = new DefaultContext(Maps.<String, Executable> newConcurrentMap(), jobEngineConfig.getConfig());
    logger.info("Staring resume all running jobs.");
    executableManager.resumeAllRunningJobs();
    logger.info("Finishing resume all running jobs.");

    //获取调度时间间隔，
    int pollSecond = jobEngineConfig.getPollIntervalSecond();
    logger.info("Fetching jobs every {} seconds", pollSecond);
    JobExecutor jobExecutor = new JobExecutor() {

        @Override
        public void execute(AbstractExecutable executable) {

            jobPool.execute(new JobRunner(executable));
        }

    };
    //判断任务调度是否考虑优先级，默认不考虑，即使用DefaultFetcherRunner
    fetcher = jobEngineConfig.getJobPriorityConsidered()
            ? new PriorityFetcherRunner(jobEngineConfig, context, executableManager, jobExecutor)
            : new DefaultFetcherRunner(jobEngineConfig, context, executableManager, jobExecutor);
    logger.info("Creating fetcher pool instance:" + System.identityHashCode(fetcher));

    //每隔pollSecond去获取一次任务
    fetcherPool.scheduleAtFixedRate(fetcher, pollSecond / 10, pollSecond, TimeUnit.SECONDS);
    hasStarted = true;

}

这是任务调度器的初始化过程，最后一步就是代表该任务每隔多久去定时获取job。下面间隔性的执行DefaultFetcherRunner的run方法：


synchronized public void run() {

    try (SetThreadName ignored = new SetThreadName(//
            "FetcherRunner %s", System.identityHashCode(this))) {//
        // logger.debug("Job Fetcher is running...");
        Map<String, Executable> runningJobs = context.getRunningJobs();
        // 任务调度池是否满了，默认只能同时执行10个job
        if (isJobPoolFull()) {
            return;
        }
        ......
        //获取索引的job
        for (final String id : executableManager.getAllJobIds()) {
            ......
            //根据任务id获取具体的任务
            final AbstractExecutable executable = executableManager.getJob(id);
            ......
            //添加任务到任务调度池
            addToJobPool(executable, executable.getDefaultPriority());
        }
      ......
    }
}

主要看下是从哪获取到的所有的job，上面是调用executableManager.getAllJobIds()来获取所有的任务id的，下面看下这个函数：

public List<String> getJobIds() throws PersistentException {

    try {
        NavigableSet<String> resources = store.listResources(ResourceStore.EXECUTE_RESOURCE_ROOT);
        if (resources == null) {
            return Collections.emptyList();
        }

        ArrayList<String> result = Lists.newArrayListWithExpectedSize(resources.size());
        for (String path : resources) {
            result.add(path.substring(path.lastIndexOf("/") + 1));
        }
        return result;
    } catch (IOException e) {
        logger.error("error get all Jobs:", e);
        throw new PersistentException(e);
    }
}

store.listResources 到存储kylin元数据的数据库获取以“/execute”开始的元数据条目，然后截取出任务的id，接着调用executableManager.getJob(id)来获取具体的任务信息，依然是到存储kylin元数据的数据库中获取

protected void addToJobPool(AbstractExecutable executable, int priority) {

    String jobDesc = executable.toString();
    logger.info(jobDesc + " prepare to schedule and its priority is " + priority);
    try {
        context.addRunningJob(executable);
        //提交任务到调度池中执行
        jobExecutor.execute(executable);
        logger.info(jobDesc + " scheduled");
    } catch (Exception ex) {
        context.removeRunningJob(executable);
        logger.warn(jobDesc + " fail to schedule", ex);
    }
}

回到DefaultScheduler中的init函数中的jobExecutor，最终调用JobRunner的run方法来执行任务，主要是调用executable.execute(context)，kylin中的具体任务都是继承类AbstractExecutable，如果重写了execute方法，就调用具体任务的execute方法来执行相应的任务，如果未重写execute方法，则调用AbstractExecutable中的execute方法，然后调用doWork来执行任务，spark的相关任务的任务类型是SparkExecutable，该类继承自AbstractExecutable，自己实现了doWork方法来提交spark任务，spark任务提交运行的主类为SparkEntry，调用main方法，然后调用AbstractApplication的execute方法，最后调用具体任务类的execute方法运行。上面就是kylin中任务调度的相关代码，下面看下任务是怎么提交到任务调度服务的。

任务提交最终要调用到JobService中submitJobInternal方法，这个方法中最终调用getExecutableManager().addJob(job)来提交任务（这里的job是一个DefaultChainedExecutable的实例，里面包含各种Executable类型的task），这里的getExecutableManager获取了ExecutableManager的单例，然后调用addJob来提交任务，然后调用executableDao.addJob(parse(executable))，接着调用writeJobResource(pathOfJob(job), job)将job信息序列化后存入元数据数据库表中。

JobRunner的run方法如下：

public void run() {
	try (SetThreadName ignored = new SetThreadName("Scheduler %s Job %s",
	        System.identityHashCode(DefaultScheduler.this), executable.getId())) {
	    /* 这个任务是job级别 */
	    executable.execute(context);
	} catch (ExecuteException e) {
	    logger.error("ExecuteException job:" + executable.getId(), e);
	} catch (Exception e) {
	    logger.error("unknown error execute job:" + executable.getId(), e);
	} finally {
	    context.removeRunningJob(executable);
	}

	// trigger the next step asap
	fetcherPool.schedule(fetcher, 0, TimeUnit.SECONDS);
}

会调用 executable 的 execute方法，而executable 是 AbstractExecutable，execute方法如下：

public final ExecuteResult execute(ExecutableContext executableContext) throws ExecuteException {

	logger.info("Executing AbstractExecutable (" + this.getName() + ")");

	Preconditions.checkArgument(executableContext instanceof DefaultContext);
	ExecuteResult result = null;

	try {
	    /**
	     * 注意这四个方法，代表了执行的先后顺序
	     */
	    onExecuteStart(executableContext);
	    Throwable exception;
	    do {
	        if (retry > 0) {
	            logger.info("Retry " + retry);
	        }
	        exception = null;
	        result = null;
	        try {
	            result = doWork(executableContext);
	        } catch (Throwable e) {
	            logger.error("error running Executable: " + this.toString());
	            exception = e;
	        }
	        retry++;
	    } while (needRetry(this.retry, exception)); //exception in ExecuteResult should handle by user itself.

	    if (exception != null) {
	        onExecuteError(exception, executableContext);
	        throw new ExecuteException(exception);
	    }

	    onExecuteFinishedWithRetry(result, executableContext);
	} catch (ExecuteException e) {
	    throw e;
	} catch (Exception e) {
	    throw new ExecuteException(e);
	}
	return result;
	}

在这里可以看到会依次执行四个方法：1、onExecuteStart 2、doWork 这个是核心任务执行方法 3、onExecuteFinishedWithRetry 4、onExecuteError 如果出错会执行此方法。

以上就是一个大致的主要逻辑。有疑问随时给我留言，谢谢各位。