一.ExecutionGraph源码机制
首先,我们要了解到,StreamGraph和JobGraph都是在client创建的,client创建完这俩图后呢,就通过submit提交给JobManager,而JobMananger会根据JobGraph去生成对应的ExecutionGraph。
其次,ExecutionGraph 是 Flink 作业调度时使用到的核⼀数据结构,它包含每一个SubTask、每⼀个 intermediate stream 以及它们之间的关系。
0.先说结论
以 per-job 模式为例,分析 ExecutionGraph 的生成逻辑:在 Dispacher 创建 JobManagerRunner 时,调用 createJobManagerRunner:
Dispatcher.createJobManagerRunner()
-> JobManagerRunnerFactory实现类(就是DefaultJobManagerRunnerFactory).createJobManagerRunner()
-> new JobManagerRunnerImpl()
-> JobMasterServiceFactory实现类(就是DefaultJobMasterServiceFactory).createJobMasterService()
-> new JobMaster()
至此,JobMaster创建时的调用流程已经完毕,下面看JobMaster
new JobMaster() -> createScheduler() 创建调度器
-> SchedulerNGFactory实现类(就是DefaultSchedulerFactory).createInstance()
-> new DefaultScheduler() ->super() 就是调new SchedulerBase() ->调createExecutionGraph()
->ExecutionGraphBuilder.buildGraph()
至此,ExecutionGraph
创建时的调用流程已经完毕,下面看具体创建的步骤
- 执行图的创建与初始化,若已存在执行图,则直接复用;否则,创建新的执行图:
executionGraph = (prior != null) ? prior :new ExecutionGraph()
- 初始化每个
JobVertex
:for (JobVertex vertex : jobGraph.getVertices())
- 对
JobGraph
进⾏拓扑排序,确保数据流向的正确性:List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources()
; - 核心逻辑:
executionGraph.attachJobGraph(sortedTopology)
,将拓扑排序过的JobGraph
添加到executionGraph
数据结构中 - 然后就是一些配置,。。。
- 最终返回
executionGraph
来到了核心代码ExecutionGraph.attachJobGraph()
中
- 确保函数在作业主节点的主线程中运行,避免多线程并发问题:
assertRunningInJobMasterMainThread()
- 初始化:
new ArrayList<>(topologiallySorted.size());
- 遍历每个
JobVertex
- 创建执行图节点,根据每⼀个
JobVertex
,创建对应的ExecutionVertex
:new ExecutionJobVertex()
- 连接的核心逻辑:
ejv.connectToPredecessors(this.intermediateResults)
,将创建的ExecutionJobVertex
与前置的IntermediateResult
连接起来 - 任务唯一性检查:
this.tasks.putIfAbsent(jobVertex.getID(), ejv);
- 遍历当前任务产生的所有的中间结果
IntermediateResult
,确保中间结果的唯一性,目的:保证数据一致性,防止图结果混乱:for (IntermediateResult res : ejv.getProducedDataSets()) { IntermediateResult previousDataSet = this.intermediateResults.putIfAbsent(res.getId(), res);
- 节点总数量需要加上当前执行图节点的并⾏度,因为执行图是作业图的并行化版本:
this.numVerticesTotal += ejv.getParallelism();
new ExecJobVertices.add(ejv);
- 创建执行图节点,根据每⼀个
- 基于当前执行图生成执行拓扑结构:
executionTopology = DefaultExecutionTopology.fromExecutionGraph(this)
; - 故障恢复和任务监控:
failoverStrategy.notifyNewVertices(newExecJobVertices);
- 配置分区释放策略:
partitionReleaseStrategy = partitionReleaseStrategyFactory.createInstance(getSchedulingTopology());
继续看连接的核心代码ExecutionJobVertex.connectToPredecessors()
- 获取当前
JobVertex
的入边JobEdge
列表:jobVertex.getInputs()
; - 遍历每条入边
JobEdge
- 通过Job入边来源的Job顶点ID,去获取该顶点所产生的
IntermediateResult
中间结果:IntermediateResult ires = intermediateDataSets.get(edge.getSourceId());
- 将
IntermediateResult
加入到当前ExecutionJobVertex
的输入中:this.inputs.add(ires);
- 为
IntermediateResult
注册consumer(就是当前ExecutionJobVertex)
,返回consumerIndex
:int consumerIndex = ires.registerConsumer();
- 由于每⼀个并行度都对应⼀个节点。所以要把每个节点都和前面中间结果相连:
ev.connectSource(num, ires, edge, consumerIndex);
- 通过Job入边来源的Job顶点ID,去获取该顶点所产生的
最后看真正连接的代码ExecutionVertex.connectSource()
- 获取边edge的上下游传输策略:只有forward的方式的情况下,pattern才是 POINTWISE的,否则均为 ALL_TO_ALL:
final DistributionPattern pattern = edge.getDistributionPattern();
- 获取中间结果的物理分区,每一个分区对应一个SubTask:
final IntermediateResultPartition[] sourcePartitions = source.getPartitions();
- 根据传输策略,去创建执行边:
edges = connectAllToAll(sourcePartitions, inputNumber);
- 保存连接信息:
inputEdges[inputNumber] = edges;
- 利用注册好的consumer索引为
IntermediateResultPartition
添加consumer
,即关联到ExecutionEdge上(在这之前已经注册好cunsumer了):ee.getSource().addConsumer(ee, consumerNumber);
前言:
- 这个JobManager生成ExecutionGraph的调用几乎全是套娃,a调b,b调c,一些不相关的一些代码,下面直接删除了,因此在看到。。。说明这部分代码和ExecutionGraph没关系被删除了
- 不了解JobGraph的先去看Flink-Graph-3.JobGraph生成源码和 Flink-Graph-2.StreamGraph生成源码
1.从Dsiapatcher启动JobManager入手
@Override
public void onStart() throws Exception {
try {
/*TODO 启动 dispatcher服务*/
startDispatcherServices();
} catch (Throwable t) {
final DispatcherException exception = new DispatcherException(String.format("Could not start the Dispatcher %s", getAddress()), t);
onFatalError(exception);
throw exception;
}
/*TODO 启动JobMaster*/
startRecoveredJobs();
this.dispatcherBootstrap = this.dispatcherBootstrapFactory.create(
getSelfGateway(DispatcherGateway.class),
this.getRpcService().getScheduledExecutor() ,
this::onFatalError);
}
// onStart中调用的startRecoveredJobs
private void startRecoveredJobs() {
// 遍历每个作业流图去执行runRecoveredJob
for (JobGraph recoveredJob : recoveredJobs) {
runRecoveredJob(recoveredJob);
}
recoveredJobs.clear();
}
// startRecoveredJobs中调用的runRecoveredJob
private void runRecoveredJob(final JobGraph recoveredJob) {
checkNotNull(recoveredJob);
try {
runJob(recoveredJob, ExecutionType.RECOVERY);
} catch (Throwable throwable) {
onFatalError(new DispatcherException(String.format("Could not start recovered job %s.", recoveredJob.getJobID()), throwable));
}
}
// runRecoveredJob中调用的runJob
private void runJob(JobGraph jobGraph, ExecutionType executionType) {
Preconditions.checkState(!runningJobs.containsKey(jobGraph.getJobID()));
long initializationTimestamp = System.currentTimeMillis();
// 在这根据JobGraph去创建JobMananger
CompletableFuture<JobManagerRunner> jobManagerRunnerFuture = createJobManagerRunner(jobGraph, initializationTimestamp);
。。。
}
ok,到这我们知道在DispatcherJob会调用它自己的createJobManagerRunner()
方法,把JobGraph
传入,最终生成JobManagerRunner
,因此我们看createJobManagerRunner
方法
CompletableFuture<JobManagerRunner> createJobManagerRunner(JobGraph jobGraph, long initializationTimestamp) {
final RpcService rpcService = getRpcService();
return CompletableFuture.supplyAsync(
() -> {
try {
/**
* 1.创建JobMaster
* 其中这个jobManagerRunnerFactory是一个接口且只有一个createJobManagerRunner方法,其实现类如下图
*/
JobManagerRunner runner = jobManagerRunnerFactory.createJobManagerRunner(
jobGraph,
configuration,
rpcService,
highAvailabilityServices,
heartbeatServices,
jobManagerSharedServices,
new DefaultJobManagerJobMetricGroupFactory(jobManagerMetricGroup),
fatalErrorHandler,
initializationTimestamp);
/*TODO 启动JobMaster*/
runner.start();
return runner;
} catch (Exception e) {
throw new CompletionException(new JobInitializationException(jobGraph.getJobID(), "Could not instantiate JobManager.", e));
}
},
ioExecutor); // do not use main thread executor. Otherwise, Dispatcher is blocked on JobManager creation
}
2.从JobManagerRunnerFactory接口看
上面说到,Dispatcher最终调JobManagerRunnerFactory
的createJobManagerRunner()
去创建JobManagerRunner
,而JobManagerRunnerFactory
是一个接口
@FunctionalInterface
public interface JobManagerRunnerFactory {
JobManagerRunner createJobManagerRunner(
JobGraph jobGraph,
Configuration configuration,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
JobManagerSharedServices jobManagerServices,
JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
FatalErrorHandler fatalErrorHandler,
long initializationTimestamp) throws Exception;
}
其实现类如图
我们看默认的
DefaultJobManagerRunnerFactory
(1) 实现类DefaultJobManagerRunnerFactory
DefaultJobManagerRunnerFactory
调JobManagerRunnerImpl()
去创建
注意:
DefaultJobManagerRunnerFactory
传入了一个JobMasterServiceFactory
的实现类--DefaultJobMasterServiceFactory
- 并调用
SchedulerNGFactoryFactory.createSchedulerNGFactory(configuration);
去创建一个SchedulerNGFactory
实现类--DefaultSchedulerFactory
public enum DefaultJobManagerRunnerFactory implements org.apache.flink.runtime.dispatcher.JobManagerRunnerFactory {
INSTANCE;
/**
* 根据JobGraph去创建JobManagerRunner
* @param jobGraph 待执行的作业图,包含作业的拓扑结构和配置信息
* @param configuration Flink 全局配置
* @param rpcService RPC 服务
* @param highAvailabilityServices 高可用服务
* @param heartbeatServices 心跳服务
* @param jobManagerServices JobManager 共享服务(如类加载器、定时任务)
* @param jobManagerJobMetricGroupFactory 指标收集工厂
* @param fatalErrorHandler 致命错误处理器
* @param initializationTimestamp 作业初始化时间戳
* @return
* @throws Exception
*/
@Override
public JobManagerRunner createJobManagerRunner(
JobGraph jobGraph,
Configuration configuration,
RpcService rpcService,
HighAvailabilityServices highAvailabilityServices,
HeartbeatServices heartbeatServices,
JobManagerSharedServices jobManagerServices,
JobManagerJobMetricGroupFactory jobManagerJobMetricGroupFactory,
FatalErrorHandler fatalErrorHandler,
long initializationTimestamp) throws Exception {
final JobMasterConfiguration jobMasterConfiguration = JobMasterConfiguration.fromConfiguration(configuration);
final SlotPoolFactory slotPoolFactory = SlotPoolFactory.fromConfiguration(configuration);
// 这个方法默认返回的是DefaultSchedulerFactory
final SchedulerNGFactory schedulerNGFactory = SchedulerNGFactoryFactory.createSchedulerNGFactory(configuration);
final ShuffleMaster<?> shuffleMaster = ShuffleServiceLoader.loadShuffleServiceFactory(configuration).createShuffleMaster(configuration);
final JobMasterServiceFactory jobMasterFactory = new DefaultJobMasterServiceFactory(
jobMasterConfiguration,
slotPoolFactory,
rpcService,
highAvailabilityServices,
jobManagerServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
fatalErrorHandler,
schedulerNGFactory,
shuffleMaster);
return new JobManagerRunnerImpl(
jobGraph,
jobMasterFactory,
highAvailabilityServices,
jobManagerServices.getLibraryCacheManager().registerClassLoaderLease(jobGraph.getJobID()),
jobManagerServices.getScheduledExecutorService(),
fatalErrorHandler,
initializationTimestamp);
}
}
(2) DefaultJobManagerRunnerFactory
调用的JobManagerRunnerImpl
JobManagerRunnerImpl
调DefaultJobManagerRunnerFactory
传入的JobMasterServiceFactory的createJobMasterService()
public JobManagerRunnerImpl(
final JobGraph jobGraph,
final JobMasterServiceFactory jobMasterFactory,
final HighAvailabilityServices haServices,
final LibraryCacheManager.ClassLoaderLease classLoaderLease,
final Executor executor,
final FatalErrorHandler fatalErrorHandler,
long initializationTimestamp) throws Exception {
。。。
// 开始创建JobMaster服务
this.jobMasterService = jobMasterFactory.createJobMasterService(jobGraph, this, userCodeLoader, initializationTimestamp);
}
(3) JobManagerRunnerImpl
调用的DefaultJobMasterServiceFactory
DefaultJobMasterServiceFactory
又调new JobMaster()
@Override
public JobMaster createJobMasterService(
JobGraph jobGraph,
OnCompletionActions jobCompletionActions,
ClassLoader userCodeClassloader,
long initializationTimestamp) throws Exception {
return new JobMaster(
rpcService,
jobMasterConfiguration,
ResourceID.generate(),
jobGraph,
haServices,
slotPoolFactory,
jobManagerSharedServices,
heartbeatServices,
jobManagerJobMetricGroupFactory,
jobCompletionActions,
fatalErrorHandler,
userCodeClassloader,
schedulerNGFactory,
shuffleMaster,
lookup -> new JobMasterPartitionTrackerImpl(
jobGraph.getJobID(),
shuffleMaster,
lookup
),
new DefaultExecutionDeploymentTracker(),
DefaultExecutionDeploymentReconciler::new,
initializationTimestamp);
}
ok,到这我们见到怎么创建JobMaster的了,那么看JobMaster类
3.JobMaster
public JobMaster(
RpcService rpcService,
JobMasterConfiguration jobMasterConfiguration,
ResourceID resourceId,
JobGraph jobGraph,
HighAvailabilityServices highAvailabilityService,
SlotPoolFactory slotPoolFactory,
JobManagerSharedServices jobManagerSharedServices,
HeartbeatServices heartbeatServices,
JobManagerJobMetricGroupFactory jobMetricGroupFactory,
OnCompletionActions jobCompletionActions,
FatalErrorHandler fatalErrorHandler,
ClassLoader userCodeLoader,
SchedulerNGFactory schedulerNGFactory,
ShuffleMaster<?> shuffleMaster,
PartitionTrackerFactory partitionTrackerFactory,
ExecutionDeploymentTracker executionDeploymentTracker,
ExecutionDeploymentReconciler.Factory executionDeploymentReconcilerFactory,
long initializationTimestamp) throws Exception {
// 这些都是一些配置,省略
。。。。
// 重点来了:创建调度器,创建的时候会把 JobGraph转换成 ExecutionGraph
this.schedulerNG = createScheduler(executionDeploymentTracker, jobManagerJobMetricGroup);
。。。
}
(1) 调用的createScheduler()
又调schedulerNGFactory
的createInstance()
,其中这个schedulerNGFactory
是上面DefaultJobManagerRunnerFactory
的创建的DefaultSchedulerFactory
private SchedulerNG createScheduler(ExecutionDeploymentTracker executionDeploymentTracker,
final JobManagerJobMetricGroup jobManagerJobMetricGroup) throws Exception {
// 又调了schedulerNGFactory的createInstance
return schedulerNGFactory.createInstance(
log,
jobGraph,
backPressureStatsTracker,
scheduledExecutorService,
jobMasterConfiguration.getConfiguration(),
slotPool,
scheduledExecutorService,
userCodeLoader,
highAvailabilityServices.getCheckpointRecoveryFactory(),
rpcTimeout,
blobWriter,
jobManagerJobMetricGroup,
jobMasterConfiguration.getSlotRequestTimeout(),
shuffleMaster,
partitionTracker,
executionDeploymentTracker,
initializationTimestamp);
}
(2) createScheduler()
中调用的DefaultSchedulerFactory
类
其实它又去new DefaultScheduler创建默认调度器了
public class DefaultSchedulerFactory implements SchedulerNGFactory {
@Override
public SchedulerNG createInstance(
final Logger log,
final JobGraph jobGraph,
final BackPressureStatsTracker backPressureStatsTracker,
final Executor ioExecutor,
final Configuration jobMasterConfiguration,
final SlotPool slotPool,
final ScheduledExecutorService futureExecutor,
final ClassLoader userCodeLoader,
final CheckpointRecoveryFactory checkpointRecoveryFactory,
final Time rpcTimeout,
final BlobWriter blobWriter,
final JobManagerJobMetricGroup jobManagerJobMetricGroup,
final Time slotRequestTimeout,
final ShuffleMaster<?> shuffleMaster,
final JobMasterPartitionTracker partitionTracker,
final ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp) throws Exception {
。。。
return new DefaultScheduler(
log,
jobGraph,
backPressureStatsTracker,
ioExecutor,
jobMasterConfiguration,
schedulerComponents.getStartUpAction(),
futureExecutor,
new ScheduledExecutorServiceAdapter(futureExecutor),
userCodeLoader,
checkpointRecoveryFactory,
rpcTimeout,
blobWriter,
jobManagerJobMetricGroup,
shuffleMaster,
partitionTracker,
schedulerComponents.getSchedulingStrategyFactory(),
FailoverStrategyFactoryLoader.loadFailoverStrategyFactory(jobMasterConfiguration),
restartBackoffTimeStrategy,
new DefaultExecutionVertexOperations(),
new ExecutionVertexVersioner(),
schedulerComponents.getAllocatorFactory(),
executionDeploymentTracker,
initializationTimestamp);
}
}
(3) DefaultSchedulerFactory
在调用的DefaultScheduler
类
调用其父类SchedulerBase
的构造方法
public class DefaultScheduler extends SchedulerBase implements SchedulerOperations {
DefaultScheduler(
final Logger log,
final JobGraph jobGraph,
final BackPressureStatsTracker backPressureStatsTracker,
final Executor ioExecutor,
final Configuration jobMasterConfiguration,
final Consumer<ComponentMainThreadExecutor> startUpAction,
final ScheduledExecutorService futureExecutor,
final ScheduledExecutor delayExecutor,
final ClassLoader userCodeLoader,
final CheckpointRecoveryFactory checkpointRecoveryFactory,
final Time rpcTimeout,
final BlobWriter blobWriter,
final JobManagerJobMetricGroup jobManagerJobMetricGroup,
final ShuffleMaster<?> shuffleMaster,
final JobMasterPartitionTracker partitionTracker,
final SchedulingStrategyFactory schedulingStrategyFactory,
final FailoverStrategy.Factory failoverStrategyFactory,
final RestartBackoffTimeStrategy restartBackoffTimeStrategy,
final ExecutionVertexOperations executionVertexOperations,
final ExecutionVertexVersioner executionVertexVersioner,
final ExecutionSlotAllocatorFactory executionSlotAllocatorFactory,
final ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp) throws Exception {
// 核心逻辑是这个,调的是父类SchedulerBase的构造方法
super(
log,
jobGraph,
backPressureStatsTracker,
ioExecutor,
jobMasterConfiguration,
new ThrowingSlotProvider(), // this is not used any more in the new scheduler
futureExecutor,
userCodeLoader,
checkpointRecoveryFactory,
rpcTimeout,
new ThrowingRestartStrategy.ThrowingRestartStrategyFactory(),
blobWriter,
jobManagerJobMetricGroup,
Time.seconds(0), // this is not used any more in the new scheduler
shuffleMaster,
partitionTracker,
executionVertexVersioner,
executionDeploymentTracker,
false,
initializationTimestamp);
。。。
}
...
}
ok,到这我们知道最终调的是DefaultScheduler
的父类SchedulerBase
的构造方法
3.SchedulerBase
(1) 构造函数
public SchedulerBase(
final Logger log,
final JobGraph jobGraph,
final BackPressureStatsTracker backPressureStatsTracker,
final Executor ioExecutor,
final Configuration jobMasterConfiguration,
final SlotProvider slotProvider,
final ScheduledExecutorService futureExecutor,
final ClassLoader userCodeLoader,
final CheckpointRecoveryFactory checkpointRecoveryFactory,
final Time rpcTimeout,
final RestartStrategyFactory restartStrategyFactory,
final BlobWriter blobWriter,
final JobManagerJobMetricGroup jobManagerJobMetricGroup,
final Time slotRequestTimeout,
final ShuffleMaster<?> shuffleMaster,
final JobMasterPartitionTracker partitionTracker,
final org.apache.flink.runtime.scheduler.ExecutionVertexVersioner executionVertexVersioner,
final ExecutionDeploymentTracker executionDeploymentTracker,
final boolean legacyScheduling,
long initializationTimestamp) throws Exception {
。。。
// 核心:执行图构造
this.executionGraph = createAndRestoreExecutionGraph(jobManagerJobMetricGroup, checkNotNull(shuffleMaster), checkNotNull(partitionTracker), checkNotNull(executionDeploymentTracker), initializationTimestamp);
。。。
}
(1) 调用的createAndRestoreExecutionGraph()
private ExecutionGraph createAndRestoreExecutionGraph(
JobManagerJobMetricGroup currentJobManagerJobMetricGroup,
ShuffleMaster<?> shuffleMaster,
JobMasterPartitionTracker partitionTracker,
ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp) throws Exception {
// 调用createExecutionGraph()
ExecutionGraph newExecutionGraph = createExecutionGraph(currentJobManagerJobMetricGroup, shuffleMaster, partitionTracker, executionDeploymentTracker, initializationTimestamp);
。。。
return newExecutionGraph;
}
(2) createAndRestoreExecutionGraph()
在调用createExecutionGraph()
private ExecutionGraph createExecutionGraph(
JobManagerJobMetricGroup currentJobManagerJobMetricGroup,
ShuffleMaster<?> shuffleMaster,
final JobMasterPartitionTracker partitionTracker,
ExecutionDeploymentTracker executionDeploymentTracker,
long initializationTimestamp) throws JobExecutionException, JobException {
。。。
// 好了,看到这,我们已经找到核心处理逻辑了ExecutionGraphBuilder.buildGraph()
return ExecutionGraphBuilder.buildGraph(
null,
jobGraph,
jobMasterConfiguration,
futureExecutor,
ioExecutor,
slotProvider,
userCodeLoader,
checkpointRecoveryFactory,
rpcTimeout,
restartStrategy,
currentJobManagerJobMetricGroup,
blobWriter,
slotRequestTimeout,
log,
shuffleMaster,
partitionTracker,
failoverStrategy,
executionDeploymentListener,
executionStateUpdateListener,
initializationTimestamp);
}
4.ExecutionGraphBuilder.buildGraph()
(1) buildGraph()
public static org.apache.flink.runtime.executiongraph.ExecutionGraph buildGraph(
@Nullable org.apache.flink.runtime.executiongraph.ExecutionGraph prior,
JobGraph jobGraph,
Configuration jobManagerConfig,
ScheduledExecutorService futureExecutor,
Executor ioExecutor,
SlotProvider slotProvider,
ClassLoader classLoader,
CheckpointRecoveryFactory recoveryFactory,
Time rpcTimeout,
RestartStrategy restartStrategy,
MetricGroup metrics,
BlobWriter blobWriter,
Time allocationTimeout,
Logger log,
ShuffleMaster<?> shuffleMaster,
JobMasterPartitionTracker partitionTracker,
FailoverStrategy.Factory failoverStrategyFactory,
org.apache.flink.runtime.executiongraph.ExecutionDeploymentListener executionDeploymentListener,
org.apache.flink.runtime.executiongraph.ExecutionStateUpdateListener executionStateUpdateListener,
long initializationTimestamp) throws JobExecutionException, JobException {
checkNotNull(jobGraph, "job graph cannot be null");
final String jobName = jobGraph.getName();
final JobID jobId = jobGraph.getJobID();
final org.apache.flink.runtime.executiongraph.JobInformation jobInformation = new org.apache.flink.runtime.executiongraph.JobInformation(
jobId,
jobName,
jobGraph.getSerializedExecutionConfig(),
jobGraph.getJobConfiguration(),
jobGraph.getUserJarBlobKeys(),
jobGraph.getClasspaths());
final int maxPriorAttemptsHistoryLength =
jobManagerConfig.getInteger(JobManagerOptions.MAX_ATTEMPTS_HISTORY_SIZE);
final PartitionReleaseStrategy.Factory partitionReleaseStrategyFactory =
PartitionReleaseStrategyFactoryLoader.loadPartitionReleaseStrategyFactory(jobManagerConfig);
// 1.执行图的创建与初始化,若已存在执行图,则直接复用;否则,创建新的执行图
final org.apache.flink.runtime.executiongraph.ExecutionGraph executionGraph;
try {
// 如果不存在执⾏图,就创建⼀个新的执⾏图
executionGraph = (prior != null) ? prior :
new org.apache.flink.runtime.executiongraph.ExecutionGraph(
jobInformation,
futureExecutor,
ioExecutor,
rpcTimeout,
restartStrategy,
maxPriorAttemptsHistoryLength,
failoverStrategyFactory,
slotProvider,
classLoader,
blobWriter,
allocationTimeout,
partitionReleaseStrategyFactory,
shuffleMaster,
partitionTracker,
jobGraph.getScheduleMode(),
executionDeploymentListener,
executionStateUpdateListener,
initializationTimestamp);
} catch (IOException e) {
throw new JobException("Could not create the ExecutionGraph.", e);
}
// set the basic properties
try {
executionGraph.setJsonPlan(JsonPlanGenerator.generatePlan(jobGraph));
}
catch (Throwable t) {
log.warn("Cannot create JSON plan for job", t);
// give the graph an empty plan
executionGraph.setJsonPlan("{}");
}
// initialize the vertices that have a master initialization hook
// file output formats create directories here, input formats create splits
final long initMasterStart = System.nanoTime();
log.info("Running initialization on master for job {} ({}).", jobName, jobId);
// 2.初始化每个JobVertex
for (JobVertex vertex : jobGraph.getVertices()) {
String executableClass = vertex.getInvokableClassName();
if (executableClass == null || executableClass.isEmpty()) {
throw new JobSubmissionException(jobId,
"The vertex " + vertex.getID() + " (" + vertex.getName() + ") has no invokable class.");
}
try {
vertex.initializeOnMaster(classLoader);
}
catch (Throwable t) {
throw new JobExecutionException(jobId,
"Cannot initialize task '" + vertex.getName() + "': " + t.getMessage(), t);
}
}
log.info("Successfully ran initialization on master in {} ms.",
(System.nanoTime() - initMasterStart) / 1_000_000);
// 3.对JobGraph进⾏拓扑排序,确保数据流向的正确性
List<JobVertex> sortedTopology = jobGraph.getVerticesSortedTopologicallyFromSources();
if (log.isDebugEnabled()) {
log.debug("Adding {} vertices from job graph {} ({}).", sortedTopology.size(), jobName, jobId);
}
// 4.核心逻辑:将拓扑排序过的JobGraph添加到 executionGraph数据结构中
executionGraph.attachJobGraph(sortedTopology);
if (log.isDebugEnabled()) {
log.debug("Successfully created execution graph from job graph {} ({}).", jobName, jobId);
}
// 5.下面就全是配置检查点和metric一些操作,忽略
。。。
return executionGraph;
}
(2) ExecutionGraph类的attachJobGraph()
/**
* 构造ExecutionGraph的核心代码
* @param topologiallySorted 排序后的JobVertex的集合
* @throws JobException
*/
public void attachJobGraph(List<JobVertex> topologiallySorted) throws JobException {
// 1.确保函数在作业主节点的主线程中运行,避免多线程并发问题
assertRunningInJobMasterMainThread();
LOG.debug("Attaching {} topologically sorted vertices to existing job graph with {} " +
"vertices and {} intermediate results.",
topologiallySorted.size(),
tasks.size(),
intermediateResults.size());
// 2.初始化
final ArrayList<ExecutionJobVertex> newExecJobVertices = new ArrayList<>(topologiallySorted.size());
final long createTimestamp = System.currentTimeMillis();
// 3.遍历每个JobVertex
for (JobVertex jobVertex : topologiallySorted) {
// 对不可停止的输入节点进行标记
if (jobVertex.isInputVertex() && !jobVertex.isStoppable()) {
this.isStoppable = false;
}
// 3.1创建执行图节点,根据每⼀个JobVertex,创建对应的 ExecutionVertex
ExecutionJobVertex ejv = new ExecutionJobVertex(
this,
jobVertex,
1,
maxPriorAttemptsHistoryLength,
rpcTimeout,
globalModVersion,
createTimestamp);
// 3.2核心逻辑:将创建的ExecutionJobVertex与前置的IntermediateResult连接起来
ejv.connectToPredecessors(this.intermediateResults);
// 3.3任务唯一性检查
ExecutionJobVertex previousTask = this.tasks.putIfAbsent(jobVertex.getID(), ejv);
if (previousTask != null) {
throw new JobException(String.format("Encountered two job vertices with ID %s : previous=[%s] / new=[%s]",
jobVertex.getID(), ejv, previousTask));
}
// 3.4遍历当前任务产生的所有的中间结果IntermediateResult,确保中间结果的唯一性,目的:保证数据一致性,防止图结果混乱
for (IntermediateResult res : ejv.getProducedDataSets()) {
IntermediateResult previousDataSet = this.intermediateResults.putIfAbsent(res.getId(), res);
if (previousDataSet != null) {
throw new JobException(String.format("Encountered two intermediate data set with ID %s : previous=[%s] / new=[%s]",
res.getId(), res, previousDataSet));
}
}
this.verticesInCreationOrder.add(ejv);
// 3.5节点总数量需要加上当前执行图节点的并⾏度,因为执行图是作业图的并行化版本
this.numVerticesTotal += ejv.getParallelism();
// 3.6将当前执⾏图节点加⼊到图中
newExecJobVertices.add(ejv);
}
// 4.基于当前执行图生成执行拓扑结构
executionTopology = DefaultExecutionTopology.fromExecutionGraph(this);
// 5.故障恢复和任务监控
failoverStrategy.notifyNewVertices(newExecJobVertices);
// 6.配置分区释放策略
partitionReleaseStrategy = partitionReleaseStrategyFactory.createInstance(getSchedulingTopology());
}
(3) ExecutionJobVertex类的connectToPredecessors()
作用:负责将一个执行节点与其前置节点产生的中间结果建立连接
// 负责将一个执行节点与其前置任务产生的中间结果建立连接
// JobGraph的案例:[Source->Map]--Edge2-->[KeyBy->Window]--Edge4-->[Sink]
public void connectToPredecessors(Map<IntermediateDataSetID, IntermediateResult> intermediateDataSets) throws JobException {
// 1.获取当前JobVertex的入边JobEdge列表,小心同名
// 案例,inputs就是[Edge2]
List<JobEdge> inputs = jobVertex.getInputs();
if (LOG.isDebugEnabled()) {
LOG.debug(String.format("Connecting ExecutionJobVertex %s (%s) to %d predecessors.", jobVertex.getID(), jobVertex.getName(), inputs.size()));
}
// 2.遍历每条入边JobEdge,这里在上面案例中的[Source->Map]不会走这里,因为其没有inputs
for (int num = 0; num < inputs.size(); num++) {
JobEdge edge = inputs.get(num);
if (LOG.isDebugEnabled()) {
if (edge.getSource() == null) {
LOG.debug(String.format("Connecting input %d of vertex %s (%s) to intermediate result referenced via ID %s.",
num, jobVertex.getID(), jobVertex.getName(), edge.getSourceId()));
} else {
LOG.debug(String.format("Connecting input %d of vertex %s (%s) to intermediate result referenced via predecessor %s (%s).",
num, jobVertex.getID(), jobVertex.getName(), edge.getSource().getProducer().getID(), edge.getSource().getProducer().getName()));
}
}
// 2.1通过入边来源的算子ID,去获取该算子所产生的IntermediateResult中间结果
// 案例,edge.getSourceId() 就是[Source->Map]的ID,那么这个ires就是它所产生的中间结果
IntermediateResult ires = intermediateDataSets.get(edge.getSourceId());
if (ires == null) {
throw new JobException("Cannot connect this job graph to the previous graph. No previous intermediate result found for ID "
+ edge.getSourceId());
}
/* 2.2将IntermediateResult加入到当前 ExecutionJobVertex 的输入中
注意:这里this.inputs是当前类ExecutionJobVertex的属性List<IntermediateResult> inputs;
案例:当前ExecutionJobVertex是[KeyBy->Window],其List<IntermediateResult>放的是[Source->Map]所产生的IntermediateResult
*/
this.inputs.add(ires);
// 2.3为 IntermediateResult 注册 consumer,就是当前ExecutionJobVertex:[KeyBy->Window]
// 注意:这里是对IntermediateResult中间结果去注册消费者,返回的是消费者的索引
int consumerIndex = ires.registerConsumer();
// 2.4由于每⼀个并行度都对应⼀个节点。所以要把每个节点都和前面中间结果相连。
for (int i = 0; i < parallelism; i++) {
ExecutionVertex ev = taskVertices[i];
// 将 ExecutionVertex与 IntermediateResult关联起来
ev.connectSource(num, ires, edge, consumerIndex);
}
}
}
(4) ExecutionVertex类的connectSource()
public void connectSource(int inputNumber, IntermediateResult source, JobEdge edge, int consumerNumber) {
// 1.获取边edge的上下游传输策略:只有forward的方式的情况下,pattern才是 POINTWISE的,否则均为 ALL_TO_ALL
/* 提出一个疑问,算子链的条件中必须要有forward,那么也就是说在JobGraph生成的时候,已经把这些forward边都链化了,因此,每个JobVertex之间不会存在forward情况,那就说不会存在pattern是 POINTWISE
解答:这个问题是有错的,因为算子链的合成条件之一是forward,那如果我两个算子之间确实是forward,但是并行度不一样呢,或者我全局禁用算子链,这都不会链化,那么,JobVertex之间就会存在forward的情况,因此,pattern是POINTWISE的
*/
final DistributionPattern pattern = edge.getDistributionPattern();
// 2.获取中间结果的物理分区,每一个分区对应一个SubTask
final org.apache.flink.runtime.executiongraph.IntermediateResultPartition[] sourcePartitions = source.getPartitions();
org.apache.flink.runtime.executiongraph.ExecutionEdge[] edges;
// 3.根据传输策略,去创建执行边
switch (pattern) {
case POINTWISE:
edges = connectPointwise(sourcePartitions, inputNumber);
break;
case ALL_TO_ALL:
edges = connectAllToAll(sourcePartitions, inputNumber);
break;
default:
throw new RuntimeException("Unrecognized distribution pattern.");
}
// 4.保存连接信息
inputEdges[inputNumber] = edges;
// 5.为IntermediateResultPartition添加consumer,即关联到ExecutionEdge上(之前已经为IntermediateResult注册了consumer)
// 注意:这里是根据ExecutionJobVertex中注册的consumerIndex,去对IntermediateResultPartition上游分区去添加消费者
for (org.apache.flink.runtime.executiongraph.ExecutionEdge ee : edges) {
ee.getSource().addConsumer(ee, consumerNumber);
}
}
(5) connectAllToAll()
看这个方法之前,需要知道,ExecutionVertex 的 inputEdges 变量,是一个二维数据。它表示了这个 ExecutionVertex 上每一个 input 所包含的 ExecutionEdge 列表。
即,如果 ExecutionVertex 有两个不同的输入:输入 A 和 B。其中输入 A 的 partition=1,输入 B 的 partition=8 ,那么这个二维数组inputEdges如下:(irp代表IntermediateResultPartition)
ExecutionEdge[A.irp[0]]
ExecutionEdge[B.irp[0], B.irp[1], ..., B.irp[7]
private ExecutionEdge[] connectAllToAll(IntermediateResultPartition[] sourcePartitions, int inputNumber) {
ExecutionEdge[] edges = new ExecutionEdge[sourcePartitions.length];
for (int i = 0; i < sourcePartitions.length; i++) {
IntermediateResultPartition irp = sourcePartitions[i];
edges[i] = new ExecutionEdge(irp, this, inputNumber);
}
return edges;
}