流程简单介绍
- RDD转换以及DAG构建
- DAGScheduler:
- 划分stage
- 创建并提交task
- TaskScheduler: 调度task,按照调度池优先级,本地性等进行任务调度
- 执行任务
DAGScheduler
(这里仅显示主要方法)
handleJobSubmitted
- 创建finalStage并划分stage
finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
- 创建job
val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
- 递归提交所有未被计算的stage
submitStage(finalStage)
submitStage
- 找到所有不可用(也就是未提交)的父stage,找父stage的方案是DFS找到RDD的宽依赖
val missing = getMissingParentStages(stage).sortBy(_.id)
判断stage可用,已经输出计算结果的分区任务数量和分区数一样
2. 假如该stage没有未提交的父Stage,直接提交该Stage
submitMissingTasks(stage, jobId.get)
- 假如有未提交的父Stage,继续递归调用遍历
submitStage(parent)
submitMissingTasks
- 获取还未计算的分区
val partitionsToCompute: Seq[Int] = stage.findMissingPartitions()
- 计算任务本地性
getPreferredLocsInternal 方法会分为三种情况- 如果RDD被缓存,通过缓存的位置信息获取每个分区的位置信息
- 如果RDD有偏向位置,通过preferredLocations获取每个分区的位置信息
- 遍历RDD的所有是NarrowDependency的父RDD,返回父RDD的偏向位置
val taskIdToLocations: Map[Int, Seq[TaskLocation]] =
stage match {
case s: ShuffleMapStage =>
partitionsToCompute.map { id => (id, getPreferredLocs(stage.rdd, id))}.toMap
case s: ResultStage =>
partitionsToCompute.map { id =>
val p = s.partitions(id)
(id, getPreferredLocs(stage.rdd, p))
}.toMap
}
- 构建任务的广播变量
广播序列化的RDD,每个任务反序列化拿到一个RDD的副本,保证任务之间的隔离,在非线程安全的情况下非常必要。
taskBinaryBytes = stage match {
case stage: ShuffleMapStage =>
JavaUtils.bufferToArray(
closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef))
case stage: ResultStage =>
JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef))
}
taskBinary = sc.broadcast(taskBinaryBytes)
- 构建task
val tasks: Seq[Task[_]] = {
val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
stage match {
case stage: ShuffleMapStage =>
stage.pendingPartitions.clear()
partitionsToCompute.map { id =>
val locs = taskIdToLocations(id)
val part = partitions(id)
stage.pendingPartitions += id
new ShuffleMapTask(stage.id, stage.latestInfo.attemptNumber,
taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId),
Option(sc.applicationId), sc.applicationAttemptId, stage.rdd.isBarrier())
}
case stage: ResultStage =>
partitionsToCompute.map { id =>
val p: Int = stage.partitions(id)
val part = partitions(p)
val locs = taskIdToLocations(id)
new ResultTask(stage.id, stage.latestInfo.attemptNumber,
taskBinary, part, locs, id, properties, serializedTaskMetrics,
Option(jobId), Option(sc.applicationId), sc.applicationAttemptId,
stage.rdd.isBarrier())
}
}
}
- 构建TaskSet并提交
taskScheduler.submitTasks(new TaskSet(
tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))
TaskSchedulerImpl
初始化
调度池初始化
def initialize(backend: SchedulerBackend): Unit = {
this.backend = backend
schedulableBuilder = {
schedulingMode match {
case SchedulingMode.FIFO =>
new FIFOSchedulableBuilder(rootPool)
case SchedulingMode.FAIR =>
new FairSchedulableBuilder(rootPool, conf)
case _ =>
throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " +
s"$schedulingMode")
}
}
schedulableBuilder.buildPools()
}
启动CoarseGrainedSchedulerBackend,假如配置了推测执行的话会启动一个周期定时器,周期性检测 检测需要推测执行的task()。
override def start(): Unit = {
backend.start()
if (!isLocal && conf.get(SPECULATION_ENABLED)) {
logInfo("Starting speculative execution thread")
speculationScheduler.scheduleWithFixedDelay(
() => Utils.tryOrStopSparkContext(sc) { checkSpeculatableTasks() },
SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
}
}
主要成员
TaskSetManager:对Task进行调度,包括任务推断、Task本地性,并对Task进行资源分配。
- 决定在某个executor上是否启动及启动哪个task
- 为了达到Locality aware,将Task的调度做相应的延迟
- 当一个Task失败的时候,在约定的失败次数之内时,将Task重新提交
- 处理拖后腿的task
schedulableBuilder: 对taskset进行调度,spark的调度模式分为两种:FIFO(先进先出)和FAIR(公平调度)。默认是FIFO,即谁先提交谁先执行,而FAIR支持在调度池中再进行分组,可以有不同的权重,根据权重、资源等来决定谁先执行。spark的调度模式可以通过spark.scheduler.mode进行设置。
CoarseGrainedSchedulerBackend: 获取集群内可用资源的情况,并将分布式任务分发给Executor。
submitTasks
走到这一步可以看到打印日志
logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
- 将TaskSetScheduluer,taskSet,最大失败次数构建TaskSetManager
val manager = createTaskSetManager(taskSet, maxTaskFailures)
- 将TaskSetManager加入调度器
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
- 资源分配,调度任务
backend.reviveOffers()
CoarseGrainedSchedulerBackend##makeOffers()
它将集群的资源以workOffer的方式发给上层的TaskSchedulerImpl。TaskSchedulerImpl调用scheduler.resourceOffers获得要被执行的Seq[TaskDescription],然后将得到的Seq[TaskDescription]交给CoarseGrainedSchedulerBackend分发到各个executor上执行。
- 获取executor上的可用资源,创建WorkOffer
val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
val workOffers = activeExecutors.map {
case (id, executorData) =>
//代表一个executor上的可用资源(这里仅可用cores)
new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
Some(executorData.executorAddress.hostPort),
executorData.resourcesInfo.map { case (rName, rInfo) =>
(rName, rInfo.availableAddrs.toBuffer)
})
}.toIndexedSeq
- 分配资源,即找出在哪些exec上启动哪些task
scheduler.resourceOffers(workOffers)
- 运行任务
if (taskDescs.nonEmpty) {
launchTasks(taskDescs)
}
resourceOffers
- 标记executor与host关系
- 随机打乱资源的顺序,目的是为了分配tasks能负载均衡,分配tasks时,从打乱的workers的序列的0下标开始判断是否能在worker上启动task
val shuffledOffers = shuffleOffers(filteredOffers)
- 根据每个workoffer的核数创建同等数量的任务描述List[workerId, ArrayBuffer[TaskDescription]]
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
- 获取所有空闲的cpu
val availableCpus = shuffledOffers.map(o => o.cores).toArray
- 获取排序好的TaskSet,假如有新加入的Executor,为其重新计算本地性
val sortedTaskSets = rootPool.getSortedTaskSetQueue
for (taskSet <- sortedTaskSets) {
if (newExecAvail) {
taskSet.executorAdded()
}
}
- 取出一个TaskSet,遍历所有的调度位置,对于locality从高到底,遍历所有worker,判断哪些tasks可以在哪些worker上启动。
for (taskSet <- sortedTaskSets) {
for (currentMaxLocality <- taskSet.myLocalityLevels) {
var launchedTaskAtCurrentMaxLocality = false
do {
launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(taskSet,
currentMaxLocality, shuffledOffers, availableCpus,
availableResources, tasks, addressesWithDescs)
launchedAnyTask |= launchedTaskAtCurrentMaxLocality
} while (launchedTaskAtCurrentMaxLocality)
}
}
resourceOfferSingleTaskSet
- 遍历每个worker的可用cores,如果可用cores大于task需要的cores数(即CPUS_PER_TASK),进入2
- 调用
taskSet.resourceOffer(execId, host, maxLocality)获取可在指定executor上启动的task,若返回非空,把返回的task加到最终的tasks: Seq[ArrayBuffer[TaskDescription]]中,该结构保存要在哪些worker上启动哪些tasks - 减少2中分配了task的worker的可用cores及更新其他信息
private def resourceOfferSingleTaskSet(
taskSet: TaskSetManager,
maxLocality: TaskLocality,
shuffledOffers: Seq[WorkerOffer],
availableCpus: Array[Int],
tasks: Seq[ArrayBuffer[TaskDescription]]) : Boolean = {
var launchedTask = false
//< 获取每个worker上要执行的tasks序列
for (i <- 0 until shuffledOffers.size) {
val execId = shuffledOffers(i).executorId
val host = shuffledOffers(i).host
if (availableCpus(i) >= CPUS_PER_TASK) {
try {
for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
//< 将获得要在index为i的worker上执行的task,添加到tasks(i)中;这样就知道了要在哪个worker上执行哪些tasks了
tasks(i) += task
availableCpus(i) -= CPUS_PER_TASK
assert(availableCpus(i) >= 0)
launchedTask = true
}
} catch {
case e: TaskNotSerializableException =>
return launchedTask
}
}
}
return launchedTask
}
TaskSetManager##resourceOffer
- 获得能容忍的最差locality
if (maxLocality != TaskLocality.NO_PREF) {
allowedLocality = getAllowedLocalityLevel(curTime)
if (allowedLocality > maxLocality) {
// We're not allowed to search for farther-away tasks
allowedLocality = maxLocality
}
}
- 根据locality,execId,host分配task,如果找到合适task,进行各种信息更新,并通知DAGScheduler,最后返回TaskDescription
打印日志:
logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")
dequeueTask(execId, host, allowedLocality).map { case ((index, taskLocality, speculative)) =>
// Found a task; do some bookkeeping and return a task description
val task = tasks(index)
val taskId = sched.newTaskId()
// Do various bookkeeping
copiesRunning(index) += 1
val attemptNum = taskAttempts(index).size
val info = new TaskInfo(taskId, index, attemptNum, curTime,
execId, host, taskLocality, speculative)
taskInfos(taskId) = info
taskAttempts(index) = info :: taskAttempts(index)
// Update our locality level for delay scheduling
// NO_PREF will not affect the variables related to delay scheduling
if (maxLocality != TaskLocality.NO_PREF) {
currentLocalityIndex = getLocalityIndex(taskLocality)
lastLaunchTime = curTime
}
// Serialize and return the task
val serializedTask: ByteBuffer = try {
ser.serialize(task)
} catch {
// If the task cannot be serialized, then there's no point to re-attempt the task,
// as it will always fail. So just abort the whole task-set.
case NonFatal(e) =>
val msg = s"Failed to serialize task $taskId, not attempting to retry it."
logError(msg, e)
abort(s"$msg Exception during serialization: $e")
throw new TaskNotSerializableException(e)
}
if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KIB * 1024 &&
!emittedTaskSizeWarning) {
emittedTaskSizeWarning = true
logWarning(s"Stage ${task.stageId} contains a task of very large size " +
s"(${serializedTask.limit() / 1024} KiB). The maximum recommended task size is " +
s"${TaskSetManager.TASK_SIZE_TO_WARN_KIB} KiB.")
}
addRunningTask(taskId)
// We used to log the time it takes to serialize the task, but task size is already
// a good proxy to task serialization time.
// val timeTaken = clock.getTime() - startTime
val taskName = s"task ${info.id} in stage ${taskSet.id}"
logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")
sched.dagScheduler.taskStarted(task, info)
new TaskDescription(
taskId,
attemptNum,
execId,
taskName,
index,
task.partitionId,
addedFiles,
addedJars,
task.localProperties,
extraResources,
serializedTask)
}
TaskSetManager##getAllowedLocalityLevel
tasksNeedToBeScheduledFrom:就是根据传递进来的task列表,判断是否还有task需要被调度(判断的规则就是该task的index还没有加入到copiesRunning,即没有被分配过),就返回true,没有就从列表中删除task,并且再从末尾重新取,循环这个过程。
moreTasksToRunIn:对于不同等级的 locality level 的 tasks 列表,将已经成功执行的或正在执行的该 locality level 的 task 从对应的列表中移除;判断对应的 locality level 的 task 是否还要等待执行的,若有则返回 true,否则返回 false。
private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = {
// Remove the scheduled or finished tasks lazily
def tasksNeedToBeScheduledFrom(pendingTaskIds: ArrayBuffer[Int]): Boolean = {
var indexOffset = pendingTaskIds.size
while (indexOffset > 0) {
indexOffset -= 1
val index = pendingTaskIds(indexOffset)
//copiesRunning在任务被调度后就为1 或者任务失败了
if (copiesRunning(index) == 0 && !successful(index)) {
return true
} else {
pendingTaskIds.remove(indexOffset)
}
}
false
}
// Walk through the list of tasks that can be scheduled at each location and returns true
// if there are any tasks that still need to be scheduled. Lazily cleans up tasks that have
// already been scheduled.
//遍历task列表,如果有未被调度的task就返回true,移除已经被调度的任务
def moreTasksToRunIn(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = {
val emptyKeys = new ArrayBuffer[String]
val hasTasks = pendingTasks.exists {
case (id: String, tasks: ArrayBuffer[Int]) =>
if (tasksNeedToBeScheduledFrom(tasks)) {
true
} else {
emptyKeys += id
false
}
}
// The key could be executorId, host or rackId
emptyKeys.foreach(id => pendingTasks.remove(id))
hasTasks
}
while (currentLocalityIndex < myLocalityLevels.length - 1) {
val moreTasks = myLocalityLevels(currentLocalityIndex) match {
case TaskLocality.PROCESS_LOCAL => moreTasksToRunIn(pendingTasks.forExecutor)
case TaskLocality.NODE_LOCAL => moreTasksToRunIn(pendingTasks.forHost)
case TaskLocality.NO_PREF => pendingTasks.noPrefs.nonEmpty
case TaskLocality.RACK_LOCAL => moreTasksToRunIn(pendingTasks.forRack)
}
if (!moreTasks) {
// This is a performance optimization: if there are no more tasks that can
// be scheduled at a particular locality level, there is no point in waiting
// for the locality wait timeout (SPARK-4939).
lastLaunchTime = curTime
logDebug(s"No tasks for locality level ${myLocalityLevels(currentLocalityIndex)}, " +
s"so moving to locality level ${myLocalityLevels(currentLocalityIndex + 1)}")
currentLocalityIndex += 1
} else if (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex)) {
// Jump to the next locality level, and reset lastLaunchTime so that the next locality
// wait timer doesn't immediately expire
lastLaunchTime += localityWaits(currentLocalityIndex)
logDebug(s"Moving to ${myLocalityLevels(currentLocalityIndex + 1)} after waiting for " +
s"${localityWaits(currentLocalityIndex)}ms")
currentLocalityIndex += 1
} else {
return myLocalityLevels(currentLocalityIndex)
}
}
myLocalityLevels(currentLocalityIndex)
}
整个循环体都在做这几个事情:
- 判断 myLocalityLevels(currentLocalityIndex) 这个级别的本地性对应的待执行 tasks 集合中是否还有待执行的 task。
- 若无;locality level 降低一级继续循环。
- 若有,且当前时间与上次提交时间间隔小于当前locality对应的延迟时间(通过spark.locality.wait.process或spark.locality.wait.node或spark.locality.wait.rack配置),则 currentLocalityIndex 不变,返回myLocalityLevels(currentLocalityIndex)。这里是延迟调度的关键,只要当前时间与上一次以某个 locality level 启动 task 的时间只差小于配置的值,不管上次是否成功启动了 task,这一次仍然以上次的 locality level 来启动 task。
- 若有,且当前时间与上次getAllowedLocalityLevel返回 myLocalityLevels(currentLocalityIndex) 时间间隔大于当前locality对应的延迟时间,则locality level 降低一级继续循环。
Spark延迟调度策略
for (taskSet <- sortedTaskSets) {
for (currentMaxLocality <- taskSet.myLocalityLevels) {
var launchedTaskAtCurrentMaxLocality = false
do {
launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(taskSet,
currentMaxLocality, shuffledOffers, availableCpus,
availableResources, tasks, addressesWithDescs)
launchedAnyTask |= launchedTaskAtCurrentMaxLocality
} while (launchedTaskAtCurrentMaxLocality)
}
}
- 假设有一个job,yarn模式,10个executor[exec1-exec10],对应host1--host10,读取HDFS数据,无缓存,那么优先级基本上是NODE_LOCAL,RACK_LOCAL,ANY三种。比如在处理NODE_LOCAL级别的task时最后一个任务分配给了exec2,此时时间为time1,pendingTasksForHost上已经没有host1-host10的任务列表了或者列表都是空了,那么后面8个executor就都没有分配到task。这时launchedTaskAtCurrentMaxLocality为true,所以所以还会进行第二次的executor列表遍历,第二次遍历所有的列表依旧没有分配任务,那么就会进行RACK_LOCAL级别的任务调度。
if (!moreTasks) {
// This is a performance optimization: if there are no more tasks that can
// be scheduled at a particular locality level, there is no point in waiting
// for the locality wait timeout (SPARK-4939).
lastLaunchTime = curTime
logDebug(s"No tasks for locality level ${myLocalityLevels(currentLocalityIndex)}, " +
s"so moving to locality level ${myLocalityLevels(currentLocalityIndex + 1)}")
currentLocalityIndex += 1
} else if (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex)) {
// Jump to the next locality level, and reset lastLaunchTime so that the next locality
// wait timer doesn't immediately expire
lastLaunchTime += localityWaits(currentLocalityIndex)
logDebug(s"Moving to ${myLocalityLevels(currentLocalityIndex + 1)} after waiting for " +
s"${localityWaits(currentLocalityIndex)}ms")
currentLocalityIndex += 1
} else {
return myLocalityLevels(currentLocalityIndex)
}
}
- 因为HDFS数据会分布在很多个主机上,有可能某个task的数据在host11-host13上,所以此时还有可能存在host11-host13对应的tasks列表有未调度的task(不在copiesRunning中),那么在进行遍历时,spark认为NODE_LOCAL还有需要分配的task,所以TaskSetManager中的allowedLocality仍旧是NODE_LOCAL,在TaskSetManager内部进行任务选取的时候仍然是按照NODE_LOCAL进行选取的(从pendingTasksForHost而不是pendingTasksForRack中选取),那么这个时候所有的executor就会都分配不到任务,但是如果在遍历到exec8时,当前时间currTime-time1>NODE_LOCAL waitTime,那么此时TaskSetManager自身的allowedLocality就会调到RACK_LOCAL,而exec8就会分到一个同机架的RACK_LOCAL task,之后的executor就都会按照RACK_LOCAL来获取task了。
CoarseGrainedSchedulerBackend
if (TaskState.isFinished(state)) {
executorDataMap.get(executorId) match {
case Some(executorInfo) =>
executorInfo.freeCores += scheduler.CPUS_PER_TASK
resources.foreach { case (k, v) =>
executorInfo.resourcesInfo.get(k).foreach { r =>
r.release(v.addresses)
}
}
makeOffers(executorId)
case None =>
// Ignoring the update since we don't know about the executor.
logWarning(s"Ignored task status update ($taskId state $state) " +
s"from unknown executor with ID $executorId")
}
}
private def makeOffers(executorId: String): Unit = {
// Make sure no executor is killed while some task is launching on it
val taskDescs = withLock {
// Filter out executors under killing
if (executorIsAlive(executorId)) {
val executorData = executorDataMap(executorId)
val workOffers = IndexedSeq(
new WorkerOffer(executorId, executorData.executorHost, executorData.freeCores,
Some(executorData.executorAddress.hostPort),
executorData.resourcesInfo.map { case (rName, rInfo) =>
(rName, rInfo.availableAddrs.toBuffer)
}))
scheduler.resourceOffers(workOffers)
} else {
Seq.empty
}
}
if (taskDescs.nonEmpty) {
launchTasks(taskDescs)
}
}
- 如果确实是cores<tasknum,那么在该executor执行完一个task之后,会单独的对这个executor进行一次任务分配,将这个本地任务分配给该executor,如果这时已经没有了pendingTasksForHost中已经没有待调度的task了,那么TaskSetManager将会下调到RACK_LOCAL,其它executor上也就能分配到RACK_LOCAL级别的task了,所以这里这个等待策略就是指本地策略级别LocalityLevel所能等待的任务分配最大时间,超过这个时间之后,TaskSetManager将自动下调至下一个级别,进行下一个级别的task调度。
参考
cloud.tencent.com/developer/a…
zhuanlan.zhihu.com/p/541505732
任务提交
CoarseGrainedSchedulerBackend##launchTasks
进行资源分配,并向executor发送任务
for (task <- tasks.flatten) {
val serializedTask = TaskDescription.encode(task)
if (serializedTask.limit() >= maxRpcMessageSize) {
Option(scheduler.taskIdToTaskSetManager.get(task.taskId)).foreach { taskSetMgr =>
try {
var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
s"${RPC_MESSAGE_MAX_SIZE.key} (%d bytes). Consider increasing " +
s"${RPC_MESSAGE_MAX_SIZE.key} or using broadcast variables for large values."
msg = msg.format(task.taskId, task.index, serializedTask.limit(), maxRpcMessageSize)
taskSetMgr.abort(msg)
} catch {
case e: Exception => logError("Exception in error callback", e)
}
}
}
else {
val executorData = executorDataMap(task.executorId)
// Do resources allocation here. The allocated resources will get released after the task
// finishes.
executorData.freeCores -= scheduler.CPUS_PER_TASK
task.resources.foreach { case (rName, rInfo) =>
assert(executorData.resourcesInfo.contains(rName))
executorData.resourcesInfo(rName).acquire(rInfo.addresses)
}
logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " +
s"${executorData.executorHost}.")
executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
}
}
CoarseGrainedExecutorBackend##receive
收到driver发送的LaunchTask消息
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
taskResources(taskDesc.taskId) = taskDesc.resources
executor.launchTask(this, taskDesc)
}
Executor##launchTask
调用TaskRunner的run方法
case LaunchTask(data) =>
if (executor == null) {
exitExecutor(1, "Received LaunchTask command but executor was null")
} else {
val taskDesc = TaskDescription.decode(data.value)
logInfo("Got assigned task " + taskDesc.taskId)
taskResources(taskDesc.taskId) = taskDesc.resources
executor.launchTask(this, taskDesc)
}
TaskRunner##run 打印日志
logInfo(s"Running $taskName (TID $taskId)")
- 状态更新为running
execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
- 任务还原
- 更新依赖文件或jar包
- 对序列化的serializedTask执行反序列化操作
updateDependencies(taskDescription.addedFiles, taskDescription.addedJars)
task = ser.deserialize[Task[Any]](
taskDescription.serializedTask, Thread.currentThread.getContextClassLoader)
- 任务运行 调用Task的run方法
task.run(
taskAttemptId = taskId,
attemptNumber = taskDescription.attemptNumber,
metricsSystem = env.metricsSystem,
resources = taskDescription.resources)
Task##run
最终调用Task的runTask方法,分为ResultTask和ShuffleMapTask,接下来就是shuffle
runTask(context)