Spark作业与任务调度

253 阅读8分钟

流程简单介绍

image.png

  1. RDD转换以及DAG构建
  2. DAGScheduler:
    • 划分stage
    • 创建并提交task
  3. TaskScheduler: 调度task,按照调度池优先级,本地性等进行任务调度
  4. 执行任务

DAGScheduler

(这里仅显示主要方法)
handleJobSubmitted

  1. 创建finalStage并划分stage
finalStage = createResultStage(finalRDD, func, partitions, jobId, callSite)
  1. 创建job
val job = new ActiveJob(jobId, finalStage, callSite, listener, properties)
  1. 递归提交所有未被计算的stage
submitStage(finalStage)

submitStage

  1. 找到所有不可用(也就是未提交)的父stage,找父stage的方案是DFS找到RDD的宽依赖
val missing = getMissingParentStages(stage).sortBy(_.id)  

判断stage可用,已经输出计算结果的分区任务数量和分区数一样
2. 假如该stage没有未提交的父Stage,直接提交该Stage

submitMissingTasks(stage, jobId.get)
  1. 假如有未提交的父Stage,继续递归调用遍历
submitStage(parent)

submitMissingTasks

  1. 获取还未计算的分区
val partitionsToCompute: Seq[Int] = stage.findMissingPartitions()
  1. 计算任务本地性
    getPreferredLocsInternal 方法会分为三种情况
    • 如果RDD被缓存,通过缓存的位置信息获取每个分区的位置信息
    • 如果RDD有偏向位置,通过preferredLocations获取每个分区的位置信息
    • 遍历RDD的所有是NarrowDependency的父RDD,返回父RDD的偏向位置
val taskIdToLocations: Map[Int, Seq[TaskLocation]] =  
  stage match {
    case s: ShuffleMapStage =>
      partitionsToCompute.map { id => (id, getPreferredLocs(stage.rdd, id))}.toMap
    case s: ResultStage =>
      partitionsToCompute.map { id =>
        val p = s.partitions(id)
        (id, getPreferredLocs(stage.rdd, p))
      }.toMap
  }
  1. 构建任务的广播变量
    广播序列化的RDD,每个任务反序列化拿到一个RDD的副本,保证任务之间的隔离,在非线程安全的情况下非常必要。
taskBinaryBytes = stage match {
  case stage: ShuffleMapStage =>
    JavaUtils.bufferToArray(
      closureSerializer.serialize((stage.rdd, stage.shuffleDep): AnyRef))
  case stage: ResultStage =>
    JavaUtils.bufferToArray(closureSerializer.serialize((stage.rdd, stage.func): AnyRef))
}
taskBinary = sc.broadcast(taskBinaryBytes)
  1. 构建task
val tasks: Seq[Task[_]] = {
  val serializedTaskMetrics = closureSerializer.serialize(stage.latestInfo.taskMetrics).array()
  stage match {
    case stage: ShuffleMapStage =>
      stage.pendingPartitions.clear()
      partitionsToCompute.map { id =>
        val locs = taskIdToLocations(id)
        val part = partitions(id)
        stage.pendingPartitions += id
        new ShuffleMapTask(stage.id, stage.latestInfo.attemptNumber,
          taskBinary, part, locs, properties, serializedTaskMetrics, Option(jobId),
          Option(sc.applicationId), sc.applicationAttemptId, stage.rdd.isBarrier())
      }

    case stage: ResultStage =>
      partitionsToCompute.map { id =>
        val p: Int = stage.partitions(id)
        val part = partitions(p)
        val locs = taskIdToLocations(id)
        new ResultTask(stage.id, stage.latestInfo.attemptNumber,
          taskBinary, part, locs, id, properties, serializedTaskMetrics,
          Option(jobId), Option(sc.applicationId), sc.applicationAttemptId,
          stage.rdd.isBarrier())
      }
  }
}
  1. 构建TaskSet并提交
taskScheduler.submitTasks(new TaskSet(
  tasks.toArray, stage.id, stage.latestInfo.attemptNumber, jobId, properties))

TaskSchedulerImpl

初始化
调度池初始化

  def initialize(backend: SchedulerBackend): Unit = {
    this.backend = backend
    schedulableBuilder = {
      schedulingMode match {
        case SchedulingMode.FIFO =>
          new FIFOSchedulableBuilder(rootPool)
        case SchedulingMode.FAIR =>
          new FairSchedulableBuilder(rootPool, conf)
        case _ =>
          throw new IllegalArgumentException(s"Unsupported $SCHEDULER_MODE_PROPERTY: " +
          s"$schedulingMode")
      }
    }
    schedulableBuilder.buildPools()
  }

启动CoarseGrainedSchedulerBackend,假如配置了推测执行的话会启动一个周期定时器,周期性检测 检测需要推测执行的task()。

 override def start(): Unit = {
  backend.start()
  if (!isLocal && conf.get(SPECULATION_ENABLED)) {
    logInfo("Starting speculative execution thread")
    speculationScheduler.scheduleWithFixedDelay(
      () => Utils.tryOrStopSparkContext(sc) { checkSpeculatableTasks() },
      SPECULATION_INTERVAL_MS, SPECULATION_INTERVAL_MS, TimeUnit.MILLISECONDS)
  }
} 

主要成员

TaskSetManager:对Task进行调度,包括任务推断、Task本地性,并对Task进行资源分配。

  • 决定在某个executor上是否启动及启动哪个task
  • 为了达到Locality aware,将Task的调度做相应的延迟
  • 当一个Task失败的时候,在约定的失败次数之内时,将Task重新提交
  • 处理拖后腿的task

schedulableBuilder: 对taskset进行调度,spark的调度模式分为两种:FIFO(先进先出)和FAIR(公平调度)。默认是FIFO,即谁先提交谁先执行,而FAIR支持在调度池中再进行分组,可以有不同的权重,根据权重、资源等来决定谁先执行。spark的调度模式可以通过spark.scheduler.mode进行设置。

cloud.tencent.com/developer/a…

CoarseGrainedSchedulerBackend: 获取集群内可用资源的情况,并将分布式任务分发给Executor。

submitTasks
走到这一步可以看到打印日志

logInfo("Adding task set " + taskSet.id + " with " + tasks.length + " tasks")
  1. 将TaskSetScheduluer,taskSet,最大失败次数构建TaskSetManager
val manager = createTaskSetManager(taskSet, maxTaskFailures)
  1. 将TaskSetManager加入调度器
schedulableBuilder.addTaskSetManager(manager, manager.taskSet.properties)
  1. 资源分配,调度任务
backend.reviveOffers()

CoarseGrainedSchedulerBackend##makeOffers()
它将集群的资源以workOffer的方式发给上层的TaskSchedulerImpl。TaskSchedulerImpl调用scheduler.resourceOffers获得要被执行的Seq[TaskDescription],然后将得到的Seq[TaskDescription]交给CoarseGrainedSchedulerBackend分发到各个executor上执行。

  1. 获取executor上的可用资源,创建WorkOffer
       val activeExecutors = executorDataMap.filterKeys(executorIsAlive)
       val workOffers = activeExecutors.map {
         case (id, executorData) =>
           //代表一个executor上的可用资源(这里仅可用cores)
           new WorkerOffer(id, executorData.executorHost, executorData.freeCores,
             Some(executorData.executorAddress.hostPort),
             executorData.resourcesInfo.map { case (rName, rInfo) =>
               (rName, rInfo.availableAddrs.toBuffer)
             })
       }.toIndexedSeq 
  1. 分配资源,即找出在哪些exec上启动哪些task
scheduler.resourceOffers(workOffers)       
  1. 运行任务
   if (taskDescs.nonEmpty) {
        launchTasks(taskDescs)
      }

resourceOffers

  1. 标记executor与host关系
  2. 随机打乱资源的顺序,目的是为了分配tasks能负载均衡,分配tasks时,从打乱的workers的序列的0下标开始判断是否能在worker上启动task
val shuffledOffers = shuffleOffers(filteredOffers)
  1. 根据每个workoffer的核数创建同等数量的任务描述List[workerId, ArrayBuffer[TaskDescription]]
val tasks = shuffledOffers.map(o => new ArrayBuffer[TaskDescription](o.cores / CPUS_PER_TASK))
  1. 获取所有空闲的cpu
val availableCpus = shuffledOffers.map(o => o.cores).toArray
  1. 获取排序好的TaskSet,假如有新加入的Executor,为其重新计算本地性
val sortedTaskSets = rootPool.getSortedTaskSetQueue
for (taskSet <- sortedTaskSets) {
  if (newExecAvail) {
    taskSet.executorAdded()
  }
}
  1. 取出一个TaskSet,遍历所有的调度位置,对于locality从高到底,遍历所有worker,判断哪些tasks可以在哪些worker上启动。
    for (taskSet <- sortedTaskSets) {
        for (currentMaxLocality <- taskSet.myLocalityLevels) {
          var launchedTaskAtCurrentMaxLocality = false
          do {
            launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(taskSet,
              currentMaxLocality, shuffledOffers, availableCpus,
              availableResources, tasks, addressesWithDescs)
            launchedAnyTask |= launchedTaskAtCurrentMaxLocality
          } while (launchedTaskAtCurrentMaxLocality)
        }
}

resourceOfferSingleTaskSet

  1. 遍历每个worker的可用cores,如果可用cores大于task需要的cores数(即CPUS_PER_TASK),进入2
  2. 调用taskSet.resourceOffer(execId, host, maxLocality)获取可在指定executor上启动的task,若返回非空,把返回的task加到最终的tasks: Seq[ArrayBuffer[TaskDescription]]中,该结构保存要在哪些worker上启动哪些tasks
  3. 减少2中分配了task的worker的可用cores及更新其他信息
private def resourceOfferSingleTaskSet(
    taskSet: TaskSetManager,
    maxLocality: TaskLocality,
    shuffledOffers: Seq[WorkerOffer],
    availableCpus: Array[Int],
    tasks: Seq[ArrayBuffer[TaskDescription]]) : Boolean = {
  var launchedTask = false

  //< 获取每个worker上要执行的tasks序列
  for (i <- 0 until shuffledOffers.size) {
    val execId = shuffledOffers(i).executorId
    val host = shuffledOffers(i).host
    if (availableCpus(i) >= CPUS_PER_TASK) {
      try {
        for (task <- taskSet.resourceOffer(execId, host, maxLocality)) {
          //< 将获得要在index为i的worker上执行的task,添加到tasks(i)中;这样就知道了要在哪个worker上执行哪些tasks了
          tasks(i) += task

          availableCpus(i) -= CPUS_PER_TASK
          assert(availableCpus(i) >= 0)
          launchedTask = true
        }
      } catch {
        case e: TaskNotSerializableException =>
          return launchedTask
      }
    }
  }
  return launchedTask
}

TaskSetManager##resourceOffer

  1. 获得能容忍的最差locality
if (maxLocality != TaskLocality.NO_PREF) {
  allowedLocality = getAllowedLocalityLevel(curTime)
  if (allowedLocality > maxLocality) {
    // We're not allowed to search for farther-away tasks
    allowedLocality = maxLocality
  }
}
  1. 根据locality,execId,host分配task,如果找到合适task,进行各种信息更新,并通知DAGScheduler,最后返回TaskDescription
    打印日志:
logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
    s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")
dequeueTask(execId, host, allowedLocality).map { case ((index, taskLocality, speculative)) =>
  // Found a task; do some bookkeeping and return a task description
  val task = tasks(index)
  val taskId = sched.newTaskId()
  // Do various bookkeeping
  copiesRunning(index) += 1
  val attemptNum = taskAttempts(index).size
  val info = new TaskInfo(taskId, index, attemptNum, curTime,
    execId, host, taskLocality, speculative)
  taskInfos(taskId) = info
  taskAttempts(index) = info :: taskAttempts(index)
  // Update our locality level for delay scheduling
  // NO_PREF will not affect the variables related to delay scheduling
  if (maxLocality != TaskLocality.NO_PREF) {
    currentLocalityIndex = getLocalityIndex(taskLocality)
    lastLaunchTime = curTime
  }
  // Serialize and return the task
  val serializedTask: ByteBuffer = try {
    ser.serialize(task)
  } catch {
    // If the task cannot be serialized, then there's no point to re-attempt the task,
    // as it will always fail. So just abort the whole task-set.
    case NonFatal(e) =>
      val msg = s"Failed to serialize task $taskId, not attempting to retry it."
      logError(msg, e)
      abort(s"$msg Exception during serialization: $e")
      throw new TaskNotSerializableException(e)
  }
  if (serializedTask.limit() > TaskSetManager.TASK_SIZE_TO_WARN_KIB * 1024 &&
    !emittedTaskSizeWarning) {
    emittedTaskSizeWarning = true
    logWarning(s"Stage ${task.stageId} contains a task of very large size " +
      s"(${serializedTask.limit() / 1024} KiB). The maximum recommended task size is " +
      s"${TaskSetManager.TASK_SIZE_TO_WARN_KIB} KiB.")
  }
  addRunningTask(taskId)

  // We used to log the time it takes to serialize the task, but task size is already
  // a good proxy to task serialization time.
  // val timeTaken = clock.getTime() - startTime
  val taskName = s"task ${info.id} in stage ${taskSet.id}"
  logInfo(s"Starting $taskName (TID $taskId, $host, executor ${info.executorId}, " +
    s"partition ${task.partitionId}, $taskLocality, ${serializedTask.limit()} bytes)")


  sched.dagScheduler.taskStarted(task, info)
  new TaskDescription(
    taskId,
    attemptNum,
    execId,
    taskName,
    index,
    task.partitionId,
    addedFiles,
    addedJars,
    task.localProperties,
    extraResources,
    serializedTask)
}

TaskSetManager##getAllowedLocalityLevel
tasksNeedToBeScheduledFrom:就是根据传递进来的task列表,判断是否还有task需要被调度(判断的规则就是该task的index还没有加入到copiesRunning,即没有被分配过),就返回true,没有就从列表中删除task,并且再从末尾重新取,循环这个过程。
moreTasksToRunIn:对于不同等级的 locality level 的 tasks 列表,将已经成功执行的或正在执行的该 locality level 的 task 从对应的列表中移除;判断对应的 locality level 的 task 是否还要等待执行的,若有则返回 true,否则返回 false。

  private def getAllowedLocalityLevel(curTime: Long): TaskLocality.TaskLocality = {
    // Remove the scheduled or finished tasks lazily
    def tasksNeedToBeScheduledFrom(pendingTaskIds: ArrayBuffer[Int]): Boolean = {
      var indexOffset = pendingTaskIds.size
      while (indexOffset > 0) {
        indexOffset -= 1
        val index = pendingTaskIds(indexOffset)
        //copiesRunning在任务被调度后就为1 或者任务失败了
        if (copiesRunning(index) == 0 && !successful(index)) {
          return true
        } else {
          pendingTaskIds.remove(indexOffset)
        }
      }
      false
    }
    // Walk through the list of tasks that can be scheduled at each location and returns true
    // if there are any tasks that still need to be scheduled. Lazily cleans up tasks that have
    // already been scheduled.
    //遍历task列表,如果有未被调度的task就返回true,移除已经被调度的任务
    def moreTasksToRunIn(pendingTasks: HashMap[String, ArrayBuffer[Int]]): Boolean = {
      val emptyKeys = new ArrayBuffer[String]
      val hasTasks = pendingTasks.exists {
        case (id: String, tasks: ArrayBuffer[Int]) =>
          if (tasksNeedToBeScheduledFrom(tasks)) {
            true
          } else {
            emptyKeys += id
            false
          }
      }
      // The key could be executorId, host or rackId
      emptyKeys.foreach(id => pendingTasks.remove(id))
      hasTasks
    }

    while (currentLocalityIndex < myLocalityLevels.length - 1) {
      val moreTasks = myLocalityLevels(currentLocalityIndex) match {
        case TaskLocality.PROCESS_LOCAL => moreTasksToRunIn(pendingTasks.forExecutor)
        case TaskLocality.NODE_LOCAL => moreTasksToRunIn(pendingTasks.forHost)
        case TaskLocality.NO_PREF => pendingTasks.noPrefs.nonEmpty
        case TaskLocality.RACK_LOCAL => moreTasksToRunIn(pendingTasks.forRack)
      }
      if (!moreTasks) {
        // This is a performance optimization: if there are no more tasks that can
        // be scheduled at a particular locality level, there is no point in waiting
        // for the locality wait timeout (SPARK-4939).
        lastLaunchTime = curTime
        logDebug(s"No tasks for locality level ${myLocalityLevels(currentLocalityIndex)}, " +
          s"so moving to locality level ${myLocalityLevels(currentLocalityIndex + 1)}")
        currentLocalityIndex += 1
      } else if (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex)) {
        // Jump to the next locality level, and reset lastLaunchTime so that the next locality
        // wait timer doesn't immediately expire
        lastLaunchTime += localityWaits(currentLocalityIndex)
        logDebug(s"Moving to ${myLocalityLevels(currentLocalityIndex + 1)} after waiting for " +
          s"${localityWaits(currentLocalityIndex)}ms")
        currentLocalityIndex += 1
      } else {
        return myLocalityLevels(currentLocalityIndex)
      }
    }
    myLocalityLevels(currentLocalityIndex)
  }

整个循环体都在做这几个事情:

  1. 判断 myLocalityLevels(currentLocalityIndex) 这个级别的本地性对应的待执行 tasks 集合中是否还有待执行的 task。
  2. 若无;locality level 降低一级继续循环。
  3. 若有,且当前时间与上次提交时间间隔小于当前locality对应的延迟时间(通过spark.locality.wait.process或spark.locality.wait.node或spark.locality.wait.rack配置),则 currentLocalityIndex 不变,返回myLocalityLevels(currentLocalityIndex)。这里是延迟调度的关键,只要当前时间与上一次以某个 locality level 启动 task 的时间只差小于配置的值,不管上次是否成功启动了 task,这一次仍然以上次的 locality level 来启动 task。
  4. 若有,且当前时间与上次getAllowedLocalityLevel返回 myLocalityLevels(currentLocalityIndex) 时间间隔大于当前locality对应的延迟时间,则locality level 降低一级继续循环。

Spark延迟调度策略

    for (taskSet <- sortedTaskSets) {
        for (currentMaxLocality <- taskSet.myLocalityLevels) {
          var launchedTaskAtCurrentMaxLocality = false
          do {
            launchedTaskAtCurrentMaxLocality = resourceOfferSingleTaskSet(taskSet,
              currentMaxLocality, shuffledOffers, availableCpus,
              availableResources, tasks, addressesWithDescs)
            launchedAnyTask |= launchedTaskAtCurrentMaxLocality
          } while (launchedTaskAtCurrentMaxLocality)
        }
}
  1. 假设有一个job,yarn模式,10个executor[exec1-exec10],对应host1--host10,读取HDFS数据,无缓存,那么优先级基本上是NODE_LOCAL,RACK_LOCAL,ANY三种。比如在处理NODE_LOCAL级别的task时最后一个任务分配给了exec2,此时时间为time1,pendingTasksForHost上已经没有host1-host10的任务列表了或者列表都是空了,那么后面8个executor就都没有分配到task。这时launchedTaskAtCurrentMaxLocality为true,所以所以还会进行第二次的executor列表遍历,第二次遍历所有的列表依旧没有分配任务,那么就会进行RACK_LOCAL级别的任务调度。
if (!moreTasks) {
        // This is a performance optimization: if there are no more tasks that can
        // be scheduled at a particular locality level, there is no point in waiting
        // for the locality wait timeout (SPARK-4939).
        lastLaunchTime = curTime
        logDebug(s"No tasks for locality level ${myLocalityLevels(currentLocalityIndex)}, " +
          s"so moving to locality level ${myLocalityLevels(currentLocalityIndex + 1)}")
        currentLocalityIndex += 1
      } else if (curTime - lastLaunchTime >= localityWaits(currentLocalityIndex)) {
        // Jump to the next locality level, and reset lastLaunchTime so that the next locality
        // wait timer doesn't immediately expire
        lastLaunchTime += localityWaits(currentLocalityIndex)
        logDebug(s"Moving to ${myLocalityLevels(currentLocalityIndex + 1)} after waiting for " +
          s"${localityWaits(currentLocalityIndex)}ms")
        currentLocalityIndex += 1
      } else {
        return myLocalityLevels(currentLocalityIndex)
      }
}      
  1. 因为HDFS数据会分布在很多个主机上,有可能某个task的数据在host11-host13上,所以此时还有可能存在host11-host13对应的tasks列表有未调度的task(不在copiesRunning中),那么在进行遍历时,spark认为NODE_LOCAL还有需要分配的task,所以TaskSetManager中的allowedLocality仍旧是NODE_LOCAL,在TaskSetManager内部进行任务选取的时候仍然是按照NODE_LOCAL进行选取的(从pendingTasksForHost而不是pendingTasksForRack中选取),那么这个时候所有的executor就会都分配不到任务,但是如果在遍历到exec8时,当前时间currTime-time1>NODE_LOCAL waitTime,那么此时TaskSetManager自身的allowedLocality就会调到RACK_LOCAL,而exec8就会分到一个同机架的RACK_LOCAL task,之后的executor就都会按照RACK_LOCAL来获取task了。

CoarseGrainedSchedulerBackend

if (TaskState.isFinished(state)) {
  executorDataMap.get(executorId) match {
    case Some(executorInfo) =>
      executorInfo.freeCores += scheduler.CPUS_PER_TASK
      resources.foreach { case (k, v) =>
        executorInfo.resourcesInfo.get(k).foreach { r =>
          r.release(v.addresses)
        }
      }
      makeOffers(executorId)
    case None =>
      // Ignoring the update since we don't know about the executor.
      logWarning(s"Ignored task status update ($taskId state $state) " +
        s"from unknown executor with ID $executorId")
  }
}
private def makeOffers(executorId: String): Unit = {
  // Make sure no executor is killed while some task is launching on it
  val taskDescs = withLock {
    // Filter out executors under killing
    if (executorIsAlive(executorId)) {
      val executorData = executorDataMap(executorId)
      val workOffers = IndexedSeq(
        new WorkerOffer(executorId, executorData.executorHost, executorData.freeCores,
          Some(executorData.executorAddress.hostPort),
          executorData.resourcesInfo.map { case (rName, rInfo) =>
            (rName, rInfo.availableAddrs.toBuffer)
          }))
      scheduler.resourceOffers(workOffers)
    } else {
      Seq.empty
    }
  }
  if (taskDescs.nonEmpty) {
    launchTasks(taskDescs)
  }
}
  1. 如果确实是cores<tasknum,那么在该executor执行完一个task之后,会单独的对这个executor进行一次任务分配,将这个本地任务分配给该executor,如果这时已经没有了pendingTasksForHost中已经没有待调度的task了,那么TaskSetManager将会下调到RACK_LOCAL,其它executor上也就能分配到RACK_LOCAL级别的task了,所以这里这个等待策略就是指本地策略级别LocalityLevel所能等待的任务分配最大时间,超过这个时间之后,TaskSetManager将自动下调至下一个级别,进行下一个级别的task调度。

参考

cloud.tencent.com/developer/a…
zhuanlan.zhihu.com/p/541505732

任务提交

CoarseGrainedSchedulerBackend##launchTasks
进行资源分配,并向executor发送任务

for (task <- tasks.flatten) {
  val serializedTask = TaskDescription.encode(task)
  if (serializedTask.limit() >= maxRpcMessageSize) {
    Option(scheduler.taskIdToTaskSetManager.get(task.taskId)).foreach { taskSetMgr =>
      try {
        var msg = "Serialized task %s:%d was %d bytes, which exceeds max allowed: " +
          s"${RPC_MESSAGE_MAX_SIZE.key} (%d bytes). Consider increasing " +
          s"${RPC_MESSAGE_MAX_SIZE.key} or using broadcast variables for large values."
        msg = msg.format(task.taskId, task.index, serializedTask.limit(), maxRpcMessageSize)
        taskSetMgr.abort(msg)
      } catch {
        case e: Exception => logError("Exception in error callback", e)
      }
    }
  }
  else {
    val executorData = executorDataMap(task.executorId)
    // Do resources allocation here. The allocated resources will get released after the task
    // finishes.
    executorData.freeCores -= scheduler.CPUS_PER_TASK
    task.resources.foreach { case (rName, rInfo) =>
      assert(executorData.resourcesInfo.contains(rName))
      executorData.resourcesInfo(rName).acquire(rInfo.addresses)
    }

    logDebug(s"Launching task ${task.taskId} on executor id: ${task.executorId} hostname: " +
      s"${executorData.executorHost}.")

    executorData.executorEndpoint.send(LaunchTask(new SerializableBuffer(serializedTask)))
  }
}

CoarseGrainedExecutorBackend##receive
收到driver发送的LaunchTask消息

case LaunchTask(data) =>
  if (executor == null) {
    exitExecutor(1, "Received LaunchTask command but executor was null")
  } else {
    val taskDesc = TaskDescription.decode(data.value)
    logInfo("Got assigned task " + taskDesc.taskId)
    taskResources(taskDesc.taskId) = taskDesc.resources
    executor.launchTask(this, taskDesc)
  }

Executor##launchTask
调用TaskRunner的run方法

case LaunchTask(data) =>
  if (executor == null) {
    exitExecutor(1, "Received LaunchTask command but executor was null")
  } else {
    val taskDesc = TaskDescription.decode(data.value)
    logInfo("Got assigned task " + taskDesc.taskId)
    taskResources(taskDesc.taskId) = taskDesc.resources
    executor.launchTask(this, taskDesc)
  }

TaskRunner##run 打印日志

logInfo(s"Running $taskName (TID $taskId)")
  1. 状态更新为running
execBackend.statusUpdate(taskId, TaskState.RUNNING, EMPTY_BYTE_BUFFER)
  1. 任务还原
    • 更新依赖文件或jar包
    • 对序列化的serializedTask执行反序列化操作
updateDependencies(taskDescription.addedFiles, taskDescription.addedJars)
task = ser.deserialize[Task[Any]](
  taskDescription.serializedTask, Thread.currentThread.getContextClassLoader)

  1. 任务运行 调用Task的run方法
task.run(
  taskAttemptId = taskId,
  attemptNumber = taskDescription.attemptNumber,
  metricsSystem = env.metricsSystem,
  resources = taskDescription.resources)

Task##run
最终调用Task的runTask方法,分为ResultTask和ShuffleMapTask,接下来就是shuffle

runTask(context)