Worker节点进程解析
Work中的main函数
def main(argStrings: Array[String]) {
Thread.setDefaultUncaughtExceptionHandler(new SparkUncaughtExceptionHandler(
exitOnUncaughtException = false))
Utils.initDaemon(log)
val conf = new SparkConf
val args = new WorkerArguments(argStrings, conf)
val rpcEnv = startRpcEnvAndEndpoint(args.host, args.port, args.webUiPort, args.cores,
args.memory, args.masters, args.workDir, conf = conf)
// With external shuffle service enabled, if we request to launch multiple workers on one host,
// we can only successfully launch the first worker and the rest fails, because with the port
// bound, we may launch no more than one external shuffle service on each host.
// When this happens, we should give explicit reason of failure instead of fail silently. For
// more detail see SPARK-20989.
val externalShuffleServiceEnabled = conf.getBoolean("spark.shuffle.service.enabled", false)
val sparkWorkerInstances = scala.sys.env.getOrElse("SPARK_WORKER_INSTANCES", "1").toInt
require(externalShuffleServiceEnabled == false || sparkWorkerInstances <= 1,
"Starting multiple workers on one host is failed because we may launch no more than one " +
"external shuffle service on each host, please set spark.shuffle.service.enabled to " +
"false or set SPARK_WORKER_INSTANCES to 1 to resolve the conflict.")
rpcEnv.awaitTermination()
}
Worker的main函数和Master的main函数几乎一样,都是先启动RPC环境并且设置一个EndPoint,不同的地方是多了一个判断的地方:
要求externalShuffleServiceEnabled为false或SPARK_WORKER_INSTANCES设置为1
原因如注释中所示:
在外部的shuffle服务开启的情况下,如果要启动多个Worker,因为端口绑定的原因,
则只有第一个能正常启动,其余的都会失败。我们只能在每个host上启动一个外部shuffle服务。
若保持externalShuffleServiceEnabled为默认false,则可以启动多个Worker。
这个external shuffle service呢,,则是在Worker上存在的一个服务service, 用于将shuffle任务从Executor中解放出来,减少Executor的压力,使其专注与任务处理,而不被shuffle所打扰。有利于提高效率。
startRpcEnvAndEndpoint函数和Master中的也雷同,不再分析
下面看看一下一个Worker这个EndPoint生命周期中的onStart()都干了什么:
- 创建工作目录:createWorkDir
- 创建shuffle服务:startExternalShuffleService
- 创建并开启Worker的WebUI:WorkerWebUI()
- 调用registerWithMaster向Master注册自己
- 开启metricsSystem系统
小结
Worker的启动过程除了有一个external shuffle service启动判断之外,其余的和Master几乎一样。