Flink源码解析 | 从Example出发：读懂start-start-shell.sh任务执行流程

微信公众号：深广大数据Club关注可了解更多大数据的相关资讯。问题或建议，请公众号留言;如果你觉得深广大数据Club对你有帮助，帮忙转发文章到微信朋友圈

本文我们主要介绍Apache Flink集成的交互式Scala Shell脚本。我们既可以在本地安装模式下或者集群模式下运行该脚本。之后就可以在这之上执行你所编写的代码程序。

Scala REPL

start-scala-shell.sh脚本存放于Flink安装目录的bin底下。通过如下命令在单机模式下启动shell脚本：

bin/start-scala-shell.sh local

详细的使用方法可具体查看Scala REPL：https://github.com/Jonathan-Wei/Flink-Docs-CN/blob/master/06%20%E9%83%A8%E7%BD%B2-%E6%93%8D%E4%BD%9C/07%20Scala%20REPL.md

start-scala-shell.sh

这里主要看脚本最后调用的代码信息，从如下代码可以看到，脚本最终调用的是FlinkShell.scala

if ${EXTERNAL_LIB_FOUND}then    java -Dscala.color -cp "$FLINK_CLASSPATH" $log_setting org.apache.flink.api.scala.FlinkShell $@ --addclasspath "$EXT_CLASSPATH"else    java -Dscala.color -cp "$FLINK_CLASSPATH" $log_setting org.apache.flink.api.scala.FlinkShell $@fi

FlinkShell.scala

从main()方法入手，来查看具体的代码逻辑。

cmd("local") action {    (_, c) => c.copy(executionMode = ExecutionMode.LOCAL)}text "Starts Flink scala shell with a local Flink cluster" children(    ...)cmd("remote") action { (_, c) =>    c.copy(executionMode = ExecutionMode.REMOTE)} text "Starts Flink scala shell connecting to a remote cluster" children(    ...)cmd("yarn") action {    (_, c) => c.copy(executionMode = ExecutionMode.YARN, yarnConfig = None)} text "Starts Flink scala shell connecting to a yarn cluster" children(    ...)// parse argumentsparser.parse(args, Config()) match {    case Some(config) => startShell(config)    case _ => println("Could not parse program arguments")}

从代码上可以看出，启动方式包含三种：local、remote、yarn在指定启动方式之后，会将executionMode指定对应的模式。最后通过startShell启动脚本。

startShell方法

val (repl, cluster) = try {    val (host, port, cluster) = fetchConnectionInfo(configuration, config)    val conf = cluster match {        case Some(Left(_)) => configuration        case Some(Right(yarnCluster)) => yarnCluster.getFlinkConfiguration        case None => configuration    }    println(s"\nConnecting to Flink cluster (host: $host, port: $port).\n")    val repl = bufferedReader match {        case Some(reader) =>            val out = new StringWriter()            new FlinkILoop(host, port, conf, config.externalJars, reader, new JPrintWriter(out))    case None =>        new FlinkILoop(host, port, conf, config.externalJars)    }    (repl, cluster)} catch {    case e: IllegalArgumentException =>    println(s"Error: ${e.getMessage}")    sys.exit()}

以上代码做了两件事，第一件事是获取链接信息fetchConnectionInfo(host, port, cluster)并读取配置，另外一件事是基于配置new一个FlinkIloop对象repl。之后通过repl.process(settings)启动处理

try {      repl.process(settings)} finally {    repl.closeInterpreter()    cluster match {        case Some(Left(miniCluster)) => miniCluster.close()        case Some(Right(yarnCluster)) =>            yarnCluster.shutDownCluster()            yarnCluster.shutdown()        case _ =>    }}

我们再来看下关键的fetchConnectionInfo方法

def fetchConnectionInfo(    configuration: Configuration,    config: Config  ): (String, Int, Option[Either[MiniCluster , ClusterClient[_]]]) = {    config.executionMode match {      case ExecutionMode.LOCAL => // Local mode        val config = configuration        config.setInteger(JobManagerOptions.PORT, 0)        val miniClusterConfig = new MiniClusterConfiguration.Builder()          .setConfiguration(config)          .build()        val cluster = new MiniCluster(miniClusterConfig)        cluster.start()        val port = cluster.getRestAddress.getPort        println(s"\nStarting local Flink cluster (host: localhost, port: $port).\n")        ("localhost", port, Some(Left(cluster)))      case ExecutionMode.REMOTE => // Remote mode        if (config.host.isEmpty || config.port.isEmpty) {          throw new IllegalArgumentException("<host> or <port> is not specified!")        }        (config.host.get, config.port.get, None)      case ExecutionMode.YARN => // YARN mode        config.yarnConfig match {          case Some(yarnConfig) => // if there is information for new cluster            deployNewYarnCluster(              configuration,              config.configDir.getOrElse(CliFrontend.getConfigurationDirectoryFromEnv),              yarnConfig)          case None => // there is no information for new cluster. Then we use yarn properties.            fetchDeployedYarnClusterInfo(              configuration,              config.configDir.getOrElse(CliFrontend.getConfigurationDirectoryFromEnv)            )        }      case ExecutionMode.UNDEFINED => // Wrong input        throw new IllegalArgumentException("please specify execution mode:\n" +          "[local | remote <host> <port> | yarn]")    }  }

fetchConnectionInfo方法中对ExecutionMode的类型进行匹配。

LOCAL模式

val cluster = new MiniCluster(miniClusterConfig)cluster.start()

REMOTE模式

 if (config.host.isEmpty || config.port.isEmpty) {          throw new IllegalArgumentException("<host> or <port> is not specified!")        }        (config.host.get, config.port.get, None)

远程模式仅获取host和port提供调用。

YARN模式

config.yarnConfig match {    case Some(yarnConfig) => // if there is information for new cluster        deployNewYarnCluster(            configuration,            config.configDir.getOrElse(CliFrontend.getConfigurationDirectoryFromEnv),              yarnConfig)    case None => // there is no information for new cluster. Then we use yarn properties.        fetchDeployedYarnClusterInfo(            configuration,            config.configDir.getOrElse(CliFrontend.getConfigurationDirectoryFromEnv)        )}

如果yarnConfig不为空调用deployNewYarnCluster，否则则使用yarn properties文件的信息，调用fetchDeployedYarnClusterInfo

deployNewYarnCluster

val frontend = new CliFrontend(configuration,      CliFrontend.loadCustomCommandLines(configuration, configurationDirectory))val commandOptions = CliFrontendParser.getRunCommandOptionsval commandLineOptions = CliFrontendParser.mergeOptions(commandOptions,      frontend.getCustomCommandLineOptions());val commandLine = CliFrontendParser.parse(commandLineOptions, args.toArray, true)val customCLI = frontend.getActiveCustomCommandLine(commandLine)val clusterDescriptor = customCLI.createClusterDescriptor(commandLine)val clusterSpecification = customCLI.getClusterSpecification(commandLine)val cluster = clusterDescriptor.deploySessionCluster(clusterSpecification)val inetSocketAddress = AkkaUtils.getInetSocketAddressFromAkkaURL(      cluster.getClusterConnectionInfo.getAddress)val address = inetSocketAddress.getAddress.getHostAddressval port = inetSocketAddress.getPort(address, port, Some(Right(cluster)))

fetchDeployedYarnClusterInfo

val commandLine = CliFrontendParser.parse(    CliFrontendParser.getRunCommandOptions,      args.toArray,      true)val frontend = new CliFrontend(    configuration,    CliFrontend.loadCustomCommandLines(configuration, configurationDirectory))val customCLI = frontend.getActiveCustomCommandLine(commandLine)val clusterDescriptor = customCLI    .createClusterDescriptor(commandLine)    .asInstanceOf[ClusterDescriptor[Any]]val clusterId = customCLI.getClusterId(commandLine)val cluster = clusterDescriptor.retrieve(clusterId)if (cluster == null) {    throw new RuntimeException("Yarn Cluster could not be retrieved.")}val jobManager = AkkaUtils.getInetSocketAddressFromAkkaURL(    cluster.getClusterConnectionInfo.getAddress)(jobManager.getHostString, jobManager.getPort, None)

两个方法最终的都是通过AkkaUtils.getInetSocketAddressFromAkkaURL获取host以及port。不同的是前面的部署，deployNewYarnCluster通过clusterDescriptor.deploySessionCluster部署集群获取Cluster，而fetchDeployedYarnClusterInfo则是先获取clusterID，调用clusterDescriptor.retrieve()传入clusterId获取Cluster

FlinkILoop

回过头来看下之前提到的repl，其实就是FlinkILoop的实例。在FlinkILoop包含了Local模式的环境信息以及Remote模式的环境信息

LOCAL模式

// local environmentval (scalaBenv: ExecutionEnvironment, scalaSenv: StreamExecutionEnvironment) = {    val scalaBenv = new ExecutionEnvironment(remoteBenv)    val scalaSenv = new StreamExecutionEnvironment(remoteSenv)    (scalaBenv,scalaSenv)}

批量env：ExecutionEnvironment流式env：StreamExecutionEnvironment

local模式在《Flink源码解析 | 从Example出发：读懂本地任务执行流程》讲过，这里就不再赘述

REMOTE模式

// remote environmentprivate val (remoteBenv: ScalaShellRemoteEnvironment,remoteSenv: ScalaShellRemoteStreamEnvironment) = {    // allow creation of environments    ScalaShellRemoteEnvironment.resetContextEnvironments()    // create our environment that submits against the cluster (local or remote)    val remoteBenv = new ScalaShellRemoteEnvironment(      host,      port,      this,      clientConfig,      this.getExternalJars(): _*)    val remoteSenv = new ScalaShellRemoteStreamEnvironment(      host,      port,      this,      clientConfig,      getExternalJars(): _*)    // prevent further instantiation of environments    ScalaShellRemoteEnvironment.disableAllContextAndOtherEnvironments()    (remoteBenv,remoteSenv)}

批量env：ScalaShellRemoteEnvironment流式env：ScalaShellRemoteStreamEnvironment

我们在交互式shell脚本运行后，在其命令行中编写代码逻辑，编写完成之后通过env.execute()执行。

代码入口

这里我们拿官网的例子来看。

Scala-Flink> val text = benv.fromElements(  "To be, or not to be,--that is the question:--",  "Whether 'tis nobler in the mind to suffer",  "The slings and arrows of outrageous fortune",  "Or to take arms against a sea of troubles,")Scala-Flink> val counts = text    .flatMap { _.toLowerCase.split("\\W+") }    .map { (_, 1) }.groupBy(0).sum(1)Scala-Flink> counts.print()Scala-Flink> benv.execute("MyProgram")

我们运行脚本进入交互式界面后，其实脚本就已经内置了benv以及senv环境变量，之后编写代码，调用execute方法。

我们来看下ScalaShellRemoteEnvironment以及ScalaShellRemoteStreamEnvironment的内部实现。

ScalaShellRemoteEnvironment继承RemoteEnvironment

@Overridepublic JobExecutionResult execute(String jobName) throws Exception {    PlanExecutor executor = getExecutor();    Plan p = createProgramPlan(jobName);    // Session management is disabled, revert this commit to enable    //p.setJobId(jobID);    //p.setSessionTimeout(sessionTimeout);    JobExecutionResult result = executor.executePlan(p);    this.lastJobExecutionResult = result;    return result;}

通过getExecutor方法获取PlanExecutor执行器对象
创建ProgramPlan
通过executor.executePlan方法执行计划并返回result

ScalaShellRemoteStreamEnvironment继承RemoteStreamEnvironment

@Overridepublic JobExecutionResult execute(String jobName) throws ProgramInvocationException {    StreamGraph streamGraph = getStreamGraph();    streamGraph.setJobName(jobName);    transformations.clear();    return executeRemotely(streamGraph, jarFiles);}

先通过RemoteStreamEnvironment.execute()方法获取StreamGraph，再调用其子类ScalaShellRemoteStreamEnvironment的executeRemotely方法

ScalaShellRemoteStreamEnvironment.executeRemotely

protected JobExecutionResult executeRemotely(StreamGraph streamGraph, List<URL> jarFiles) throws ProgramInvocationException {    URL jarUrl;    try {        jarUrl = flinkILoop.writeFilesToDisk().getAbsoluteFile().toURI().toURL();    } catch (MalformedURLException e) {        throw new ProgramInvocationException("Could not write the user code classes to disk.",            streamGraph.getJobGraph().getJobID(), e);    }    List<URL> allJarFiles = new ArrayList<>(jarFiles.size() + 1);    allJarFiles.addAll(jarFiles);    allJarFiles.add(jarUrl);    return super.executeRemotely(streamGraph, allJarFiles);}

获取url对象以及所需添加的jar包的url对象后，调用父类RemoteStreamEnvironment的executeRemotely方法。

RemoteStreamEnvironment.executeRemotely

final ClusterClient<?> client;try {    client = new RestClusterClient<>(configuration, "RemoteStreamEnvironment");}catch (Exception e) {    throw new ProgramInvocationException("Cannot establish connection to JobManager: " + e.getMessage(),                streamGraph.getJobGraph().getJobID(), e);}client.setPrintStatusDuringExecution(getConfig().isSysoutLoggingEnabled());try {    return client.run(streamGraph, jarFiles, globalClasspaths, usercodeClassLoader).getJobExecutionResult();}catch (ProgramInvocationException e) {    throw e;}

主要做了两件事

获取ClusterClient，此处的ClusterClient是RestClusterClient
调用RestClusterClient.run()执行并获取JobExecutionResult

后续的流程与先前文章的流程类似，只是最后submitJob是调用的RestClusterClient的submitJob。具体内容我这里就不再深入。自己试着理解下。其他大体都和之前的流程类似。

Flink源码解析 | 从Example出发：读懂本地任务执行流程

Flink源码解析 | 从Example出发：读懂集群任务执行流程

Flink源码解析 | 从Example出发：读懂Flink On Yarn任务执行流程