使用Java提交flink作业

444 阅读4分钟

最近在做实时大数据平台,项目需要使用web的方式提交flink作业,在网上搜了好多资料没有找到合适的,没办法只能自己撸flink的client源码,经过半个多月终于搞清楚了,特写一篇小文章分享一下(这里只涉及flink on yarn)

一. web提交flink作业的四种方式:

  1. 使用java调用shell直接调用flink提供的脚本,即$FLINK_HOME/bin flink run.......等(太easy,没挑战)
  2. 直接使用YarnClient提交(难度太大,需要自己设置一大堆参数,设置appContainContext等,需要对yarn源码及架构非常熟悉)
  3. 借用flink-yarn包里成熟的YarnClusterDescriptor,封装好ClusterSpection和AppclicationConfiguration后直接调用YarnClusterDescriptor.deployApplicationCluster(ClusterSpecification clusterSpecification, ApplicationConfiguration applicationConfiguration), 源码如下:
public ClusterClientProvider<ApplicationId> deployApplicationCluster(ClusterSpecification clusterSpecification, ApplicationConfiguration applicationConfiguration) throws ClusterDeploymentException {
    Preconditions.checkNotNull(clusterSpecification);
    Preconditions.checkNotNull(applicationConfiguration);
    YarnDeploymentTarget deploymentTarget = YarnDeploymentTarget.fromConfig(this.flinkConfiguration);
    if (YarnDeploymentTarget.APPLICATION != deploymentTarget) {
        throw new ClusterDeploymentException("Couldn't deploy Yarn Application Cluster. Expected deployment.target=" + YarnDeploymentTarget.APPLICATION.getName() + " but actual one was "" + deploymentTarget.getName() + """);
    } else {
        applicationConfiguration.applyToConfiguration(this.flinkConfiguration);
        List<String> pipelineJars = (List)this.flinkConfiguration.getOptional(PipelineOptions.JARS).orElse(Collections.emptyList());
        Preconditions.checkArgument(pipelineJars.size() == 1, "Should only have one jar");

        try {
            return this.deployInternal(clusterSpecification, "Flink Application Cluster", YarnApplicationClusterEntryPoint.class.getName(), (JobGraph)null, false);
        } catch (Exception var6) {
            throw new ClusterDeploymentException("Couldn't deploy Yarn Application Cluster", var6);
        }
    }
}

这种方式的好处是,提交完作业可以拿到,ClusterClient对象,通过此对象可以拿到刚刚提交的作业的jm的host和port,通过这俩货就可以做一些监控作业状态的活儿,非常适合做大数据平台的场景 4. 第四种方式就是直接使用flink原生的client及CliFronted,直接将提交任务的参数包装成String[],调用CliFronted的parseAndRun(String[] args)即可,这种方式代码量更少,简单,缺点就是作业提交拿不到作业的提交结果,本篇文章为了简单点就是采用的这种方式,话不多说上代码吧

二. 干货-代码篇

  1. 项目的依赖:

4.0.0

<groupId>org.example</groupId>
<artifactId>FlinkAdmin</artifactId>
<version>1.0-SNAPSHOT</version>

<properties>
    <maven.compiler.source>8</maven.compiler.source>
    <maven.compiler.target>8</maven.compiler.target>
    <project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<parent>
    <groupId>org.springframework.boot</groupId>
    <artifactId>spring-boot-starter-parent</artifactId>
    <version>2.7.10</version>
</parent>
<dependencies>
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-web</artifactId>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-clients_2.12</artifactId>
        <version>1.14.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.flink</groupId>
        <artifactId>flink-yarn_2.12</artifactId>
        <version>1.14.2</version>
        <exclusions>
            <exclusion>
                <groupId>org.apache.hadoop</groupId>
                <artifactId>hadoop-common</artifactId>
            </exclusion>
        </exclusions>
    </dependency>
    <dependency>
        <groupId>org.apache.hadoop</groupId>
        <artifactId>hadoop-common</artifactId>
        <version>2.4.1</version>
    </dependency>
    <dependency>
        <groupId>org.projectlombok</groupId>
        <artifactId>lombok</artifactId>
        <version>1.18.28</version>
    </dependency>
</dependencies>
<build>
    <plugins>
        <plugin>
            <groupId>org.springframework.boot</groupId>
            <artifactId>spring-boot-maven-plugin</artifactId>
        </plugin>
    </plugins>
</build>
  1. 自定义flink客户端
package com.wilson.common;

import com.wilson.bean.Application;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.client.cli.CliFrontend;
import org.apache.flink.client.cli.CustomCommandLine;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.configuration.GlobalConfiguration;
import org.apache.flink.runtime.security.SecurityConfiguration;
import org.apache.flink.runtime.security.SecurityUtils;
import org.apache.flink.util.ExceptionUtils;
import org.apache.flink.yarn.configuration.YarnConfigOptions;
import org.springframework.stereotype.Component;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;

import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.lang.reflect.UndeclaredThrowableException;
import java.net.MalformedURLException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;

@Slf4j
@Component
public class FlinkClient {
    private static final String CONFIG_PREFIX="flink";
    private Map<String,String> yarnConf;

    public FlinkClient() {
        //初始化yarnConfig
        String hadoopConf = System.getenv("HADOOP_HOME");
        File file = new File(hadoopConf + "/etc/hadoop");
        File[] files = file.listFiles();
        if(files!=null){
            for (File ele : files) {
                if(ele.getName().equalsIgnoreCase("yarn-site.xml")){
                    yarnConf=parseYarnSite(ele);
                    break;
                }
            }
        }
    }
    private Map<String,String> parseYarnSite(File yarnSite){
        HashMap<String, String> map = new HashMap<String, String>();
        try {
            DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
            DocumentBuilder builder = factory.newDocumentBuilder();
            Document parse = builder.parse(yarnSite);
            Element root = parse.getDocumentElement();
            NodeList property = root.getElementsByTagName("property");
            for (int i = 0; i < property.getLength(); i++) {
                Element item = (Element) property.item(i);
                String name = item.getElementsByTagName("name").item(0).getTextContent();
                String value = item.getElementsByTagName("value").item(0).getTextContent();
                map.put(name,value);
            }
        }catch (Exception e){
            e.printStackTrace();
        }
        return map;
    }

    public void startJob(Application app){
        String flinkHome = app.getFlinkHome();
        String flinkConf=flinkHome+"/conf";
        Configuration flinkConfig = GlobalConfiguration.loadConfiguration(flinkConf);
        List<CustomCommandLine> customCommandLines = CliFrontend.loadCustomCommandLines(flinkConfig, flinkConf);
        setYarnConfig(flinkConfig);
        String flinkDistJar = getFlinkDistJar(flinkHome);
        if(flinkDistJar!=null&&!flinkDistJar.isEmpty()){
            flinkConfig.setString(YarnConfigOptions.FLINK_DIST_JAR.key(),flinkDistJar);
        }
        //如果系统里配了FLINK_HOME的环境变量,不用此项配置,如果没有配置 则需要把flink lib下的jar包上传至hdfs,然后此项配置设置成hdfs相应的目录
        //flinkConfig.setString(YarnConfigOptions.PROVIDED_LIB_DIRS.key(),"hdfs://hadoop202:8020/flink-1.14.2/lib");
        log.info("-----------------------------flink config--------------------------");
        flinkConfig.toMap().forEach((k,v)->{
            log.info("{}:{}",k,v);
        });
        try {
            CliFrontend cli = new CliFrontend(flinkConfig, customCommandLines);
            SecurityUtils.install(new SecurityConfiguration(flinkConfig));
            String[] customArgs = toArgs(app);
            SecurityUtils.getInstalledContext().runSecured(() -> cli.parseAndRun(customArgs));
        } catch (Throwable var10) {
            Throwable strippedThrowable = ExceptionUtils.stripException(var10, UndeclaredThrowableException.class);
            log.error("Fatal error while running command line interface.", strippedThrowable);
            strippedThrowable.printStackTrace();
        }

    }
    private String getFlinkDistJar(String flinkHome){
        File file = new File(flinkHome+"/lib");
        File[] files = file.listFiles();
        if(files!=null){
            for (File ele : files) {
                if(ele.getName().matches("flink-dist.*\.jar")){
                    log.info("set flink dist jar:{}",ele.getAbsolutePath());
                    return ele.getAbsolutePath();
                }
            }
            }
        return null;
        }

        //加载 yarn-site.xml的原因是,初始化YarnClient的时候需要,客户端需要知道rm的地址和连接端口
    private void setYarnConfig(Configuration conf){
        if(yarnConf!=null){
            yarnConf.forEach((k,v)->{
                conf.setString(CONFIG_PREFIX+"."+k,v);
            });
        }
    }

    private String[] toArgs(Application app){
        //不能有任何空格
        String otherArgs = app.getOtherArgs();
        String mainClass = app.getMainClass();
        String[] classArr = mainClass.split("\.");
        String jobName=classArr[classArr.length-1];
        String[] argsArr = otherArgs.split(" ");
        int  effectiveLength=0;
        for (String ele : argsArr) {
            if(ele!=null&&!ele.isEmpty()){
                effectiveLength++;
            }
        }
        String[] result = new String[14 + effectiveLength];
        if(app.getExecuteMode().equalsIgnoreCase("yarn-application")){
            result[0]="run-application";
        }else{
            result[0]="run";
        }
        result[1]="-t";
        result[2]=app.getExecuteMode();
        result[3]="-d";
        result[4]="-Dyarn.application.name="+jobName;
        result[5]="-Dyarn.application.queue="+app.getQueue();
        result[6]="-Djobmanager.memory.heap.size="+app.getMemOfJm();
        result[7]="-Dtaskmanager.memory.task.heap.size="+app.getMemOfTm();
        result[8]="-Dtaskmanager.numberOfTaskSlots="+app.getNumSlot();
        result[9]="-p";
        result[10]=String.valueOf(app.getParallelism());
        result[11]="-c";
        result[12]=mainClass;
        File file = new File(app.getJarPath());
        if(file.exists() && file.isFile()){
            try {
                result[13]=file.toURI().toURL().toString();
            } catch (MalformedURLException e) {
                throw new RuntimeException(e);
            }
        }
        log.info("argsArr length:{}",argsArr.length);
        int index=0;
        for (int i = 0; i < argsArr.length; i++) {
            if(argsArr[i]!=null&&!argsArr[i].isEmpty()){
                result[14+index]=argsArr[i];
                index++;
            }
        }
        log.info(Arrays.toString(result));
        return result;
    }

    public static void main(String[] args) {
        Application app = new Application();
        app.setFlinkHome("E:/flink-1.14.2");
        app.setMainClass("com.wilson.WordCount");
        app.setJarPath("E:\project_code\word-count\target\word-count-1.0-SNAPSHOT.jar");
        app.setQueue("default");
        app.setExecuteMode("yarn-application");
        app.setParallelism(1);
        app.setMemOfJm("512m");
        app.setMemOfTm("512m");
        app.setNumSlot(1);
        app.setOtherArgs("--broker hadoop202:9092 --topic number");
        FlinkClient cli = new FlinkClient();
        cli.startJob(app);
    }

}
  1. 后面就是springboot那一套,写了个小页面,controller层调service层,service层new了一个FlinkClient,然后调用FlinkClient的startJob方法
public class ApplicationServiceImpl implements ApplicationService {
    @Autowired
    FlinkClient flinkClient;
    @Override
    public void startApp(Application app) {
        flinkClient.startJob(app);
    }
}
  1. 项目打包,上传至服务器,java -jar FlinkAdmin-1.0-SNAPSHOT.jar 运行即可 备注:对服务器的要求 <1>. jdk8 <2>. 安装了hadoop和flink且配置了HADOOP_HOME和FLINK_HOME
  2. 项目启动后,访问:http://ip:8082/index.html,配置后作业参数,提交即可

参数.jpg