最近在做实时大数据平台,项目需要使用web的方式提交flink作业,在网上搜了好多资料没有找到合适的,没办法只能自己撸flink的client源码,经过半个多月终于搞清楚了,特写一篇小文章分享一下(这里只涉及flink on yarn)
一. web提交flink作业的四种方式:
- 使用java调用shell直接调用flink提供的脚本,即$FLINK_HOME/bin flink run.......等(太easy,没挑战)
- 直接使用YarnClient提交(难度太大,需要自己设置一大堆参数,设置appContainContext等,需要对yarn源码及架构非常熟悉)
- 借用flink-yarn包里成熟的YarnClusterDescriptor,封装好ClusterSpection和AppclicationConfiguration后直接调用YarnClusterDescriptor.deployApplicationCluster(ClusterSpecification clusterSpecification, ApplicationConfiguration applicationConfiguration), 源码如下:
public ClusterClientProvider<ApplicationId> deployApplicationCluster(ClusterSpecification clusterSpecification, ApplicationConfiguration applicationConfiguration) throws ClusterDeploymentException {
Preconditions.checkNotNull(clusterSpecification);
Preconditions.checkNotNull(applicationConfiguration);
YarnDeploymentTarget deploymentTarget = YarnDeploymentTarget.fromConfig(this.flinkConfiguration);
if (YarnDeploymentTarget.APPLICATION != deploymentTarget) {
throw new ClusterDeploymentException("Couldn't deploy Yarn Application Cluster. Expected deployment.target=" + YarnDeploymentTarget.APPLICATION.getName() + " but actual one was "" + deploymentTarget.getName() + """);
} else {
applicationConfiguration.applyToConfiguration(this.flinkConfiguration);
List<String> pipelineJars = (List)this.flinkConfiguration.getOptional(PipelineOptions.JARS).orElse(Collections.emptyList());
Preconditions.checkArgument(pipelineJars.size() == 1, "Should only have one jar");
try {
return this.deployInternal(clusterSpecification, "Flink Application Cluster", YarnApplicationClusterEntryPoint.class.getName(), (JobGraph)null, false);
} catch (Exception var6) {
throw new ClusterDeploymentException("Couldn't deploy Yarn Application Cluster", var6);
}
}
}
这种方式的好处是,提交完作业可以拿到,ClusterClient对象,通过此对象可以拿到刚刚提交的作业的jm的host和port,通过这俩货就可以做一些监控作业状态的活儿,非常适合做大数据平台的场景 4. 第四种方式就是直接使用flink原生的client及CliFronted,直接将提交任务的参数包装成String[],调用CliFronted的parseAndRun(String[] args)即可,这种方式代码量更少,简单,缺点就是作业提交拿不到作业的提交结果,本篇文章为了简单点就是采用的这种方式,话不多说上代码吧
二. 干货-代码篇
- 项目的依赖:
4.0.0
<groupId>org.example</groupId>
<artifactId>FlinkAdmin</artifactId>
<version>1.0-SNAPSHOT</version>
<properties>
<maven.compiler.source>8</maven.compiler.source>
<maven.compiler.target>8</maven.compiler.target>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>2.7.10</version>
</parent>
<dependencies>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-clients_2.12</artifactId>
<version>1.14.2</version>
</dependency>
<dependency>
<groupId>org.apache.flink</groupId>
<artifactId>flink-yarn_2.12</artifactId>
<version>1.14.2</version>
<exclusions>
<exclusion>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>2.4.1</version>
</dependency>
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<version>1.18.28</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
</plugin>
</plugins>
</build>
- 自定义flink客户端
package com.wilson.common;
import com.wilson.bean.Application;
import lombok.extern.slf4j.Slf4j;
import org.apache.flink.client.cli.CliFrontend;
import org.apache.flink.client.cli.CustomCommandLine;
import org.apache.flink.configuration.Configuration;
import org.apache.flink.configuration.GlobalConfiguration;
import org.apache.flink.runtime.security.SecurityConfiguration;
import org.apache.flink.runtime.security.SecurityUtils;
import org.apache.flink.util.ExceptionUtils;
import org.apache.flink.yarn.configuration.YarnConfigOptions;
import org.springframework.stereotype.Component;
import org.w3c.dom.Document;
import org.w3c.dom.Element;
import org.w3c.dom.NodeList;
import javax.xml.parsers.DocumentBuilder;
import javax.xml.parsers.DocumentBuilderFactory;
import java.io.File;
import java.lang.reflect.UndeclaredThrowableException;
import java.net.MalformedURLException;
import java.util.Arrays;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
@Slf4j
@Component
public class FlinkClient {
private static final String CONFIG_PREFIX="flink";
private Map<String,String> yarnConf;
public FlinkClient() {
//初始化yarnConfig
String hadoopConf = System.getenv("HADOOP_HOME");
File file = new File(hadoopConf + "/etc/hadoop");
File[] files = file.listFiles();
if(files!=null){
for (File ele : files) {
if(ele.getName().equalsIgnoreCase("yarn-site.xml")){
yarnConf=parseYarnSite(ele);
break;
}
}
}
}
private Map<String,String> parseYarnSite(File yarnSite){
HashMap<String, String> map = new HashMap<String, String>();
try {
DocumentBuilderFactory factory = DocumentBuilderFactory.newInstance();
DocumentBuilder builder = factory.newDocumentBuilder();
Document parse = builder.parse(yarnSite);
Element root = parse.getDocumentElement();
NodeList property = root.getElementsByTagName("property");
for (int i = 0; i < property.getLength(); i++) {
Element item = (Element) property.item(i);
String name = item.getElementsByTagName("name").item(0).getTextContent();
String value = item.getElementsByTagName("value").item(0).getTextContent();
map.put(name,value);
}
}catch (Exception e){
e.printStackTrace();
}
return map;
}
public void startJob(Application app){
String flinkHome = app.getFlinkHome();
String flinkConf=flinkHome+"/conf";
Configuration flinkConfig = GlobalConfiguration.loadConfiguration(flinkConf);
List<CustomCommandLine> customCommandLines = CliFrontend.loadCustomCommandLines(flinkConfig, flinkConf);
setYarnConfig(flinkConfig);
String flinkDistJar = getFlinkDistJar(flinkHome);
if(flinkDistJar!=null&&!flinkDistJar.isEmpty()){
flinkConfig.setString(YarnConfigOptions.FLINK_DIST_JAR.key(),flinkDistJar);
}
//如果系统里配了FLINK_HOME的环境变量,不用此项配置,如果没有配置 则需要把flink lib下的jar包上传至hdfs,然后此项配置设置成hdfs相应的目录
//flinkConfig.setString(YarnConfigOptions.PROVIDED_LIB_DIRS.key(),"hdfs://hadoop202:8020/flink-1.14.2/lib");
log.info("-----------------------------flink config--------------------------");
flinkConfig.toMap().forEach((k,v)->{
log.info("{}:{}",k,v);
});
try {
CliFrontend cli = new CliFrontend(flinkConfig, customCommandLines);
SecurityUtils.install(new SecurityConfiguration(flinkConfig));
String[] customArgs = toArgs(app);
SecurityUtils.getInstalledContext().runSecured(() -> cli.parseAndRun(customArgs));
} catch (Throwable var10) {
Throwable strippedThrowable = ExceptionUtils.stripException(var10, UndeclaredThrowableException.class);
log.error("Fatal error while running command line interface.", strippedThrowable);
strippedThrowable.printStackTrace();
}
}
private String getFlinkDistJar(String flinkHome){
File file = new File(flinkHome+"/lib");
File[] files = file.listFiles();
if(files!=null){
for (File ele : files) {
if(ele.getName().matches("flink-dist.*\.jar")){
log.info("set flink dist jar:{}",ele.getAbsolutePath());
return ele.getAbsolutePath();
}
}
}
return null;
}
//加载 yarn-site.xml的原因是,初始化YarnClient的时候需要,客户端需要知道rm的地址和连接端口
private void setYarnConfig(Configuration conf){
if(yarnConf!=null){
yarnConf.forEach((k,v)->{
conf.setString(CONFIG_PREFIX+"."+k,v);
});
}
}
private String[] toArgs(Application app){
//不能有任何空格
String otherArgs = app.getOtherArgs();
String mainClass = app.getMainClass();
String[] classArr = mainClass.split("\.");
String jobName=classArr[classArr.length-1];
String[] argsArr = otherArgs.split(" ");
int effectiveLength=0;
for (String ele : argsArr) {
if(ele!=null&&!ele.isEmpty()){
effectiveLength++;
}
}
String[] result = new String[14 + effectiveLength];
if(app.getExecuteMode().equalsIgnoreCase("yarn-application")){
result[0]="run-application";
}else{
result[0]="run";
}
result[1]="-t";
result[2]=app.getExecuteMode();
result[3]="-d";
result[4]="-Dyarn.application.name="+jobName;
result[5]="-Dyarn.application.queue="+app.getQueue();
result[6]="-Djobmanager.memory.heap.size="+app.getMemOfJm();
result[7]="-Dtaskmanager.memory.task.heap.size="+app.getMemOfTm();
result[8]="-Dtaskmanager.numberOfTaskSlots="+app.getNumSlot();
result[9]="-p";
result[10]=String.valueOf(app.getParallelism());
result[11]="-c";
result[12]=mainClass;
File file = new File(app.getJarPath());
if(file.exists() && file.isFile()){
try {
result[13]=file.toURI().toURL().toString();
} catch (MalformedURLException e) {
throw new RuntimeException(e);
}
}
log.info("argsArr length:{}",argsArr.length);
int index=0;
for (int i = 0; i < argsArr.length; i++) {
if(argsArr[i]!=null&&!argsArr[i].isEmpty()){
result[14+index]=argsArr[i];
index++;
}
}
log.info(Arrays.toString(result));
return result;
}
public static void main(String[] args) {
Application app = new Application();
app.setFlinkHome("E:/flink-1.14.2");
app.setMainClass("com.wilson.WordCount");
app.setJarPath("E:\project_code\word-count\target\word-count-1.0-SNAPSHOT.jar");
app.setQueue("default");
app.setExecuteMode("yarn-application");
app.setParallelism(1);
app.setMemOfJm("512m");
app.setMemOfTm("512m");
app.setNumSlot(1);
app.setOtherArgs("--broker hadoop202:9092 --topic number");
FlinkClient cli = new FlinkClient();
cli.startJob(app);
}
}
- 后面就是springboot那一套,写了个小页面,controller层调service层,service层new了一个FlinkClient,然后调用FlinkClient的startJob方法
public class ApplicationServiceImpl implements ApplicationService {
@Autowired
FlinkClient flinkClient;
@Override
public void startApp(Application app) {
flinkClient.startJob(app);
}
}
- 项目打包,上传至服务器,java -jar FlinkAdmin-1.0-SNAPSHOT.jar 运行即可 备注:对服务器的要求 <1>. jdk8 <2>. 安装了hadoop和flink且配置了HADOOP_HOME和FLINK_HOME
- 项目启动后,访问:http://ip:8082/index.html,配置后作业参数,提交即可