Flink安装部署

889 阅读2分钟

Standalone on Docker

命令行:

FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager"
docker network create flink-network

# 启动jobmanager
docker run \
    --rm \
    --name=jobmanager \
    --network flink-network \
    --publish 8081:8081 \
    --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" \
    flink:1.13.0-scala_2.11 jobmanager
# 启动taskmanager
docker run \
    --rm \
    --name=taskmanager \
    --network flink-network \
    --env FLINK_PROPERTIES="${FLINK_PROPERTIES}" \
    flink:1.13.0-scala_2.11 taskmanager

使用docker-compose.yml

version: "2.2"
services:
  jobmanager:
    image: flink:1.13.0-scala_2.11
    ports:
      - "8081:8081"
    command: jobmanager
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager

  taskmanager:
    image: flink:1.13.0-scala_2.11
    depends_on:
      - jobmanager
    command: taskmanager
    scale: 1
    environment:
      - |
        FLINK_PROPERTIES=
        jobmanager.rpc.address: jobmanager
        taskmanager.numberOfTaskSlots: 2

在docker-compose目录下执行:

docker-compose up

启动之后可以访问localhost:8081

image.png 整个集群有一个JobManager一个TaskManager正在运行。

下载flink包:flink.apache.org/downloads.h…
运行代码 :

./bin/flink run ./examples/streaming/TopSpeedWindowing.jar

你可以在Running Jobs中看到正在运行的任务。

Local

下载flink包:flink.apache.org/downloads.h…
在flink目录下

# 启动集群
./bin/start-cluster.sh
# 停止集群
./bin/stop-cluster.sh
# 查看进程
jps
StandaloneSessionClusterEntrypoint
TaskManagerRunner

Standalone

  • node01作为JobManager
  • node02/03作为TaskManager
修改conf/flink-conf.yaml
jobmanager.rpc.address: node01

修改conf/masters
node01:8081

修改conf/workers
node02
node03

在node01上启动停止集群

HA 部署

HA就是部署多个JobManager,达到容灾的作用。

  • 持久化Job到zk
  • 持久化snapshot到zk
  • 创建全局checkpoint,并持久化到zk
  1. 修改 conf/flink-conf.yaml
 high-availability: zookeeper
 high-availability.cluster-id: custom-id
 high-availability.zookeeper.quorum: node01:2181,node02:2181,node03:2181
 high-availability.zookeeper.path.root: /flink
 high-availability.storageDir: hdfs:///flink/recovery
  1. 修改conf/masters
node01:8081
node02:8082
  1. 启动停止
# 启动集群
./bin/start-cluster.sh
# 停止集群
./bin/stop-cluster.sh

Flink on Yarn

  1. 配置hadoop的环境变量
export HADOOP_CLASSPATH= 
  1. Session Mode
# 启动Session集群
./bin/yarn-session.sh
# 提交任务
./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
# 指定applicationId提交任务
./bin/flink run -t yarn-session -Dyarn.application.id=application_XXXX_YY ./examples/streaming/TopSpeedWindowing.jar
# 停止Session集群
echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
  1. Per-Job Mode
# 提交任务 
./bin/flink run -t yarn-per-job --detached ./examples/streaming/TopSpeedWindowing.jar
# 列出集群重的所有任务
./bin/flink list -t yarn-per-job -Dyarn.application.id=application_XXXX_YY
# 取消任务
./bin/flink cancel -t yarn-per-job -Dyarn.application.id=application_XXXX_YY <jobId>
  1. Application Mode
# 提交任务
./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar

# 列出集群重的所有任务
./bin/flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY
# 取消任务
./bin/flink cancel -t yarn-application -Dyarn.application.id=application_XXXX_YY <jobId>
# 提交任务,这种模式会吧jar预先发到hdfs上,这样每一个节点都可以访问这些jar
./bin/flink run-application -t yarn-application \
	-Dyarn.provided.lib.dirs="hdfs://myhdfs/my-remote-flink-dist-dir" \
	hdfs://myhdfs/jars/my-application.jar

HA on Yarn

  1. 修改yarn-site.xml配置
<property>
    <name>yarn.resourcemanager.am.max-attempts</name>
    <value>4</value>
    <description>ApplicationMaster最大启动次数</description>
</property>
  1. 修改 conf/flink-conf.yaml
 high-availability: zookeeper
 high-availability.zookeeper.quorum: node01:2181,node02:2181,node03:2181
 high-availability.zookeeper.path.root: /flink
 high-availability.storageDir: hdfs:///flink/recovery
 yarn.application-attempts: 10
  1. 启动集群
# 启动集群
./bin/yarn-session.sh -n 2

flink run -h

参数解释
-cmain函数类
-Cclasspath
-d,--detached后台执行
-p,--parallelism并行度
-D配置ci.apache.org/projects/fl…
-t"remote", "local", "kubernetes-session", "yarn-per-job", "yarn-session"

yarn-session.sh -h

Run in IDE

  1. Create a Maven Project
    • Add new Archetype 'org.apache.flink:flink-walkthrough-datastream-java:1.13.0'
    • Create new project with this archetype image.png
  2. Add the test code
public class WordCount {
    public static void main(String[] args) throws Exception {

        String hostname = "localhost";
        Integer port = 9999;

        // env
        final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        //socket source
        DataStreamSource<String> stream = env.socketTextStream(hostname, port);
        
        SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new WordSplit())
                .keyBy(0)
                .sum(1);

        sum.print();

        //execute
        env.execute("SocketTextStreamWordCount");
    }


    public static final class WordSplit implements FlatMapFunction<String, Tuple2<String, Integer>> {
        @Override
        public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
            String[] words = s.toLowerCase().split(",");

            for (String word: words) {
                collector.collect(new Tuple2<String, Integer>(word, 1));
            }
        }
    }
}
  1. test with socket
nc -l 9999