Standalone on Docker
命令行:
FLINK_PROPERTIES="jobmanager.rpc.address: jobmanager"
docker network create flink-network
# 启动jobmanager
docker run \
--rm \
--name=jobmanager \
--network flink-network \
--publish 8081:8081 \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}" \
flink:1.13.0-scala_2.11 jobmanager
# 启动taskmanager
docker run \
--rm \
--name=taskmanager \
--network flink-network \
--env FLINK_PROPERTIES="${FLINK_PROPERTIES}" \
flink:1.13.0-scala_2.11 taskmanager
使用docker-compose.yml
version: "2.2"
services:
jobmanager:
image: flink:1.13.0-scala_2.11
ports:
- "8081:8081"
command: jobmanager
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager:
image: flink:1.13.0-scala_2.11
depends_on:
- jobmanager
command: taskmanager
scale: 1
environment:
- |
FLINK_PROPERTIES=
jobmanager.rpc.address: jobmanager
taskmanager.numberOfTaskSlots: 2
在docker-compose目录下执行:
docker-compose up
启动之后可以访问localhost:8081
整个集群有一个JobManager一个TaskManager正在运行。
下载flink包:flink.apache.org/downloads.h…
运行代码 :
./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
你可以在Running Jobs中看到正在运行的任务。
Local
下载flink包:flink.apache.org/downloads.h…
在flink目录下
# 启动集群
./bin/start-cluster.sh
# 停止集群
./bin/stop-cluster.sh
# 查看进程
jps
StandaloneSessionClusterEntrypoint
TaskManagerRunner
Standalone
- node01作为JobManager
- node02/03作为TaskManager
修改conf/flink-conf.yaml
jobmanager.rpc.address: node01
修改conf/masters
node01:8081
修改conf/workers
node02
node03
在node01上启动停止集群
HA 部署
HA就是部署多个JobManager,达到容灾的作用。
- 持久化Job到zk
- 持久化snapshot到zk
- 创建全局checkpoint,并持久化到zk
- 修改 conf/flink-conf.yaml
high-availability: zookeeper
high-availability.cluster-id: custom-id
high-availability.zookeeper.quorum: node01:2181,node02:2181,node03:2181
high-availability.zookeeper.path.root: /flink
high-availability.storageDir: hdfs:///flink/recovery
- 修改conf/masters
node01:8081
node02:8082
- 启动停止
# 启动集群
./bin/start-cluster.sh
# 停止集群
./bin/stop-cluster.sh
Flink on Yarn
- 配置hadoop的环境变量
export HADOOP_CLASSPATH=
- Session Mode
# 启动Session集群
./bin/yarn-session.sh
# 提交任务
./bin/flink run ./examples/streaming/TopSpeedWindowing.jar
# 指定applicationId提交任务
./bin/flink run -t yarn-session -Dyarn.application.id=application_XXXX_YY ./examples/streaming/TopSpeedWindowing.jar
# 停止Session集群
echo "stop" | ./bin/yarn-session.sh -id application_XXXXX_XXX
- Per-Job Mode
# 提交任务
./bin/flink run -t yarn-per-job --detached ./examples/streaming/TopSpeedWindowing.jar
# 列出集群重的所有任务
./bin/flink list -t yarn-per-job -Dyarn.application.id=application_XXXX_YY
# 取消任务
./bin/flink cancel -t yarn-per-job -Dyarn.application.id=application_XXXX_YY <jobId>
- Application Mode
# 提交任务
./bin/flink run-application -t yarn-application ./examples/streaming/TopSpeedWindowing.jar
# 列出集群重的所有任务
./bin/flink list -t yarn-application -Dyarn.application.id=application_XXXX_YY
# 取消任务
./bin/flink cancel -t yarn-application -Dyarn.application.id=application_XXXX_YY <jobId>
# 提交任务,这种模式会吧jar预先发到hdfs上,这样每一个节点都可以访问这些jar
./bin/flink run-application -t yarn-application \
-Dyarn.provided.lib.dirs="hdfs://myhdfs/my-remote-flink-dist-dir" \
hdfs://myhdfs/jars/my-application.jar
HA on Yarn
- 修改yarn-site.xml配置
<property>
<name>yarn.resourcemanager.am.max-attempts</name>
<value>4</value>
<description>ApplicationMaster最大启动次数</description>
</property>
- 修改 conf/flink-conf.yaml
high-availability: zookeeper
high-availability.zookeeper.quorum: node01:2181,node02:2181,node03:2181
high-availability.zookeeper.path.root: /flink
high-availability.storageDir: hdfs:///flink/recovery
yarn.application-attempts: 10
- 启动集群
# 启动集群
./bin/yarn-session.sh -n 2
flink run -h
| 参数 | 解释 |
|---|---|
| -c | main函数类 |
| -C | classpath |
| -d,--detached | 后台执行 |
| -p,--parallelism | 并行度 |
| -D | 配置ci.apache.org/projects/fl… |
| -t | "remote", "local", "kubernetes-session", "yarn-per-job", "yarn-session" |
yarn-session.sh -h
Run in IDE
- Create a Maven Project
- Add new Archetype 'org.apache.flink:flink-walkthrough-datastream-java:1.13.0'
- Create new project with this archetype
- Add the test code
public class WordCount {
public static void main(String[] args) throws Exception {
String hostname = "localhost";
Integer port = 9999;
// env
final StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
//socket source
DataStreamSource<String> stream = env.socketTextStream(hostname, port);
SingleOutputStreamOperator<Tuple2<String, Integer>> sum = stream.flatMap(new WordSplit())
.keyBy(0)
.sum(1);
sum.print();
//execute
env.execute("SocketTextStreamWordCount");
}
public static final class WordSplit implements FlatMapFunction<String, Tuple2<String, Integer>> {
@Override
public void flatMap(String s, Collector<Tuple2<String, Integer>> collector) {
String[] words = s.toLowerCase().split(",");
for (String word: words) {
collector.collect(new Tuple2<String, Integer>(word, 1));
}
}
}
}
- test with socket
nc -l 9999