Flink - checkpoint & savepoint

1,234 阅读5分钟

checkpoint & savepoint

image-20211208201939048

what:
  • 快照: 某一时刻数据状态的全局拷贝,flink中指状态全局一致的镜像.
  • Checkpoint(检查点): 由flink自动执行的快照,用于故障恢复。通常checkpoints不会被用户操纵
  • savepoint(保存点):用户出于某种操作目的(例如有状态的重新部署/升级/缩放操作)手动(或 API 调用)触发的快照。Savepoints 始终是完整的,并且已针对操作灵活性进行了优化。
why:

保障应用容错

1、实现flink job重启,降低重启的开销

2、实现exactly-once(每个事件只影响flink状态一次)的条件之一(两阶段提交 + source exactly once + sink exactly once)

image-20211210100502095

how:

flink使用异步barrier快照实现作业全局状态的持久化。

1、barrier

barrier由jobManager(checkpoint Coordinator)触发产生(通知sourceTask产生barrier)并注入数据流,barrier可以看成一种特殊的流事件元素,它将流数据切分为两部分:老数据和新数据。

2、异步

基于写时复制的异步生成快照

** 写时复制: 如果这份内存数据没有任何修改,那没必要生成一份拷贝,只需要有一个指向这份数据的指针,通过指针将本地数据同步到State Backend上;如果这份内存数据有一些更新,那再去申请额外的内存空间并维护两份数据,一份是快照时的数据,一份是更新后的数据。

快照生成流程

1、checkpoint Coordinator发送checkpoint执行命令,并将barrier注入数据流源头。此时sources会记录当前当前消费kafka分区offset(完成后通知checkpoint coordinator)

2、barrier接着向下游传递,当非源和非sink算子接收到barrier时异步对算子状态进行保存(完成后通知checkpoint coordinator),并广播至下游算子。

3、当sink接收到来自上游的算子后,根据容错保证语义存在两种处理方式

1、引擎内exactly-once: sink算子保存state快照,完成后通知checkpoint coordinator。当协调器接收到所有算子快照完成的消息,发送本次快照完成消息给所有算子。

2、端到端exactly-once: sink算子保存state快照,预提交事务,然后通知checkpoint coordinator,当协调器接收到所有算子快照完成的消息时,发送本次快照完成消息给所有算子,sink提交事务。

image-20211207105802973

barrier对齐,比如上游a barrier先到,此时当前算子会缓存上游后续数据,直到上游b barrier到达,执行快照,此时数据处理和checkpoint都处于阻塞状态

img

1.11及以后引入非对齐检查点,默认关闭,当上游a barrier到达以后,立即执行checkpoint,并发往下游,将其他流中后续到来的该快照的记录进行异步存储。

img

存储文件由两部分组成:

元数据文件和数据文件,数据文件存储状态镜像,元数据文件存储执指向数据文件的指针

checkpoint文件格式(FsStateBackend):

image-20211210094834897

savepoint文件格式:

image-20211210094922198

不同点:

1、触发时机不同

* checkpoint由flink自动触发
* savepoint有用于手动触发

2、生命周期管理方式不同

  • checkpoint生命周期由flink自己创建、管理和删除,默认情况下,作业结束后删除
  • savepoint由用户手动创建和删除,因此可以在作业停止后继续存在

3、使用场景不同

  • checkpoint用于故障恢复(如果savepoint之后没有checkpoint,flink会读取savepoint进行恢复)
  • savepoint用于调整用户逻辑、ab实验、并行度修改,版本升级等

4、实现差异

Checkpoint 和 Savepoint 的当前实现基本上使用相同的代码并生成相同的格式。然而,目前有一个例外,我们可能会在未来引入更多的差异。例外情况是使用 RocksDB 状态后端的增量 Checkpoint。他们使用了一些 RocksDB内部格式,而不是 Flink 的本机 Savepoint 格式。这使他们成为了与 Savepoint 相比,更轻量级的Checkpoint机制的第一个实例。

配置

checkpoint

StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.enableCheckpointing(1000);
env.getCheckpointConfig().enableExternalizedCheckpoints(ExternalizedCheckpointCleanup.DELETE_ON_CANCELLATION);
env.getCheckpointConfig().setCheckpointStorage("file:///...");
最大并发checkpoint: env1.getCheckpointConfig().getMaxConcurrentCheckpoints() default 1
最小间隔时间: env1.getCheckpointConfig().getMinPauseBetweenCheckpoints() default 0 : 这个值也意味着并发 checkpoint 的数目是一,和最大并发checkpoint不能同时使用
​
#配置
state.checkpoint.dir
  
# 恢复checkpoint
bin/flink run -s :checkpointMetaDataPath [:runArgs]

savepoint

# 配置
state.savepoint.dir
​
# 触发
bin/flink savepoint :jobId [:targetDirectory]
bin/flink savepoint :jobId [:targetDirectory] -yid :yarnAppId# savepoint并取消
bin/flink cancel -s [:targetDirectory] :jobId# savepoint恢复
bin/flink run -s :savepointPath [:runArgs]
-n: allowNonRestoredState 允许跳过删除的运算符
​
# 删除savepoint
bin/flink savepoint -d :savepointPath

image-20211207151647349

www.ververica.com/blog/differ…

  1. Objective: Conceptually, Flink's Savepoints are different from Checkpoints in a similar way that backups are different from recovery logs in traditional database systems. Checkpoints’ primary objective is to act as a recovery mechanism in Apache Flink ensuring a fault-tolerant processing framework that can recover from potential job failures. Conversely, Savepoints’ primary goal is to act as the way to restart, continue or reopen a paused application after a manual backup and resume activity by the user.
  2. Implementation: Checkpoints and Savepoints differ in their implementation. Checkpoints are designed to be lightweight and fast. They might (but don’t necessarily have to) make use of different features of the underlying state backend and try to restore data as fast as possible As an example, incremental Checkpoints with the RocksDB State backend use RocksDB’s internal format instead of Flink’s native format. This is used to speed up the checkpointing process of RocksDB that makes them the first instance of a more lightweight Checkpointing mechanism. On the contrary, Savepoints are designed to focus more on the portability of data and support any changes made to the job that make them slightly more expensive to produce and restore.
  3. Lifecycle: Checkpoints are automatic and periodic in nature. They are owned, created and dropped automatically and periodically by Flink, without any user interaction, to ensure full recovery in case of an unexpected job failure. On the contrary, Savepoints are owned and managed (i.e. they are scheduled, created, and deleted) manually by the user.

\