Exactly Once in Flink | 青训营笔记DataStream and Dynamic Tables;

这是我参与「第四届青训营」笔记创作活动的的第3天

1. DataStream and Dynamic Tables

Relational QUeries on Data Streams | Relational Algebra/SQL | Stream Processing | | --- | --- | | Relations (or tables) are bounded (multi-)sets of tuples. | A stream is an infinite sequences of tuples. | | A query that is executed on batch data (e.g., a table in a relational database) has access to the complete input data. | A streaming query cannot access all data when it is started and has to "wait" for data to be streamed in. | | A batch query terminates after it produced a fixed sized result. | A streaming query continuously updates its result based on the received records and never completes. |
Dynamic Tables & Continuous Queries
- Dynamic tables are the core concept of Flink’s Table API and SQL support for streaming data. In contrast to the static tables that represent batch data, dynamic tables change over time. But just like static batch tables, systems can execute queries over dynamic tables. Querying dynamic tables yields a Continuous Query. A continuous query never terminates and produces dynamic results - another dynamic table. The query continuously updates its (dynamic) result table to reflect changes on its (dynamic) input tables.
- The relationship of streams, dynamic tables, and continuous queries:
  1. A stream is converted into a dynamic table.
  2. A continuous query is evaluated on the dynamic table yielding a new dynamic table.
  3. The resulting dynamic table is converted back into a stream.
Defining a Table on a Stream
- Processing streams with a relational query require converting it into a Table. Conceptually, each record of the stream is interpreted as an INSERT modification on the resulting table. We are building a table from an INSERT-only changelog stream.
Continuous Queries
- A continuous query never terminates
- A continuous query updates its result table according to its input tables’ updates
- Although the two example queries appear to be quite similar (both compute a grouped count aggregate), they differ in one crucial aspect:
  - The first query updates previously emitted results, i.e., the changelog stream that defines the result table contains INSERT and UPDATE changes.
  - The second query only appends to the result table, i.e., the result table’s changelog stream only consists of INSERT changes.
- Table to Stream Conversion
  - When converting a dynamic table into a stream or writing it to an external system, these changes need to be encoded. Flink’s Table API and SQL support three ways to encode the changes of a dynamic table:
    - Append-only stream: A dynamic table that is only modified by INSERT changes can be converted into a stream by emitting the inserted rows.
    - Retract stream: A retract stream is a stream with two types of messages, add messages and retract messages. A dynamic table is converted into a retract stream by encoding an INSERT change as add message, a DELETE change as a retract message, and an UPDATE change as a retract message for the updated (previous) row, and an additional message for the updating (new) row. The following figure visualizes the conversion of a dynamic table into a retract stream.
    - Upsert stream: An upsert stream is a stream with two types of messages, upsert messages and delete messages. A dynamic table that is converted into an upsert stream requires a (possibly composite) unique key. A dynamic table with a unique key is transformed into a stream by encoding INSERT and UPDATE changes as upsert messages and DELETE changes as delete messages. The stream consuming operator needs to be aware of the unique key attribute to apply messages correctly. The main difference to a retract stream is that UPDATE changes are encoded with a single message and hence more efficient. The following figure visualizes the conversion of a dynamic table into an upsert stream.

2. Exactly once and Checkpoint

In typical stream processing programs, there are three types of processing semantics:
- At-most-once: it means that a message will only be consumed and processed once regardless of the success of subsequent processing, then there is the possibility of data loss
- At-least-once: indicating that a message may occur multiple times from consumption to subsequent successful processing
- Exactly-once: indicating that a message will only occur once from its consumption to the success of the subsequent processing. The most rigorous processing semantics
State Snapshot and Recovery
- Chandy-Lamport Algorithm (Distributed Snapshots)
  - The start of snapshot creation
    - Each source operator receives a Checkpoint Barrier from JM that identifies the start of state snapshot production
  - Processing of Source operators
    - After each source saves its own state, it continues to send Checkpoint Barrier to all connected downstreams, and at the same time informs JM that it has finished making its own state
  - Barrier Alignment
    - The source waits for all upstream barriers to arrive before starting the snapshot
    - Upstream operators that have already been produced will continue to process the data and will not be blocked by the process of producing snapshots by downstream operators
  - Decoupling of snapshot production and processing data
  - End of Checkpoint
    - After all the operators have informed JM that the state is made, the whole checkpoint is finished
The impact of Checkpoint on job performance
- Decouples snapshot creation and data processing, each operator can process data normally after completing the state snapshot, without waiting for downstream operators to complete the snapshot
- The need to pause data processing during snapshot creation and Barrier Alignment still increases data processing latency
- Saving snapshots to remote server can also be extremely time consuming

3. Flink End-to-End Exactly-once Semantics

End-to-End Exactly-once Semantics
- Checkpoint ensures that each piece of data is updated once for each stateful operator, and the sink output operator may still send duplicate data.
- Strictly end-to-end Exactli-once semantics requires a special sink operator implementation.
Two phase commit protocol (2PC)
- In a distributed system where multiple nodes participate in execution, in order to coordinate each node to commit or roll back a transactional operation at the same time, a central node is introduced to unify the execution logic of all nodes, which is called a coordinator, and the other nodes scheduled by the central node are called participant
- In a "normal execution" of any single distributed transaction (i.e., when no failure occurs, which is typically the most frequent situation), the protocol consists of two phases:
  - The commit-request phase (or voting phase):
    1. A coordinator process send a commit message to all participants
    2. Each participant receives the message and executes the transaction, but does not actually commit it
    3. If the transaction completes successfully, send a success message (vote yes); if the transaction fails, send a failure message (vote no)
  - The commit phase:
    - If all have votes "yes":
      1. The coordinator sends a commit message to all the participants.
      2. Each participant completes the operation, and releases all the locks and resources held during the transaction.
      3. Each participant sends an acknowledgement to the coordinator.
      4. The coordinator completes the transaction when all acknowledgements have been received.
    - Otherwise (any participant votes "no", or waiting timeout):
      1. The coordinator sends a rollback message to all the participants
      2. Each participant undoes the transaction using the undo log, and releases the resources and locks held during the transaction.
      3. Each participant sends an acknowledgement to the coordinator.
      4. The coordinator undoes the transaction when all acknowledgements have been received.
2PC in Flink
2PC in Flink Summary
1. Transaction Start: Before the sink task writes data downstream, a transaction started, and all subsequent operations of writing data are executed in this transaction, before the transaction is committed, the data written by the transaction is not readable downstream.
2. The commit-request phase: JobManager starts to sent out Checkpoint barrier, and when each operator receives the barrier, it stops processing subsequent data and makes a snapshot of the current state, at which time sink also does not continue to process data under the current transaction. If the state is made successfully, a success message is sent to JM, and if it fails, a failure message is sent.
3. Commit phase: If JM receives all pre-commit messages successfully, it sends a message to all operators (including sink) that the transaction can be committed, and when sink receives this message, it completes the commitment of this transaction, and at this time the data written in this transaction can be read downstream; if JM receives any failure message, it notifies all processing logic to roll back the operation of this transaction, and at this time sink discards the data committed in this transaction
Disadvantages:
- If the coordinator fails permanently, some participants will never resolve their transactions: After a participant has sent an agreement message to the coordinator, it will block until a commit or rollback is received.
- Synchronous blocking: Participants in 2PC are blocking. Resources are pre-locked upon receipt of a request in the first phase and are not released until after commit.
- Data inconsistency: 2PC If the coordinator hangs during the second stage of commit, some participants receive a commit request and some participants do not, resulting in inconsistent data.

Exactly Once in Flink | 青训营笔记

这是我参与「第四届青训营 」笔记创作活动的的第3天

1. DataStream and Dynamic Tables

2. Exactly once and Checkpoint

3. Flink End-to-End Exactly-once Semantics

这是我参与「第四届青训营」笔记创作活动的的第3天