Flink Architecture | 青训营笔记

117 阅读4分钟

这是我参与「第四届青训营 」笔记创作活动的的第2天

1. Flink Intro

  • Flink is a distribuited stream processing framework that enables efficient processing of bounded and unbounded data streams. The core of Flink is stream processing, but it can also support batch processing. Flink views batch processing as a special case of stream processing, i.e., data streams have clear boundaries.
    • Unbounded streams have a start but no defined end. They do not terminate and provide data as it is generated. Unbounded streams must be continuously processed, i.e., events must be promptly handled after they have been ingested.
    • Bounded streams have a defined start and end. Bounded streams can be processed by ingesting all data before performing any computations. Processing of bounded streams is also known as batch processing. Screen Shot 2022-07-28 at 11.27.34.png

2. Flink Layered Architecture

  • Layered architecture design
    • Screen Shot 2022-07-28 at 11.37.03.png
      • API & Libraries Layer provide the programming API and top-level libraries
        • Programming API: DataStream API for stream processing and DataSet API for batch processing.
        • Top-level libraries: This includes the CEP library for complex event processing; the SQL & Table library for structured data queries, and the batch-based machine learning library FlinkML and the graphical processing library Gelly.
      • Runtime Layer
        • Core layer, including job conversion (to DAG), task scheduling, resource allocation, task execution and other functions, based on the implementation of this layer, you can run both stream processors and batch processors under the stream engine.
      • Deploy Layer
        • Support for deploying and running Flink applications on different platforms.

3. Flink Layered API

  • Flink has more specific layers for API & Libraries Layer Screen Shot 2022-07-28 at 11.57.29.png According to the above hierarchy, the consistency of the API increases from the bottom to the top, and the performance capability of the interface decreases from the bottom to the top, with the core functions of each layer as follows:
    • SQL & Table API
      • The SQL & Table API
        • works with both batch and stream processing, which means you can query both bounded and unbounded data streams with the same semantics and produce the same results. In addition to basic queries, it also supports custom scalar, aggregate, and table-valued functions to meet a wide variety of query needs.
      • DataStream & DataSet APIs
        • are the core APIs of Flink data processing, supporting calls using Java or Scala, providing a series of common operations encapsulation such as data reading, data conversion and data output.
      • Stateful Stream Processing
        • is the lowest level of abstraction and is embedded in the DataStream API through the Process Function. The Process Function is the lowest level API provided by Flink and provides maximum flexibility, allowing developers to have fine-grained control over time and state.

4. Flink Cluster Architecture

  • Core Components

    • The Runtime layer uses the standard Master - Slave structure, where the Master part contains three core components: Dispatcher, ResourceManager and JobManager, while the Slave is mainly the TaskManager process. Their functions are as follows:
      • JobManagers(Masters): JobManagers receive the application from the Dispatcher, which contains the JobGraph, the logical dataflow graph and all its classes and third-party libraries, etc. The JobManagers then convert the JobGraph into an ExecutionGraph and request resources from the ResourceManager to execute the task, and once the resources are requested, the ExecutionGraph is distributed to the corresponding TaskManagers. So there is at least one JobManager for each job; in a high availability deployment, there can be multiple JobManagers, one of which is the leader and the rest are in standby state.
      • TaskManagers(Workers): TaskManagers are responsible for the execution of actual subtasks, and each TaskManager has a certain number of slots, which are a set of fixed-size resources (e.g., computing power, storage space). When a TaskManager starts, it registers its slots with the ResourceManager, which manages them uniformly.
      • Dispatcher: It is responsible for receiving the applications submitted by the client and passing them to the JobManager. In addition, it also provides a WEB UI interface for monitoring the execution of jobs.
      • ResourceManager: ResourceManager receives resource requests from JobManager and assigns TaskManagers with free slots to JobManager to execute tasks. When TaskManagers don't have enough slots to execute tasks, it initiates a session with a third-party platform to request additional resources. Screen Shot 2022-07-28 at 12.38.00.png
  • Task & SubTask

    • Code Example Screen Shot 2022-07-28 at 12.41.14.png
      • Streaming DataFlow Graph Screen Shot 2022-07-28 at 12.42.24.png
      • Assume that the concurrency of the sink operator of the job is 1 and the concurrency of the rest of the operators is 2.
      • Convert Streaming DataFlow Graph to Parallel DataFlow (Execution Graph) Screen Shot 2022-07-28 at 12.44.56.png
    • Task: When performing distributed computation, Flink links together operators that can be linked together, that is, Task, in order to reduce the overhead caused by switching and buffering between threads, and to reduce latency while increasing overall throughput. However, not all operators can be linked, as operations such as keyBy cause network shuffle and repartitioning, so they cannot be linked and can only be treated as a separate Task. In short, a Task is the smallest chain of operators that can be linked (Operator Chains). In the figure below, the source and map operators are linked together, so there are only three Tasks for the entire job. Screen Shot 2022-07-28 at 12.49.24.png
    • SubTask: A subtask is one parallel slice of a task, that is, a Task can be split into multiple SubTasks according to its parallelism. As shown above, source & map have two degrees of parallelism, KeyBy has two degrees of parallelism, and Sink has one degree of parallelism, so there are only 3 Task, but 5 SubTask. Jobmanager is responsible for defining and splitting these SubTasks, and giving them to Taskmanagers to execute, each SubTask is a separate thread.
  • Resource Manage

    • One possible distribution scenario is as follows: Screen Shot 2022-07-28 at 12.52.58.png In this case, each SubTask thread runs in a separate TaskSlot, and they share the TCP connection (through multiplexing) and heartbeat messages of the TaskManager process to which they belong, thus reducing the overall performance overhead. This seems to be the best case, but the resources required for each operation are not the same.

      For this reason, Flink allows multiple subtasks to share slots, even if they are subtasks of different tasks, as long as they come from the same Job. Suppose the parallelism of souce & map and keyBy above is adjusted to 6, and the number of slots remains the same, then the situation is as follows: Screen Shot 2022-07-28 at 12.57.43.png You can see that a Task Slot has multiple SubTask subtasks running in it, and each subtask is still executed in a separate thread, but sharing a set of Slot resources. Flink handles this problem simply by default, the number of Slots required by a Job is equal to the maximum parallelism of its Operation. For example, if the parallelism of operations A, B, and D is 4, and the parallelism of operations C and E is 2, then the whole Job needs at least four Slots to complete. Through this mechanism, Flink can not care about how many Tasks and SubTasks a Job will be split into. Screen Shot 2022-07-28 at 13.02.41.png

  • Component Communication

    • All components of Flink communicate based on the Actor System, which is a container for actors with multiple roles, providing services such as scheduling, configuration, logging, etc., and contains a pool of threads that can start all the actors, and messages are shared through shared memory if the actor is local, but passed through RPC calls if the actor is remote. if the actor is local, messages are shared via shared memory, but if the actor is remote, messages are passed via RPC calls. Screen Shot 2022-07-28 at 13.04.27.png