Kafka学习

119 阅读1分钟

1. 是什么

官网

Apache Kafka is an open-source distributed event streaming platform used by thousands of companies for high-performance data pipelines, streaming analytics, data integration, and mission-critical applications.

Kafka is a distributed system consisting of servers and clients that communicate via a high-performance TCP network protocol. It can be deployed on bare-metal hardware, virtual machines, and containers in on-premise as well as cloud environments.

Event: An event records the fact that "something happened" in the world or in your business. It is also called record or message in the documentation. When you read or write data to Kafka, you do this in the form of events. Conceptually, an event has a key, value, timestamp, and optional metadata headers. Here's an example event:

Event key: "Alice" Event value: "Made a payment of $200 to Bob" Event timestamp: "Jun. 25, 2020 at 2:06 p.m." producer and consumer 分布式

为了容错,每个partition都有备份

关键词:开源,分布式,事件流,high-performance, 流分析,数据整合

2. 关键概念

Event streaming thus ensures a continuous flow and interpretation of data so that the right information is at the right place, at the right time.

Kafka is an event streaming platform.

To publish (write) and subscribe to (read) streams of events, including continuous import/export of your data from other systems. To store streams of events durably and reliably for as long as you want. To process streams of events as they occur or retrospectively.

Difference between Kafka and RabbitMQ

参考

  • Message in Kafka is persistent
  • In RabbitMQ, after consumed, message would be removed from the queue
  • RabbitMQ supports several standardized protocols such as AMQP, MQTT, STOMP, etc
  • Kafka uses a custom protocol, on top of TCP/IP for communication between applications and the cluster.
  • RabbitMQ has more routing options
  • RabbitMQ supports priority queues