分布式系统中的数据一致性保障拜占庭将军问题描述了分布式系统中的共识难题：一群将军必须协商一致才能行动，但其中可能存在叛徒

什么是一致性？

拜占庭将军的难题

在很久很久以前，拜占庭是东罗马帝国的首都。那个时候罗马帝国国土辽阔，为了防御目的，因此每个军队都分隔很远，将军与将军之间只能靠信使传递消息。在打仗的时候，拜占庭军队内所有将军必须达成一致的共识，才能赢得胜利。但是，在军队内有可能存有叛徒，扰乱将军们的决定。这时候，在已知有成员不可靠的情况下，其余忠诚的将军需要在不受叛徒或间谍的影响下达成一致的协议。

这是美国计算机科学家莱斯利·兰伯特（Leslie Lamport）在1982年提出的“拜占庭将军问题”，并在1989年提出了一种为了解决这个问题的算法（Paxos算法），凭借该算法在分布式计算系统的突出贡献，莱斯利·兰伯特获得了2013年图灵奖。

baijiahao.baidu.com/s?id=167000…

zhuanlan.zhihu.com/p/107439021

对于关系型数据库，要求更新过的数据能被后续的访问都能看到，这是强一致性。若是能容忍后续的部分或者所有访问不到，则是弱一致性。

强一致性

线性一致性（Linearizability）

线性一致性代表着，当数据更新后，所有 Client 的读写都是在数据更新的基础上。 如下图所示，我们假设每份数据有三个副本，分别落到三个节点上。当Client1尝试将X的值置为1时，严格一致性要求当Client1完成更新操作以后，所有Client都要在最新值的基础上进行读写，这里的Client10读取到的值是x=1，在同一时刻Client100的更新操作也是在x=1的基础上进行x+=1操作，在下一个时刻Client1000读到的任意一个副本，X的值都会是2。

顺序一致性（Sequential consistency）

不保证全局行为绝对有序，但是保证分布式服务全局相对有序。

如下图所示，D1先后更新了x=1，x=2，D3先后更新了a=1，a=2。当Client读取到D2节点时，按照顺序一致性要求，所有节点的操作相对顺序都是相同的，一定是x=1在x=2之前，a=1在a=2之前，下图举例的是顺序一致性的其中一种情况。

弱一致性

最终一致性（Eventual Consistency）

不保证在任意时刻、任意节点上的同一份数据都是完全一致的，但是随着时间的迁移，不同节点上的同一份数据总是在向一致的方向变化。其中数据不一致的时间段，称为非一致性窗口。简单说，就是数据写入的一段时间后，各节点的数据最终会达到一致状态。

The difference between sequential consistency and eventual consistency is in the guarantees they provide. Eventual consistency doesn't specify:

What happens if there are concurrent updates to a register
How long the period of inconsistencies lasts

What is the difference between Sequential Consistency and Eventual Consistency?

baijiahao.baidu.com/s?id=167802…

lotabout.me/2019/QQA-Wh…

要不要一致性？

CAP定理

在计算机科学领域，由计算机学科学家Eric Brewer提出的CAP理论（Consistency & Availability & Partition tolerance）如此断言道：在任何的分布式系统中，都只能满足以下三个条件中的两个：

一致性：在集群中的任何节点所读取的数据都是一样的（强一致性）
可用性：集群可始持续外提供服务
分区容错性：集群中任意节点间发生了网络隔离后（节点在线，但是彼此不可通信），集群仍可对外提供服务

en.wikipedia.org/wiki/CAP_th…

stackoverflow.com/questions/1…

分布式系统在CAP上的取舍

Kafka

Our goal was to support replication in a Kafka cluster within a single datacenter, where network partitioning is rare, so our design focuses on maintaining highly available and strongly consistent replicas. Strong consistency means that all replicas are byte-to-byte identical, which simplifies the job of an application developer.

replication.factor = 3
min.insync.replicas = 2
acks = all

副本[1,2,3]， ISR队列[1,2,3]， leader副本1挂了后，从[2,3]中选出leader，数据一致性有保障。

副本[1,2,3]， ISR队列[1,2]， leader副本1挂了后，无法写入成功，可用性无法保障。

replication.factor = 3
min.insync.replicas = 2
acks = 1

副本[1,2,3]， ISR队列[1,2]， leader写入成功返回ack后没有与follower同步就挂了，从[2]中选出leader，可用性有保障，数据一致性无保障。

engineering.linkedin.com/kafka/intra…

stackoverflow.com/questions/5…

cloud.tencent.com/developer/a…

blog.csdn.net/qq_38704184…

ZooKeeper（CP）

ZooKeeper does not guarantee that at every instance in time, two different clients will have identical views of ZooKeeper data. Due to factors like network delays, one client may perform an update before another client gets notified of the change. Consider the scenario of two clients, A and B. If client A sets the value of a znode /a from 0 to 1, then tells client B to read /a, client B may read the old value of 0, depending on which server it is connected to. If it is important that Client A and Client B read the same value, Client B should should call the sync() method from the ZooKeeper API method before it performs its read.

zookeeper.apache.org/doc/r3.1.2/…

Is ZooKeeper always consistent in terms of CAP theorem?

如何实现一致性？

ZAB(Zookeeper Atomic Broadcast)算法

Leader选举

zookeeper.apache.org/doc/current…

www.runoob.com/w3cnote/zoo…

消息广播

ZooKeeper uses a variation of two-phase-commit protocol for replicating transactions to followers. When a leader receive a change update from a client it generate a transaction (Zxid) with sequel number c and the leader’s epoch e and send the transaction to all followers. A follower adds the transaction to its history queue and send ACK to the leader. When a leader receives ACK’s from a quorum it send the the quorum COMMIT for that transaction. A follower that accept COMMIT will commit this transaction unless c is higher than any sequence number in its history queue. It will wait for receiving COMMIT’s for all its earlier transactions (outstanding transactions) before commiting.

zookeeper.apache.org/doc/r3.4.13…

distributedalgorithm.wordpress.com/tag/zookeep…