CAP

What's CAP

C (Consistency) A (Availability) P (Partition Tolerance)

理解CAP最简单的例子就是两个实例各在两个分区, 目前两个分区之间网络断开了

如果两个分区都能响应, 则满足A, 但不满足C
如果只有一个分区能响应, 则满足C, 但不满足A

RAFT

一个演示Distributed Consensus Example的网站 thesecretlivesofdata.com/raft/

etcd consensus algorithm is RAFT

Leader Election:

As default all nodes are followers. When a follower has not reveived a heartbeat from leader for a election timeout (random 150ms ~ 300ms), it becomes a candidate with a new +1 election term and send request vote message to other nodes (after voting to the candidate, the followers' term also become the new term).
When a candidate get a majority of vote number(every follower can only vote once per term), it becomes a leader node.
The leader sends Append Entries messages (used for heart beats) with the leader's changes to followers periodically (far less than election timeout), and followers response to the Append Entries messages.

All change of system must go through the leader node(for example a client sends a msg to a follower, the follower would forward the msg to leader).

Log Replication

When the leader gets an entry (set x value) from aclient, it saves the entry in the log (WAL). But the entry's status is uncommited, so the x value is still old.
Then the leader broadcasts the entry to the other followers throught the next heartbeats, and followers save the entry in the log (uncommited) and return acks.
If leader gets a majority of acks, it runs the entry (the entry is commited) and update the x value. And the leader response to the client
The leader notifies the other followers, and followers also commit the entry (apply the log into status machine).

Partition Tolerance

Because of the network partition, there would be two leaders at the same time, But only one leader can response to the client successfully because only the "true" leader can received a majority of votes, so the "fake" leader would not response to the client and the client would retry to request.
When the network partition is healed, the old leader received the new leader's heartbeats and becomes a follower because its term is smaller.
And the old partition's nodes will rollback their uncommited entry log.

ZAB

Gossip

Distributed Lock

分布式锁用于多个实例同时访问同一块资源做限制

Cases

For example, 我们模版分享到TT需要先通过TT的OpenAPI(法务规定)拿TT的share_id, 但每次请求需要带Token是有有效期的且每次请求都会刷新Token, 导致之前的Token失效

为了防止QPS过高, 也为了提升性能, 用redis去存储TT返回的Token, 有一个后台线程定时去请求Token然后更新到redis

当模版分享时需要使用Token去请求TT的OpenApi拿到ShareID, 此时可能Token失效(毕竟token是定时reload的), 如果发现http报错失效了就需要主动请求再去请求OpenApi去拿Token, 这时就需要分布式锁

在分布式锁一个锁周期内完成, 请求Token保证是最新可用的Token, 然后更新到redis, 然后解锁

采用的锁是redis实现的分布式锁

加锁:

SETNX key value ex 30

其中value就是token; 30是过期时间, 为了防止取锁太长时间阻塞其他线程

因为是取锁, 所以可能取失败, 这时候就需要用for-sleep重试(重试也有次数)

解锁:

DEL key

Which Distributed lock

redis

Lock:

SETNX lock_key unique_value EX 30

lock_key: 分布式锁的唯一key

unique_value: 每个client有独有的value, 这样保证一个锁只能由它加锁方来解锁

EX 过期时间

Unlock:

lua脚本:

if redis.call("get",KEYS[1]) == ARGV[1] then
    return redis.call("del",KEYS[1])
else
    return 0
end

其中KEYS[1]是lock_key, ARGV[1]是unique_value, 保证一个锁只能由它加锁方来解锁

bad case:

当clientA取到锁后, redis的master宕机了, failover后slave变成了master, 此时另一个clientB便也能取到锁了

etcd

etcd是线性一致性的KV存储, 所以可以解决上述redis因master和slave数据不一致导致的bad case

zookeeper

REF

Distributed lock proposal: https://byte_dance.fei_shu.cn/wiki/wikcnErkj2kh52PFcn5ET2TPrId#

分布式 - Must Know

CAP