CAP
What's CAP
C (Consistency) A (Availability) P (Partition Tolerance)
理解CAP最简单的例子就是两个实例各在两个分区, 目前两个分区之间网络断开了
- 如果两个分区都能响应, 则满足A, 但不满足C
- 如果只有一个分区能响应, 则满足C, 但不满足A
RAFT
一个演示Distributed Consensus Example的网站 thesecretlivesofdata.com/raft/
etcd consensus algorithm is RAFT
Leader Election:
- As default all nodes are
followers. When afollowerhas not reveived aheartbeatfromleaderfor aelection timeout(random 150ms ~ 300ms), it becomes acandidatewith a new+1 election termand sendrequest votemessage to other nodes (after voting to thecandidate, thefollowers' term also become the new term). - When a
candidateget a majority of vote number(everyfollowercan only vote once per term), it becomes aleadernode. - The
leadersendsAppend Entriesmessages (used forheart beats) with theleader's changes tofollowersperiodically (far less thanelection timeout), andfollowersresponse to theAppend Entriesmessages.
All change of system must go through the leader node(for example a client sends a msg to a follower, the follower would forward the msg to leader).
Log Replication
- When the
leadergets an entry (set x value) from aclient, it saves the entry in the log (WAL). But the entry's status isuncommited, so the x value is still old. - Then the
leaderbroadcasts the entry to the otherfollowersthrought the nextheartbeats, andfollowerssave the entry in the log (uncommited) and return acks. - If
leadergets a majority of acks, it runs the entry (the entry iscommited) and update the x value. And theleaderresponse to theclient - The
leadernotifies the otherfollowers, andfollowersalso commit the entry (applythe log intostatus machine).
Partition Tolerance
- Because of the network partition, there would be two
leadersat the same time, But only oneleadercan response to theclientsuccessfully because only the "true"leadercan received a majority of votes, so the "fake"leaderwould not response to the client and the client would retry to request. - When the network partition is healed, the old
leaderreceived the newleader'sheartbeatsand becomes afollowerbecause itstermis smaller. - And the old partition's nodes will rollback their
uncommitedentry log.
ZAB
Gossip
Distributed Lock
分布式锁用于多个实例同时访问同一块资源做限制
Cases
For example, 我们模版分享到TT需要先通过TT的OpenAPI(法务规定)拿TT的share_id, 但每次请求需要带Token是有有效期的且每次请求都会刷新Token, 导致之前的Token失效
为了防止QPS过高, 也为了提升性能, 用redis去存储TT返回的Token, 有一个后台线程定时去请求Token然后更新到redis
当模版分享时需要使用Token去请求TT的OpenApi拿到ShareID, 此时可能Token失效(毕竟token是定时reload的), 如果发现http报错失效了就需要主动请求再去请求OpenApi去拿Token, 这时就需要分布式锁
在分布式锁一个锁周期内完成, 请求Token保证是最新可用的Token, 然后更新到redis, 然后解锁
采用的锁是redis实现的分布式锁
加锁:
SETNX key value ex 30
其中value就是token; 30是过期时间, 为了防止取锁太长时间阻塞其他线程
因为是取锁, 所以可能取失败, 这时候就需要用for-sleep重试(重试也有次数)
解锁:
DEL key
Which Distributed lock
redis
Lock:
SETNX lock_key unique_value EX 30
lock_key: 分布式锁的唯一key
unique_value: 每个client有独有的value, 这样保证一个锁只能由它加锁方来解锁
EX 过期时间
Unlock:
lua脚本:
if redis.call("get",KEYS[1]) == ARGV[1] then
return redis.call("del",KEYS[1])
else
return 0
end
其中KEYS[1]是lock_key, ARGV[1]是unique_value, 保证一个锁只能由它加锁方来解锁
bad case:
当clientA取到锁后, redis的master宕机了, failover后slave变成了master, 此时另一个clientB便也能取到锁了
etcd
etcd是线性一致性的KV存储, 所以可以解决上述redis因master和slave数据不一致导致的bad case
zookeeper
REF
- Distributed lock proposal: https://byte_dance.fei_shu.cn/wiki/wikcnErkj2kh52PFcn5ET2TPrId#