Etcd crash-safe实现分析etcd的数据生效落库的流程主要是在日志被Leader同步到大多数raft节点，并

etcd的数据生效落库的流程主要是在日志被Leader同步到大多数raft节点，并Apply到应用层数据库的过程。raft状态机重启只会导致日志Entry丢失，raft重启之后，可以向Leader节点同步进行数据恢复的，但是日志Entry事务数据被Apply写库场景相对复杂，日志Entry既不能漏Apply，也不能重复Apply，因此在crash-safe多的是保证etcd日志Entry在Apply流程上容忍节点宕机的能力。

etcd日志Entry组件Apply流程涉及多个组件交互，如：raft状态组件、mvcc、wal组件等等，相关组件的介绍已经在专栏前面的文章由讲述，这里主要叙述etcd是如何组织多组件，并在etcd宕机重启后完成数据恢复，实现数据一致性的，日志Apply流程如图【图解apply流程】

etcd的crash-safe可简单描述为：运行时多组件持久化，恢复时多组件数据进行操作补偿。但是多个组件如何交互、如何持久化、重启后恢复补偿机制才是关键的，因此本文会分成Ready数据产生、写WAL日志、Apply普通日志Entry、重启数据恢复等几个阶段介绍etcd apply流程，并介绍etcd重启后保证数据一致的crash-safe关键流程。

产生Ready数据

Ready是raft状态机将可以Apply的日志/snapshot、需要写入到WAL的日志等就绪数据传递给上层应用的一个数据对象，咱们只关心apply日志Entry相关的数据字段，定义如下：

// Ready encapsulates the entries and messages that are ready to read,
// be saved to stable storage, committed or sent to other peers.
// All fields in Ready are read-only.
type Ready struct {
	// The current state of a Node to be saved to stable storage BEFORE
	// Messages are sent.
	// HardState will be equal to empty state if there is no update.
	pb.HardState

	// Entries specifies entries to be saved to stable storage BEFORE
	// Messages are sent.
	Entries []pb.Entry

	// Snapshot specifies the snapshot to be saved to stable storage.
	Snapshot pb.Snapshot

	// CommittedEntries specifies entries to be committed to a
	// store/state-machine. These have previously been committed to stable
	// store.
	CommittedEntries []pb.Entry

	// MustSync indicates whether the HardState and Entries must be synchronously
	// written to disk or if an asynchronous write is permissible.
	MustSync bool

        // ...  
}

Ready对象中与Apply流程有关的关键字段有如下：

**pb.HardState：**raft状态机内部需要持久化的关键字段，如：Term任期、Vote投票对象、Commit提交的日志index等。

**Entries：**新产生的日志Entry。

**Snapshot：**Leader向Follower同步的snapshot，帮助Follower快速追上Leader的日志进度。

**CommittedEntries：**已经提交的日志Entry，等待应用到状态机中。

**MustSync：**用来通知上层是否需要调用系统sync API进行将数据及时落盘的标识。

raft状态机产生Ready数据之后，通过Ready channel将Ready数据发出去，并用advancec等待上层处理结果，具体代码如下：

func (n *node) run() {
	var propc chan msgWithResult
	var readyc chan Ready
	var advancec chan struct{}
	var rd Ready

	r := n.rn.raft

	// other code ...

	for {
		if advancec != nil {
			readyc = nil
		} else if n.rn.HasReady() {
			// Populate a Ready. Note that this Ready is not guaranteed to
			// actually be handled. We will arm readyc, but there's no guarantee
			// that we will actually send on it. It's possible that we will
			// service another channel instead, loop around, and then populate
			// the Ready again. We could instead force the previous Ready to be
			// handled first, but it's generally good to emit larger Readys plus
			// it simplifies testing (by emitting less frequently and more
			// predictably).
			rd = n.rn.readyWithoutAccept()
			readyc = n.readyc
		}

		select {
		// other case ...

		case <-n.tickc:
			n.rn.Tick()
		case readyc <- rd:
			n.rn.acceptReady(rd)
			advancec = n.advancec
		case <-advancec:
			n.rn.Advance(rd)
			rd = Ready{}
			advancec = nil
		case <-n.stop:
			close(n.done)
			return
		}
	}
}

落WAL日志

raft状态机讲Ready数据发出之后，应用层会通过监听ready channel获取到Ready数据，相关流程在如下文件中：

https://github.com/etcd-io/etcd/blob/main/server/etcdserver/raft.go

由于etcdserver/raft中raftNode对Ready结构的处理比较关键，这里奉上于Apply相关的完整代码流程，并将注释加入到代码里面，具体逻辑如下：

func (r *raftNode) start(rh *raftReadyHandler) {
	internalTimeout := time.Second

	go func() {
		defer r.onStop()
		islead := false

		for {
			select {
			case rd := <-r.Ready():
				// 数据主要是在ApplyAll函数中，被apply到数据库中，
				// 构建notifyc channel主要是为了本协程可以和ApplyAll协程进行进度协调
				notifyc := make(chan struct{}, 1)
				ap := apply{
					entries:  rd.CommittedEntries,
					snapshot: rd.Snapshot,
					notifyc:  notifyc,
				}

				// 将apply数据发送给ApplyAll任务协程
				select {
				case r.applyc <- ap:
				case <-r.stopped:
					return
				}

				// 下面两步是持久化数据：
				// （1）如果ready中包含snapshot数据，就持久化snapshot，然后将snapshot元数据写入到WAL中，该操作主要在如下函数完成：
				//     func (st *storage) SaveSnap(snap raftpb.Snapshot) error，对应图【图解apply流程】流程的3
				// （2）将HardState和新产生的日志写入到WAL中，并会在最后调用sync将数据即时落盘，对应图【图解apply流程】流程的4，5

				// Must save the snapshot file and WAL snapshot entry before saving any other entries or hardstate to
				// ensure that recovery after a snapshot restore is possible.
				if !raft.IsEmptySnap(rd.Snapshot) {
					if err := r.storage.SaveSnap(rd.Snapshot); err != nil {
						if r.lg != nil {
							r.lg.Fatal("failed to save Raft snapshot", zap.Error(err))
						} else {
							plog.Fatalf("failed to save Raft snapshot %v", err)
						}
					}
				}

				if err := r.storage.Save(rd.HardState, rd.Entries); err != nil {
					if r.lg != nil {
						r.lg.Fatal("failed to save Raft hard state and entries", zap.Error(err))
					} else {
						plog.Fatalf("failed to save state and entries error: %v", err)
					}
				}
				if !raft.IsEmptyHardState(rd.HardState) {
					proposalsCommitted.Set(float64(rd.HardState.Commit))
				}

				if !raft.IsEmptySnap(rd.Snapshot) {
					// Force WAL to fsync its hard state before Release() releases
					// old data from the WAL. Otherwise could get an error like:
					// panic: tocommit(107) is out of range [lastIndex(84)]. Was the raft log corrupted, truncated, or lost?
					// See https://github.com/etcd-io/etcd/issues/10219 for more details.
					if err := r.storage.Sync(); err != nil {
						if r.lg != nil {
							r.lg.Fatal("failed to sync Raft snapshot", zap.Error(err))
						} else {
							plog.Fatalf("failed to sync Raft snapshot %v", err)
						}
					}

					// 如果snapshot存在的话，etcdserver在ApplyAll函数中首先会applySnapshot
					// applySnapshot 首先会等待notifyc信号，只有等待信号，snapshot落盘之后才会开始apply snapshot流程
					// etcdserver now claim the snapshot has been persisted onto the disk
					notifyc <- struct{}{}

					// 将snapshot保存到storage中到，对应图【图解apply流程】流程的8.2
					r.raftStorage.ApplySnapshot(rd.Snapshot)
					if r.lg != nil {
						r.lg.Info("applied incoming Raft snapshot", zap.Uint64("snapshot-index", rd.Snapshot.Metadata.Index))
					} else {
						plog.Infof("raft applied incoming snapshot at index %d", rd.Snapshot.Metadata.Index)
					}

					// 清理多余的snapshot
					if err := r.storage.Release(rd.Snapshot); err != nil {
						if r.lg != nil {
							r.lg.Fatal("failed to release Raft wal", zap.Error(err))
						} else {
							plog.Fatalf("failed to release Raft wal %v", err)
						}
					}
				}

				// 将日志Entry保存到Storage中，对应图【图解apply流程】流程的8.1
				r.raftStorage.Append(rd.Entries)

				// 等待ApplyAll流程结束
				// （1）Follower及Candidate除了通知ApplyAll raft-log、hardState、snapshot全部落盘
				//     在有EntryConfChange配置的时候需要等待Apply流程结束
				//     主要是防止存在节点被剔除，导致竞选这类影响集群稳定的消息发送到即将被剔除的节点中，
				//     被剔除的节点在apply之后，会清理掉transport中的peer，因此，消息就不会被发送到类似的节点中
				// （2）Leader节点只需要通知ApplyAll raft-log、hardState、snapshot全部落盘
				if !islead {
					// finish processing incoming messages before we signal raftdone chan
					msgs := r.processMessages(rd.Messages)

					// now unblocks 'applyAll' that waits on Raft log disk writes before triggering snapshots
					notifyc <- struct{}{}

					// Candidate or follower needs to wait for all pending configuration
					// changes to be applied before sending messages.
					// Otherwise we might incorrectly count votes (e.g. votes from removed members).
					// Also slow machine's follower raft-layer could proceed to become the leader
					// on its own single-node cluster, before apply-layer applies the config change.
					// We simply wait for ALL pending entries to be applied for now.
					// We might improve this later on if it causes unnecessary long blocking issues.
					waitApply := false
					for _, ent := range rd.CommittedEntries {
						if ent.Type == raftpb.EntryConfChange {
							waitApply = true
							break
						}
					}
					if waitApply {
						// blocks until 'applyAll' calls 'applyWait.Trigger'
						// to be in sync with scheduled config-change job
						// (assume notifyc has cap of 1)
						select {
						case notifyc <- struct{}{}:
						case <-r.stopped:
							return
						}
					}

					r.transport.Send(msgs)
				} else {
					// leader already processed 'MsgSnap' and signaled
					notifyc <- struct{}{}
				}

				// 通知raft状态机：Ready对象已经处理完成
				r.Advance()
			case <-r.stopped:
				return
			}
		}
	}()
}

Apply日志Entry

对Ready进行处理协程主要是与etcdserver的ApplyAll流程交互，并通过notify-channel来进行进度协同，下面分析下ApplyAll流程，逻辑讲解全部加到代码的注释里面，具体如下：

func (s *EtcdServer) applyAll(ep *etcdProgress, apply *apply) {
	// 如果存在snapshot，将snapshot存储到store中，对应图【图解apply流程】7.1
	s.applySnapshot(ep, apply)
	// 如果存在Entries，将Entries存储到store中，对应图【图解apply流程】7.2
	s.applyEntries(ep, apply)

	proposalsApplied.Set(float64(ep.appliedi))
	s.applyWait.Trigger(ep.appliedi)

	// 等待ready处理流程将snapshot、HardState、raft-log持久化到磁盘
	// wait for the raft routine to finish the disk writes before triggering a
	// snapshot. or applied index might be greater than the last index in raft
	// storage, since the raft routine might be slower than apply routine.
	<-apply.notifyc

	// 由于有新的日志被Apply，因此需要判断下是否满足了重新做一次snapshot的条件
	// 如果满足snapshot的创建条件，就新建一个snapshot
	s.triggerSnapshot(ep)
  // other code ...
}

func (s *EtcdServer) applySnapshot(ep *etcdProgress, apply *apply) {
	if raft.IsEmptySnap(apply.snapshot) {
		return
	}
	
	// 判断被刺apply的snapshot是否合法
	if apply.snapshot.Metadata.Index <= ep.appliedi {
		if lg != nil {
			lg.Panic(
				"unexpected leader snapshot from outdated index",
				zap.Uint64("current-snapshot-index", ep.snapi),
				zap.Uint64("current-applied-index", ep.appliedi),
				zap.Uint64("incoming-leader-snapshot-index", apply.snapshot.Metadata.Index),
				zap.Uint64("incoming-leader-snapshot-term", apply.snapshot.Metadata.Term),
			)
		} else {
			plog.Panicf("snapshot index [%d] should > appliedi[%d] + 1",
				apply.snapshot.Metadata.Index, ep.appliedi)
		}
	}

	// 等待snapshot在Ready处理协程中被持久化到磁盘，因为持久化snapshot主要是借助于snapshotter
	// 下面openSnapshotBackend也是借助与snapshotter，所以在调用openSnapshotBackend之前必须保证snapshot保存到了snapshotter
	// wait for raftNode to persist snapshot onto the disk
	<-apply.notifyc

	newbe, err := openSnapshotBackend(s.Cfg, s.snapshotter, apply.snapshot)
	if err != nil {
		if lg != nil {
			lg.Panic("failed to open snapshot backend", zap.Error(err))
		} else {
			plog.Panic(err)
		}
	}

         // （1）从newbe(backend.Backend)中恢复lessor相关信息

         // （2）从newbe(backend.Backend)恢复当前store的索引及数据信息
	if err := s.kv.Restore(newbe); err != nil {
		if lg != nil {
			lg.Panic("failed to restore mvcc store", zap.Error(err))
		} else {
			plog.Panicf("restore KV error: %v", err)
		}
	}

	// 设置ConsistentWatchableKV最后一条日志Entry的index
	s.consistIndex.setConsistentIndex(s.kv.ConsistentIndex())
	if lg != nil {
		lg.Info("restored mvcc store")
	} else {
		plog.Info("finished restoring mvcc store")
	}

	// 用新建的backend恢复etcd环境运行的其他信息，如：store鉴权、集群节点信息、重建transport等

	ep.appliedt = apply.snapshot.Metadata.Term
	ep.appliedi = apply.snapshot.Metadata.Index
	ep.snapi = ep.appliedi
	ep.confState = apply.snapshot.Metadata.ConfState
}

func (s *EtcdServer) applyEntries(ep *etcdProgress, apply *apply) {
	if len(apply.entries) == 0 {
		return
	}

	// 检查本次要Apply的Entry是否合法
	firsti := apply.entries[0].Index
	if firsti > ep.appliedi+1 {
		if lg := s.getLogger(); lg != nil {
			lg.Panic(
				"unexpected committed entry index",
				zap.Uint64("current-applied-index", ep.appliedi),
				zap.Uint64("first-committed-entry-index", firsti),
			)
		} else {
			plog.Panicf("first index of committed entry[%d] should <= appliedi[%d] + 1", firsti, ep.appliedi)
		}
	}

	// 从即将apply的日志中剔除已经apply过的
	var ents []raftpb.Entry
	if ep.appliedi+1-firsti < uint64(len(apply.entries)) {
		ents = apply.entries[ep.appliedi+1-firsti:]
	}
	if len(ents) == 0 {
		return
	}

	// 调用apply函数执行真正的apply日志Entry的操作
	var shouldstop bool
	if ep.appliedt, ep.appliedi, shouldstop = s.apply(ents, &ep.confState); shouldstop {
		go s.stopWithDelay(10*100*time.Millisecond, fmt.Errorf("the member has been permanently removed from the cluster"))
	}
}

// 遍历所有的Entry，根据Entry类型执行Apply操作
func (s *EtcdServer) apply(
	es []raftpb.Entry,
	confState *raftpb.ConfState,
) (appliedt uint64, appliedi uint64, shouldStop bool) {
	for i := range es {
		e := es[i]
		switch e.Type {
		case raftpb.EntryNormal:
			s.applyEntryNormal(&e)
			s.setAppliedIndex(e.Index)
			s.setTerm(e.Term)

		case raftpb.EntryConfChange:
			// set the consistent index of current executing entry
			if e.Index > s.consistIndex.ConsistentIndex() {
				s.consistIndex.setConsistentIndex(e.Index)
			}
			var cc raftpb.ConfChange
			pbutil.MustUnmarshal(&cc, e.Data)
			removedSelf, err := s.applyConfChange(cc, confState)
			s.setAppliedIndex(e.Index)
			s.setTerm(e.Term)
			shouldStop = shouldStop || removedSelf
			s.w.Trigger(cc.ID, &confChangeResponse{s.cluster.Members(), err})

		default:
			if lg := s.getLogger(); lg != nil {
				lg.Panic(
					"unknown entry type; must be either EntryNormal or EntryConfChange",
					zap.String("type", e.Type.String()),
				)
			} else {
				plog.Panicf("entry type should be either EntryNormal or EntryConfChange")
			}
		}
		appliedi, appliedt = e.Index, e.Term
	}
	return appliedt, appliedi, shouldStop
}

执行真正的apply 操作时，不同的Entry类型处理方式是不一样的，这里Entry类型主要有两种：读写事务Entry、_raft配置变更Entry，_本节先不介绍raft配置变更，下面主要介绍普通日志Entry的Apply流程：

Apply读写事务Entry

普通的事务Entry主要是key-value的读写操作，因此主要是和mvcc存储/key-value存储组件打交互，关键代码的讲解都在注释中：

func (s *EtcdServer) applyEntryNormal(e *raftpb.Entry) {
        // etcd v3中主要是设计两个版本的存储组件：
	// （1）v2版本的存储组件，只是简单的key-value存储，日志Entry重复Apply不影响store一致性
	// （2）v3版本的存储组件，支持事务的mvcc存储组件，日志Entry重复Apply影响store一致性
	
	// 在介绍mvcc模块时，构建mvcc对象是会传入一个ConsistentIndexGetter
	// etcdserver在初始化mvcc组件时会将s.consistIndex传入到mvcc对象的构造参数中
	// 这样s.consistIndex.setConsistentIndex(e.Index)可以设置mvcc事务End函数里面获取当前事务关联的Entry Index
	shouldApplyV3 := false
	if e.Index > s.consistIndex.ConsistentIndex() {
		// set the consistent index of current executing entry
		s.consistIndex.setConsistentIndex(e.Index)
		shouldApplyV3 = true
	}

        // other code ...

	// 将数据存储到v2的存储中
	var raftReq pb.InternalRaftRequest
	if !pbutil.MaybeUnmarshal(&raftReq, e.Data) { // backward compatible
		var r pb.Request
		rp := &r
		pbutil.MustUnmarshal(rp, e.Data)
		s.w.Trigger(r.ID, s.applyV2Request((*RequestV2)(rp)))
		return
	}
	if raftReq.V2 != nil {
		req := (*RequestV2)(raftReq.V2)
		s.w.Trigger(req.ID, s.applyV2Request(req))
		return
	}

	// 如果日志Entry没有Apllied到V3版本的存储的，使用Applyer v3执行Entry对应的事务
	// Entry对应的事务结束后，在End函数中通过ConsistentIndexGetter获取前面(s.consistIndex.setConsistentIndex(e.Index))设置的当前日志Index
	// 将日志Index和事务变更一起存储到Store中，并在合适的时机持久化到boltdb
	if !shouldApplyV3 {
		return
	}

	id := raftReq.ID
	if id == 0 {
		id = raftReq.Header.ID
	}

	var ar *applyResult
	needResult := s.w.IsRegistered(id)
	if needResult || !noSideEffect(&raftReq) {
		if !needResult && raftReq.Txn != nil {
			removeNeedlessRangeReqs(raftReq.Txn)
		}
		ar = s.applyV3.Apply(&raftReq)
	}

        // other code ...
}

重启数据恢复

上面几大段主要是描述了，咱们暂时忽略了raft配置变更日志Entry的Apply，因为这块后期会有专门的文章去讲述，这里我们主要关注下普通日志Entry的重启恢复，不过在分析etcd重启流程没必要过于关注时普通日志Entry还是配置变更Entry，因为整个流程时相对比较通用的，你大可以认为etcd在历史运行过程中还没有发生过日志变更。

上面描述我们看到数据持久化的地方有三部分，按照持久化的顺序分别是：sanpshot、WAL，事务数据持久化：

snapshot：持久化主要是持久化了完整的一份snapshot数据。

WAL：持久化了日志entry、raft HardState、snapshot元数据等等。

事务数据持久化：参考前面mvcc组件的put/range/del等操作，持久化的用户业务数据及事务日志Entry等。

etcdserver重启的场景也比较多，这里主要是关注在有WAL正常重启的过程，该过程主要流程在如下文件中：

https://github.com/etcd-io/etcd/blob/v3.4.9/etcdserver/server.go#L410

首先是判断是否存在一份可用的snapshot，如果存在snapshot就创建一个snapshot对象：

// 首先检查WAL文件中，WAL Entry的可用性，从中取出snapshot元数据
walSnaps, serr := wal.ValidSnapshotEntries(cfg.Logger, cfg.WALDir())
if serr != nil {
	return nil, serr
}

// 通过snapshot元数据，找到最新的可用的snapshot数据，构建sanpshot对象
snapshot, err = ss.LoadNewestAvailable(walSnaps)
if err != nil && err != snap.ErrNoSnapshot {
	return nil, err
}

// 由于WAL是线性写入的，后写入的Entry最新，因此从snap entry中找到最后匹配元数据的snapshot，即为NewestAvailable对象
func (s *Snapshotter) LoadNewestAvailable(walSnaps []walpb.Snapshot) (*raftpb.Snapshot, error) {
	return s.loadMatching(func(snapshot *raftpb.Snapshot) bool {
		m := snapshot.Metadata
		for i := len(walSnaps) - 1; i >= 0; i-- {
			if m.Term == walSnaps[i].Term && m.Index == walSnaps[i].Index {
				return true
			}
		}
		return false
	})
}

有了snapshot对象，便可以使用其去构建存储对象：

// 使用snapshot恢复v2版本的k-v存储和v3版本的backend对象
if snapshot != nil {
	if err = st.Recovery(snapshot.Data); err != nil {
		if cfg.Logger != nil {
			cfg.Logger.Panic("failed to recover from snapshot")
		} else {
			plog.Panicf("recovered store from snapshot error: %v", err)
		}
	}
  
	// other code ...

	if be, err = recoverSnapshotBackend(cfg, be, *snapshot); err != nil {
		if cfg.Logger != nil {
			cfg.Logger.Panic("failed to recover v3 backend from snapshot", zap.Error(err))
		} else {
			plog.Panicf("recovering backend from snapshot error: %v", err)
		}
	}
	
	// other code ...
}

构造完snapshot进入启动raft node节点流程的详细讲解在如下代码逻辑中：

func restartNode(cfg ServerConfig, snapshot *raftpb.Snapshot) (types.ID, *membership.RaftCluster, raft.Node, *raft.MemoryStorage, *wal.WAL) {
	// 取出snapshot里面的元数据数据
	var walsnap walpb.Snapshot
	if snapshot != nil {
		walsnap.Index, walsnap.Term = snapshot.Metadata.Index, snapshot.Metadata.Term
	}

	// 借助于walsnap元数据找到合适的WAL文件，读取WAL文件，返回节点集群信息、HardState、日志Entry等等
	w, id, cid, st, ents := readWAL(cfg.Logger, cfg.WALDir(), walsnap)

        // 由于主要是以raft节点正常重启流程作为叙述的，不考虑raft重启后集群被新建的问题
	// 设置raft对象所在集群ID
	cl := membership.NewCluster(cfg.Logger, "")
	cl.SetID(id, cid)

	// 将snapshot存储到storage中
	s := raft.NewMemoryStorage()
	if snapshot != nil {
		s.ApplySnapshot(*snapshot)
	}
	// 将HardState/日志Enties存储到Storage中
	s.SetHardState(st)
	s.Append(ents)

	// 构建raft状态机对象，此时raft对象能够看到如下内容：
	// (1) 日志Entries
        // (2) 那些日志被Committed了
	// 注意：截止到目前etcd重启前的数据基本被加载完毕了，可以分析下etcd突然宕机可能出现的两种数据不一致的场景
	//      (1)Entry被重复Applied：截止到目前，raft并不知道那些日志被Applied了，虽然raft.Config里面有Applied uint64字段可以告诉raft已经applied的日志index
	//      但是raft.Config.Applied字段此时为0，因此raft杂构建Ready对象时会把介于raftlog中满足(max(applied, firstIndex), committed]Entry全部返回，
	//      因此，如果Storage的日志Entries存在了已经Applied的日志Entry，可能会被重新打包到Ready对象中
	//      但是在Apply的时候，该日志并不会被真正Apply，而是会被过滤掉，因此Store提供接口可以获取到最后一次Applied的日志Entry index
	//      (2)被Applid的Entry为持久化时突然宕机：只要合理的清理WAL日志，保证WAL日志Entry和已经被Apply的有交集或连续，就不会导致etcd在宕机时丢失事务Entry，也不会出现日志Entry被重复Apply的情况发生.
	//      WAL中清理wal文件实在创建snapshot之后清理的，并且创建snapshot的日志Entry都是被apply的，并且WAL清理日志Entry的过程也比较保守，会多保留一个WAL文件，
	//      即使已经Apply的日志在boltdb tx batch buffer中还未持久化，在创建snapshot之前etcd宕机，但这些日志WAL也已经持久化了，并且记录是已经committed了，
	//      这些applied的日志，未落盘时，会被etcd raft状态机重新提交，根本不会出现丢失日志Entry的情况
	c := &raft.Config{
		ID:              uint64(id),
		ElectionTick:    cfg.ElectionTicks,
		HeartbeatTick:   1,
		Storage:         s,
		MaxSizePerMsg:   maxSizePerMsg,
		MaxInflightMsgs: maxInflightMsgs,
		CheckQuorum:     true,
		PreVote:         cfg.PreVote,
	}
	
	// other code ...

	// 启动raft对象
	n := raft.RestartNode(c)
	raftStatusMu.Lock()
	raftStatus = n.Status
	raftStatusMu.Unlock()
	return id, cl, n, s, w
}

截止到目前etcd重启前的数据基本被加载完毕了，可以开始分析下etcd宕机时，两种数据不一致的场景：
(1)Entry被重复Applied：构建raft对象时，raft.Config里面有Applied字段没有被利用起来，显然raft对象并不知道那些日志被Applied了，因此raft杂构建Ready对象时会把介于raftlog中满足(max(applied, firstIndex), committed]Entry全部返回用于本次applied，如果Storage的日志Entries存在了已经Applied的日志Entry，可能会被重新打包到Ready对象中，但在Apply的时候，该日志并不会被真正Apply，而是会被过滤掉，因此Store提供接口可以获取到最后一次Applied的日志Entry index。
(2)被Applid的Entry未持久化时突然宕机：只要合理的清理WAL日志，保证WAL日志Entry与已经Applied的Entry有交集或连续，就不会导致etcd在宕机时丢失事务Entry，也不会出现日志Entry被重复Apply的情况发生。etcd的snapshot及WAL机制，确实会保证这一点，WAL在清理比较旧的wal文件是在创建snapshot之后清理的，并且只会清理打包到snapshot里面的日志Entry，创建snapshot的日志Entry都是被apply的，并且WAL清理日志Entry的也比较保守，会多保留一个WAL文件，即使已经Apply的日志在boltdb tx batch buffer中，还未持久化，在创建snapshot之前etcd宕机，这些日志在WAL也已经持久化了，且committed也持久化了，etcd重启后会被etcd raft状态机重新提交，根本不会出现丢失日志Entry的情况。