并行系统Parallelism System(五)并行系统Parallelism System(五) VMware FT

并行系统Parallelism System(五)

VMware FT

什么是传统虚拟机：

一台物理计算机（宿主机，Host Machine）上，通过一个特殊的软件层，叫做虚拟机监控程序 (Hypervisor) 或虚拟机管理器 (VMM)，来创建和运行多个虚拟的计算机环境（客户机，Guest Machine / Virtual Machine, VM）
1. 虚拟CPU(VCPU)
2. Virtual memory
3. virtual Disk
4. virtual network interface card
5. virtual BIOS

**虚拟机的核心的原理：**Hypervisor

Hypervisor is the key to virtualization, It is located between physical hardware and virtual machines;

硬件虚拟化： 模拟出虚拟硬件设备，使得每个客户操作系统认为自己独占了硬件资源
资源管理与调度： 将物理机的实际CPU时间、内存空间、磁盘I/O、网络带宽等资源分配给各个虚拟机，并进行调度，确保它们能够并行运行，且互不干扰
Instruction interception and conversion：当客户操作系统执行一些敏感指令（如直接访问硬件的指令）时，Hypervisor会截获这些指令，然后进行相应的处理或转换，以确保虚拟机的隔离性和安全性，并最终将操作映射到物理硬件上

传统虚拟机在容错方面的问题在哪？

当底层物理硬件（宿主机）发生故障时，如何保证其上运行的应用程序能够不中断或以最小中断时间继续运行

单点故障(Single Point of Failure)
1. 一个物理服务器（宿主机）上运行的所有虚拟机，都依赖于这台物理服务器的正常运行
2. If the host machine experiences hardware failures(such as CPU failure, memroy failure, motherboard failure, power failure, network connection interruption, etc)all virtual machines running on it will crash or stop running accordingly
数据丢失和业务中断
1. 当宿主机故障时，正在虚拟机内存中处理的数据可能会丢失，除非应用程序本身有非常完善的持久化机制
2. 业务服务将立即中断，直到：虚拟机被手动或通过其他高可用性软件在另一台健康的物理机上重启
无法实现无缝故障转移 Seamless failover cannot be achieved
1. 传统虚拟机的高可用性方案通常依赖于重启
2. 重启”意味着虚拟机内部的操作系统和应用程序需要从头开始启动，这必然会导致服务中断时间，并且可能丢失内存中的瞬时状态

In summary, traditional vitrual machines solve theproblems of resource isolation and utilization efficiency, but do not solve the problem of business interruption caused by physical machine hardware failure. They require additional, more complex mechanisms(such as VMware FT discussed in this article) to provide true "seamless" fault tolerance.

核心解决方案和方法 Its core solutions and methods：

基于确定性重放的主/备虚拟机复制
1. The thesis introuces the concepts of a Primary virtual machine and a Backup virtual machine
  
  主虚拟机负责处理所有客户端请求和实际执行任务，而备份虚拟机则在另一台独立的物理服务器上运行，并实时同步主虚拟机的执行状态
2. 确定性重放 (Deterministic Replay)： 这是实现无缝容错的关键技术
  
  All non-deterministic events on the main virtual machine(such as interrupts, user inputs, hardware interaction, timer reads, etc.) are precisely recorded through a low-bandwidth log channel, and the input backup virtual machine precisely replays all operations of the main virtual machine based on these logs
解决传统虚拟机问题的具体方式：
1. 解决单点故障 (Single Point of Failure)：By running the primary and backup virtual machines simultaneously on differernt physical serves, VMware FT eliminates the reliance on a single physical server
2. 实现无缝故障转移和消除业务中断 Achieve Seamless Failover and eliminate bussiness disrutions
  1. 由于主备虚拟机时刻保持同步，并且备份虚拟机已经处于“运行”状态，当主虚拟机发生故障时，Hypervisor 会立即将客户端流量从主虚拟机透明地切换到备份虚拟机
  2. 通过精心的设计和优化（例如，优化非确定性事件的捕获和重放机制），VMware FT 能够将对实际应用程序的性能影响通常降低到 10% 以下

确定性重放（Deterministic Replay）

简单来说，确定性重放是一种技术，它能够记录一个计算过程的执行轨迹，并在未来精确地重现这个轨迹。对于虚拟机而言，这意味着一个虚拟机在第一次运行时，其所有输入和会导致执行路径变化的非确定性事件都会被记录下来，形成一个“日志”。然后，通过重放这个日志，另一个（或同一个）虚拟机可以在不同时间或不同机器上，以完全相同的状态，完全相同的顺序，执行完全相同的指令流

为什么需要确定性重放？ Why is deterministic replay required?

CPU和内存的操作是高度确定性的，即给定相同的初始状态和相同的输入，它们会产生相同的输出

The operations of CPU and memory are highly deterministic, given the same initial state and the same input, they produce the same output

外部输入external input：网络数据包的到达时间，磁盘IO的完成时间，用户键盘鼠标输入、宿主机系统调用结果(如读取时间戳)， These inputs affect the state inside the VM
并发和时序：
1. 中断： 硬件中断（如定时器中断、网络I/O完成中断）可以在任何指令间隙发生，其精确时机通常是非确定性的
2. 多处理器/多核竞争： 如果VM是多处理器的，不同虚拟CPU之间的指令交错、共享内存的访问顺序、锁的竞争结果等，都可能导致非确定性
其他非确定性指令Other non-deterministic instructions：某些特殊的CPU指令（如读取 CPU 的时钟周期计数器 RDTSC）本身就会产生非确定性的结果。

输出要求 (Output Requirement) 和 输出规则 (Output Rule)

输出要求Output Requirement
1. 定义： “如果备份虚拟机在主虚拟机故障后接管，它必须以与主虚拟机已发送到外部世界的所有输出完全一致的方式继续执行“
2. 目的：确保在发生故障转移时，任何已提交到外部世界的可见状态都不会丢失或不一致。无论客户端连接到主VM还是故障转移后的备VM，都应该看到一致的、连贯的服务状态，仿佛没有发生过故障一样
输出规则 Output Rule
1. 主虚拟机不允许向外部世界发送输出，除非备份虚拟机已经接收并确认了与该输出操作相关的日志条目
  
  The primary virtual machine does not allowed to sending of output to the outside world unless the backup virtual machine has received and confirmed