使用MPI模拟Chandy-Lamport快照算法分布式快照算法应用到流式系统中就是确定一个 Global 的 Snap

开启掘金成长之旅！这是我参与「掘金日新计划 · 12 月更文挑战」的第16天，点击查看活动详情

分布式快照算法应用到流式系统中就是确定一个 Global 的 Snapshot，错误处理的时候各个节点根据上一次的 Global Snapshot 来恢复，在系统做 Failure Recovery 的时候非常有用，它更多是一种容错处理。

分布式系统可以先简单由进程和进程之间的channel组成 。分布式系统可以被看作有向图，进程是节点，而进程之间的有向边是chanel，每个节点的入边和出边对应了input channel和 output channel。总之，一个分布式系统的全局状态可以由进程的状态和 channel 中的 message 组成，这正是distributed snapshot需要记录的。

算法假设：

Channel是无限大的FIFO队列

channel收到的message是有序且无重复的

算法目标： 通过记录每个进程的local state和它的input channel中有序的message，也就是说局部快照，全局快照可以通过所有进程的局部快照合并得到。

整体流程：

Initiating a snapshot: 可以由系统中的任意一个进程发起
    
Propagating a snapshot: 系统中其他进程开始逐个创建 snapshot 的过程
    
Terminating a snapshot: 算法结束条件

具体流程： Initiating a snapshot

1. 假设P\i发起，它记录自己的进程状态，生成一个marker，这个marker和channel的message不同
    
2. 发送marker通过output channel发送给其它进程
    
3. 记录所有input  channel收到的信息

Propagating a snapshot 对于进程 Pj 从 input channel Ckj 接收到 marker 信息:

如果 Pj 还没有记录自己的进程状态，则：
    
-   Pj 记录自己的进程状态，同时将 channel Ckj 置为空
    
-   向 output channel 发送 marker 信息
否则：
-   记录在收到marker之前的channel中收到的所有message
	(也就是把channel记录为保留pj状态以来从这个channel接收到的信息集合)

因此这里的 marker 其实是充当一个分隔符，分隔进程做 local snapshot （记录进程状态）的 message。比如 Pj 做完 local snapshot 之后 Ckj 中发送过来的 message 为 [a,b,c,marker,x,y,z] 那么 a, b, c 就是进程 Pk 做 local snapshot 前的数据，Pj 对于这部分数据需要记录下来，比如记录在 log 里面。而 marker 后面 message 正常处理掉就可以了。

Terminating a snapshot 所有的进程都收到 marker 信息并且记录下自己的状态和 channel 的状态（包含的 message）

下面用MPI来模拟Chandy-lamport快照算法(该算法的图解分析可以参照博文：blog.csdn.net/justlpf/art… 以及developer.aliyun.com/article/688…)

任务要求：

实现全局状态的快照算法，并监控下列程序：两个进程P和Q用两个通道连成一个环，它们不断地轮转消息m。在任何一个时刻，系统中仅有一份m的拷贝。每个进程的状态是指由它接收到m的次数。P首先发送m。在某一点，P得到消息且它的状态是101。在发送m之后，P启动快照算法，要求记录由快照算法报告的全局状态。

实现代码：

#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <mpi.h>


typedef struct{
    char msg[50];
    int marker;
    int count;
} Message;



int main(int argc, char** argv) {
    MPI_Init(&argc, &argv);
    int world_rank, world_size;

    MPI_Comm_size(MPI_COMM_WORLD, &world_size);
    MPI_Comm_rank(MPI_COMM_WORLD, &world_rank);

    
    int limit = 110;
    int local_count = 0;
    int is_snap = 0;
    int is_marked = 0;
    Message message;
    message.count  = 0;
    message.marker = 0;
    int act = world_size;
    Message *msg_recv = (Message*)malloc(sizeof(Message)*10);   //假设每个进程本地只能保留10个消息，用于快照恢复
    int msg_recv_count_after_marker = 0;
    
    
    while (message.count < limit) {
        if(message.count % 2 == world_rank) {
            message.count++;
            if(message.count == 103 || message.marker == 1) {
                //要么主动local snapshot，要么接收到marker再snapshot，snapshot后发送marker
                //假设在这时候有一个进程做了local snapshot并发送了一个marker,由于是轮转发送，所以snap shot只能等到轮到自己发送是才能做
                //这时仍然能够发送消息，此时要将marker添加到消息中
                act --;
                //printf("act of %d is %d at send of clock %d\n", world_rank, act, message.count);
                is_snap = 1;
                //报告本地状态信息，并将后面到来的消息都保存下来
                printf("Process %d has local status %d\n", world_rank, local_count);
                
            
                printf("%d Begin Snapshot!!! at clock %d\n", world_rank, message.count);
                //设置消息
                message.marker = 1;             //如果是0则表示不是marker，逻辑上在消息上添加了marker   
                strcpy(message.msg, "HELLO with MARKER!!!");

                //发送消息
                MPI_Send(&message, sizeof(message), MPI_BYTE, (world_rank + 1) % world_size, 0, MPI_COMM_WORLD);
                printf("Process %d send %s to process %d at clock %d\n", world_rank, message.msg, (world_rank + 1) % world_size, message.count);
                //destory the message
                memset(message.msg, '$', sizeof(message.msg));
                printf("Process %d destroyed the message at clock %d\n", world_rank, message.count);
                
                

                message.marker = 0;     //恢复marker
            }

            else {
                strcpy(message.msg, "HELLO");
                //正常消息通信，不带有marker，发送marker后也能正常通信
                MPI_Send(&message, sizeof(message), MPI_BYTE, (world_rank + 1) % 2, 0, MPI_COMM_WORLD);
                printf("Process %d sent %s to %d at clock %d\n", world_rank, message.msg, (world_rank + 1) % 2, message.count); 

                //destory the message
                memset(message.msg, '$', sizeof(message.msg));
                printf("Process %d destroyed the message at clock %d\n", world_rank, message.count);
            }

        }
        else {
            MPI_Recv(&message, sizeof(message), MPI_BYTE, (world_rank + 1) % 2, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
            
            printf("Process %d received %s from process %d at clock %d\n", world_rank, message.msg, (world_rank + 1) % 2, message.count);

            if(message.marker == 1) act --;                             
            // 若果进程是发送完marker后，则接受的信息不会被处理，只是将其保存到本地
            if(is_snap == 1) {
                
                //保存消息
                //printf("sssssssssssss at %d\n", world_rank);
                memcpy(msg_recv + msg_recv_count_after_marker, &message, sizeof(Message));
                msg_recv_count_after_marker++;
            }
            //printf("act of %d is %d at recv of clock %d\n", world_rank, act, message.count);
            if(act == 0) {
                    //说明对面也完成了自己的snapshot并发送了marker
                    //报告状态信息
                    printf("%d end snap shot at clock %d !!!\n", world_rank, message.count);
                    printf("---------------P_%d log----------------\n", world_rank);
                    for(int i=0; i<msg_recv_count_after_marker; i++)
                        printf("message %s with clock %d marker %d\n", msg_recv[i].msg, msg_recv[i].count, msg_recv[i].marker);   
                    printf("---------------P_%d log----------------\n", world_rank); 
                    message.marker = 0;     //恢复marker，不能在这一轮恢复，只能在下一轮恢复
                    is_snap = 0;
                    act = world_size;
                }
            
            local_count++;    
        }
    }
    MPI_Finalize();
    return 0;
}

下面是部分运行结果：

Process 0 destroyed the message at clock 101
Process 0 received HELLO from process 1 at clock 102
Process 0 has local status 51
0 Begin Snapshot!!! at clock 103
Process 0 send HELLO with MARKER!!! to process 1 at clock 103
Process 1 received HELLO with MARKER!!! from process 0 at clock 103
Process 1 has local status 52
1 Begin Snapshot!!! at clock 104
Process 1 send HELLO with MARKER!!! to process 0 at clock 104
Process 1 destroyed the message at clock 104
Process 0 destroyed the message at clock 103
Process 0 received HELLO with MARKER!!! from process 1 at clock 104
0 end snap shot at clock 104 !!!
---------------P_0 log----------------
message HELLO with MARKER!!! with clock 104 marker 1
---------------P_0 log----------------
Process 0 sent HELLO to 1 at clock 105
Process 1 received HELLO from process 0 at clock 105
Process 0 destroyed the message at clock 105
1 end snap shot at clock 105 !!!
---------------P_1 log----------------
message HELLO with clock 105 marker 0
---------------P_1 log----------------
Process 1 sent HELLO to 0 at clock 106
Process 1 destroyed the message at clock 106