Algorithm

Leetcode 478: Generate Random Point in a Circle

medium.com/@dreamume/l…

Review

book.mixu.net/distsys/tim…

时间和顺序

设想传统的模型，一个程序在一个CPU上运行，程序逻辑顺序执行。而对于分布式系统，顺序就很重要了，因为程序通过多台系统完成，不能保证顺序正如你所愿。

完全有序和部分有序

分布式系统中每一个节点，其执行的程序可以看作本地序，因此分布式系统自然就是部分有序的。

部分有序可以看作为弱的完全序。
时间

时间可以用来作为一种顺序、时间段或一种解释。

顺序，指：
- 我们可以给无序的事件加上时间戳，使之有序
- 使用时间戳强制操作有序或转发消息（例如延迟失序的操作）
- 利用时间戳确认事件先后顺序
时间段：指依据时间间隔做一些判断操作解释：时间戳所代表的时间

全局时钟是假设存在一个完全精确的时钟。

反之则每个节点有自己的时钟，但无法用于比较不同节点的时间戳。

全局时钟使不同节点间的操作有序化，否则需要额外的节点间通讯来确保顺序。

同时时间可以定义算法边界条件，区分高延迟、服务器或网络不可达的情况。
矢量时钟

假设我们无法获取精确的时钟同步，我们可以用Lamport时钟和矢量时钟，通过计数器和节点间通讯来决定事件顺序。

Lamport时钟：
- 每个进程中包含一个计数器，每次递增
- 进程发送的信息中包含该计数器值
- 接收到信息时，设置计数器值为max(local_counter, received_counter) + 1
Lamport时钟是部分序，如果 timestamp(a) < timestamp(b)：
- a可能发生在b之前
- 或a跟b不兼容
如果a和b有因果依赖关系，a和b都是同一个进程生产或b是消息a的response，则能确定a发生在b之前。

由于Lamport只能记录了一个时间线，需要避免没有因果关系的事件被排序。

矢量时钟是Lamport时钟的扩展，其维护了一个带有N个逻辑时钟的数组，每个逻辑时钟代表一个节点。

其更新规则为：
- 进程每次递增矢量里对应节点的时钟
- 进程发送消息时，包含完整的矢量时钟
- 接收消息时：
  - 更新每个时钟：max(local, received)
  - 递增对应节点里的矢量时钟

故障检测（截止时间）

利用等待花费时间可以区分分区还是高延迟。超过一定时间可以认为是已分区。故障检测就是利用这个特点。

故障检测实现心跳消息和Timer，进程间交换心跳信息。消息的响应如未在一定时间内收到，则认为进程异常。

这个时间值过短则过于激进，长则保守。这个时间值如何确定？一般有两个指标衡量：完整性和精确性。
- 强完整性
  
  每个崩溃过的进程最终都被正常的进程检测为不可靠
- 弱完整性
  
  每个崩溃过的进程最终都被部分正常的进程检测为不可靠
- 强精确性
  
  没有正常的进程被检测为不可靠
- 弱精确性
  
  部分正常的进程没有被检测为不可靠
实际环境中检测完整性比精确性要容易一些。因为当消息延迟时不容易确定是延迟导致还是故障导致。
时间、顺序和性能

分场景考虑时间、顺序，同时关注性能：

分布式系统天然就是一个部分有序的系统，可以改造成完全序，但成本较高，所有处理需要在一个统一的最低速度下运行，通常最简单的方式是所有事件转发需通过同一个节点（造成瓶颈）。

一些场合下，需要操作始终保持一致性状态，则使用同步机制。

如一些场景对结果正确性要求不那么绝对的话，可使用弱同步。

或者可考虑结果为最大估值（结果基于部分信息，而不是完整信息），比如网络被分区等。

Tips

如果直接处理数据没有什么好的思路，可以考虑从数据的取值范围入手
添加一个字符到字符串的末尾，在leetcode上发现使用+=比push_back效率高很多，但在Mac系统下测试两者基本无区别，本地Mac系统下测试+=内部也是调用push_back

无锁编程

多线程编程中，一般会使用锁来保护共享资源。但不正确的使用锁会导致死锁问题，锁的范围粒度过大导致性能下降

无锁编程可以避免这些问题，无锁编程的优点是：

最大程度地实现并行
代码有强鲁棒性，不会引起死锁

缺点是：

代码复杂
可能导致出现活锁
由于代码复杂，虽然线程的并行性提高，但可能导致整体的性能下降
无锁编程的示例代码

以下用一些例子来介绍无锁编程，我们下面会介绍一个逐步完善的无锁stack结构。在这个过程中来一步步介绍相关重要细节。

这是个最简化的版本，push函数实现了创建一个新节点，设置新节点的next指针为head，设置head指向新节点。通过compare_exchange_weak（）函数确保new_node的next指针指向head，然后设置head指针指向new_node。

    template<typename T> class lock_free_stack {
    public:
      void pop(T& result) {
            node* old_head = head.load();
            while (!head.compare_exchange_weak(old_head, old_head->next));
            result = old_head->data;
      }
    };

代码优化

上面代码虽然简洁，但是有两个问题：

head不能为空
对result进行赋值时无法处理异常情况（类型不是指针类型，导致调用拷贝构造函数可能异常）

处理办法1是对head进行空指针检查，2是使用指针类型或shared_ptr。

template<typename T> class lock_free_stack {
private:
  struct node {
        std::shared_ptr<T> data;
        node* next;

        node(T const& data_): data(std::make_shared<T>(data_)) {}
  };
  std::atomic<node *> head;
public:
  void push(T const& data) {
        node* const new_node = new node(data);
        new_node->next = head.load();
        while (!head.compare_exchange_weak(new_node->next, new_node));
  }

  std::shared_ptr<T> pop() {
        node* old_head = head.load();
        while (old_head && !head.compare_exchange_weak(old_head, old_head->next));
        return old_head ? old_head->data : std::shared_ptr<T>();
  }
};

处理内存

然而，上面例子代码会导致pop()函数有内存泄露，返回的值需要判断在无其他线程操作该对象时删除该对象。

我们需要添加一个计数来判断当前有几个线程在同时执行pop()函数，及添加删除处理的函数。

template <typename T> class lock_free_stack {
private:
  std::atomic<unsigned> threads_in_pop;
  void try_reclaim(node* old_head);
public:
  std::shared_ptr<T> pop() {
        ++threads_in_pop;
        node* old_head = head.load();
        while (old_head && !head.compare_exchange_weak(old_head, old_head->next));
        std::shared_ptr<T> res;
        if (old_head) res.swap(old_head->data);
        try_reclaim(old_head);
        return res;
  }

private:
  std::atomic<node *> to_be_deleted;

  static void delete_nodes(node* nodes) {
        while (nodes) {
          node *next = nodes->next;
          delete nodes;
          nodes = next;
        }
  }

  void try_reclaim(node* old_head) {
        if (threads_in_pop == 1) {
          node* nodes_to_delete = to_be_deleted.exchange(nullptr);
          if (!--threads_in_pop) delete_nodes(nodes_to_delete);
          else if (nodes_to_delete) chain_pending_nodes(nodes_to_delete);

          delete old_head;
        } else {
          chain_pending_node(old_head);
          --threads_in_pop;
        }
  }

  void chain_pending_nodes(node* nodes) {
        node* last = nodes;
        while (node* const next = last->next) last = next;
        chain_pending_nodes(nodes, last);
  }

  void chain_pending_nodes(node* first, node* last) {
        last->next = to_be_deleted;
        while (!to_be_deleted.compare_exchange_weak(last->next, first));
  }

  void chain_pending_node(node* n) {
        chain_pending_nodes(n, n);
  }
};

继续完善内存处理

以上代码添加了删除节点的处理，但还是不完善，当始终有多个线程在执行pop()函数时，导致节点一直没有时机进行删除，to_be_deleted列表会过长。

为了解决这个问题，我们引入一个hazard pointer结构，当删除节点时检查其他线程是否在引用该节点，如没有则删除。

std::shared_ptr<T> pop() {
  std::atomic<void *>& hp = get_hazard_pointer_for_current_thread();
  node* old_head = head.load();
  do {
        node* temp;
        do {
          temp = old_head;
          hp.store(old_head);
          old_head = head.load();
        } while (old_head != temp);
  } while (old_head && !head.compare_exchange_strong(old_head, old_head->next));
  hp.store(nullptr);
  std::shared_ptr<T> res;
  if (old_head) {
        res.swap(old_head->data);
        if (outstanding_hazard_pointers_for(old_head)) reclaim_later(old_head);
        else delete old_head;

        delete_nodes_with_no_hazards();
  }

  return res;
}

unsigned const max_hazard_pointers = 100;
struct hazard_pointer {
  std::atomic<std::thread_id> id;
  std::atomic<void *> pointer;
};
hazard_pointer hazard_pointers[max_hazard_pointers];

class hp_owner {
  hazard_pointer *hp;
public:
  hp_owner(hp_owner const&)=delete;
  hp_owner operator=(hp_owner const&)=delete;

  hp_owner(): hp(nullptr) {
        for (unsigned i = 0; i < max_hazard_pointers; ++i) {
          std::thread::id old_id;
          if (hazard_pointers[i].id.compare_exchange_strong(old_id, std::this_thread::get_id())) {
                hp = &hazard_pointers[i];
                break;
          }
        }
        if (!hp) throw std::runtime_error("No hazard pointers available");
  }

  std::atomic<void *>& get_pointer() { return hp->pointer; }
  ~hp_owner() {
        hp->pointer.store(nullptr);
        hp->id.store(std::thread::id());
  }
};

std::atomic<void *>& get_hazard_pointer_for_current_thread() {
  thread_local static hp_owner hazard;
  return hazard.get_pointer();
}

bool outstanding_hazard_pointers_for(void* p) {
  for (unsigned i = 0; i < max_hazard_pointers; ++i)
        if (hazard_pointers[i].pointer.load() == p) return true;

  return false;
}

// A simple implementation of the reclaim functions
template<typename T> void do_delete(void *p) {
  delete static_cast<T *>(p);
}

struct data_to_reclaim {
  void *data;
  std::function<void(void *)> deleter;
  data_to_reclaim* next;

  template<typename T> data_to_reclaim(T* p): 
        data(p), deleter(&do_delete<T>), next(0) {}
  ~data_to_reclaim() { deleter(data); }
};
std::atomic<data_to_reclaim *> nodes_to_reclaim;

void add_to_reclaim_list(data_to_reclaim* node) {
  node->next = nodes_to_reclaim.load();
  while (!nodes_to_reclaim.compare_exchange_weak(node->next, node));
}

template<typename T> void reclaim_later(T* data) {
  add_to_reclaim_list(new data_to_reclaim(data));
}

void delete_nodes_with_no_hazards() {
  data_to_reclaim* current = nodes_to_reclaim.exchange(nullptr);
  while (current) {
        data_to_reclaim* const next = current->next;
        if (!outstanding_hazard_pointers_for(current->data)) delete current;
        else add_to_reclaim_list(current);

        current = next;
  }
}

注意以上代码中两次使用了compare_exchange_strong()函数（因compare_exchange_weak()在测试交换的值就是想要的值时仍可能失败，所以通常需要在while中循环判断。）

在代码第一次使用compare_exchange_strong()，是因为如果用compare_exchange_weak()函数误返回false会不必要的重新设置hazard pointer（hp）对象。

另一种内存处理的方案：引用计数

由于在定义hazard pointer数据时使用的是数组，数组最大个数为max_hazard_pointers，这样会导致不能有超过max_hazard_pointers个线程同时进入pop()函数。

总之，这里使用数组不是一个很好的方案。

下面我们使用一种引用计数的方案，避免了使用数组的问题，并且代码更加简洁：

template <typename T> class lock_free_stack {
private:
  struct node;

  struct counted_node_ptr {
        int external_count;
        node* ptr;
  };

  struct node {
        std::shared_ptr<T> data;
        std::atomic<int> internal_count;
        counted_node_ptr next;

        node(T const& data_): data(std::make_shared<T>(data_)), internal_count(0) {}
  };
  std::atomic<counted_node_ptr> head;

public:
  ~lock_free_stack() { while (pop()); }
  void push(T const& data) {
        counted_node_ptr new_node;
        new_node.ptr = new node(data);
        new_node.external_count = 1;
        new_node.ptr->next = head.load();
        while (!head.compare_exchange_weak(new_node.ptr->next, new_node));
  }
  std::shared_ptr<T> pop() {
        counted_node_ptr old_head = head.load();
        for (;;) {
          increase_head_count(old_head);
          node* const ptr = old_head.ptr;
          if (!ptr) return std::shared_ptr<T>();
          if (head.compare_exchange_strong(old_head, ptr->next)) {
                std::shared_ptr<T> res;
                res.swap(ptr->data);

                int const count_increase = old_head.external_count - 2;
                if (ptr->internal_count.fetch_add(count_increase) == -count_increase)
                  delete ptr;

                return res;
          } else if (ptr->internal_count.fetch_sub(1) == 1) {
                delete ptr;
          }
        }
  }

private:
  void increase_head_count(counted_node_ptr& old_counter) {
        counted_node_ptr new_counter;
        do {
          new_counter = old_counter;
          ++new_counter.external_count;
        } while (!head.compare_exchange_strong(old_counter, new_counter));

        old_counter.external_count = new_counter.external_count;
  }
};

内存模型优化

最后我们加上详细的内存模型的参数设置来优化代码，具体细节省略，列出代码如下：

template<typename T> class lock_free_stack {
private:
  struct node;

  struct counted_node_ptr {
        int external_count;
        node* ptr; 
  };

  struct node {
        std::shared_ptr<T> data;
        std::atomic<int> internal_count;
        counted_node_ptr next;
        node(T const& data_):
          data(std::make_shared<T>(data_)), internal_count(0) {}
  };

  std::atomic<counted_node_ptr> head;
  void increase_head_count(counted_node_ptr& old_counter) {
        counted_node_ptr new_counter;
        do {
      new_counter = old_counter;
          ++new_counter.external_count;
        }
        while(!head.compare_exchange_strong(old_counter,new_counter,
            std::memory_order_acquire,
            std::memory_order_relaxed));
        old_counter.external_count = new_counter.external_count;
  }

public:
  ~lock_free_stack() { while(pop()); }
  void push(T const& data) {
        counted_node_ptr new_node;
        new_node.ptr = new node(data);
        new_node.external_count = 1;
        new_node.ptr->next = head.load(std::memory_order_relaxed)
          while(!head.compare_exchange_weak(new_node.ptr->next,new_node,
            std::memory_order_release,
            std::memory_order_relaxed));
  }

  std::shared_ptr<T> pop() {
        counted_node_ptr old_head = head.load(std::memory_order_relaxed);
        for(;;) {
          increase_head_count(old_head);
          node* const ptr = old_head.ptr;
          if (!ptr) return std::shared_ptr<T>();
          if (head.compare_exchange_strong(old_head,ptr->next,
                            std::memory_order_relaxed)) {
                std::shared_ptr<T> res;
                res.swap(ptr->data);
                int const count_increase = old_head.external_count-2;
                if (ptr->internal_count.fetch_add(count_increase,
                        std::memory_order_release) == -count_increase)
                  delete ptr;
                return res; 
          } else if(ptr->internal_count.fetch_add(-1,
                    std::memory_order_relaxed) == 1) {
            ptr->internal_count.load(std::memory_order_acquire);
            delete ptr; 
          }
        }
  }
};

无锁编程结构的一些指导原则
1. 最开始使用最强的std::memory_order_seq_cst内存模型参数
  
  当把无锁编程代码完成之后，最后再优化，即使用其他更弱的内存模型参数替代，提高性能。
2. 使用无锁内存回收方案
  
  内存管理是无锁编程代码最大的难点之一。之前的代码中介绍了几种方案：
  1. 等待无线程操作数据结构时再删除节点
  2. 使用hazard pointer确认是否有线程在访问节点
  3. 使用引用计数删除节点
  也可以使用垃圾回收算法处理，或者使用循环利用节点的方案而不删除节点，这种方案会有一些不同的难点需要注意。
3. ABA问题
  
  ABA问题存在于所有基于比较交换的算法中。如下场景：
  1. 线程1读取一个原子的变量x，读到的值为A
  2. 线程1基于A值执行一些操作，比如解析该值内容（指针）或做一些查找等处理
  3. 线程1被系统交换出去以便其他线程执行
  4. 另一个线程执行一些操作，改变x的值为B
  5. 一个线程改变A中的数据
  6. 一个线程修改x的值为A
  7. 线程1重新执行并比较x的值，当前x的值为A。比较显示成功，然后这时的A跟线程1之前读取的A实际上并不一致。
  一种通常使用避免ABA问题的方法是给变量x添加一个ABA计数，每次x变量被修改，则计数递增。
4. 注意忙等循环
  
  一些无锁编程一些场景下某些操作需要等待其他线程同样的操作完成才能继续执行，可能需要改变数据结构，如声明变量为原子性变量等来进行优化（有一个无锁编程queue结构的代码已省略没有给出，这里暂不详细说明）。

左耳听风ARTS第5周

Algorithm

Review

时间和顺序

Tips

Share

无锁编程