并发队列(concurrentqueue)源码详细剖析 [第二篇 - implicit producer入队]

还是从使用函数入口

在我的项目里test.cpp这样使用入队函数

std::thread t1([&]{
        for (int i = 0; i < 50; i++)
            conqueue.Enqueue(i);
        cout << "t1 insert done!\n";
    });

其中调用Enqueue函数,ConcurrentQueue对于这个函数提供了左值引用和右值引用两种版本

左值引用版本

template <typename T>
bool ConcurrentQueue<T>::Enqueue(T const& item)
{
    if (kInitialImplicitProducerHashSize == 0)
        return false;
    else 
        return InnerEnqueue<CAN_ALLOC>(item);
}

右值引用版本

template <typename T>
bool ConcurrentQueue<T>::Enqueue(T&& item)
{
    if (kInitialImplicitProducerHashSize == 0)
        return false;
    else 
        return InnerEnqueue<CAN_ALLOC>(std::move(item));
}

底层都是调用InnerEnqueue函数. 那我们进入这个函数看看是如何做的

InnerEnqueue

template <typename T>
template<AllocationMode can_alloc, typename U>
bool ConcurrentQueue<T>::InnerEnqueue(U&& element)
{
    auto producer = GetOrAddImplicitProducer(); // 获取或创建一个隐式生产者
    
    // 调用隐式生产者的 Enqueue入队成员函数,将元素入队
    return producer == nullptr ? false
        : producer->ImplicitProducer<T>::template Enqueue<can_alloc>(
            std::forward<U>(element));
}

看下第5行是怎么获取火创建一个隐式生产者,怎么使用生产者哈希表的.

GetOrAddImplicitProducer函数分析 这个函数代码稍微有一点点多,请耐心看下来

template <typename T>
ImplicitProducer<T>* ConcurrentQueue<T>::GetOrAddImplicitProducer()
{
    auto id = ThreadId();    // 获取每个线程变量(__thread修饰)的地址
    auto hash_id = HashThreadId(id);  // 将上面地址转换成hash值
    
    // 获取隐式生产者哈希表
    auto main_hash = implicit_producer_hash_.load(std::memory_order_acquire);
    
    // 用于获取当前线程对应的隐式生产者,从哈希表里查找
    for (auto hash = main_hash; hash != nullptr; hash = hash->prev_)
    {
        auto index = hash_id;
        while (true)
        {
            index &= hash->capacity_ - 1u;  // 将哈希值转换成哈希表大小之内,防止过大的值导致下标越界
            // 获取当前线程在哈希中存储的key
            auto probed_key = hash->entries_[index].key_.load(std::memory_order_releaxed);
            if (probed_key == id) // 条件成立,则表示当前线程在之前向hash存储过值
            {
                auto value = hash->entries_[index].value_;  // 获取当前线程所属的隐式生产者对象
                if (hash != main_hash)  // 不是在主哈希表里找到的
                {
                    // 下面是将之前哈希里面当前线程存储的值,重新哈希到当前的主哈希里面
                    index = hash_id;
                    while (true)
                    {
                        index &= main_hash->capacity_ - 1u;
                        auto empty = kInvalidThreadId;
                        
                        // 哈希到主哈希里面
                        if (main_hash->entries_[index].key_.compare_exchange_strong(empty, id, 
                            std::memory_order_seq_cst, std::memory_order_relaxed))
                        {
                            main_hash->entries_[index].value_ = value;
                            break;
                        }
                        ++index;// 如果存储哈希冲突,将下标加1使用. [也就是使用线性探测法解决哈希冲突]  
                    }
                }
                return value; // 将找到的隐式生产者对象返回
            }
            if (probed_key == kInvalidThreadId) // 当前线程还没有创建任何生产者存储到hash里面
                break;
            ++index; // 如果有哈希冲突,则指向下一个位置来判断是否设置了值
        }
    }
}

只写了该函数的一半代码,上面代码主要是从哈希表获取该线程对应的隐式生产者对象,找到就返回,否则就进入后面的代码. 还记得第一篇里哈希表元素ImplicitProducerKVP的两个成员,key_和value_.
上面代码就是从哈希表找到当前线程对应的value_,也就是隐式生产者对象.
ok,上面代码应该看懂了吧,我们接着分析函数的剩余代码.

    // implicit_producer_hash_count_是当前隐式生产者的个数
    // 获取新的隐式生产者的个数,也就是在原计数器 + 1
    auto new_count = 1 + implicit_producer_hash_count_.fetch_add(1, std::memory_order_relaxed);
    
    while (true)
    {
        // implicit_producer_hash_resize_in_progress_是用于分配新哈希表的标志位,告诉其他线程
        // 判断新的生产者个数大于哈希表的二分之一,并且没有其他线程正在重新扩大哈希表容量
        if (new_count >= (main_hash->capacity_ >> 1) 
            && !implicit_producer_hash_resize_in_progress_.test_and_set(
                std::memory_order_acquire))
        {
            // 重新获取一次主哈希,因为有可能其他线程调整了
            main_hash = implicit_producer_hash_.load(std::memory_order_acquire);
            // 再次生产者个数判断是否小于哈希的总容量一半,那就需要扩容
            // 因为容量内存不够就很容易造成哈希冲突,所以要扩容
            if (new_count >= (main_hash->capacity_ >> 1))
            {
                // 设置新的哈希表大小为原大小的2倍
                size_t new_capacity = main_hash->capacity_ << 1;
                // 如果生产者个数比新的哈希表大于的2分之一还要大
                while (new_count >= (new_capacity >> 1))
                    new_capacity <<= 1;  // 那么继续扩大2倍
                    
                // 分配新的哈希表
                auto raw = static_cast<char*>(malloc(sizeof(ImplicitProducerhash<T>)
                    + std::alignment_of<ImplicitProducerKVP<T>>::value - 1
                    + sizeof(ImplicitProducerKVP<T> * new_capacity));
                if (raw == nullptr) // 内存不足,无法分配
                {
                    // 将生产者个数计数器回归到之前值
                    implicit_producer_hash_count_.fetch_sub(1, std::memory_order_relaxed);
                    // 将标志位设置为默认值,表示当前没有进行调整哈希表大小
                    implicit_producer_hash_resize_in_progress_.clear(
                        std::memory_order_relaxed);
                        
                    return nullptr; // 返回nullptr,内存不足
                }
                
                // 到这里表示新的哈希表分配成功
                
                // 使用重定位new,分配一个管理哈希表的结构,里面有capacity和entries(回想第一章)
                auto new_hash = new (raw) ImplicitProducerHash<T>();
                new_hash->capacity_ = static_cast<size_t>(new_capacity); // 设置新哈希表的大小
                
                // entries_就是指向哈希表头部
                new_hash->entries_ = reinterpret_cast<ImplicitProducerKVP<T>*>(
                    AlignFor<ImplicitProducerKVP<T>>(
                        raw + sizeof(ImplicitProducerHash<T>)));
                
                // 设置哈希表每个下标的元素是ImplicitProducerKVP
                for (size_t i = 0; i != new_capacity; i++)
                {
                    new (new_hash->entries_ + 1) ImplicitProducerKVP<T>;
                    new_hash->entries_[i].key_.store(kInvalidThreadId, 
                        std::memory_order_relaxed);
                }
                new_hash->prev_ = main_hash; // 新哈希表指向旧的哈希表.所以哈希表也用链表连起来
                
                // 设置关于哈希表的成员
                implicit_producer_hash_.store(new_hash, std::memory_order_release);
                implicit_producer_hash_resize_in_progress_.clear(std::memory_order_release);
                main_hash = new_hash; // 主哈希表是新的哈希表
            }
            else  // 内存还充足,没有达到哈希表的二分之一大小
            {
                // 将哈希表调整大小的标志设置为false
                implicit_producer_hash_resize_hash_resize_in_progres_.clear(
                    std::memory_order_release);
            }
        }
    }

大家可以慢慢看看上面的代码,注释写的很详细.如果大家没有耐心,那么听我解释.
其实就是重新调整哈希表的大小,当新的生产者个数如果超过了哈希表容量的二分之一大小,那么就需要重新分配一个哈希表,是不是觉得挺浪费的.
其实这么做就是防止哈希冲突,因为容量越不够就很容易产生哈希冲突

这个函数还剩一小段代码了,坚持住,我相信你能行.看源码就是这么烦躁的,忍住后以后你也能自己去分析开源代码

    // 新的生产者个数比哈希表容量4分之3小
    if (new_count < (main_hash->capacity_ >> 1) + (main_hash->capacity_ >> 2))
    {
        // 回收旧的生产者,如果没有旧的就创建新的
        auto producer = static_cast<ImplicitProducer<T>*>(RecycleOrCreateProducer(false));
        if (producer == nullptr) // 旧的也没有,创建新的也失败
        {
            // 将生产者计数器减1
            implicit_producer_hash_count_.fetch_sub(1, std::memory_order_relaxed);
            return nullptr;
        }
        
        // 执行到这里,获取到了生产者
        
        auto index = hash_id;
        while (true)
        {
            index &= main_hash->capacity_ - 1u; // 将哈希值设置为哈希表容量之内,防止数组下标越界
            auto empty = kInvalidThreadId;
            
            // 设置哈希表对应的位置,将当前线程id和上面获取的生产者都设置进去
            if (main_hash->entries_[index].key_.compare_exchange_strong(empty, id,
                std::memory_order_seq_cst, std::memory_order_relaxed))
            {
                // 设置当前线程对应的生产者
                main_hash->entries_[index].value_ = producer;
                break;
            }
            ++index; // 如果哈希冲突,就找下一个位置.(使用线性探测解决哈希冲突)
        }
        return producer;  // 成功返回
    }
    
    
    main_hash = implicit_producer_hash_.load(std::memory_order_acquire);

上面代码如果新生产者个数小于哈希表容量4分之3大小,则回收或创建新的生产者.然后根据哈希值设置哈希表对应的位置.将当前线程id和生产者设置到哈希表里,可以用于下次访问.

可能有的同学想知道RecycleOrCreateProducer是怎么回收和创建新生产者的.下面分析下该函数.

template <typename T>
ProducerBase<T>* ConcurrentQueue<T>::RecycleOrCreateProducer(bool is_explicit)
{
    for (auto ptr = producer_list_tail_.load(std::memory_order_acquire);
        ptr != nullptr; ptr = ptr->NextProd())  // 遍历生产者链表,是否有可以回收的
    {
        if (ptr->inactive_.load(std::memory_order_relaxed) 
            && ptr->is_explicit_ == is_explicit)    // 如果可回收
        {
            bool expected = true;
            // 将回收的inactive_标志设置为false,表示不可回收
            if (ptr->inactive_.compare_exchange_strong(expected, false, std::memory_order_acquire,
                std::memory_order_relaxed))
            {
                return ptr; // 则回收该生产者并返回
            }
        }
    }
    
    // 如果没有可回收的,则直接创建生产者
    return AddProducer(is_explicit ? static_cast<ProducerBase<T>*>(
        Create<ExplicitProducer<T>>(this)) : Create<ImplicitProducer<T>>(this));
}

ConcurrentQueue类有个producer_list_tail_成员.上面代码该成员遍历,查找生产者状态inactive_设置为true,这表示可回收.则将该标志位设置为false然后返回
否则就调用AddProducer函数创建新的生产者返回

ok,这段代码还是挺多了,不过挺到了这里,非常厉害. 现在回想也没啥东西,是不是.
我们获取到了生产者,那么就该调用它的出队和入队函数.

在分析新的代码之前,帮大家回想第一篇文章的块池,记不得可以回去看一看画的图. 生产者就需要用到它用来存储数据. 生产者内部维护了一套数据结构用来操作这个块池. 不过很简单.

隐式生产者分析及其成员函数 Enqueue

我们先来看看隐式生产者是如何定义的吧

template <typename T>
struct ImplicitProducer : public ProducerBase<T>
{
// 构造函数,接收一个ConcurentQueue指针,因为需要调用它里面的块池来分配一个块
    ImplicitProducer(ConcurrentQueue<T>* parent_);
    // 省略析构函数
    
// 入队操作
    template <AllocationMode alloc_mode, typename U>
    bool Enqueue(U&& element); 
    
    template <AllocationMode alloc_mode, typename It>
    bool Enqueue(It item_first, size_t count);
    
// 出队操作
    template <typename U>
    bool Dequeue(U& element);
    
    template <typename It>
    size_t DequeueBulk(It& item_first, size_t max); // 批量出队
    
private:
    // 块索引条目
    struct BlockIndexEntry
    {
        std::atomic<size_t> key_;      // 块索引号
        std::atoimic<Block<T>*> value; // 指向块地址
    };
    
    // 块索引头,这个很有用,后面会画图讲解
    struct BlockIndexHeader
    {
        size_t capacity_;            // BlockIndexEntry的个数
        std::atomic<size_t> tail_;   // entries_的末尾元素
        BlockIndexEntry* entries_;   // 存储BlockIndexEntry对象,是个数组
        BlockIndexEntry** index_;    // 存储entries_中每个条目的地址
        BlockIndexHeader* prev_;     // 指向上一次分配的BlockIndex
    };
private:
    size_t next_block_index_capacity_;   // 需要分配块大小的容量
    std::atomic<BlockIndexHeader*> block_index_;  // 块索引,指向一个BlockIndexHeader
    static const index_t kInvalidBlockBase = 1;
};

疑问? 为什么要提供个什么生产者呢?直接将数据存储到容器里不好吗?

其实ConcurrentQueue提供了Implicit和explicit两种生产者,这两种都提供入队和出队操作. 但是他俩还是有点不一样的. 后面会分析explicit 生产者是如何做的.

该类继承了ProducerBase,implicit和explicit 生产者都继承自该类.我们看下这个类又定义了哪些内容?

template <typename T>
struct ProducerBase : public ConcurrentQueueProducerTypelessBase
{
    ProducerBase(ConcurrentQueue<T>* parent, bool is_explicit);
    virtual ~ProducerBase() {}
    
    tempalte <typename U>
    bool Dequeue(U& element);
    
    template <typename It>
    size_t DequeueBulk(It& item_first, size_t max);
public:
    bool is_explicit; // 是否是explicit 生产者
    ConcurrentQueue<T>* parent_;
protected:
    std::atomic<index_t> tail_index_;   // 写入数据的下标
    std::atomic<index_t> head_index_;   // 读取数据的下标
    
    std::atomic<index_t> dequeue_optimistic_count_;
    std::atomic<index_t> dequeue_overcommit_;
    
    Block<T>* tail_block_;
};

上面的tail_index_和head_index_用于数据入队和数据出队.通过这俩去获取对应的块索引项(里面存储着Block,也就是存储着数据).
tail_index主要用于入队的时候;
head_index用于出队的时候获取块索引项

在开始分析入队函数源码之前,先看看这个类的构造函数怎么初始化的.
构造函数

    template <typename T>
    ImplicitProducer<T>::ImplicitProducer(ConcurrentQueue<T>* parent)
        : ProducerBase<T>(parent, false),
        next_block_index_capacity_(kImplicitInitialIndexSize),
        block_index_(nullptr)
    { NewBlockIndex(); }

构造函数接收一个ConcurrentQueue指针,隐式生产者需要ConcurrentQueue里面的块池来存储数据.
构造函数里调用了NewBlockIndex函数,这个函数用来创建块索引(就是一块数组)

NewBlockIndex源码分析

template <typename T>
bool ImplicitProducer<T>::NewBlockIndex()
{
    auto prev = block_index_.load(std::memory_order_relaxed);
    size_t prev_capacity = prev == nullptr ? 0 : prev->capacity_;
    auto entry_count = prev == nullptr ? next_block_index_capacity_ : prev_capacity;
    
    auto raw = static_cast<char*>(malloc(sizeof(BlockIndexHeader) + 
        std::alignment_of<BlockIndexEntry>::value - 1 + sizeof(BlockIndexEntry) * entry_count
        + std::alignment_of<BlockIndexEntry*>::value - 1 + sizeof(BlockIndexEntry*) 
        * next_block_index_capacity_));

    if (raw == nullptr)
        return false;
    auto header = new (raw) BlockIndexHeader;   // 构造一个BlockIndexHeader
    // 调整指针位置,指向存储条目项的位置
    auto entries = reinterpret_cast<BlockIndexEntry*>(AlignFor<BlockIndexEntry>(raw + sizeof(BlockIndexHeader)));
    // 调整指针位置,指向存储每个条目地址的位置
    auto index = reinterpret_cast<BlockIndexEntry**>(AlignFor<BlockIndexEntry*>(reinterpret_cast<char*>(entries) 
        + sizeof(BlockIndexEntry) * entry_count));

    if (prev != nullptr)    // 再次之前已经分配过BlockIndex数组了
    {
        // 获取之前BlockIndex的条目数组中的最后一个
        auto prev_tail = prev->tail_.load(std::memory_order_relaxed);
        auto prev_pos = prev_tail;
        size_t i = 0;

        // 将之前BlockIndex中存储条目地址的数组拷贝到当前新分配BlockIndex中的index_中
        do {
            prev_pos = (prev_pos + 1) & (prev->capacity_ - 1);
            index[i++] = prev->index_[prev_pos];
        } while (prev_pos != prev_tail);
        assert(i == prev_capacity);
    }

    // 初始化entries(底层对象是BlockIndexEntry),并在内存最后存储每个条目的地址
    for (size_t i = 0; i != entry_count; i++)   
    {
        new (entries + i) BlockIndexEntry;  // 创建BlockIndexEntry
        entries[i].key_.store(kInvalidBlockBase, std::memory_order_relaxed);
        index[prev_capacity + i] = entries + i; // 存储每个条目的地址
    }

    // BlockIndexHeader设置有关信息
    header->prev_ = prev;       // 指向上一次分配的BlockIndex
    header->entries_ = entries; // 指向条目数组
    header->index_ = index;     // 指向存储条目地址的数组
    header->capacity_ = next_block_index_capacity_; // 条目个数, 默认32
    // tail_默认为 next_block_index_capacity_ - 1
    header->tail_.store((prev_capacity - 1) & (next_block_index_capacity_ - 1),
        std::memory_order_relaxed);

    block_index_.store(header, std::memory_order_release);
    next_block_index_capacity_ <<= 1;   // 下一次要分配块内存所需要的大小

    return true;
}

根据注释浏览一下上面代码.大家可以尝试分析一下上面代码做了什么.如果看不太懂,没有关系,我在这里画个图,说明一下上面到底干了什么.

上面就是分配一块比较大的内存,然后将这段代码分为三块.
分别是:

BlockIndexHeader: 内部存储这块内存大小、内存不足需要扩容的大小、块索引项首地址等等
BlockIndexEntry: 存储Block块的地址
Index: 指向具体的块索引项,就是用来遍历找到具体的块索引项

Enqueue入队函数源码分析

template <typename T>
template<AllocationMode alloc_mode, typename U>
bool ImplicitProducer<T>::Enqueue(U&& element)
{
    index_t current_tail_index = this->tail_index_.load(std::memory_order_relaxed);
    index_t new_tail_index = 1 + current_tail_index;    // 下一次插入的块索引序号

    // 还没有获取任何一个块或者获取的块已用完
    if ((current_tail_index & static_cast<index_t>(kBlockSize - 1)) == 0) 
    {
	    // 我们到达了一个区块的末尾，开始一个新的区块
	    auto head = this->head_index_.load(std::memory_order_relaxed);
	    assert(!CircularLessThan<index_t>(current_tail_index, head));
	    if (!CircularLessThan<index_t>(head, current_tail_index + kBlockSize) 
            || (kMaxSubqueueSize != ConstNumericMax<size_t>::value 
            && (kMaxSubqueueSize == 0 
            || kMaxSubqueueSize - kBlockSize < current_tail_index - head))) 
        {
	    	return false;
	    }

        BlockIndexEntry* index_entry;
        // 插入一个块索引,并获得这个块索引地址
	    if (!InsertBlockIndexEntry<alloc_mode>(index_entry, current_tail_index)) 
        {
	    	return false;
	    }

        auto new_block = this->parent_->ConcurrentQueue<T>::template 
            RequisitionBlock<alloc_mode>(); // 申请一个块

        if (new_block == nullptr)   // 如果没有空块
        {
            RewindBlockIndexTail(); // 将块索引尾 向前移动一个位置
            index_entry->value_.store(nullptr, std::memory_order_relaxed);
            return false;
        }

        // 设置当前Block里面元素个数为0
        new_block->Block<T>::template ResetEmpty<implicit_context>();

        index_entry->value_.store(new_block, std::memory_order_relaxed);
        this->tail_block_ = new_block;

        if (!noexcept(new(static_cast<T *>(nullptr)) T(std ::forward<U>(element))))
        {
            this->tail_index_.store(new_tail_index, std::memory_order_release);
            return true;
        }
    }
    // 在new_block上构造传入进来的元素
    new ((*this->tail_block_)[current_tail_index]) T(std::forward<U>(element));
    // 下一次再默认插入元素的下标
    this->tail_index_.store(new_tail_index, std::memory_order_release);
    return true;
}

可以尝试根据上面注释分析下到底怎么做的入队.看不懂没关系,我画个图.
第一步,通过this->tail_index_(默认是0),通过快索引找到块索引项

第二步,调用ConcurrentQueue的RequisitionBlock函数,从块池中取出一个Block

第三步,设置块索引项目的key和value

第四步,将数据元素构造在Block的elements成员数组中

也就是上面的52行代码

到此,数据入队成功.

下一篇,是批量入队源码分析.

第三篇,批量入队源码分析