Java Collection集合源码详解 —— ConcurrentHashMap

作者：shiwyang
这个类的难度实在是太大了，涉及好多锁相关的概念，只能简单的介绍一下我学习时候的思路，具体的源码实现的解析，只能有缘再写了！

ConcurrentHashMap

ConcurrentHashMap 是用于支持高并发、高吞吐量的线程安全的Map实现。
Key 和 Value 都不允许为空
可以用于制作可伸缩频率图

HashMap是非线程安全的，在多线程访问的时候，没有同步机制，并发场景下put操作可能导致同一个数组下的链表形成闭环，get出现死循环，导致CPU利用率接近百分之百

源码注释阅读

我发现通过阅读最上方的源码注释，可以非常清晰且简介的认识这个类的全部知识，所以我通过简单的翻译注释，提取出有关于这个类的信息。

HashTable也是一个和HashMap相似并且实现了 线程同步 的类，为什么高并发场景下不使用HashTable来保证线程安全呢？在ConcurrentHashMap中的注释详细的写出了为什么要使用一个全新的类。

设计目标： 保证并发可读性（主要是get()方法），减少争用，保持空间消耗和HashMap差不多，提高多线程在空表中的初始插入率。 （要求真高）

The primary design goal of this hash table is to maintain concurrent readability (typically method get(), but also iterators and related methods) while minimizing update contention. Secondary goals are to keep space consumption about the same or better than java.util.HashMap, and to support high initial insertion rates on an empty table by many threads.

对比HashTable

该类遵循与 HashTable 相同的功能规范，并包含与Hashtable的每个方法对象的方法版本。
所有操作的是线程安全的，而且检索操作不需要锁定，但是不支持锁定整个表以阻止访问

A hash table supporting full concurrency of retrievals and high expected concurrency for updates. This class obeys the same functional specification as Hashtable, and includes versions of methods corresponding to each method of Hashtable. However, even though all operations are thread-safe, retrieval operations do not entail locking, and there is not any support for locking the entire table in a way that prevents all access. This class is fully interoperable with Hashtable in programs that rely on its thread safety but not on its synchronization details.

使用此类的注意事项：

检索操作不会被阻塞，因此可能和更新操作重叠，造成不可重复读问题。
对于类似 putAll clear 之类的聚合操作，并发检索可能只能反应插入或删除的一部分条目
类似的迭代器、拆分其、枚举返回、也只会返回他们在创建迭代器、枚举时候的时间点的状态，不会抛出ConcurrentModificationException。
迭代器被设计为只能提供一个线程使用
在使用聚合操作包括 size isEmpty 和 containsValue 在内的聚合状态方法的结果仅在线程末进行并发控制的时候使用，否则这些方法的瞬间状态只能用于监测或估计目的，不适用程序控制。

Retrieval operations (including get) generally do not block, so may overlap with update operations (including put and remove). Retrievals reflect the results of the most recently completed update operations holding upon their onset. (More formally, an update operation for a given key bears a happens-before relation with any (non-null) retrieval for that key reporting the updated value.) For aggregate operations such as putAll and clear, concurrent retrievals may reflect insertion or removal of only some entries. Similarly, Iterators, Spliterators and Enumerations return elements reflecting the state of the hash table at some point at or since the creation of the iterator/enumeration. They do not throw ConcurrentModificationException. However, iterators are designed to be used by only one thread at a time. Bear in mind that the results of aggregate status methods including size, isEmpty, and containsValue are typically useful only when a map is not undergoing concurrent updates in other threads. Otherwise the results of these methods reflect transient states that may be adequate for monitoring or estimation purposes, but not for program control.

参数作用:

0.75 的负载调节因子，用于触发自动扩容机制，可以通过构造函数设定 loadFactor的值
initialCapacity 在构造函数中，设定初始化大小。
loadFactor 浮动范围
concurrencyLevel 作为并发粒度的调整值（这个参数在JDK1.7 和JDK1.8中有不同的涵义）
- JDK1.7中，由于ConcurrentHashMap是分段锁，就是在一个HashMap中将数组分为几段，分段加锁，提高并发度，因此在JDK1.7中这个参数的作用就是用来设置段的数量的。
- 在JDK1.8中，ConcurrentHashMap改为了粒度锁，就是数组的每一个链表都加锁，因此这个元素的作用和initialCapacity 相同，这里我不太理解为什么不把这个参数去掉。
最后一句话没读懂

The table is dynamically expanded when there are too many collisions (i.e., keys that have distinct hash codes but fall into the same slot modulo the table size), with the expected average effect of maintaining roughly two bins per mapping (corresponding to a 0.75 load factor threshold for resizing). There may be much variance around this average as mappings are added and removed, but overall, this maintains a commonly accepted time/space tradeoff for hash tables. However, resizing this or any other kind of hash table may be a relatively slow operation. When possible, it is a good idea to provide a size estimate as an optional initialCapacity constructor argument. An additional optional loadFactor constructor argument provides a further means of customizing initial table capacity by specifying the table density to be used in calculating the amount of space to allocate for the given number of elements. Also, for compatibility with previous versions of this class, constructors may optionally specify an expected concurrencyLevel as an additional hint for internal sizing. Note that using many keys with exactly the same hashCode() is a sure way to slow down performance of any hash table. To ameliorate impact, when keys are Comparable, this class may use comparison order among keys to help break ties.

KeySet工具：

跟Map里面提供的EntrySet应该是一个道理

A Set projection of a ConcurrentHashMap may be created (using newKeySet() or newKeySet(int)), or viewed (using keySet(Object) when only keys are of interest, and the mapped values are (perhaps transiently) not used or all take the same mapping value.

保证多线程安全：

这个类能支持一组顺序和并行的批量操作，并且不同于 Stream的方法，能保证多线程同时更新的安全
该类的元素没有以任何特定方式排序，并且可能在不同的并行执行中以不同的顺序进行处理（就类似于HashMap的保存方法）
除了forEach方法，理想情况下应该没有副作用。Map.Entry 对象的批量操作 不支持 setValue方法（保证线程安全）

ConcurrentHashMaps support a set of sequential and parallel bulk operations that, unlike most Stream methods, are designed to be safely, and often sensibly, applied even with maps that are being concurrently updated by other threads; for example, when computing a snapshot summary of the values in a shared registry. There are three kinds of operation, each with four forms, accepting functions with Keys, Values, Entries, and (Key, Value) arguments and/or return values. Because the elements of a ConcurrentHashMap are not ordered in any particular way, and may be processed in different orders in different parallel executions, the correctness of supplied functions should not depend on any ordering, or on any other objects or values that may transiently change while computation is in progress; and except for forEach actions, should ideally be side-effect-free. Bulk operations on Map.Entry objects do not support method setValue.

并行阈值：

parallelismThreshold 通过这个参数来设置Map的并行阈值
- 使用Long.MAX_VALUE 就不允许并行了
- 使用 1 就是最大的并行性。通过 ForkJoinPool.commonPool() 这个函数来寻找最大并行度

These bulk operations accept a parallelismThreshold argument. Methods proceed sequentially if the current map size is estimated to be less than the given threshold. Using a value of Long.MAX_VALUE suppresses all parallelism. Using a value of 1 results in maximal parallelism by partitioning into enough subtasks to fully utilize the ForkJoinPool.commonPool() that is used for all parallel computations. Normally, you would initially choose one of these extreme values, and then measure performance of using in-between values that trade off overhead versus throughput.

部分原子性：

这段话简单概括就是在并发的时候，保证的是每个元素都是保证原子性的，但是不能保证整个映射是具有原子性的，这个就跟他的粒度锁有关。

The concurrency properties of bulk operations follow from those of ConcurrentHashMap: Any non-null result returned from get(key) and related access methods bears a happens-before relation with the associated insertion or update. The result of any bulk operation reflects the composition of these per-element relations (but is not necessarily atomic with respect to the map as a whole unless it is somehow known to be quiescent). Conversely, because keys and values in the map are never null, null serves as a reliable atomic indicator of the current lack of any result. To maintain this property, null serves as an implicit basis for all non-scalar reduction operations. For the double, long, and int versions, the basis should be one that, when combined with any other value, returns that other value (more formally, it should be the identity element for the reduction). Most common reductions have these properties; for example, computing a sum with basis 0 or a minimum with basis MAX_VALUE

小技巧

使用new AbstractMap.SimpleEntry(k,v) 可以返回Map中的数量（还没测试过） // TODO

Methods accepting and/or returning Entry arguments maintain key-value associations. They may be useful for example when finding the key for the greatest value. Note that "plain" Entry arguments can be supplied using new AbstractMap.SimpleEntry(k,v).

剩下的两段没啥用，我就没写出来。

ConcurrentHashMap源码解析

源码阅读过程，关键放在如何实现高并发（锁）的机制，这部分绝大多数参考，也可以看一下这篇文章，主要描述的是JDK1.7和1.8同一个类的不同区别

ConcurrentHashMap源码阅读 - butterfly100 - 博客园 (cnblogs.com)

粒度锁实现：

使用 volatile 加锁（TODO：synchronized Lock volatile），实现对每一行数据进行加锁。
nextTable 是 table的缓存表，用于扩容的时候，原table 需要扩容的时候，先复制到这个表，然后再table = nextTable

transient volatile Node<K,V>[] table;

private transient volatile Node<K,V>[] nextTable;

Node

不允许执行 setValue 方法

static class Node<K,V> implements Map.Entry<K,V> {
    final int hash;
    final K key;
  	// volatile关键字保证并发性（可见性和禁止重排序）
    volatile V val;
    volatile Node<K,V> next;

    Node(int hash, K key, V val, Node<K,V> next) {
        this.hash = hash;
        this.key = key;
        this.val = val;
        this.next = next;
    }

    public final K getKey()       { return key; }
    public final V getValue()     { return val; }
    public final int hashCode()   { return key.hashCode() ^ val.hashCode(); }
    public final String toString(){ return key + "=" + val; }
  	// 不允许执行，直接抛出异常
    public final V setValue(V value) {
        throw new UnsupportedOperationException();
    }

    public final boolean equals(Object o) {
        Object k, v, u; Map.Entry<?,?> e;
        return ((o instanceof Map.Entry) &&
                (k = (e = (Map.Entry<?,?>)o).getKey()) != null &&
                (v = e.getValue()) != null &&
                (k == key || k.equals(key)) &&
                (v == (u = val) || v.equals(u)));
    }

    /**
     * Virtualized support for map.get(); overridden in subclasses.
     */
  	// 用于map.get() 方法，子类重写
    Node<K,V> find(int h, Object k) {
        Node<K,V> e = this;
        if (k != null) {
            do {
                K ek;
                if (e.hash == h &&
                    ((ek = e.key) == k || (ek != null && k.equals(ek))))
                    return e;
            } while ((e = e.next) != null);
        }
        return null;
    }
}

TreeBin

和HashMap一样，链表上的元素超过8个就会树化，这是为了保存树化的类，同样继承了Node
使用了一些关键字来标识和储存结点的并发状态

static final class TreeBin<K,V> extends Node<K,V> {
    TreeNode<K,V> root;
    volatile TreeNode<K,V> first;
    volatile Thread waiter;
  	// 锁的状态
    volatile int lockState;
    // values for lockState 锁的状态码标识符
    static final int WRITER = 1; // set while holding write lock
    static final int WAITER = 2; // set when waiting for write lock
    static final int READER = 4; // increment value for setting read lock

    /**
     * Tie-breaking utility for ordering insertions when equal
     * hashCodes and non-comparable. We don't require a total
     * order, just a consistent insertion rule to maintain
     * equivalence across rebalancings. Tie-breaking further than
     * necessary simplifies testing a bit.
     */
  	// 当类没有实现Comparable的时候，使用这个方法来比较HashCodes
    static int tieBreakOrder(Object a, Object b) {
        int d;
        if (a == null || b == null ||
            (d = a.getClass().getName().
             compareTo(b.getClass().getName())) == 0)
            d = (System.identityHashCode(a) <= System.identityHashCode(b) ?
                 -1 : 1);
        return d;
    }

    /**
     * Creates bin with initial set of nodes headed by b.
     */
   // 树化（红黑树）
    TreeBin(TreeNode<K,V> b) {
        super(TREEBIN, null, null, null);
        this.first = b;
        TreeNode<K,V> r = null;
        for (TreeNode<K,V> x = b, next; x != null; x = next) {
            next = (TreeNode<K,V>)x.next;
            x.left = x.right = null;
            if (r == null) {
                x.parent = null;
                x.red = false;
                r = x;
            }
            else {
                K k = x.key;
                int h = x.hash;
                Class<?> kc = null;
                for (TreeNode<K,V> p = r;;) {
                    int dir, ph;
                    K pk = p.key;
                    if ((ph = p.hash) > h)
                        dir = -1;
                    else if (ph < h)
                        dir = 1;
                    else if ((kc == null &&
                              (kc = comparableClassFor(k)) == null) ||
                             (dir = compareComparables(kc, k, pk)) == 0)
                        dir = tieBreakOrder(k, pk);
                        TreeNode<K,V> xp = p;
                    if ((p = (dir <= 0) ? p.left : p.right) == null) {
                        x.parent = xp;
                        if (dir <= 0)
                            xp.left = x;
                        else
                            xp.right = x;
                        r = balanceInsertion(r, x);
                        break;
                    }
                }
            }
        }
        this.root = r;
        assert checkInvariants(root);
    }

    /**
     * Acquires write lock for tree restructuring.
     */
  	// 添加write lock
    private final void lockRoot() {
        if (!U.compareAndSwapInt(this, LOCKSTATE, 0, WRITER))
            contendedLock(); // offload to separate method
    }

    /**
     * Releases write lock for tree restructuring.
     */
  	// 释放write lock
    private final void unlockRoot() {
        lockState = 0;
    }

    /**
     * Possibly blocks awaiting root lock.
     */
  	// TODO:没看懂啥意思
    private final void contendedLock() {
        boolean waiting = false;
        for (int s;;) {
            if (((s = lockState) & ~WAITER) == 0) {
                if (U.compareAndSwapInt(this, LOCKSTATE, s, WRITER)) {
                    if (waiting)
                        waiter = null;
                    return;
                }
            }
            else if ((s & WAITER) == 0) {
                if (U.compareAndSwapInt(this, LOCKSTATE, s, s | WAITER)) {
                    waiting = true;
                    waiter = Thread.currentThread();
                }
            }
            else if (waiting)
                LockSupport.park(this);
        }
    }

扩容机制

ConcurrentHashMap 的扩容机制和HashMap的扩容机制不一样，在类里面就有一个静态的nextTable缓冲表，先在这个表上扩容，然后把原来的元素复制过来，实现扩容
HashMap的扩容使用的缓冲表是在函数里面创建的，在外部先设置静态对象，虽然会占用空间，但是可以通过唯一的静态变量，设置表的状态（加锁）保证类的线程安全
通过这样的扩容，可以保证table扩容的安全性，因为在插入数据的时候可能会触发扩容，在高并发量的情况下，可能会导致多个线程同时触发扩容机制，通过一个加锁的nextTable 可以保证同时只能有一个线程进行扩容，也可以通过同步机制，通知其他线程，等待扩容完成