LongAdder相对于AtomicLong为什么快那么多？本文将先介绍AtomicLong，再对比AtomicLong

本文将先介绍AtomicLong，再对比AtomicLong和LongAdder的效率差异，并介绍LongAdder的实现原理，最后引出LongAccumulator

原子变量操作类 AtomicLong

JUC并发包中包含有Atomiclnteger、AtomicLong和AtomicBoolean等原子性操作类，它们的原理是相通的，这章着重讲解AtomicLong类内部的实现。

重要的数据域

private static final Unsafe U = Unsafe.getUnsafe();
private static final long VALUE
    = U.objectFieldOffset(AtomicLong.class, "value");

private volatile long value;

AtomicLong使用Unsafe对象来进行CAS原子性对value的值进行增加操作。值得注意的是，只有在64位的虚拟机上才能真正的使用CAS，32位的虚拟机是在底层使用加锁来模拟CAS。

原子操作

public final long incrementAndGet() {
    return U.getAndAddLong(this, VALUE, 1L) + 1L;
}
@IntrinsicCandidate
public final long getAndAddLong(Object o, long offset, long delta) {
    long v;
    do {
        v = getLongVolatile(o, offset);
    } while (!weakCompareAndSetLong(o, offset, v, v + delta));
    return v;
}

AtomicLong原子性增加的操作，底层是使用Unsafe，先是获取目的的当前值，然后CAS增加这个值。其他的方法也是类似。

AtomicLong使用示例

public class Main {
    private static AtomicLong atomicLong = new AtomicLong();
    private static Integer[] array1=new Integer[]{0 , 1 , 2 , 3 , 0 , 5 , 6 , 0 , 56 , 0};
    private static Integer[] array2=new Integer[]{10 , 1 , 2 , 3 , 0 , 5 , 6 , 0 , 56 , 0};
    public static void main(String[] args) throws InterruptedException {
        CountDownLatch countDownLatch = new CountDownLatch(2);
        Thread thread1=new Thread(()->{
            final Integer[] array=array1;
            for(int i=0;i<array.length;i++) {
                atomicLong.addAndGet(array[i]);
            }
            countDownLatch.countDown();
        });
        Thread thread2=new Thread(()->{
            final Integer[] array=array2;
            for(int i=0;i<array.length;i++) {
                atomicLong.addAndGet(array[i]);
            }
            countDownLatch.countDown();
        });
        thread1.start();
        thread2.start();
        countDownLatch.await();
        System.out.println(atomicLong.get());
    }
}

最终产生的结果：

AtomicLong和LongAdder的对比测试

        public static <T> void test(
                Supplier<T> supplier,
                Consumer<T> add
        ){
            T t = supplier.get();
            List<Thread> threadList=new ArrayList<>();
            for (int i=0;i<5000;i++){
                threadList.add(new Thread(()->{
                    for (int k=0;k<50000;k++){
                        add.accept(t);
                    }
                }));
            }
            Long start=System.nanoTime();
            threadList.forEach(Thread::start);
            threadList.forEach(thread -> {
                try {
                    thread.join();
                } catch (InterruptedException e) {
                    throw new RuntimeException(e);
                }
            });
            Long end=System.nanoTime();
            System.out.println(t+" "+"cost:"+(end-start)/1000000);
        }


        public static void main(String[] args) throws ExecutionException, InterruptedException {
            for (int i=0;i<4;i++){
                test(AtomicInteger::new, AtomicInteger::getAndIncrement);
            }
            System.out.println();
            for (int i=0;i<4;i++){
                test(LongAdder::new,LongAdder::increment);
            }
        }

最终结果：

可以看出LongAdder的效率是AtomicInteger的4倍左右。

那么为什么LongAdder的效率就比AtomicInteger的效率高出这么多？

AtomicIntegr的原子性

我们翻看其源码，发现它使用了Unsafe的getAndAddInt

        public final int getAndAddInt(Object o, long offset, int delta) {
            int v;
            do {
                v = getIntVolatile(o, offset);
            } while (!weakCompareAndSetInt(o, offset, v, v + delta));
            return v;
        }

这是一个自旋+CAS操作，因此在有大量线程同时对AtomicInteger进行操作时，将同时有大量的线程陷入自旋状态。正是这种大量的自旋，导致cpu空转浪费。

LongAdder为什么那么快？

AtomicInteger的缺点已经很明显了，就是大量线程同时去竞争同一个内存地址，尝试让其数值加1。

而LongAdder采用“分段累加”的思路，将大量的线程分布到不同的段上，以空间换时间，分散热点。

LongAdder内部有一个核心的 base值，和一个 Cell[]数组（称为单元格数组）。这继承于Striped64类，每个Cell里面有一个初始值为0 的long型变量，这样，在同等并发量的情况下，争夺单个变量更新操作的线程量会减少，这变相地减少了争夺共享资源的并发量。

并且，多个线程在争夺同一个Cell原子变量时如果失败了，并不是在当前Cell变量上一直自旋 CAS 重试，而是尝试在其他 Cell 的变量上进行 CAS 尝试，这个改变增加了当前线程重试 CAS 成功的可能性。

最后，在获取LongAdder当前值时，是把所有Cell变量的value值累加后再加上base返回的。

    /** CPU核心数，用于限制表格大小 */
    static final int NCPU = Runtime.getRuntime().availableProcessors();

    /**
     * 单元格数组。当非空时，其长度为2的幂，它在需要的时候惰性加载
     */
    transient volatile Cell[] cells;

    /**
     * 基础值，主要在没有竞争时使用，也可作为表格初始化竞争期间的备用值。通过CAS更新。
     */
    transient volatile long base;

    /**
     * 自旋锁（通过CAS锁定），在调整大小和/或创建单元格时使用。
     */
    transient volatile int cellsBusy;

Cell类

//Contended注释用来避免伪共享问题
@jdk.internal.vm.annotation.Contended static final class Cell {
    //被volatile声明的变量，用于记录当前Cell的值
    volatile long value;
    Cell(long x) { value = x; }
    //通过cas更新数值
    final boolean cas(long cmp, long val) {
        return VALUE.weakCompareAndSetRelease(this, cmp, val);
    }
    final void reset() {
        VALUE.setVolatile(this, 0L);
    }
    final void reset(long identity) {
        VALUE.setVolatile(this, identity);
    }
    final long getAndSet(long val) {
        return (long)VALUE.getAndSet(this, val);
    }

    // VarHandle mechanics
    private static final VarHandle VALUE;
    static {
        try {
            MethodHandles.Lookup l = MethodHandles.lookup();
            VALUE = l.findVarHandle(Cell.class, "value", long.class);
        } catch (ReflectiveOperationException e) {
            throw new ExceptionInInitializerError(e);
        }
    }
}

获取LongAdder的数值

LongAdder通过sum函数来获得最终的结果。

public long sum() {
    Cell[] cs = cells;
    long sum = base;
    if (cs != null) {
        for (Cell c : cs)
            if (c != null)
                sum += c.value;
    }
    return sum;
}

这个方法的实质是，累加base和所有Cell的值，最后返回累加的值。因为这个方法并没有加锁，所以在调用这个方法的过程中，Cell数组可能被修改或者进行扩容。

因此这个累加的结果并不是实时的精准，并不是一个原子性的快照。

重置LongAdder

public void reset() {
    Cell[] cs = cells;
    base = 0L;
    if (cs != null) {
        for (Cell c : cs)
            if (c != null)
                c.reset();
    }
}

这个方法的本质逻辑就是将base和所有不为null的Cell重置为0。值得注意的是，这个方法也没有加锁，因此不要在还有其他线程修改LongAdder的时候调用。

LongAdder的原子增加

 public void add(long x) {
        Cell[] cs; long b, v; int m; Cell c;
        // 第一层判断：在cell为null的时候，直接通过cas更新base值
        if ((cs = cells) != null || !casBase(b = base, b + x)) {
            // 执行到这里，说明两种情况之一：
            // 1. cells数组已经初始化了（说明之前发生过竞争）
            // 2. cells还没初始化，但通过casBase直接累加到base的操作失败了（说明发生了第一次竞争）

            int index = getProbe(); // 获取当前线程的哈希码，用于定位到cells数组的某个位置
            boolean uncontended = true; // 一个“乐观”的标志，假设定位到的Cell没有竞争

            // 第二层判断：尝试走“Cell路径”
            if (cs == null || // 情况1: cells数组未初始化（由casBase失败进入）
                (m = cs.length - 1) < 0 || // 情况2: cells数组长度为0（容错检查）
                (c = cs[index & m]) == null || // 情况3: 哈希到的那个Cell槽位是空的
                !(uncontended = c.cas(v = c.value, v + x))) // 情况4: 对找到的Cell进行CAS累加操作失败了！
            {
                // 上述四个条件任何一个为true，就进入最终的“终极解决方法”
                longAccumulate(x, null, uncontended, index);
            }
        }
        // 如果第一层的if条件都不满足，说明cells为null，且casBase成功了，方法直接结束，这是最快、无竞争的路径。
    }

当cells为空的时候，代表没有竞争时，它像 AtomicLong一样，直接CAS更新 base值。
当cells不为空或者CAS更新base时失败了，说明发生竞争了，但是竞争较低，即对应的Cell无人竞争，如果对应的Cell存在则尝试CAS更新Cell。
当cells没有初始化或者对应的Cell不存在或者CAS更新Cell失败时，调用最终方法longAccumulate

longAccumulate的处理逻辑如下：

     final void longAccumulate(long x, LongBinaryOperator fn,
                                  boolean wasUncontended, int index) {                  
            if (index == 0) {
                ThreadLocalRandom.current(); // 强制初始化线程的随机数种子
                index = getProbe();// 重新获取哈希码，这样更换要修改的Cell对象
                wasUncontended = true;// 标记为无竞争重新开始
            }
            // 无限循环，直到成功
            for (boolean collide = false;;) {       // True if last slot nonempty
                Cell[] cs; Cell c; int n; long v;
                //分支1:当cell数组已经存在的时候
                if ((cs = cells) != null && (n = cs.length) > 0) {
                    //分支1.1:对应的那个槽是空的
                    if ((c = cs[(n - 1) & index]) == null) {
                        //先判断锁是否存在
                        if (cellsBusy == 0) {       // Try to attach new Cell
                            //预创建一个新的Cell
                            Cell r = new Cell(x);   // Optimistically create
                            //再次判断锁并尝试cas获得锁
                            if (cellsBusy == 0 && casCellsBusy()) {
                                try {               // Recheck under lock
                                    Cell[] rs; int m, j;
                                     // 双重检查，防止其他线程已创建
                                    if ((rs = cells) != null &&
                                        (m = rs.length) > 0 &&
                                        rs[j = (m - 1) & index] == null) {
                                        rs[j] = r; //放入槽位
                                        break; // 成功退出循环
                                    }
                                } finally {
                                    //释放锁
                                    cellsBusy = 0;
                                }
                                continue;   // 槽位已被占用，重试
                            }
                        }
                        collide = false;
                    }
                    //子分支1.2：之前 CAS 失败过，重新哈希
                    else if (!wasUncontended)       // CAS already known to fail
                        wasUncontended = true;      // 标记为无竞争，重新尝试
                    // 子分支1.3：尝试 CAS 累加
                    else if (c.cas(v = c.value,
                                   (fn == null) ? v + x : fn.applyAsLong(v, x)))
                        break;// CAS 成功，累加完成
                    // 子分支1.4：数组已最大或已过时
                    else if (n >= NCPU || cells != cs)
                        collide = false;           // 不扩容
                    else if (!collide)
                        collide = true;
                    //获取锁并扩容
                    else if (cellsBusy == 0 && casCellsBusy()) {
                        try {
                            if (cells == cs)        
                                //扩容两倍
                                cells = Arrays.copyOf(cs, n << 1);
                        } finally {
                            cellsBusy = 0;
                        }
                        collide = false;
                        continue;                   // Retry with expanded table
                    }
                    index = advanceProbe(index);
                }
                //分支2：未初始化cell数组
                else if (cellsBusy == 0 && cells == cs && casCellsBusy()) {
                    try {                      
                        //双重检查
                        if (cells == cs) {
                            //初始大小为2
                            Cell[] rs = new Cell[2];
                            //并在对应的位置创建cell
                            rs[index & 1] = new Cell(x);
                            cells = rs;
                            break;
                        }
                    } finally {
                        cellsBusy = 0;
                    }
                }
                //回退对base cas
                else if (casBase(v = base,
                                 (fn == null) ? v + x : fn.applyAsLong(v, x)))
                    break;
            }
        }

上面给出了cells数组扩容的条件：

cells数组的大小小于CPU的核数
线程尝试CAS一个Cell失败后，换一个Cell再次失败

扩容会扩容为先前的两倍。

当线程CAS失败后，会重新计算当前线程的随机值threadLoca!RandomProbe, 以减少下次访问 cells元素时的冲突机会。

当Cell数组存在时：只会对Cell进行更新

    cells != null
        ↓
    定位到Cell c = cells[index & (n-1)]
        ↓
    分支判断：
    ├── 1. c == null               → 创建新Cell并放入
    ├── 2. 之前CAS cell失败(wasUncontended=false) → 重新哈希，标记为无竞争
    ├── 3. 尝试c.cas(v, v+x) 来cas更新cell      → 成功则退出
    ├── 4. 数组已达最大(NCPU)或已过时 → 不扩容，重新哈希
    ├── 5. 未标记冲突(!collide)    → 标记冲突，重新哈希
    └── 6. 已标记冲突(collide=true) → 发生第二次cas冲突，获取锁，扩容2倍
        ↓
    每次失败后：index = advanceProbe(index)  // 重新哈希

当Cell数组不存在时：创建cells数组并且直接设置对应Cell的初值

    cells == null
        ↓
    尝试获取锁(cellsBusy)
        ↓
    成功：
        Cell[] rs = new Cell[2];   // 初始化大小为2
        rs[index&1] = new Cell(x); // 放入当前线程槽位
        cells = rs;
        ↓
    失败：回退到base变量CAS

获取锁失败后：最后尝试对base进行更新

    cellsBusy CAS失败
        ↓
    回退到base变量尝试CAS
        ↓
    成功：退出循环
    失败：重新循环

伪共享

伪共享 是一种在多核CPU架构下，由缓存系统引发的高性能“隐形杀手”。它指的是多个不相关的变量，因为被加载到同一个CPU缓存行（Cache Line）中，导致一个线程修改其中一个变量时，会“误伤”地使整个缓存行失效，从而拖慢其他线程的读写速度。

现代CPU为了弥补与内存之间的速度鸿沟，引入了多级缓存（L1、L2、L3）。数据在缓存和内存之间不是以单个字节为单位传输，而是以一个固定大小的块为单位，这个块就叫缓存行，通常是64字节。

伪共享发生的场景：

假设有两个独立的变量 a和 b，它们的内存地址恰好落在同一个64字节的缓存行里。当运行在CPU核心1上的线程T1频繁修改 a时，它会独占（Invalidate） 这个缓存行。运行在CPU核心2上的线程T2即使只想读取 b，也会发现它本地的缓存行副本已经失效，必须重新从更慢的内存或L3缓存中加载。这种无谓的、由无关数据引发的缓存行竞争，就是伪共享。

本质就是：变量本身在逻辑上不共享，但承载它们的物理缓存行被共享了，导致了性能下降。

解决方案：使用@sun.misc.Contended

在LongAdder中，因为Cell是数组形式，在内存中是连续存储的，一个Cell为 24 字节（16 字节的对象头和 8 字节的 value），因

此缓存行可以存下 2 个的 Cell 对象。这样问题来了：

Core-0修改Cell[0]

Core-1要修改Cell[1]

无论谁修改成功，都会导致对方Core的缓存行失效，要从内存中重新读取，因此在Cell上增加这个注解，使得一个缓存行只有一个Cell，一个Cell更新后不会影响到其他Cell。

LongAccumulator原理探究

LongAdder类是LongAccumulator的一个特例，LongAccumulator比LongAdder的功能更强大。

构造方法

public LongAccumulator(LongBinaryOperator accumulatorFunction,
                       long identity) {
    this.function = accumulatorFunction;
    base = this.identity = identity;
}

identity是LongAccumulator的初始值。

而LongBinaryOperator是一个函数式接口：根据传入的两个参数返回计算结果

@FunctionalInterface
public interface LongBinaryOperator {

  
    long applyAsLong(long left, long right);
}

使用LongAdder就相当于使用以下的LongAccumulator:

LongAccumulator longAccumulator=new LongAccumulator((long left,long right)->{
    return left+right;
},0);

LongAccumulator相比于LongAdder，可以为累加器提供非 0 的初始值，后者只能提供默认的 0 值。另外，前者还可以指定累加规则，比如不进行累加而进行相乘，只需要在构造 LongAccumulator时传入自定义的双目运算器即可，后者则内置累加的规则。

累加的原子性

它的原子性实现和LongAdder几乎无差别：

public void accumulate(long x) {
    Cell[] cs; long b, v, r; int m; Cell c;
    if ((cs = cells) != null
        || ((r = function.applyAsLong(b = base, x)) != b
            && !casBase(b, r))) {
        int index = getProbe();
        boolean uncontended = true;
        if (cs == null
            || (m = cs.length - 1) < 0
            || (c = cs[index & m]) == null
            || !(uncontended =
                 (r = function.applyAsLong(v = c.value, x)) == v
                 || c.cas(v, r)))
            longAccumulate(x, function, uncontended, index);
    }
}

只是在调用casBase的时候，LongAdder传入的是b+x，而LongAccumulator传入的是由自定义的双目运算符接收b和x产生的运算结果。

并且调用longAccumulate的时候，LongAdder传入的是null，而LongAccumulator传入的是自定义函数，这会将自定义函数接收b和x的结果作为更新值。

获取LongAccumulator的数值

public long get() {
    Cell[] cs = cells;
    long result = base;
    if (cs != null) {
        for (Cell c : cs)
            if (c != null)
                result = function.applyAsLong(result, c.value);
    }
    return result;
}

可见，逻辑也是与LongAdder类似，把中间结果和每个cell的值都进行运算。值得注意的是，要想获取的值是数学上正确的，传入的函数必须要满足结合律和交换律。

LongAccumulator示例：记录最大值

public class Main {
    private static  LongAccumulator longAccumulator=new LongAccumulator((long left,long right)->{
        return Math.max(left,right);
    },Long.MIN_VALUE);
    private static Integer[] array1=new Integer[]{0 , 1 , 2 , 3 , 0 , 5 , 6 , 0 , 56 , 0};
    private static Integer[] array2=new Integer[]{10 , 1 , 2 , 3 , 0 , 521 , 6 , 0 , 56 , 0};
    public static void main(String[] args) throws InterruptedException {
        CountDownLatch countDownLatch = new CountDownLatch(2);
        Thread thread1=new Thread(()->{
            final Integer[] array=array1;
            for(int i=0;i<array.length;i++) {
                longAccumulator.accumulate(array[i]);
            }
            countDownLatch.countDown();
        });
        Thread thread2=new Thread(()->{
            final Integer[] array=array2;
            for(int i=0;i<array.length;i++) {
                longAccumulator.accumulate(array[i]);
            }
            countDownLatch.countDown();
        });
        thread1.start();
        thread2.start();
        countDownLatch.await();
        System.out.println(longAccumulator.get());
    }
}