开启掘金成长之旅！这是我参与「掘金日新计划 · 12 月更文挑战」的第27天，点击查看活动详情

并发编程-共享模型之无锁（一）介绍了无锁。并发编程-共享模型之无锁（二）介绍了原子性的原子整数。本篇介绍原子性的原子数组。

原子数组

AtomicIntegerArray
AtomicLongArray
AtomicReferenceArray

有如下方法

/**
 参数1，提供数组、可以是线程不安全数组或线程安全数组
 参数2，获取数组长度的方法
 参数3，自增方法，回传 array, index
 参数4，打印数组的方法
*/
// supplier 提供者 无中生有 ()->结果
// function 函数 一个参数一个结果 (参数)->结果 , BiFunction (参数1,参数2)->结果
// consumer 消费者 一个参数没结果 (参数)->void, BiConsumer (参数1,参数2)->
private static <T> void demo(
    Supplier<T> arraySupplier,
    Function<T, Integer> lengthFun,
    BiConsumer<T, Integer> putConsumer,
    Consumer<T> printConsumer ) {
    List<Thread> ts = new ArrayList<>();
    T array = arraySupplier.get();
    int length = lengthFun.apply(array);
    for (int i = 0; i < length; i++) {
        // 每个线程对数组作 10000 次操作
        ts.add(new Thread(() -> {
            for (int j = 0; j < 10000; j++) {
                putConsumer.accept(array, j%length);
            }
        }));
    }
    ts.forEach(t -> t.start()); // 启动所有线程
    ts.forEach(t -> {
        try {
            t.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    }); // 等所有线程结束
    printConsumer.accept(array);
}

不安全的数组

demo(
    ()->new int[10],
    (array)->array.length,
    (array, index) -> array[index]++,
    array-> System.out.println(Arrays.toString(array))
);

结果

[9870, 9862, 9774, 9697, 9683, 9678, 9679, 9668, 9680, 9698]

安全的数组

demo(
    ()-> new AtomicIntegerArray(10),
    (array) -> array.length(),
    (array, index) -> array.getAndIncrement(index),
    array -> System.out.println(array)
);

结果

[10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000, 10000]

字段更新器

AtomicReferenceFieldUpdater // 域字段
AtomicIntegerFieldUpdater
AtomicLongFieldUpdater 利用字段更新器，可以针对对象的某个域（Field）进行原子操作，只能配合 volatile 修饰的字段使用，否则会出现异常

Exception in thread "main" java.lang.IllegalArgumentException: Must be volatile type

public class Test5 {
    private volatile int field;
    public static void main(String[] args) {
        AtomicIntegerFieldUpdater fieldUpdater =
            AtomicIntegerFieldUpdater.newUpdater(Test5.class, "field");
        Test5 test5 = new Test5();
        fieldUpdater.compareAndSet(test5, 0, 10);
        // 修改成功 field = 10
        System.out.println(test5.field);
        // 修改成功 field = 20
        fieldUpdater.compareAndSet(test5, 10, 20);
        System.out.println(test5.field);
        // 修改失败 field = 20
        fieldUpdater.compareAndSet(test5, 10, 30);
        System.out.println(test5.field);
    }
}

输出

10 
20 
20

原子累加器

累加器性能比较

private static <T> void demo(Supplier<T> adderSupplier, Consumer<T> action) {
    T adder = adderSupplier.get();
    long start = System.nanoTime();
    List<Thread> ts = new ArrayList<>();
    // 4 个线程，每人累加 50 万
    for (int i = 0; i < 40; i++) {
        ts.add(new Thread(() -> {
            for (int j = 0; j < 500000; j++) {
                action.accept(adder);
            }
        }));
    }
    ts.forEach(t -> t.start());
    ts.forEach(t -> {
        try {
            t.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
    });
    long end = System.nanoTime();
    System.out.println(adder + " cost:" + (end - start)/1000_000);
}

比较 AtomicLong 与 LongAdder

for (int i = 0; i < 5; i++) {
    demo(() -> new LongAdder(), adder -> adder.increment());
}
for (int i = 0; i < 5; i++) {
    demo(() -> new AtomicLong(), adder -> adder.getAndIncrement());
}

输出

1000000 cost:43 
1000000 cost:9 
1000000 cost:7 
1000000 cost:7 
1000000 cost:7 
1000000 cost:31 
1000000 cost:27 
1000000 cost:28 
1000000 cost:24 
1000000 cost:22

性能提升的原因很简单，就是在有竞争时，设置多个累加单元，Therad-0 累加 Cell[0]，而 Thread-1 累加 Cell[1]... 最后将结果汇总。这样它们在累加时操作的不同的 Cell 变量，因此减少了 CAS 重试失败，从而提高性能。

原理之伪共享(CPU 缓存结构)

查看 cpu 缓存

⚡ root@yihang01 ~ lscpu
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Byte Order: Little Endian
CPU(s): 1
On-line CPU(s) list: 0
Thread(s) per core: 1
Core(s) per socket: 1
Socket(s): 1
NUMA node(s): 1
Vendor ID: GenuineIntel
CPU family: 6
Model: 142
Model name: Intel(R) Core(TM) i7-8565U CPU @ 1.80GHz
Stepping: 11
CPU MHz: 1992.002
BogoMIPS: 3984.00
Hypervisor vendor: VMware
Virtualization type: full
L1d cache: 32K
L1i cache: 32K
L2 cache: 256K
L3 cache: 8192K
NUMA node0 CPU(s): 0

速度比较

从cpu到	大约需要的时钟周期
寄存器	1 cycle
L1	3~4 cycle
L2	10~20 cycle
L3	40~45 cycle
内存	120~240 cycle

查看 cpu 缓存行

⚡ root@yihang01 ~ cat /sys/devices/system/cpu/cpu0/cache/index0/coherency_line_size
64

cpu 拿到的内存地址格式是这样的

[高位组标记][低位索引][偏移量]

CPU 缓存读

读取数据流程如下

根据低位，计算在缓存中的索引
判断是否有效
- 0 去内存读取新数据更新缓存行
- 1 再对比高位组标记是否一致
  - 一致，根据偏移量返回缓存数据
  - 不一致，去内存读取新数据更新缓存行

CPU 缓存一致性

MESI 协议

E、S、M 状态的缓存行都可以满足 CPU 的读请求
E 状态的缓存行，有写请求，会将状态改为 M，这时并不触发向主存的写
E 状态的缓存行，必须监听该缓存行的读操作，如果有，要变为 S 状态
M 状态的缓存行，必须监听该缓存行的读操作，如果有，先将其它缓存（S 状态）中该缓存行变成 I 状态（即 6. 的流程），写入主存，自己变为 S 状态
S 状态的缓存行，有写请求，走 4. 的流程
S 状态的缓存行，必须监听该缓存行的失效操作，如果有，自己变为 I 状态
I 状态的缓存行，有读请求，必须从主存读取

内存屏障

Memory Barrier（Memory Fence）

可见性

写屏障（sfence）保证在该屏障之前的，对共享变量的改动，都同步到主存当中
而读屏障（lfence）保证在该屏障之后，对共享变量的读取，加载的是主存中最新数据

有序性

写屏障会确保指令重排序时，不会将写屏障之前的代码排在写屏障之后
读屏障会确保指令重排序时，不会将读屏障之后的代码排在读屏障之前

并发编程-共享模型之无锁（三）