沉默的性能杀手 - false sharing一般在做性能优化时，我们往往着眼于代码层面，很少关注硬件层面。这篇文章的主

一般在做性能优化时，我们往往着眼于代码层面，很少关注硬件层面。这篇文章的主题是 false sharing (伪共享)，在介绍 false sharing 前，我们首先需要了解下什么是 CPU Cache

CPU Cache

计算机存储器是分层次的，离 CPU 越近的存储器速度越快，每字节的成本越高，同时容量也越小。这里我们要讲的 CPU Cache 指的是 L1 Cache，L2 Cache 和 L3 Cache，也是我们常说的高速缓存。CPU Cache 最小缓存单元为 Cache Line，x86_64 CPU 系列为 64Bytes。后面我们要讲到的 false sharing 主要是和 Cache Line 有关

CPU Cache 在处理数据时，是以 Cache Line 为单位进行操作的，那么就意味着，如果处理 1Bytes 数据，其实是会一并把它临近的 63Bytes 数据一并处理。这里是以 x86_64 系列举例

false sharing

false sharing (伪共享) 指的是多个线程操作互相独立的变量时，如果这些变量共享同一 Cache Line，那么就会在无意中影响彼此的性能

那么如何避免 false sharing，答案就是 Cache Line 对齐，也很简单，就是通过空数组把这些独立的变量隔开

type NoPad struct {
    a uint64
    b uint64
    c uint64
}

type Pad struct {
    a  uint64
    _a [7]uint64
    b  uint64
    _b [7]uint64
    c  uint64
    _c [7]uint64
}

由于不同 CPU 架构体系的 Cache Line 大小不同，go 的标准库 golang.org/x/sys/cpu 提供 CacheLinePad 用于做 Cache Line 对齐

type Pad struct {
    a  uint64
    _a cpu.CacheLinePad
    b  uint64
    _b cpu.CacheLinePad
    c  uint64
    _c cpu.CacheLinePad
}

那么有没有做 Cache Line 对性能影响会有多大呢？我们直接来做个 benchmark 测试

type NoPad struct {
    a uint64
    b uint64
    c uint64
}

func (np *NoPad) Increase() {
    atomic.AddUint64(&np.a, 1)
    atomic.AddUint64(&np.b, 1)
    atomic.AddUint64(&np.c, 1)
}

type Pad struct {
    a  uint64
    _a [7]uint64
    b  uint64
    _b [7]uint64
    c  uint64
    _c [7]uint64
}

func (p *Pad) Increase() {
    atomic.AddUint64(&p.a, 1)
    atomic.AddUint64(&p.b, 1)
    atomic.AddUint64(&p.c, 1)
}

func BenchmarkPadIncrement(b *testing.B) {
    pad := &Pad{}

    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            pad.Increase()
        }
    })
}

func BenchmarkNoPadIncrement(b *testing.B) {
    noPad := &NoPad{}

    b.RunParallel(func(pb *testing.PB) {
        for pb.Next() {
            noPad.Increase()
        }
    })
}

benchmark 测试结果如下

$ go test -bench .

BenchmarkPadIncrement-12        42052069                26.6 ns/op
BenchmarkNoPadIncrement-12      20085301                58.9 ns/op

可以看到有做 Cache Line 对齐和没做对齐，两者的性能差异是巨大的，这也是为什么称 false sharing 为沉默的性能杀手