[go源码]sync.Once源码阅读与分析A：若是直接使用o.done == 0，Do方法中的o.done读操作和do

demo

package main

import (
	"fmt"
	"sync"
)

func main() {
	var once sync.Once
	var wg sync.WaitGroup
	n, m := 0, 100
	wg.Add(m)
	for i := 0; i < m; i++ {
		go func() {
			defer wg.Done()
			once.Do(func() { n++ })
		}()
	}
	wg.Wait()
	fmt.Println(n)
}

输出结果:1

源码

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

package sync

import (
	"sync/atomic"
)

// Once is an object that will perform exactly one action.
type Once struct {
	// done indicates whether the action has been performed.
	// It is first in the struct because it is used in the hot path.
	// The hot path is inlined at every call site.
	// Placing done first allows more compact instructions on some architectures (amd64/x86),
	// and fewer instructions (to calculate offset) on other architectures.
	done uint32
	m    Mutex
}

// Do calls the function f if and only if Do is being called for the
// first time for this instance of Once. In other words, given
// 	var once Once
// if once.Do(f) is called multiple times, only the first call will invoke f,
// even if f has a different value in each invocation. A new instance of
// Once is required for each function to execute.
//
// Do is intended for initialization that must be run exactly once. Since f
// is niladic, it may be necessary to use a function literal to capture the
// arguments to a function to be invoked by Do:
// 	config.once.Do(func() { config.init(filename) })
//
// Because no call to Do returns until the one call to f returns, if f causes
// Do to be called, it will deadlock.
//
// If f panics, Do considers it to have returned; future calls of Do return
// without calling f.
//
func (o *Once) Do(f func()) {
	// Note: Here is an incorrect implementation of Do:
	//
	//	if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
	//		f()
	//	}
	//
	// Do guarantees that when it returns, f has finished.
	// This implementation would not implement that guarantee:
	// given two simultaneous calls, the winner of the cas would
	// call f, and the second would return immediately, without
	// waiting for the first's call to f to complete.
	// This is why the slow path falls back to a mutex, and why
	// the atomic.StoreUint32 must be delayed until after f returns.
    
	if atomic.LoadUint32(&o.done) == 0 {
		// Outlined slow-path to allow inlining of the fast-path.
		o.doSlow(f)
	}
}

func (o *Once) doSlow(f func()) {
	o.m.Lock()
	defer o.m.Unlock()
	if o.done == 0 {
		defer atomic.StoreUint32(&o.done, 1)
		f()
	}
}

问题与分析

Q：Do 方法为什么不直接 o.done == 0，而要使用 atomic.LoadUint32(&o.done) == 0 ？
A：若是直接使用o.done == 0，Do方法中的o.done读操作和doSlow方法中的o.done写操作不是同步操作。在o.done被写为1的时候，Do方法中o.done读取到的值可能仍然为0，从而导致执行了原本可以避免执行的doSlow，使程序性能降低。而使用 atomic.LoadUint32(&o.done) == 0，与doSlow方法中的o.done操作形成同步操作，在doSlow方法中o.done被设置为1时，Do方法中o.done可以及时读取到最新的值，从而避免了上述使程序性能降低的情况。

Q：Do 方法为什么不使用锁读取 o.done，而要使用 atomic.LoadUint32(&o.done) == 0 ？
A：虽然可以用锁读取o.done值，从而与doSlow方法中的o.done写操作形成同步操作，但依然增加了加解锁操作，锁操作与原子操作相比，性能较低；且与doSlow方法中的锁形成锁竞争，也会降低性能。所以使用锁读取o.done的方式，依然没有使用 atomic.LoadUint32(&o.done) == 0 的方式在性能方面更优。

Q：为什么 doSlow 方法中直接使用 o.done == 0 ？
A：因为doSlow方法中o.done处于临界区内（锁的保护下），同一协程内读写操作是同步操作，同时阻塞其他协程，达到了线程安全。

Q：既然已经使用的Lock, 为什么不直接 o.done = 1，还需要 atomic.StoreUint32(&o.done, 1)
A：这里涉及到锁操作与原子操作的区别，锁操作可能会发生上下文切换，而原子操作不会。直接使用o.done = 1，可能会发生上下文切换，导致o.done设置为1的时间较原子操作长；且o.done被置为1时，Do方法中的o.done读取到的可能仍然是0。使用atomic.StoreUint32(&o.done, 1)是原子操作，没有上下文切换的损耗，而o.done被置为1时，Do方法中o.done能及时读取到为1。所以从性能上来说，使用atomic.StoreUint32(&o.done, 1)更优。

Q：使用sync.Once后所有协程都会被阻塞吗?
A：在sync.Once中，在o.done被置为1之前，协程会被阻塞；o.done被置为1后，协程会被直接返回，不会被阻塞。所以所有协程是否都会被阻塞，由o.done被置为1的耗时决定，可能全部被阻塞，也可能部分被阻塞。

Q：为什么使用atomic.StoreUint32(&o.done, 1)，Do方法中的o.done能够及时读取到最新的值?
A：这是由原子操作的特性决定的。原子操作执行某块内存操作的时候，其他任何对该内存的操作均会被阻塞。doSlow中的原子写操作和Do方法中的原子读是同步操作，写发生之后，能够被读操作立即检测到。

关键知识点

锁(mutex)：锁由操作系统提供的API实现，是软件层通过阻塞其他协程达到原子操作的功能。在加锁和解锁之间的代码形成来临界区，在临界区的变量内存形成临界资源。对于同一临界区，程序只允许一个协程来执行，其他协程被阻塞。从而达到对临界资源的同步操作。锁操作可能会被中断（上下文切换），且有加解锁操作，所以有一定的性能开销。对于该临界区内的临界资源在该临界区之外被访问时，若有锁，则会与该临界区的锁形成锁竞争（同一资源同时只能被一个地方锁住）；若没有锁，则该临界资源非线程安全。

原子操作(atomic)：原子操作由底层硬件支持，是向 CPU 发送对某一个块内存的 LOCK 信号，然后就将此内存块加锁，从而保证了内存块操作的原子性。原子操作不会被中断（上下文切换），阻塞其他任何对该内存的操作，没有形成临界区，没有加解锁操作，所以性能较高。