WaitGroup

// A WaitGroup must not be copied after first use.
type WaitGroup struct {
   noCopy noCopy

   // 64-bit value: high 32 bits are counter, low 32 bits are waiter count.
   // 64-bit atomic operations require 64-bit alignment, but 32-bit
   // compilers only guarantee that 64-bit fields are 32-bit aligned.
   // For this reason on 32 bit architectures we need to check in state()
   // if state1 is aligned or not, and dynamically "swap" the field order if
   // needed.
   state1 uint64//counter + waiter count
   state2 uint32//64位下 是sema field， &wg.state2  就是 semap

}

WaitGroup 的整个调用过程可以简单地描述成下面这样

当调用 WaitGroup.Add(n)时，counter 将会自增:counter += n

当调用 WaitGroup.Wait()时，会将 waiter++。同时调用 runtime Semacquire(semap),增加信号量，并挂起当前 goroutine.

当调用 WaitGroup.Done()时，将会 counter--。如果自减后的counter 等于 0，说明WaitGroup 的等待过程已经结束，则需要调用 runtime_Semrelease 释放信号量，唤醒正在 WaitGroup.Wait 的 goroutine.

Once

// Once is an object that will perform exactly one action.
//
// A Once must not be copied after first use.
type Once struct {
   // done indicates whether the action has been performed.
   // It is first in the struct because it is used in the hot path.
   // The hot path is inlined at every call site.
   // Placing done first allows more compact instructions on some architectures (amd64/386),
   // and fewer instructions (to calculate offset) on other architectures.
   done uint32
   m    Mutex
}

func (o *Once) Do(f func()) {
   // Note: Here is an incorrect implementation of Do:
   //
   // if atomic.CompareAndSwapUint32(&o.done, 0, 1) {
   //    f()
   // }
   //
   // Do guarantees that when it returns, f has finished.
   // This implementation would not implement that guarantee:
   // given two simultaneous calls, the winner of the cas would
   // call f, and the second would return immediately, without
   // waiting for the first's call to f to complete.
   // This is why the slow path falls back to a mutex, and why
   // the atomic.StoreUint32 must be delayed until after f returns.

   if atomic.LoadUint32(&o.done) == 0 {
      // Outlined slow-path to allow inlining of the fast-path.
      o.doSlow(f)
   }
}

func (o *Once) doSlow(f func()) {
   o.m.Lock()
   defer o.m.Unlock()
   if o.done == 0 {
      defer atomic.StoreUint32(&o.done, 1)
      f()
   }
}

sync.map

通过 read 和 dirty 两个字段将读写分离，misses 字段来统计 read 被穿透的次数(被穿透指需要读 dirty的情况)，超过一定次数则将 dirty 数据同步到 read 上

type Map struct {
   mu Mutex

   // read contains the portion of the map's contents that are safe for
   // concurrent access (with or without mu held).
   //
   // The read field itself is always safe to load, but must only be stored with
   // mu held.
   //
   // Entries stored in read may be updated concurrently without mu, but updating
   // a previously-expunged entry requires that the entry be copied to the dirty
   // map and unexpunged with mu held.
   read atomic.Value // readOnly

   // dirty contains the portion of the map's contents that require mu to be
   // held. To ensure that the dirty map can be promoted to the read map quickly,
   // it also includes all of the non-expunged entries in the read map.
   //
   // Expunged entries are not stored in the dirty map. An expunged entry in the
   // clean map must be unexpunged and added to the dirty map before a new value
   // can be stored to it.
   //
   // If the dirty map is nil, the next write to the map will initialize it by
   // making a shallow copy of the clean map, omitting stale entries.
   dirty map[any]*entry

   // misses counts the number of loads since the read map was last updated that
   // needed to lock mu to determine whether the key was present.
   //
   // Once enough misses have occurred to cover the cost of copying the dirty
   // map, the dirty map will be promoted to the read map (in the unamended
   // state) and the next store to the map will make a new dirty copy.
   misses int
}

// readOnly is an immutable struct stored atomically in the Map.read field.
type readOnly struct {
   m       map[any]*entry
   amended bool // true if the dirty map contains some key not in m.
}

func (m *Map) missLocked() {
   m.misses++
   if m.misses < len(m.dirty) {
      return
   }
   m.read.Store(readOnly{m: m.dirty})
   m.dirty = nil
   m.misses = 0
}

sync.Pool

初始化 Pool 实例 New

第一个步骤就是创建一个 Pool 实例，关键一点是配置 New 方法，声明 Pool 元素创建的方法。

bufferPool := &sync.Pool{
   New: func() any {
      println("Create new instance")
      return struct{}{}
   },
}

申请对象 Get

buffer := bufferPool.Get()

Get 方法会返回 Pool 已经存在的对象，如果没有，那么就走慢路径，也就是调用初始化的时候定义的 New 方法（也就是最开始定义的初始化行为）来初始化一个对象。

释放对象 Put

bufferPool.Put(buffer)

使用对象之后，调用 Put 方法声明把对象放回池子。注意了，这个调用之后仅仅是把这个对象放回池子，池子里面的对象啥时候真正释放外界是不清楚的，是不受外部控制的。

// A Pool is a set of temporary objects that may be individually saved and
// retrieved.
//
// Any item stored in the Pool may be removed automatically at any time without
// notification. If the Pool holds the only reference when this happens, the
// item might be deallocated

原理

unskilled.blog/posts/lets-…

Let’s say you have some kind of burst of activity, and you start allocating lots of jobs. Once the burst is over, you’ve offloaded a mountain of jobs to the pool, but there’s little chance you will need so many soon. That’s where sync.Pool starts to shine: it works hand in hand with the garbage collector to free unused jobs.

var (
	allPoolsMu Mutex

	// allPools is the set of pools that have non-empty primary
	// caches. Protected by either 1) allPoolsMu and pinning or 2)
	// STW.
	allPools []*Pool

	// oldPools is the set of pools that may have non-empty victim
	// caches. Protected by STW.
	oldPools []*Pool
)

func poolCleanup() {
	// This function is called with the world stopped, at the beginning of a garbage collection.
	// It must not allocate and probably should not call any runtime functions.

	// Because the world is stopped, no pool user can be in a
	// pinned section (in effect, this has all Ps pinned).

	// Drop victim caches from all pools.
	for _, p := range oldPools {
		p.victim = nil
		p.victimSize = 0
	}

	// Move primary cache to victim cache.
	for _, p := range allPools {
		p.victim = p.local
		p.victimSize = p.localSize
		p.local = nil
		p.localSize = 0
	}

	// The pools with non-empty primary caches now have non-empty
	// victim caches and no pools have primary caches.
	oldPools, allPools = allPools, nil
}

poolCleanup move items from local to victim.

In other terms, this means that items that have been in the pool for too long without being retrieved will eventually fade out. On the first garbage collection cycle, they will go from local to victim, and if they’re still not picked up, on the second cycle they will be trashed.

// A Pool must not be copied after first use.
type Pool struct {
   noCopy noCopy

   local     unsafe.Pointer // local fixed-size per-P pool, actual type is [P]poolLocal
   localSize uintptr        // size of the local array

   victim     unsafe.Pointer // local from previous cycle
   victimSize uintptr        // size of victims array

   // New optionally specifies a function to generate
   // a value when Get would otherwise return nil.
   // It may not be changed concurrently with calls to Get.
   New func() any
}

type poolLocal struct {
   poolLocalInternal

   // Prevents false sharing on widespread platforms with
   // 128 mod (cache line size) = 0 .
   pad [128 - unsafe.Sizeof(poolLocalInternal{})%128]byte
}

// Local per-P Pool appendix.
type poolLocalInternal struct {
   private any       // Can be used only by the respective P.
   shared  poolChain // Local P can pushHead/popHead; any P can popTail.
}

poolLocalInternal is in essence what we call the “pool”: it is composed of a private attribute, which is one pool item, and the rest of the items are stored in shared, which is a doubly linked list. poolLocal embeds poolLocalInternal, and ensures a 128 bytes padding.

purpose of `poolLocal`.

poolLocal is created for each P to minimize concurrency conflicts, taking advantage of GMP features.

the purpose of padding in `poolLocal`

We make the general assumption that CPU cache lines are 128 bytes maximum, or a lower power of 2, and the goal is that the poolLocal structure fills one or more cache lines. As we’ve seen previously, p.local is a contiguous block of memory (an array) with poolLocal objects for each P next to each other. False sharing would be very detrimental to the fast path here, as it would mean that for every change a P makes to its poolLocal object, other P’s might need to synchronize cache lines containing their own poolLocal; instead, by making sure cache lines are not shared by multiple poolLocal objects, we prevent this false sharing.

In other terms, each P can access and modify their own poolLocal object without causing cache invalidation in other cores.

Get

pin, get our poolLocal;
empty private, and if it was not empty, use that item;
in the event private was empty, pop shared, LIFO style, and use that item;
in the event shared was empty, we’ll go down the slow path;
in the event even that did not work, we’ll call New;
unpin.

popHead() : we choose to reuse latest allocations, and let the oldest ones fade out.

When our p-local pool is empty, we’re going to try to steal an item from other P’s. If that does not work, we’ll try to look in all the victim pools.

Put

// Put adds x to the pool.
func (p *Pool) Put(x any) {
   if x == nil {
      return
   }
   if race.Enabled {
      if fastrandn(4) == 0 {
         // Randomly drop x on floor.
         return
      }
      race.ReleaseMerge(poolRaceAddr(x))
      race.Disable()
   }
   l, _ := p.pin()
   if l.private == nil {
      l.private = x
      x = nil
   }
   if x != nil {
      l.shared.pushHead(x)
   }
   runtime_procUnpin()
   if race.Enabled {
      race.Enable()
   }
}

sync WaitGroup / Once / Map / Pool