golang 条件变量源码剖析

avatar
rpc工程师 @小厂

1. 条件变量介绍

Condition variables allow threads to wait until some event 

or condition has occurred.

条件变量是并发编程中很经典的一个手段,常用的条件变量有两种实现。

  1. 非阻塞式条件变量(Nonblocking condition variables),通知且继续(signal and continue)。
  2. 阻塞式条件变量(Blocking condition variables),通知且等待(signal and wait)。

两种方式的区别是发出通知的线程是否会立刻失去所有权,阻塞式条件变量的实现是会立刻失去,非阻塞式条件变量的实现式不会立刻失去。

条件变量在各语言/基础库中都有自己的实现,linux的pthread库和c++的std::condition_variable 都是非阻塞式条件变量的实现,而golang sync.Cond也是典型的非阻塞式条件变量的实现。

// Cond implements a condition variable, a rendezvous point

// for goroutines waiting for or announcing the occurrence

// of an event.

我们再来看一下go官方对cond的定义,上述内容是从go源码中摘出来的。翻译一下,意思如下:

sync.Cond 是golang对条件变量的实现,有两个关键点:

  1. 等待条件变量成立而进入等待状态的多个goroutine(wait 操作)。
  2. 通知事件的发生goroutine,使条件变量成立(signal/broadcast 操作)。

2. 使用

2.1 基本使用

package main



import (

   "fmt"

   "sync"

   "time"

)



// 唤醒检测标志

var flag = false



func CondTest(info string, c *sync.Cond) {

   // 使用条件变量前需要先加锁(wait()内部有释放锁操作)

   c.L.Lock()

   for flag == false {

      fmt.Println(info, "wait")

      // 挂起等待唤醒

      c.Wait()

   }

   fmt.Println(info, flag)

   c.L.Unlock()

}



func main() {

   m := sync.Mutex{}

   c := sync.NewCond(&m)

   // add 2 waiter

   go CondTest("go one", c)

   go CondTest("go two", c)



   // main

   time.Sleep(time.Second)

  

   c.L.Lock()

   flag = true

   c.L.Unlock()

   c.Broadcast() // 全部唤醒

   // c.Signal() // 唤醒1个

   fmt.Println("main broadcast")



   time.Sleep(time.Second)

}

2.2 开源库使用

没咋找到😂,sync.Cond的场景基本都可以被chan替换掉

2.3 适用场景

想象这么一个场景,有1个worker在异步的接收数据,剩下的n个waiter必须等待这个worker接收完数据才能继续下面的处理流程,这时我们很容易想到两种方案。

  1. 自旋锁+全局变量。

缺点是需要不断轮询对应的变量来判断是否满足条件,且较难支持单waiter通知的操作。

  1. select + chan。

Don't communicate by sharing memory, share memory by communicating.

按照go的哲学,在这种并发的场景下,更推荐chan,但是如果使用chan来操作,需要worker明确感知到等待的waiter数来进行处理,比如有n个waiter就需要notify n次(用close也可以),而使用sync.Cond就可以极大的简化这个操作,只需要调用Broadcast即可完成多waiter的通知,除此之外cond也提供了类似于chan send单信号的通知(Singal)。

一句话总结:多waiter单worker的场景都可以使用sync.Cond

3. 源码分析

代码部分基于golang 1.16 64位机

3.1 内存模型

image.png

// Cond implements a condition variable, a rendezvous point

// for goroutines waiting for or announcing the occurrence

// of an event.

//

// Each Cond has an associated Locker L (often a *Mutex or *RWMutex),

// which must be held when changing the condition and

// when calling the Wait method.

//

// A Cond must not be copied after first use.

type Cond struct {

    noCopy noCopy



    // L is held while observing or changing the condition

    L Locker



    notify  notifyList

    checker copyChecker

}



// A Locker represents an object that can be locked and unlocked.

type Locker interface {

   Lock()

   Unlock()

}



// Approximation of notifyList in runtime/sema.go. Size and alignment must

// agree.

// 每一个waiter都会有1个ticket,可以理解为waiter的唯一标识,单调递增

type notifyList struct {

   wait   uint32 //下一个waiter的最大ticket,单调递增

   notify uint32 //下一个待唤醒的waiter的ticket值(ticket 小于该值的waiter都 即将/已经 处于唤醒态)

   lock   uintptr // key field of the mutex

   head   unsafe.Pointer

   tail   unsafe.Pointer

}



type copyChecker uintptr



// noCopy may be embedded into structs which must not be copied

// after the first use.

//

// See https://golang.org/issues/8005#issuecomment-190753527

// for details.

type noCopy struct{}

3.2 核心接口

copy

函数定义:

func (c *copyChecker) check()



func (*noCopy) Lock()



func (*noCopy) Unlock()

实现:

// copyChecker holds back pointer to itself to detect object copying.

type copyChecker uintptr



// copyChecker会存放copyChecker的地址,通过这个地址判断是否被拷贝

func (c *copyChecker) check() {

   // 检测copyChecker的地址是否与copyChecker中存储的相同

   if uintptr(*c) != uintptr(unsafe.Pointer(c)) &&

      // 第一次调用会将copyChecker的地址存储在copyChecker中

      !atomic.CompareAndSwapUintptr((*uintptr)(c), 0, uintptr(unsafe.Pointer(c))) &&

      // 保证第一次调用check()不报错

      uintptr(*c) != uintptr(unsafe.Pointer(c)) {

      panic("sync.Cond is copied")

   }

}



// noCopy may be embedded into structs which must not be copied

// after the first use.

//

// See https://golang.org/issues/8005#issuecomment-190753527

// for details.

type noCopy struct{}



// Lock is a no-op used by -copylocks checker from `go vet`.

func (*noCopy) Lock()   {}

func (*noCopy) Unlock() {}

cond

函数定义:

//新建cond

func NewCond(l Locker) *Cond 



//等待

func (c *Cond) Wait() 



//单waiter唤醒

func (c *Cond) Signal() 



//全waiter唤醒

func (c *Cond) Broadcast() 

实现:

// NewCond returns a new Cond with Locker l.

func NewCond(l Locker) *Cond {

   return &Cond{L: l}

}



// Wait atomically unlocks c.L and suspends execution

// of the calling goroutine. After later resuming execution,

// Wait locks c.L before returning. Unlike in other systems,

// Wait cannot return unless awoken by Broadcast or Signal.

//

// Because c.L is not locked when Wait first resumes, the caller

// typically cannot assume that the condition is true when

// Wait returns. Instead, the caller should Wait in a loop:

//

//    c.L.Lock()

//    for !condition() {

//        c.Wait()

//    }

//    ... make use of condition ...

//    c.L.Unlock()

//

func (c *Cond) Wait() {

   // 检测cond是否被拷贝(如果拷贝则会panic)

   c.checker.check()

   // 得到waiter的唯一标识

   t := runtime_notifyListAdd(&c.notify)

   c.L.Unlock()

   // 将waiter唯一标识添加到等待通知的队列中

   runtime_notifyListWait(&c.notify, t)

   c.L.Lock()

}



// Signal wakes one goroutine waiting on c, if there is any.

//

// It is allowed but not required for the caller to hold c.L

// during the call.

func (c *Cond) Signal() {

   // 检测cond是否被拷贝(如果拷贝则会panic)

   c.checker.check()

   // 唤醒等待队列中的一个waiter(唤醒前需要加锁)

   runtime_notifyListNotifyOne(&c.notify)

}



// Broadcast wakes all goroutines waiting on c.

//

// It is allowed but not required for the caller to hold c.L

// during the call.

func (c *Cond) Broadcast() {

   // 检测cond是否被拷贝(如果拷贝则会panic)

   c.checker.check()

   // 唤醒等待队列中所有的waiter(唤醒前需要加锁)

   runtime_notifyListNotifyAll(&c.notify)

}

3.3 runtime实现

sync/runtime.go

以下代码中省略了一些不重要的逻辑,已使用//...标识出来

和sync.Mutex一样,sync.Cond在sync包内只有函数声明,具体的函数实现会在链接时link到runtime/sema.go。

//可以理解为并发安全的id生成器,为每一个waiter生成1个唯一标识

func runtime_notifyListAdd(l *notifyList) uint32



//等待事件(Signal/Broadcast)的发生,t为当前waiter的唯一标识

func runtime_notifyListWait(l *notifyList, t uint32)



//唤醒当前等待的全部waiter

func runtime_notifyListNotifyAll(l *notifyList)



//唤醒1个waiter

func runtime_notifyListNotifyOne(l *notifyList)



// 内存安全保证,在init时会执行检查,保证sync包的notifyList结构大小等于runtime包的notifyList

func runtime_notifyListCheck(size uintptr)

func init() {

   var n notifyList

   runtime_notifyListCheck(unsafe.Sizeof(n))

}

runtime/sema.go


// notifyListAdd adds the caller to a notify list such that it can receive

// notifications. The caller must eventually call notifyListWait to wait for

// such a notification, passing the returned ticket number.

//go:linkname notifyListAdd sync.runtime_notifyListAdd

//获取waiter的唯一标识,单调递增,也是实现fifo的基础

func notifyListAdd(l *notifyList) uint32 {

   // This may be called concurrently, for example, when called from

   // sync.Cond.Wait while holding a RWMutex in read mode.

   return atomic.Xadd(&l.wait, 1) - 1

}



// notifyListWait waits for a notification. If one has been sent since

// notifyListAdd was called, it returns immediately. Otherwise, it blocks.

//go:linkname notifyListWait sync.runtime_notifyListWait

//开始wait,等待被唤醒

func notifyListWait(l *notifyList, t uint32) {

   lockWithRank(&l.lock, lockRankNotifyList)



   // Return right away if this ticket has already been notified.

   // 如果当前waiter的编号小于notify,则无须等待,直接返回即可

   if less(t, l.notify) {

      unlock(&l.lock)

      return

   }



   //将当前goroutine加入到notifyList链表中,单链表,尾插法

   //sudog represents a g in a wait list

   s := acquireSudog()

   s.g = getg()

   s.ticket = t

   s.releasetime = 0

//...

   if l.tail == nil {

      l.head = s

   } else {

      l.tail.next = s

   }

   l.tail = s

   //调用gopark,将goroutine状态由 _Grunning切换为 _Gwaiting

   goparkunlock(&l.lock, waitReasonSyncCondWait, traceEvGoBlockCond, 3)

//...

   releaseSudog(s)

}



// notifyListNotifyAll notifies all entries in the list.

//go:linkname notifyListNotifyAll sync.runtime_notifyListNotifyAll

func notifyListNotifyAll(l *notifyList) {

   // Fast-path: if there are no new waiters since the last notification

   // we don't need to acquire the lock.

   // fastpath,如果当前的notify和wait一致,则代表无新的waiter,直接返回即可

   if atomic.Load(&l.wait) == atomic.Load(&l.notify) {

      return

   }



   // Pull the list out into a local variable, waiters will be readied

   // outside the lock.

   lockWithRank(&l.lock, lockRankNotifyList)

   // 将notifyList链表清空

   s := l.head

   l.head = nil

   l.tail = nil



   // Update the next ticket to be notified. We can set it to the current

   // value of wait because any previous waiters are already in the list

   // or will notice that they have already been notified when trying to

   // add themselves to the list.

//将notify的值置为wait的值,意思将当前所有waiter都已经可以被唤醒了

   atomic.Store(&l.notify, atomic.Load(&l.wait))

   unlock(&l.lock)



   // Go through the local list and ready all waiters.

   // 遍历链表,循环唤醒所有waiter

   for s != nil {

      next := s.next

      s.next = nil

      readyWithTime(s, 4)

      s = next

   }

}



// notifyListNotifyOne notifies one entry in the list.

//go:linkname notifyListNotifyOne sync.runtime_notifyListNotifyOne

func notifyListNotifyOne(l *notifyList) {

   // Fast-path: if there are no new waiters since the last notification

   // we don't need to acquire the lock at all.

   // fastpath,如果当前的notify和wait一致,则代表无新的waiter,直接返回即可

   if atomic.Load(&l.wait) == atomic.Load(&l.notify) {

      return

   }



   lockWithRank(&l.lock, lockRankNotifyList)



   // Re-check under the lock if we need to do anything.

   // 很经典的操作,加锁后再二次确认

   t := l.notify

   if t == atomic.Load(&l.wait) {

      unlock(&l.lock)

      return

   }



   // Update the next notify ticket number.

   // 标识下一个可唤醒的waiter

   atomic.Store(&l.notify, t+1)



   // Try to find the g that needs to be notified.

   // If it hasn't made it to the list yet we won't find it,

   // but it won't park itself once it sees the new notify number.

   //

   // This scan looks linear but essentially always stops quickly.

   // Because g's queue separately from taking numbers,

   // there may be minor reorderings in the list, but we

   // expect the g we're looking for to be near the front.

   // The g has others in front of it on the list only to the

   // extent that it lost the race, so the iteration will not

   // be too long. This applies even when the g is missing:

   // it hasn't yet gotten to sleep and has lost the race to

   // the (few) other g's that we find on the list.

   // 从waiter链表中找到需要唤醒的waiter,将对应waiter唤醒,并从链表中移除

   for p, s := (*sudog)(nil), l.head; s != nil; p, s = s, s.next {

      if s.ticket == t {

         n := s.next

         if p != nil {

            p.next = n

         } else {

            l.head = n

         }

         if n == nil {

            l.tail = p

         }

         unlock(&l.lock)

         s.next = nil

         // 将g的状态由 _Gwaiting切换到_Grunnable

         readyWithTime(s, 4)

         return

      }

   }

   unlock(&l.lock)

}



//go:linkname notifyListCheck sync.runtime_notifyListCheck

// 检查sync.notifyList和runtime.notifyList大小是否一致

func notifyListCheck(sz uintptr) {

   if sz != unsafe.Sizeof(notifyList{}) {

      print("runtime: bad notifyList size - sync=", sz, " runtime=", unsafe.Sizeof(notifyList{}), "\n")

      throw("bad notifyList size")

   }

}

4. 踩坑

u1s1 sync.cond确实一次都没用过,踩坑都是从网上扒的😓

  1. 不要copy “使用过的” cond,运行时会有panic。
func main() {

   // 创建一个cond1并使用

   cond1 := sync.NewCond(&sync.Mutex{})

   go func() {

      cond1.L.Lock()

      cond1.Wait()

   }()

   cond1.Signal()

   

   // 对cond1进行深拷贝

   var cond2 = new(sync.Cond)

   *cond2 = *cond1

   f := func(cond *sync.Cond, v int) {

      cond.L.Lock()

      for {

         fmt.Println(v)

         cond.Wait()

      }

   }

   go f(cond1, 1)

   

   // 当使用cond2的时候会报错(panic: sync.Cond is copied)

   go f(cond2, 2)

   time.Sleep(time.Second)

}

5. 参考文档

sync - The Go Programming Language (studygolang.com)

管程 - 维基百科,自由的百科全书 (wikipedia.org)

Golang sync.Cond 条件变量源码分析 | 编程沉思录 (cyhone.com)

Linux条件变量pthread_condition细节(为何先加锁,pthread_cond_wait为何先解锁,返回时又加锁)