goroutine调度

760 阅读8分钟

使用版本为: go1.17

可执行文件

找到go程序入口点

需要使用gdb image.png

调度循环

GMP结构

定义:

  • G:goroutine,⼀个计算任务。由需要执⾏的代码和其上下⽂组成,上下⽂包括:当前代码位置,栈顶、栈底地址,状态等。

  • M:machine,系统线程,执⾏实体,想要在 CPU 上执⾏代码,必须有线程,与 C 语⾔中的线程相同,通过系统调⽤ clone 来创建。

  • P:processor,虚拟处理器,M 必须获得 P 才能执⾏代码,否则必须陷⼊休眠(后台监控线程除外),你也可以将其理解为⼀种 token,有这个 token,才有在物理 CPU 核⼼上执⾏的权⼒。

image.png

  • M有三种,其中schedula loop不停的到左面的任务队列里拿出来执行
  • 创建出来的线程有时候可能是空闲的(比如消费端找不到可用的Goroutine,就执行stopm),那么就放到midle
  • runnext表示即将被执行的Goroutine

生产端

image.png

消费端

image.png

runtime.schedule

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
   _g_ := getg()
   
...

top:
   pp := _g_.m.p.ptr()
   pp.preempt = false

   if sched.gcwaiting != 0 {
      gcstopm()
      goto top
   }
   
   ...
   

   // Sanity check: if we are spinning, the run queue should be empty.
   // Check this before calling checkTimers, as that might call
   // goready to put a ready goroutine on the local run queue.
   if _g_.m.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
      throw("schedule: spinning with local work")
   }

   checkTimers(pp, 0)

   var gp *g
   var inheritTime bool

   if gp == nil && gcBlackenEnabled != 0 {
      gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
      if gp != nil {
         tryWakeP = true
      }
   }
   if gp == nil {
      // Check the global runnable queue once in a while to ensure fairness.
      // Otherwise two goroutines can completely occupy the local runqueue
      // by constantly respawning each other.
      if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
         lock(&sched.lock)
         gp = globrunqget(_g_.m.p.ptr(), 1)
         unlock(&sched.lock)
      }
   }
   if gp == nil {
      gp, inheritTime = runqget(_g_.m.p.ptr())
      // We can see gp != nil here even if the M is spinning,
      // if checkTimers added a local goroutine via goready.
   }
   if gp == nil {
      gp, inheritTime = findrunnable() // blocks until work is available
   }
   
   ...
 

   execute(gp, inheritTime)
}

schedule永远不会返回
schedtick: incremented on every scheduler call

  1. 为了保证公平,这个P每61次执行goroutine就要从全局队列去拿goroutine,否则可能出现两个P不停的互相窃取对方goroutine的情况,通过gp = globrunqget(_g_.m.p.ptr(), 1)看到只从头部取一个
  2. 从本地队列获取,先判断runnext是否为空,不为空就返回,为空的话就从_p_.runq获取,从队列头部拿(这段逻辑要加锁,因为可能同时存在被其他P窃取的情况). 从runqget的代码中能看出_p_.runq是一个循环队列. 而atomic.CasRel(&_p_.runqhead, h, h+1)能看出每次从头部拿
  3. 前两种方法都找不到的话,就阻塞执行findrunnable,直到找到过goutine。 对于findrunnable,逻辑如下:
  4. 从本地找
  5. 从全局队列找globrunqget(_p_, 0),最多拿128个,为本地数量的一半,并执行拿过来的第一个
  6. Poll network中找是否有等待的goroutine
  7. 从其他的P的本地队列找,并且还可能运行其他Ptimer(若计运行其他P时器后成功,则先检查自己的goroutine队列时候在这期间被放入了goroutine,有的话直接返回)
  8. 执行stealWork->runqsteal从其他的P中的本地队列_p_.runq偷一半,并且将偷来的最后一个执行
  9. 以上都获取不到,则通过stopm把当前m放到空闲列表
// Get g from local runnable queue.
// If inheritTime is true, gp should inherit the remaining time in the
// current time slice. Otherwise, it should start a new time slice.
// Executed only by the owner P.
func runqget(_p_ *p) (gp *g, inheritTime bool) {
   // If there's a runnext, it's the next G to run.
   for {
      next := _p_.runnext
      if next == 0 {
         break
      }
      if _p_.runnext.cas(next, 0) {
         return next.ptr(), true
      }
   }

   for {
      h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with other consumers
      t := _p_.runqtail
      if t == h {
         return nil, false
      }
      gp := _p_.runq[h%uint32(len(_p_.runq))].ptr()
      if atomic.CasRel(&_p_.runqhead, h, h+1) { // cas-release, commits consume
         return gp, false
      }
   }
}

首先判断runnext是否为空,不为空的话,就返回runnext

runtime.globrunqget

// Try get a batch of G's from the global runnable queue.
// sched.lock must be held.
func globrunqget(_p_ *p, max int32) *g {
   assertLockHeld(&sched.lock)

   if sched.runqsize == 0 {
      return nil
   }

   n := sched.runqsize/gomaxprocs + 1
   if n > sched.runqsize {
      n = sched.runqsize
   }
   if max > 0 && n > max {
      n = max
   }
   // 这里最多拿128个,为本地数量的一般
   if n > int32(len(_p_.runq))/2 {
      n = int32(len(_p_.runq)) / 2
   }

   sched.runqsize -= n
   // 返回第一个,用来执行,其他的按顺序放入本地队列
   gp := sched.runq.pop()
   n--
   for ; n > 0; n-- {
      gp1 := sched.runq.pop()
      runqput(_p_, gp1, false)
   }
   return gp
}

runtime.findrunnable

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from local or global queue, poll network.
func findrunnable() (gp *g, inheritTime bool) {
   _g_ := getg()

   // The conditions here and in handoffp must agree: if
   // findrunnable would return a G to run, handoffp must start
   // an M.

top:
   ...

   // local runq
   if gp, inheritTime := runqget(_p_); gp != nil {
      return gp, inheritTime
   }

   // global runq
   if sched.runqsize != 0 {
      lock(&sched.lock)
      gp := globrunqget(_p_, 0)
      unlock(&sched.lock)
      if gp != nil {
         return gp, false
      }
   }

   // Poll network.
   // This netpoll is only an optimization before we resort to stealing.
   // We can safely skip it if there are no waiters or a thread is blocked
   // in netpoll already. If there is any kind of logical race with that
   // blocked thread (e.g. it has already returned from netpoll, but does
   // not set lastpoll yet), this thread will do blocking netpoll below
   // anyway.
   if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
      if list := netpoll(0); !list.empty() { // non-blocking
         gp := list.pop()
         injectglist(&list)
         casgstatus(gp, _Gwaiting, _Grunnable)
         if trace.enabled {
            traceGoUnpark(gp, 0)
         }
         return gp, false
      }
   }

   // Spinning Ms: steal work from other Ps.
   //
   // Limit the number of spinning Ms to half the number of busy Ps.
   // This is necessary to prevent excessive CPU consumption when
   // GOMAXPROCS>>1 but the program parallelism is low.
   procs := uint32(gomaxprocs)
   if _g_.m.spinning || 2*atomic.Load(&sched.nmspinning) < procs-atomic.Load(&sched.npidle) {
      if !_g_.m.spinning {
         _g_.m.spinning = true
         atomic.Xadd(&sched.nmspinning, 1)
      }

      gp, inheritTime, tnow, w, newWork := stealWork(now)
      now = tnow
      if gp != nil {
         // Successfully stole.
         return gp, inheritTime
      }
      if newWork {
         // There may be new timer or GC work; restart to
         // discover.
         goto top
      }
      if w != 0 && (pollUntil == 0 || w < pollUntil) {
         // Earlier timer to wait for.
         pollUntil = w
      }
   }
   
...

runtime.stealWork

// stealWork attempts to steal a runnable goroutine or timer from any P.
//
// If newWork is true, new work may have been readied.
//
// If now is not 0 it is the current time. stealWork returns the passed time or
// the current time if now was passed as 0.
func stealWork(now int64) (gp *g, inheritTime bool, rnow, pollUntil int64, newWork bool) {
   pp := getg().m.p.ptr()

   ranTimer := false

   const stealTries = 4
   for i := 0; i < stealTries; i++ {
      stealTimersOrRunNextG := i == stealTries-1

      for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
         if sched.gcwaiting != 0 {
            // GC work may be available.
            return nil, false, now, pollUntil, true
         }
         p2 := allp[enum.position()]
         if pp == p2 {
            continue
         }

         // Steal timers from p2. This call to checkTimers is the only place
         // where we might hold a lock on a different P's timers. We do this
         // once on the last pass before checking runnext because stealing
         // from the other P's runnext should be the last resort, so if there
         // are timers to steal do that first.
         //
         // We only check timers on one of the stealing iterations because
         // the time stored in now doesn't change in this loop and checking
         // the timers for each P more than once with the same value of now
         // is probably a waste of time.
         //
         // timerpMask tells us whether the P may have timers at all. If it
         // can't, no need to check at all.
         if stealTimersOrRunNextG && timerpMask.read(enum.position()) {
            tnow, w, ran := checkTimers(p2, now)
            now = tnow
            if w != 0 && (pollUntil == 0 || w < pollUntil) {
               pollUntil = w
            }
            if ran {
               // Running the timers may have
               // made an arbitrary number of G's
               // ready and added them to this P's
               // local run queue. That invalidates
               // the assumption of runqsteal
               // that it always has room to add
               // stolen G's. So check now if there
               // is a local G to run.
               if gp, inheritTime := runqget(pp); gp != nil {
                  return gp, inheritTime, now, pollUntil, ranTimer
               }
               ranTimer = true
            }
         }

         // Don't bother to attempt to steal if p2 is idle.
         if !idlepMask.read(enum.position()) {
            if gp := runqsteal(pp, p2, stealTimersOrRunNextG); gp != nil {
               return gp, false, now, pollUntil, ranTimer
            }
         }
      }
   }

   // No goroutines found to steal. Regardless, running a timer may have
   // made some goroutine ready that we missed. Indicate the next timer to
   // wait for.
   return nil, false, now, pollUntil, ranTimer
}
  1. i == stealTries -1 时,会检查其他P的定时器并进行调用
  2. 在执行计时器的期间,可能有G被放入的当前P的队列里,所以这里判断了下,有的话,直接返回 3.从其他的P的本地队列偷

runtime.checkTimers

// checkTimers runs any timers for the P that are ready.
// If now is not 0 it is the current time.
// It returns the passed time or the current time if now was passed as 0.
// and the time when the next timer should run or 0 if there is no next timer,
// and reports whether it ran any timers.
// If the time when the next timer should run is not 0,
// it is always larger than the returned time.
// We pass now in and out to avoid extra calls of nanotime.
//go:yeswritebarrierrec
func checkTimers(pp *p, now int64) (rnow, pollUntil int64, ran bool) {
   // If it's not yet time for the first timer, or the first adjusted
   // timer, then there is nothing to do.
   next := int64(atomic.Load64(&pp.timer0When))
   nextAdj := int64(atomic.Load64(&pp.timerModifiedEarliest))
   if next == 0 || (nextAdj != 0 && nextAdj < next) {
      next = nextAdj
   }

   // 没有计时器
   if next == 0 {
      // No timers to run or adjust.
      return now, 0, false
   }

   if now == 0 {
      now = nanotime()
   }
   
   // 没有要执行的计数器(时间没到)
   if now < next {
      // Next timer is not ready to run, but keep going
      // if we would clear deleted timers.
      // This corresponds to the condition below where
      // we decide whether to call clearDeletedTimers.
      // 如果要删除的计时器数量 <= 计时器总数的1/4,直接返回
      if pp != getg().m.p.ptr() || int(atomic.Load(&pp.deletedTimers)) <= int(atomic.Load(&pp.numTimers)/4) {
         return now, next, false
      }
   }

   lock(&pp.timersLock)

   if len(pp.timers) > 0 {
      adjusttimers(pp, now)
      for len(pp.timers) > 0 {
         // Note that runtimer may temporarily unlock
         // pp.timersLock.
         if tw := runtimer(pp, now); tw != 0 {
            if tw > 0 {
               pollUntil = tw
            }
            break
         }
         ran = true
      }
   }

   // If this is the local P, and there are a lot of deleted timers,
   // clear them out. We only do this for the local P to reduce
   // lock contention on timersLock.
   if pp == getg().m.p.ptr() && int(atomic.Load(&pp.deletedTimers)) > len(pp.timers)/4 {
      clearDeletedTimers(pp)
   }

   unlock(&pp.timersLock)

   return now, pollUntil, ran
}
  1. 检查P下有没有计时器,没有就返回
  2. 计时器时间未到
    a. 如果要删除的计时器数量 <= 计时器总数的1/4,直接返回
  3. 从timer中找一个运行
  4. 如果是当前的P,并且如果要删除的计时器数量 > 计时器总数的1/4,直接返回,则清理计时器

runtime.runqsteal

// Steal half of elements from local runnable queue of p2
// and put onto local runnable queue of p.
// Returns one of the stolen elements (or nil if failed).
func runqsteal(_p_, p2 *p, stealRunNextG bool) *g {
   t := _p_.runqtail
   n := runqgrab(p2, &_p_.runq, t, stealRunNextG)
   if n == 0 {
      return nil
   }
   n--
   gp := _p_.runq[(t+n)%uint32(len(_p_.runq))].ptr()
   if n == 0 {
      return gp
   }
   h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with consumers
   if t-h+n >= uint32(len(_p_.runq)) {
      throw("runqsteal: runq overflow")
   }
   atomic.StoreRel(&_p_.runqtail, t+n) // store-release, makes the item available for consumption
   return gp
}

// A gQueue is a dequeue of Gs linked through g.schedlink. A G can only
// be on one gQueue or gList at a time.
type gQueue struct {
   head guintptr
   tail guintptr
}

runtime.stopm

// Stops execution of the current m until new work is available.
// Returns with acquired P.
func stopm() {
   _g_ := getg()

   if _g_.m.locks != 0 {
      throw("stopm holding locks")
   }
   if _g_.m.p != 0 {
      throw("stopm holding p")
   }
   if _g_.m.spinning {
      throw("stopm spinning")
   }

   lock(&sched.lock)
   mput(_g_.m)
   unlock(&sched.lock)
   mPark()
   acquirep(_g_.m.nextp.ptr())
   _g_.m.nextp = 0
}

func mspinning() {
   // startm's caller incremented nmspinning. Set the new M's spinning.
   getg().m.spinning = true
}

runtime.execute

// Schedules gp to run on the current M.
// If inheritTime is true, gp inherits the remaining time in the
// current time slice. Otherwise, it starts a new time slice.
// Never returns.
//
// Write barriers are allowed because this is called immediately after
// acquiring a P in several places.
//
//go:yeswritebarrierrec
func execute(gp *g, inheritTime bool) {
   _g_ := getg()

   // Assign gp.m before entering _Grunning so running Gs have an
   // M.
   _g_.m.curg = gp
   gp.m = _g_.m
   casgstatus(gp, _Grunnable, _Grunning)
   gp.waitsince = 0
   gp.preempt = false
   gp.stackguard0 = gp.stack.lo + _StackGuard
   if !inheritTime {
      _g_.m.p.ptr().schedtick++
   }

   // Check whether the profiler needs to be turned on or off.
   hz := sched.profilehz
   if _g_.m.profilehz != hz {
      setThreadCPUProfiler(hz)
   }

   if trace.enabled {
      // GoSysExit has to happen when we have a P, but before GoStart.
      // So we emit it here.
      if gp.syscallsp != 0 && gp.sysblocktraced {
         traceGoSysExit(gp.sysexitticks)
      }
      traceGoStart()
   }

   gogo(&gp.sched)
}

execute执行schedule获取的Goroutine,做一些准备工作后,通过runtime.gogo将Goroutine调用用当前线程上

runtime.gogo

// func gogo(buf *gobuf)
// restore state from Gobuf; longjmp
TEXT runtime·gogo(SB), NOSPLIT, $0-8
   MOVQ   buf+0(FP), BX     // gobuf
   MOVQ   gobuf_g(BX), DX
   MOVQ   0(DX), CX     // make sure g != nil
   JMP    gogo<>(SB)

TEXT gogo<>(SB), NOSPLIT, $0
   get_tls(CX)
   MOVQ   DX, g(CX)
   MOVQ   DX, R14       // set the g register
   MOVQ   gobuf_sp(BX), SP   // 将runtime.goexit函数的PC放到SP中
   MOVQ   gobuf_ret(BX), AX
   MOVQ   gobuf_ctxt(BX), DX
   MOVQ   gobuf_bp(BX), BP
   MOVQ   $0, gobuf_sp(BX)   // clear to help garbage collector
   MOVQ   $0, gobuf_ret(BX)
   MOVQ   $0, gobuf_ctxt(BX)
   MOVQ   $0, gobuf_bp(BX)
   MOVQ   gobuf_pc(BX), BX  // 获取待执行函数的程序计数器(就是指向go 关键字后面的函数)
   JMP    BX                // 开始执行go关键字后面的函数
  • runtime.goexit的程序计数器被放到了SP上
  • 待执行函数的程序计数器被放到了BX上 JMP函数调用待执行函数,并且执行返回时,程序会跳转到runtime.goexit

runtime·goexit

// The top-most function running on a goroutine
// returns to goexit+PCQuantum.
TEXT runtime·goexit(SB),NOSPLIT|TOPFRAME,$0-0
   BYTE   $0x90  // NOP
   CALL   runtime·goexit1(SB)    // does not return
   // traceback from goexit1 must hit code range of goexit
   BYTE   $0x90  // NOP

runtime.goexit1

// Finishes execution of the current goroutine.
func goexit1() {
   if raceenabled {
      racegoend()
   }
   if trace.enabled {
      traceGoEnd()
   }
   mcall(goexit0)
}

mcall


// mcall switches from the g to the g0 stack and invokes fn(g),
// where g is the goroutine that made the call.
// mcall saves g's current PC/SP in g->sched so that it can be restored later.
// It is up to fn to arrange for that later execution, typically by recording
// g in a data structure, causing something to call ready(g) later.
// mcall returns to the original goroutine g later, when g has been rescheduled.
// fn must not return at all; typically it ends by calling schedule, to let the m
// run other goroutines.
//
// mcall can only be called from g stacks (not g0, not gsignal).
//
// This must NOT be go:noescape: if fn is a stack-allocated closure,
// fn puts g on a run queue, and g executes before fn returns, the
// closure will be invalidated while it is still executing.
func mcall(fn func(*g))

mcallg切换到g0

runtime.goexit0

// goexit continuation on g0.
func goexit0(gp *g) {
   _g_ := getg()

   casgstatus(gp, _Grunning, _Gdead)
   if isSystemGoroutine(gp, false) {
      atomic.Xadd(&sched.ngsys, -1)
   }
   gp.m = nil
   locked := gp.lockedm != 0
   gp.lockedm = 0
   _g_.m.lockedg = 0
   gp.preemptStop = false
   gp.paniconfault = false
   gp._defer = nil // should be true already but just in case.
   gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
   gp.writebuf = nil
   gp.waitreason = 0
   gp.param = nil
   gp.labels = nil
   gp.timer = nil

   if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
      // Flush assist credit to the global pool. This gives
      // better information to pacing if the application is
      // rapidly creating an exiting goroutines.
      assistWorkPerByte := float64frombits(atomic.Load64(&gcController.assistWorkPerByte))
      scanCredit := int64(assistWorkPerByte * float64(gp.gcAssistBytes))
      atomic.Xaddint64(&gcController.bgScanCredit, scanCredit)
      gp.gcAssistBytes = 0
   }

   dropg()

   if GOARCH == "wasm" { // no threads yet on wasm
      gfput(_g_.m.p.ptr(), gp)
      schedule() // never returns
   }

   if _g_.m.lockedInt != 0 {
      print("invalid m->lockedInt = ", _g_.m.lockedInt, "\n")
      throw("internal lockOSThread error")
   }
   // 这里把g放到了gfree中,并没有销毁,复用减少开销
   gfput(_g_.m.p.ptr(), gp)
   if locked {
      // The goroutine may have locked this thread because
      // it put it in an unusual kernel state. Kill it
      // rather than returning it to the thread pool.

      // Return to mstart, which will release the P and exit
      // the thread.
      if GOOS != "plan9" { // See golang.org/issue/22227.
         gogo(&_g_.m.g0.sched)
      } else {
         // Clear lockedExt on plan9 since we may end up re-using
         // this thread.
         _g_.m.lockedExt = 0
      }
   }
   schedule()
}

阻塞处理

1. 常见的阻塞情况

channel

image.png

net read/write

image.png

time.Sleep

image.png

select

image.png

lock

image.png
以上几种情况不会阻塞调度循环,而是会把goroutine挂起: 将g存进某个数据结构,等待ready后再继续执行,不占用线程,并且线程会进入schedule继续消费队列,执⾏其它的G

2. 这些情况是如何挂起的

image.png

  1. channel:有sendqrecvq代表读写队列,将G打包为sudog,放到对饮的队列里
  2. 网络读写: G挂在pollDesc的rwwg
  3. select:图片中有三个channel,各自打包自己的sudog
  4. time.Sleep:将G挂在timer的arg参数上
  5. lock: image.png 锁的阻塞会将G打包为sudog,会停留在树堆的结构中,树堆是一个二叉平衡树,且其中的每一个节点就是一个链表

为什么有的是sudog,有的是g

image.png

3. runtime无法处理的阻塞

CGO

image.png