Golang调度循环

31 阅读11分钟

使用环境

系统架构语言版本
Amd64Go 1.8

调度流程

未命名文件.png

在Go语言中,调度循环的核心是schedule()函数,它是由运行时库负责的。其主要任务是调度和管理Goroutine的执行。

schedule()函数的主要工作流程如下:

  1. 使用Go关键子创建一个goroutine,写法:go func(){}
  2. 将goroutine放入p的本地队列(如果当前m绑定的p的本地队列满了,会放在全局队列中)
  3. 唤醒或者新建m来执行任务(如果p本地队列没有可运行的goroutine时,那么就会调用findrunnable()函数执行work stealing机制,会随机的去另一个线程m中的p本地队列偷取一半的goroutine,如果还是没有获取goroutine,会去全局队列中拿取一半的来运行)
  4. 进入调度循环(m 运行goroutine ,goroutine 执行完成后,m 会从 p 获取下一个 goroutine,不断重复下去)
  5. 清理现场并重新进入调度循环

schedule

循环调度函数,Never returns

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
  _g_ := getg()

  if _g_.m.locks != 0 {
    throw("schedule: holding locks")
  }

  if _g_.m.lockedg != 0 {
    stoplockedm()
    execute(_g_.m.lockedg.ptr(), false) // Never returns.
  }

  // We should not schedule away from a g that is executing a cgo call,
  // since the cgo call is using the m's g0 stack.
  if _g_.m.incgo {
    throw("schedule: in cgo")
  }

top:
  pp := _g_.m.p.ptr()
  pp.preempt = false

  if sched.gcwaiting != 0 {
    gcstopm()
    goto top
  }
  if pp.runSafePointFn != 0 {
    runSafePointFn()
  }

  // Sanity check: if we are spinning, the run queue should be empty.
  // Check this before calling checkTimers, as that might call
  // goready to put a ready goroutine on the local run queue.
  if _g_.m.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
    throw("schedule: spinning with local work")
  }

  checkTimers(pp, 0)

  var gp *g
  var inheritTime bool

  // Normal goroutines will check for need to wakeP in ready,
  // but GCworkers and tracereaders will not, so the check must
  // be done here instead.
  tryWakeP := false
  if trace.enabled || trace.shutdown {
    gp = traceReader()
    if gp != nil {
      casgstatus(gp, _Gwaiting, _Grunnable)
      traceGoUnpark(gp, 0)
      tryWakeP = true
    }
  }
  if gp == nil && gcBlackenEnabled != 0 {
    gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
    if gp != nil {
      tryWakeP = true
    }
  }
  if gp == nil {
    // Check the global runnable queue once in a while to ensure fairness.
    // Otherwise two goroutines can completely occupy the local runqueue
    // by constantly respawning each other.
    // 为了确保公平不能总是从p的本地队列获取goroutine schedtick%61 == 0 schedtick=61次的时候 去全局队列回去goroutine 
    if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
      lock(&sched.lock)
      gp = globrunqget(_g_.m.p.ptr(), 1)
      unlock(&sched.lock)
    }
  }
  //从当前调度m所绑定的p的本地可运行队列中获取g
  if gp == nil {
    gp, inheritTime = runqget(_g_.m.p.ptr())
    // We can see gp != nil here even if the M is spinning,
    // if checkTimers added a local goroutine via goready.
  }
  //如果本地不存在可运行的g,则继续寻找可运行的g
  //Tries to steal from other P's, get g from local or global queue, poll network.
  //尝试steal 其他的p的本地队列,或者去全局队列获取
  if gp == nil {
    gp, inheritTime = findrunnable() // blocks until work is available
  }

  // This thread is going to run a goroutine and is not spinning anymore,
  // so if it was marked as spinning we need to reset it now and potentially
  // start a new spinning M.
  if _g_.m.spinning {
    resetspinning()
  }

  if sched.disable.user && !schedEnabled(gp) {
    // Scheduling of this goroutine is disabled. Put it on
    // the list of pending runnable goroutines for when we
    // re-enable user scheduling and look again.
    lock(&sched.lock)
    if schedEnabled(gp) {
      // Something re-enabled scheduling while we
      // were acquiring the lock.
      unlock(&sched.lock)
    } else {
      sched.disable.runnable.pushBack(gp)
      sched.disable.n++
      unlock(&sched.lock)
      goto top
    }
  }
​
  // If about to schedule a not-normal goroutine (a GCworker or tracereader),
  // wake a P if there is one.
  if tryWakeP {
    wakep()
  }
  if gp.lockedm != 0 {
    // Hands off own p to the locked m,
    // then blocks waiting for a new p.
    startlockedm(gp)
    goto top
  }

  //执行可运行的g
  execute(gp, inheritTime)
}

findrunnable

获取一个可执行的协程:

  1. 从本地可运行队列中获取goroutine
  2. 从全局可运行队列中获取goroutine
  3. 从netpoll中获取处于非阻塞状态的goroutine
  4. 从其它p中盗取goroutine
  5. 如果确实没有课调度的g,则处于gc标记阶段,则获取背景标记g对象
  6. 再次从全局可运行队列中获取goroutine
  7. 如果当前m自旋使能,则从netpoll中等待事件到达的g对象

如果实在没有可以调度的goroutine,则停止m,并再次按照上述顺序寻找可运行的goroutine。

func findrunnable() (gp *g, inheritTime bool) {
    _g_ := getg()
​
    // The conditions here and in handoffp must agree: if
    // findrunnable would return a G to run, handoffp must start
    // an M.
​
top:
    _p_ := _g_.m.p.ptr()
    if sched.gcwaiting != 0 {
        gcstopm()
        goto top
    }
    if _p_.runSafePointFn != 0 {
        runSafePointFn()
    }
​
    now, pollUntil, _ := checkTimers(_p_, 0)
​
    if fingwait && fingwake {
        if gp := wakefing(); gp != nil {
            ready(gp, 0, true)
        }
    }
    if *cgo_yield != nil {
        asmcgocall(*cgo_yield, nil)
    }
​
    // 1. 从本地可运行队列中获取g对象
    // local runq
    if gp, inheritTime := runqget(_p_); gp != nil {
        return gp, inheritTime
    }
​
    // 2. 从全局可运行队列中获取g对象
    // global runq
    if sched.runqsize != 0 {
        lock(&sched.lock)
        gp := globrunqget(_p_, 0)
        unlock(&sched.lock)
        if gp != nil {
            return gp, false
        }
    }
​
    // 3. 从netpoll中获取处于非阻塞状态的g对象
    // Poll network.
    // This netpoll is only an optimization before we resort to stealing.
    // We can safely skip it if there are no waiters or a thread is blocked
    // in netpoll already. If there is any kind of logical race with that
    // blocked thread (e.g. it has already returned from netpoll, but does
    // not set lastpoll yet), this thread will do blocking netpoll below
    // anyway.
    if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
        if list := netpoll(0); !list.empty() { // non-blocking
            gp := list.pop()
            injectglist(&list)
            casgstatus(gp, _Gwaiting, _Grunnable)
            if trace.enabled {
                traceGoUnpark(gp, 0)
            }
            return gp, false
        }
    }
​
    // 4. 从其它p中盗取g对象
    // Spinning Ms: steal work from other Ps.
    //
    // Limit the number of spinning Ms to half the number of busy Ps.
    // This is necessary to prevent excessive CPU consumption when
    // GOMAXPROCS>>1 but the program parallelism is low.
    procs := uint32(gomaxprocs)
    if _g_.m.spinning || 2*atomic.Load(&sched.nmspinning) < procs-atomic.Load(&sched.npidle) {
        if !_g_.m.spinning {
            _g_.m.spinning = true
            atomic.Xadd(&sched.nmspinning, 1)
        }
​
        gp, inheritTime, tnow, w, newWork := stealWork(now)
        now = tnow
        if gp != nil {
            // Successfully stole.
            return gp, inheritTime
        }
        if newWork {
            // There may be new timer or GC work; restart to
            // discover.
            goto top
        }
        if w != 0 && (pollUntil == 0 || w < pollUntil) {
            // Earlier timer to wait for.
            pollUntil = w
        }
    }
​
    // 5. 如果确实没有课调度的g,则处于gc标记阶段,则获取背景标记g对象
    // We have nothing to do.
    //
    // If we're in the GC mark phase, can safely scan and blacken objects,
    // and have work to do, run idle-time marking rather than give up the
    // P.
    if gcBlackenEnabled != 0 && gcMarkWorkAvailable(_p_) {
        node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
        if node != nil {
            _p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
            gp := node.gp.ptr()
            casgstatus(gp, _Gwaiting, _Grunnable)
            if trace.enabled {
                traceGoUnpark(gp, 0)
            }
            return gp, false
        }
    }
​
    // wasm only:
    // If a callback returned and no other goroutine is awake,
    // then wake event handler goroutine which pauses execution
    // until a callback was triggered.
    gp, otherReady := beforeIdle(now, pollUntil)
    if gp != nil {
        casgstatus(gp, _Gwaiting, _Grunnable)
        if trace.enabled {
            traceGoUnpark(gp, 0)
        }
        return gp, false
    }
    if otherReady {
        goto top
    }
​
    // Before we drop our P, make a snapshot of the allp slice,
    // which can change underfoot once we no longer block
    // safe-points. We don't need to snapshot the contents because
    // everything up to cap(allp) is immutable.
    allpSnapshot := allp
    // Also snapshot masks. Value changes are OK, but we can't allow
    // len to change out from under us.
    idlepMaskSnapshot := idlepMask
    timerpMaskSnapshot := timerpMask
​
    // return P and block
    lock(&sched.lock)
    if sched.gcwaiting != 0 || _p_.runSafePointFn != 0 {
        unlock(&sched.lock)
        goto top
    }
    // 6. 再次从全局可运行队列中获取g对象
    if sched.runqsize != 0 {
        gp := globrunqget(_p_, 0)
        unlock(&sched.lock)
        return gp, false
    }
    if releasep() != _p_ {
        throw("findrunnable: wrong p")
    }
    
    // 将p休眠
    pidleput(_p_)
    unlock(&sched.lock)
​
    // Delicate dance: thread transitions from spinning to non-spinning
    // state, potentially concurrently with submission of new work. We must
    // drop nmspinning first and then check all sources again (with
    // #StoreLoad memory barrier in between). If we do it the other way
    // around, another thread can submit work after we've checked all
    // sources but before we drop nmspinning; as a result nobody will
    // unpark a thread to run the work.
    //
    // This applies to the following sources of work:
    //
    // * Goroutines added to a per-P run queue.
    // * New/modified-earlier timers on a per-P timer heap.
    // * Idle-priority GC work (barring golang.org/issue/19112).
    //
    // If we discover new work below, we need to restore m.spinning as a signal
    // for resetspinning to unpark a new worker thread (because there can be more
    // than one starving goroutine). However, if after discovering new work
    // we also observe no idle Ps it is OK to skip unparking a new worker
    // thread: the system is fully loaded so no spinning threads are required.
    // Also see "Worker thread parking/unparking" comment at the top of the file.
    // 7. 如果当前m自旋使能,则从netpoll中等待事件到达的g对象
    wasSpinning := _g_.m.spinning
    if _g_.m.spinning {
        _g_.m.spinning = false
        if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
            throw("findrunnable: negative nmspinning")
        }
​
        // Note the for correctness, only the last M transitioning from
        // spinning to non-spinning must perform these rechecks to
        // ensure no missed work. We are performing it on every M that
        // transitions as a conservative change to monitor effects on
        // latency. See golang.org/issue/43997.
        // Check all runqueues once again.
        _p_ = checkRunqsNoP(allpSnapshot, idlepMaskSnapshot)
        if _p_ != nil {
            acquirep(_p_)
            _g_.m.spinning = true
            atomic.Xadd(&sched.nmspinning, 1)
            goto top
        }
​
        // Check for idle-priority GC work again.
        _p_, gp = checkIdleGCNoP()
        if _p_ != nil {
            acquirep(_p_)
            _g_.m.spinning = true
            atomic.Xadd(&sched.nmspinning, 1)
​
            // Run the idle worker.
            _p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
            casgstatus(gp, _Gwaiting, _Grunnable)
            if trace.enabled {
                traceGoUnpark(gp, 0)
            }
            return gp, false
        }
​
        // Finally, check for timer creation or expiry concurrently with
        // transitioning from spinning to non-spinning.
        //
        // Note that we cannot use checkTimers here because it calls
        // adjusttimers which may need to allocate memory, and that isn't
        // allowed when we don't have an active P.
        pollUntil = checkTimersNoP(allpSnapshot, timerpMaskSnapshot, pollUntil)
    }
​
    // Poll network until next timer.
    if netpollinited() && (atomic.Load(&netpollWaiters) > 0 || pollUntil != 0) && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
        atomic.Store64(&sched.pollUntil, uint64(pollUntil))
        if _g_.m.p != 0 {
            throw("findrunnable: netpoll with p")
        }
        if _g_.m.spinning {
            throw("findrunnable: netpoll with spinning")
        }
        delay := int64(-1)
        if pollUntil != 0 {
            if now == 0 {
                now = nanotime()
            }
            delay = pollUntil - now
            if delay < 0 {
                delay = 0
            }
        }
        if faketime != 0 {
            // When using fake time, just poll.
            delay = 0
        }
        list := netpoll(delay) // block until new work is available
        atomic.Store64(&sched.pollUntil, 0)
        atomic.Store64(&sched.lastpoll, uint64(nanotime()))
        if faketime != 0 && list.empty() {
            // Using fake time and nothing is ready; stop M.
            // When all M's stop, checkdead will call timejump.
            stopm()
            goto top
        }
        lock(&sched.lock)
        _p_ = pidleget()
        unlock(&sched.lock)
        if _p_ == nil {
            injectglist(&list)
        } else {
            acquirep(_p_)
            if !list.empty() {
                gp := list.pop()
                injectglist(&list)
                casgstatus(gp, _Gwaiting, _Grunnable)
                if trace.enabled {
                    traceGoUnpark(gp, 0)
                }
                return gp, false
            }
            if wasSpinning {
                _g_.m.spinning = true
                atomic.Xadd(&sched.nmspinning, 1)
            }
            goto top
        }
    } else if pollUntil != 0 && netpollinited() {
        pollerPollUntil := int64(atomic.Load64(&sched.pollUntil))
        if pollerPollUntil == 0 || pollerPollUntil > pollUntil {
            netpollBreak()
        }
    }
    // 将m停止
    stopm()
    goto top
}

execute

当前获取到可运行的goroutine,则执行该goroutine

func execute(gp *g, inheritTime bool) {
   _g_ := getg()
   // Assign gp.m before entering _Grunning so running Gs have an
   // M.
   _g_.m.curg = gp
   gp.m = _g_.m
   casgstatus(gp, _Grunnable, _Grunning)
   gp.waitsince = 0
   gp.preempt = false
   gp.stackguard0 = gp.stack.lo + _StackGuard
   if !inheritTime {
      _g_.m.p.ptr().schedtick++
   }
   // Check whether the profiler needs to be turned on or off.
   hz := sched.profilehz
   if _g_.m.profilehz != hz {
      setThreadCPUProfiler(hz)
   }
   if trace.enabled {
      // GoSysExit has to happen when we have a P, but before GoStart.
      // So we emit it here.
      if gp.syscallsp != 0 && gp.sysblocktraced {
         traceGoSysExit(gp.sysexitticks)
      }
      traceGoStart()
   }
   gogo(&gp.sched)
}

runtime·gogo

gogo由汇编实现,主要是由g0切换到g栈,然后执行函数

gobuf 是一个可以存储当前goroutine上下文的数据结构

​
TEXT runtime·gogo(SB), NOSPLIT, $0-8
   MOVQ   buf+0(FP), BX   // FP指向入参起始位置,所以FP+0指向第一个入参,即gobuf
   MOVQ   gobuf_g(BX), DX // 将gobuf.g存入DX
   MOVQ   0(DX), CX       // 将gobuf.g存入CX
   JMP    gogo<>(SB)      // 跳转到gogo
​
TEXT gogo<>(SB), NOSPLIT, $0
   get_tls(CX)          // 将TLS地址存入CX
   MOVQ   DX, g(CX)     // set the g register // 将DX的值(gobuf.g)存入CX指向的地址(TLS)
   MOVQ   DX, R14       // 将DX的值(gobuf.g)存入R14
   MOVQ   gobuf_sp(BX), SP    // 将gobuf.sp存入SP restore SP, 恢复sp寄存器值切换到g栈
   MOVQ   gobuf_ret(BX), AX   // 将gobuf.ret存入AX
   MOVQ   gobuf_ctxt(BX), DX  // 将gobuf.ctxt存入DX
   MOVQ   gobuf_bp(BX), BP    // 将gobuf.bp存入BP
   MOVQ   $0, gobuf_sp(BX)    // 将gobuf.sp清零
   MOVQ   $0, gobuf_ret(BX)   // 将gobuf.ret清零 
   MOVQ   $0, gobuf_ctxt(BX)  // 将gobuf.ctxt清零 
   MOVQ   $0, gobuf_bp(BX)    // 将gobuf.bp清零 
   MOVQ   gobuf_pc(BX), BX    // 将gobuf.pc存入BX  获取G任务函数的地址
   JMP    BX                  // 跳转到BX指向的地址 转到任务函数执行

runtime·goexit

TEXT runtime·goexit(SB),NOSPLIT|TOPFRAME,$0-0
   BYTE   $0x90  // NOP
   CALL   runtime·goexit1(SB)    // does not return
   // traceback from goexit1 must hit code range of goexit
   BYTE   $0x90  // NOP
func goexit1() {
   if raceenabled {
      racegoend()
   }
   if trace.enabled {
      traceGoEnd()
   }
   mcall(goexit0) // 切换到g0,并执行goexit0
}
​
// goexit continuation on g0.
// 入参gp:退出的协程
func goexit0(gp *g) {
   _g_ := getg()
   _p_ := _g_.m.p.ptr()
​
   casgstatus(gp, _Grunning, _Gdead) // 设置gp为_Gdead状态
   gcController.addScannableStack(_p_, -int64(gp.stack.hi-gp.stack.lo)) // 将退出协程栈空间加入可扫描栈
   if isSystemGoroutine(gp, false) {
      atomic.Xadd(&sched.ngsys, -1)
   }
   gp.m = nil // 以下为清理退出协程的上下文
   locked := gp.lockedm != 0
   gp.lockedm = 0
   _g_.m.lockedg = 0
   gp.preemptStop = false
   gp.paniconfault = false
   gp._defer = nil // should be true already but just in case.
   gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
   gp.writebuf = nil
   gp.waitreason = 0
   gp.param = nil
   gp.labels = nil
   gp.timer = nil
​
   if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
      // Flush assist credit to the global pool. This gives
      // better information to pacing if the application is
      // rapidly creating an exiting goroutines.
      assistWorkPerByte := gcController.assistWorkPerByte.Load()
      scanCredit := int64(assistWorkPerByte * float64(gp.gcAssistBytes))
      atomic.Xaddint64(&gcController.bgScanCredit, scanCredit)
      gp.gcAssistBytes = 0
   }
​
   dropg() // 解绑定当前m和执行g(退出的协程)的关系
​
   if GOARCH == "wasm" { // no threads yet on wasm
      gfput(_p_, gp)
      schedule() // never returns
   }
​
   if _g_.m.lockedInt != 0 {
      print("invalid m->lockedInt = ", _g_.m.lockedInt, "\n")
      throw("internal lockOSThread error")
   }
   gfput(_p_, gp) // 将g对象存入p的gfree队列
   if locked {
      // The goroutine may have locked this thread because
      // it put it in an unusual kernel state. Kill it
      // rather than returning it to the thread pool.
​
      // Return to mstart, which will release the P and exit
      // the thread.
      if GOOS != "plan9" { // See golang.org/issue/22227.
         gogo(&_g_.m.g0.sched)
      } else {
         // Clear lockedExt on plan9 since we may end up re-using
         // this thread.
         _g_.m.lockedExt = 0
      }
   }
   schedule()  // 重新开始下一周期调度
}

调度策略

复用线程:避免频繁的创建、销毁线程,而是对线程的复用

抢占调度

监控线程sysmon() 函数会检查当前系统中的可运行goroutine,并根据优先级和资源占用情况来调度它们的执行。如果某个goroutine长时间占用CPU或内存等资源,sysmon()函数会强制将其挂起,让其他goroutine有机会执行

和操作系统按时间片调度线程不同,Go并没有时间片的概念。如果某个goroutine没有进行system call调用、没有进行I/O操作、没有阻塞在一个channel操作上,那么m是如何让该goroutine停下来并调度下一个runnable goroutine的呢答案是:G是被抢占调度的。我们看到sysmon将“向长时间运行的G任务发出抢占调度”,这个事情由retake实施

func sysmon() {
    //........
    // retake P's blocked in syscalls
    // and preempt long running G's
    if retake(now) != 0 {
      idle = 0
    } else {
      idle++
    }
  }
}
​
const forcePreemptNS = 10 * 1000 * 1000 // 10msfunc retake(now int64) uint32 {
    // .......
    pd := &_p_.sysmontick
    s := _p_.status
    sysretake := false
    if s == _Prunning || s == _Psyscall {
      // Preempt G if it's running for too long.
      t := int64(_p_.schedtick)
      if int64(pd.schedtick) != t {
        pd.schedtick = uint32(t)
        pd.schedwhen = now
        // 如果当前的时间超过了 10ms 设置当前的goroutine强制标识preempt 为true
      } else if pd.schedwhen+forcePreemptNS <= now {
        preemptone(_p_)
        // In case of syscall, preemptone() doesn't
        // work, because there is no M wired to P.
        sysretake = true
      }
    }
      
  // .......
}

阻塞调度

channel或network I/O

如果goroutine被阻塞在某个channel操作或network I/O操作上时,goroutine会被放置到某个wait队列中,而m会尝试运行下一个runnable的goroutine;如果此时没有runnable的goroutine供m运行,那么m将解绑P,并进入sleep状态。当I/O available或channel操作完成,在wait队列中的goroutine会被唤醒,标记为runnable,放入到某P的队列中,绑定一个m继续执行。

system call阻塞

如果goroutine被阻塞在某个system call操作上,那么不光goroutine会阻塞,执行该goroutine的m也会解绑p(实质是被sysmon抢走了),与goroutine一起进入sleep状态。如果此时有idle的m,则P与其绑定继续执行其他goroutine;如果没有idle m,但仍然有其他goroutine要去执行,那么就会创建一个新m。

当阻塞在syscall上的goroutine完成syscall调用后,goroutine会去尝试获取一个可用的p,如果没有可用的p,那么goroutine会被标记为runnable,之前的那个sleep的m将再次进入sleep。

work stealing

当没有可运行的goroutine 时,尝试从其他线程m绑定的p偷取一半的goroutine过来,而不是销毁线程。

hand off

当m阻塞时,m释放绑定的 p(mp分离),把 p 转移给其他空闲的线程执行。

参考