使用环境
系统架构 | 语言版本 | |
---|---|---|
Amd64 | Go 1.8 |
调度流程
在Go语言中,调度循环的核心是schedule()
函数,它是由运行时库负责的。其主要任务是调度和管理Goroutine的执行。
schedule()
函数的主要工作流程如下:
- 使用Go关键子创建一个goroutine,写法:go func(){}
- 将goroutine放入p的本地队列(如果当前m绑定的p的本地队列满了,会放在全局队列中)
- 唤醒或者新建m来执行任务(如果p本地队列没有可运行的goroutine时,那么就会调用
findrunnable()
函数执行work stealing机制,会随机的去另一个线程m中的p本地队列偷取一半的goroutine,如果还是没有获取goroutine,会去全局队列中拿取一半的来运行) - 进入调度循环(m 运行goroutine ,goroutine 执行完成后,m 会从 p 获取下一个 goroutine,不断重复下去)
- 清理现场并重新进入调度循环
schedule
循环调度函数,Never returns
// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
_g_ := getg()
if _g_.m.locks != 0 {
throw("schedule: holding locks")
}
if _g_.m.lockedg != 0 {
stoplockedm()
execute(_g_.m.lockedg.ptr(), false) // Never returns.
}
// We should not schedule away from a g that is executing a cgo call,
// since the cgo call is using the m's g0 stack.
if _g_.m.incgo {
throw("schedule: in cgo")
}
top:
pp := _g_.m.p.ptr()
pp.preempt = false
if sched.gcwaiting != 0 {
gcstopm()
goto top
}
if pp.runSafePointFn != 0 {
runSafePointFn()
}
// Sanity check: if we are spinning, the run queue should be empty.
// Check this before calling checkTimers, as that might call
// goready to put a ready goroutine on the local run queue.
if _g_.m.spinning && (pp.runnext != 0 || pp.runqhead != pp.runqtail) {
throw("schedule: spinning with local work")
}
checkTimers(pp, 0)
var gp *g
var inheritTime bool
// Normal goroutines will check for need to wakeP in ready,
// but GCworkers and tracereaders will not, so the check must
// be done here instead.
tryWakeP := false
if trace.enabled || trace.shutdown {
gp = traceReader()
if gp != nil {
casgstatus(gp, _Gwaiting, _Grunnable)
traceGoUnpark(gp, 0)
tryWakeP = true
}
}
if gp == nil && gcBlackenEnabled != 0 {
gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
if gp != nil {
tryWakeP = true
}
}
if gp == nil {
// Check the global runnable queue once in a while to ensure fairness.
// Otherwise two goroutines can completely occupy the local runqueue
// by constantly respawning each other.
// 为了确保公平不能总是从p的本地队列获取goroutine schedtick%61 == 0 schedtick=61次的时候 去全局队列回去goroutine
if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
lock(&sched.lock)
gp = globrunqget(_g_.m.p.ptr(), 1)
unlock(&sched.lock)
}
}
//从当前调度m所绑定的p的本地可运行队列中获取g
if gp == nil {
gp, inheritTime = runqget(_g_.m.p.ptr())
// We can see gp != nil here even if the M is spinning,
// if checkTimers added a local goroutine via goready.
}
//如果本地不存在可运行的g,则继续寻找可运行的g
//Tries to steal from other P's, get g from local or global queue, poll network.
//尝试steal 其他的p的本地队列,或者去全局队列获取
if gp == nil {
gp, inheritTime = findrunnable() // blocks until work is available
}
// This thread is going to run a goroutine and is not spinning anymore,
// so if it was marked as spinning we need to reset it now and potentially
// start a new spinning M.
if _g_.m.spinning {
resetspinning()
}
if sched.disable.user && !schedEnabled(gp) {
// Scheduling of this goroutine is disabled. Put it on
// the list of pending runnable goroutines for when we
// re-enable user scheduling and look again.
lock(&sched.lock)
if schedEnabled(gp) {
// Something re-enabled scheduling while we
// were acquiring the lock.
unlock(&sched.lock)
} else {
sched.disable.runnable.pushBack(gp)
sched.disable.n++
unlock(&sched.lock)
goto top
}
}
// If about to schedule a not-normal goroutine (a GCworker or tracereader),
// wake a P if there is one.
if tryWakeP {
wakep()
}
if gp.lockedm != 0 {
// Hands off own p to the locked m,
// then blocks waiting for a new p.
startlockedm(gp)
goto top
}
//执行可运行的g
execute(gp, inheritTime)
}
findrunnable
获取一个可执行的协程:
- 从本地可运行队列中获取goroutine
- 从全局可运行队列中获取goroutine
- 从netpoll中获取处于非阻塞状态的goroutine
- 从其它p中盗取goroutine
- 如果确实没有课调度的g,则处于gc标记阶段,则获取背景标记g对象
- 再次从全局可运行队列中获取goroutine
- 如果当前m自旋使能,则从netpoll中等待事件到达的g对象
如果实在没有可以调度的goroutine,则停止m,并再次按照上述顺序寻找可运行的goroutine。
func findrunnable() (gp *g, inheritTime bool) {
_g_ := getg()
// The conditions here and in handoffp must agree: if
// findrunnable would return a G to run, handoffp must start
// an M.
top:
_p_ := _g_.m.p.ptr()
if sched.gcwaiting != 0 {
gcstopm()
goto top
}
if _p_.runSafePointFn != 0 {
runSafePointFn()
}
now, pollUntil, _ := checkTimers(_p_, 0)
if fingwait && fingwake {
if gp := wakefing(); gp != nil {
ready(gp, 0, true)
}
}
if *cgo_yield != nil {
asmcgocall(*cgo_yield, nil)
}
// 1. 从本地可运行队列中获取g对象
// local runq
if gp, inheritTime := runqget(_p_); gp != nil {
return gp, inheritTime
}
// 2. 从全局可运行队列中获取g对象
// global runq
if sched.runqsize != 0 {
lock(&sched.lock)
gp := globrunqget(_p_, 0)
unlock(&sched.lock)
if gp != nil {
return gp, false
}
}
// 3. 从netpoll中获取处于非阻塞状态的g对象
// Poll network.
// This netpoll is only an optimization before we resort to stealing.
// We can safely skip it if there are no waiters or a thread is blocked
// in netpoll already. If there is any kind of logical race with that
// blocked thread (e.g. it has already returned from netpoll, but does
// not set lastpoll yet), this thread will do blocking netpoll below
// anyway.
if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
if list := netpoll(0); !list.empty() { // non-blocking
gp := list.pop()
injectglist(&list)
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
}
// 4. 从其它p中盗取g对象
// Spinning Ms: steal work from other Ps.
//
// Limit the number of spinning Ms to half the number of busy Ps.
// This is necessary to prevent excessive CPU consumption when
// GOMAXPROCS>>1 but the program parallelism is low.
procs := uint32(gomaxprocs)
if _g_.m.spinning || 2*atomic.Load(&sched.nmspinning) < procs-atomic.Load(&sched.npidle) {
if !_g_.m.spinning {
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
}
gp, inheritTime, tnow, w, newWork := stealWork(now)
now = tnow
if gp != nil {
// Successfully stole.
return gp, inheritTime
}
if newWork {
// There may be new timer or GC work; restart to
// discover.
goto top
}
if w != 0 && (pollUntil == 0 || w < pollUntil) {
// Earlier timer to wait for.
pollUntil = w
}
}
// 5. 如果确实没有课调度的g,则处于gc标记阶段,则获取背景标记g对象
// We have nothing to do.
//
// If we're in the GC mark phase, can safely scan and blacken objects,
// and have work to do, run idle-time marking rather than give up the
// P.
if gcBlackenEnabled != 0 && gcMarkWorkAvailable(_p_) {
node := (*gcBgMarkWorkerNode)(gcBgMarkWorkerPool.pop())
if node != nil {
_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
gp := node.gp.ptr()
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
}
// wasm only:
// If a callback returned and no other goroutine is awake,
// then wake event handler goroutine which pauses execution
// until a callback was triggered.
gp, otherReady := beforeIdle(now, pollUntil)
if gp != nil {
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
if otherReady {
goto top
}
// Before we drop our P, make a snapshot of the allp slice,
// which can change underfoot once we no longer block
// safe-points. We don't need to snapshot the contents because
// everything up to cap(allp) is immutable.
allpSnapshot := allp
// Also snapshot masks. Value changes are OK, but we can't allow
// len to change out from under us.
idlepMaskSnapshot := idlepMask
timerpMaskSnapshot := timerpMask
// return P and block
lock(&sched.lock)
if sched.gcwaiting != 0 || _p_.runSafePointFn != 0 {
unlock(&sched.lock)
goto top
}
// 6. 再次从全局可运行队列中获取g对象
if sched.runqsize != 0 {
gp := globrunqget(_p_, 0)
unlock(&sched.lock)
return gp, false
}
if releasep() != _p_ {
throw("findrunnable: wrong p")
}
// 将p休眠
pidleput(_p_)
unlock(&sched.lock)
// Delicate dance: thread transitions from spinning to non-spinning
// state, potentially concurrently with submission of new work. We must
// drop nmspinning first and then check all sources again (with
// #StoreLoad memory barrier in between). If we do it the other way
// around, another thread can submit work after we've checked all
// sources but before we drop nmspinning; as a result nobody will
// unpark a thread to run the work.
//
// This applies to the following sources of work:
//
// * Goroutines added to a per-P run queue.
// * New/modified-earlier timers on a per-P timer heap.
// * Idle-priority GC work (barring golang.org/issue/19112).
//
// If we discover new work below, we need to restore m.spinning as a signal
// for resetspinning to unpark a new worker thread (because there can be more
// than one starving goroutine). However, if after discovering new work
// we also observe no idle Ps it is OK to skip unparking a new worker
// thread: the system is fully loaded so no spinning threads are required.
// Also see "Worker thread parking/unparking" comment at the top of the file.
// 7. 如果当前m自旋使能,则从netpoll中等待事件到达的g对象
wasSpinning := _g_.m.spinning
if _g_.m.spinning {
_g_.m.spinning = false
if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
throw("findrunnable: negative nmspinning")
}
// Note the for correctness, only the last M transitioning from
// spinning to non-spinning must perform these rechecks to
// ensure no missed work. We are performing it on every M that
// transitions as a conservative change to monitor effects on
// latency. See golang.org/issue/43997.
// Check all runqueues once again.
_p_ = checkRunqsNoP(allpSnapshot, idlepMaskSnapshot)
if _p_ != nil {
acquirep(_p_)
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
goto top
}
// Check for idle-priority GC work again.
_p_, gp = checkIdleGCNoP()
if _p_ != nil {
acquirep(_p_)
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
// Run the idle worker.
_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
// Finally, check for timer creation or expiry concurrently with
// transitioning from spinning to non-spinning.
//
// Note that we cannot use checkTimers here because it calls
// adjusttimers which may need to allocate memory, and that isn't
// allowed when we don't have an active P.
pollUntil = checkTimersNoP(allpSnapshot, timerpMaskSnapshot, pollUntil)
}
// Poll network until next timer.
if netpollinited() && (atomic.Load(&netpollWaiters) > 0 || pollUntil != 0) && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
atomic.Store64(&sched.pollUntil, uint64(pollUntil))
if _g_.m.p != 0 {
throw("findrunnable: netpoll with p")
}
if _g_.m.spinning {
throw("findrunnable: netpoll with spinning")
}
delay := int64(-1)
if pollUntil != 0 {
if now == 0 {
now = nanotime()
}
delay = pollUntil - now
if delay < 0 {
delay = 0
}
}
if faketime != 0 {
// When using fake time, just poll.
delay = 0
}
list := netpoll(delay) // block until new work is available
atomic.Store64(&sched.pollUntil, 0)
atomic.Store64(&sched.lastpoll, uint64(nanotime()))
if faketime != 0 && list.empty() {
// Using fake time and nothing is ready; stop M.
// When all M's stop, checkdead will call timejump.
stopm()
goto top
}
lock(&sched.lock)
_p_ = pidleget()
unlock(&sched.lock)
if _p_ == nil {
injectglist(&list)
} else {
acquirep(_p_)
if !list.empty() {
gp := list.pop()
injectglist(&list)
casgstatus(gp, _Gwaiting, _Grunnable)
if trace.enabled {
traceGoUnpark(gp, 0)
}
return gp, false
}
if wasSpinning {
_g_.m.spinning = true
atomic.Xadd(&sched.nmspinning, 1)
}
goto top
}
} else if pollUntil != 0 && netpollinited() {
pollerPollUntil := int64(atomic.Load64(&sched.pollUntil))
if pollerPollUntil == 0 || pollerPollUntil > pollUntil {
netpollBreak()
}
}
// 将m停止
stopm()
goto top
}
execute
当前获取到可运行的goroutine,则执行该goroutine
func execute(gp *g, inheritTime bool) {
_g_ := getg()
// Assign gp.m before entering _Grunning so running Gs have an
// M.
_g_.m.curg = gp
gp.m = _g_.m
casgstatus(gp, _Grunnable, _Grunning)
gp.waitsince = 0
gp.preempt = false
gp.stackguard0 = gp.stack.lo + _StackGuard
if !inheritTime {
_g_.m.p.ptr().schedtick++
}
// Check whether the profiler needs to be turned on or off.
hz := sched.profilehz
if _g_.m.profilehz != hz {
setThreadCPUProfiler(hz)
}
if trace.enabled {
// GoSysExit has to happen when we have a P, but before GoStart.
// So we emit it here.
if gp.syscallsp != 0 && gp.sysblocktraced {
traceGoSysExit(gp.sysexitticks)
}
traceGoStart()
}
gogo(&gp.sched)
}
runtime·gogo
gogo由汇编实现,主要是由g0切换到g栈,然后执行函数
gobuf 是一个可以存储当前goroutine上下文的数据结构
TEXT runtime·gogo(SB), NOSPLIT, $0-8
MOVQ buf+0(FP), BX // FP指向入参起始位置,所以FP+0指向第一个入参,即gobuf
MOVQ gobuf_g(BX), DX // 将gobuf.g存入DX
MOVQ 0(DX), CX // 将gobuf.g存入CX
JMP gogo<>(SB) // 跳转到gogo
TEXT gogo<>(SB), NOSPLIT, $0
get_tls(CX) // 将TLS地址存入CX
MOVQ DX, g(CX) // set the g register // 将DX的值(gobuf.g)存入CX指向的地址(TLS)
MOVQ DX, R14 // 将DX的值(gobuf.g)存入R14
MOVQ gobuf_sp(BX), SP // 将gobuf.sp存入SP restore SP, 恢复sp寄存器值切换到g栈
MOVQ gobuf_ret(BX), AX // 将gobuf.ret存入AX
MOVQ gobuf_ctxt(BX), DX // 将gobuf.ctxt存入DX
MOVQ gobuf_bp(BX), BP // 将gobuf.bp存入BP
MOVQ $0, gobuf_sp(BX) // 将gobuf.sp清零
MOVQ $0, gobuf_ret(BX) // 将gobuf.ret清零
MOVQ $0, gobuf_ctxt(BX) // 将gobuf.ctxt清零
MOVQ $0, gobuf_bp(BX) // 将gobuf.bp清零
MOVQ gobuf_pc(BX), BX // 将gobuf.pc存入BX 获取G任务函数的地址
JMP BX // 跳转到BX指向的地址 转到任务函数执行
runtime·goexit
TEXT runtime·goexit(SB),NOSPLIT|TOPFRAME,$0-0
BYTE $0x90 // NOP
CALL runtime·goexit1(SB) // does not return
// traceback from goexit1 must hit code range of goexit
BYTE $0x90 // NOP
func goexit1() {
if raceenabled {
racegoend()
}
if trace.enabled {
traceGoEnd()
}
mcall(goexit0) // 切换到g0,并执行goexit0
}
// goexit continuation on g0.
// 入参gp:退出的协程
func goexit0(gp *g) {
_g_ := getg()
_p_ := _g_.m.p.ptr()
casgstatus(gp, _Grunning, _Gdead) // 设置gp为_Gdead状态
gcController.addScannableStack(_p_, -int64(gp.stack.hi-gp.stack.lo)) // 将退出协程栈空间加入可扫描栈
if isSystemGoroutine(gp, false) {
atomic.Xadd(&sched.ngsys, -1)
}
gp.m = nil // 以下为清理退出协程的上下文
locked := gp.lockedm != 0
gp.lockedm = 0
_g_.m.lockedg = 0
gp.preemptStop = false
gp.paniconfault = false
gp._defer = nil // should be true already but just in case.
gp._panic = nil // non-nil for Goexit during panic. points at stack-allocated data.
gp.writebuf = nil
gp.waitreason = 0
gp.param = nil
gp.labels = nil
gp.timer = nil
if gcBlackenEnabled != 0 && gp.gcAssistBytes > 0 {
// Flush assist credit to the global pool. This gives
// better information to pacing if the application is
// rapidly creating an exiting goroutines.
assistWorkPerByte := gcController.assistWorkPerByte.Load()
scanCredit := int64(assistWorkPerByte * float64(gp.gcAssistBytes))
atomic.Xaddint64(&gcController.bgScanCredit, scanCredit)
gp.gcAssistBytes = 0
}
dropg() // 解绑定当前m和执行g(退出的协程)的关系
if GOARCH == "wasm" { // no threads yet on wasm
gfput(_p_, gp)
schedule() // never returns
}
if _g_.m.lockedInt != 0 {
print("invalid m->lockedInt = ", _g_.m.lockedInt, "\n")
throw("internal lockOSThread error")
}
gfput(_p_, gp) // 将g对象存入p的gfree队列
if locked {
// The goroutine may have locked this thread because
// it put it in an unusual kernel state. Kill it
// rather than returning it to the thread pool.
// Return to mstart, which will release the P and exit
// the thread.
if GOOS != "plan9" { // See golang.org/issue/22227.
gogo(&_g_.m.g0.sched)
} else {
// Clear lockedExt on plan9 since we may end up re-using
// this thread.
_g_.m.lockedExt = 0
}
}
schedule() // 重新开始下一周期调度
}
调度策略
复用线程:避免频繁的创建、销毁线程,而是对线程的复用
抢占调度
监控线程
sysmon()
函数会检查当前系统中的可运行goroutine,并根据优先级和资源占用情况来调度它们的执行。如果某个goroutine长时间占用CPU或内存等资源,sysmon()
函数会强制将其挂起,让其他goroutine有机会执行
和操作系统按时间片调度线程不同,Go并没有时间片的概念。如果某个goroutine没有进行system call调用、没有进行I/O操作、没有阻塞在一个channel操作上,那么m是如何让该goroutine停下来并调度下一个runnable goroutine的呢?答案是:G是被抢占调度的
。我们看到sysmon将“向长时间运行的G任务发出抢占调度”,这个事情由retake实施
func sysmon() {
//........
// retake P's blocked in syscalls
// and preempt long running G's
if retake(now) != 0 {
idle = 0
} else {
idle++
}
}
}
const forcePreemptNS = 10 * 1000 * 1000 // 10ms
func retake(now int64) uint32 {
// .......
pd := &_p_.sysmontick
s := _p_.status
sysretake := false
if s == _Prunning || s == _Psyscall {
// Preempt G if it's running for too long.
t := int64(_p_.schedtick)
if int64(pd.schedtick) != t {
pd.schedtick = uint32(t)
pd.schedwhen = now
// 如果当前的时间超过了 10ms 设置当前的goroutine强制标识preempt 为true
} else if pd.schedwhen+forcePreemptNS <= now {
preemptone(_p_)
// In case of syscall, preemptone() doesn't
// work, because there is no M wired to P.
sysretake = true
}
}
// .......
}
阻塞调度
channel或network I/O
如果goroutine被阻塞在某个channel操作或network I/O操作上时,goroutine会被放置到某个wait队列中,而m会尝试运行下一个runnable的goroutine;如果此时没有runnable的goroutine供m运行,那么m将解绑P,并进入sleep状态。当I/O available或channel操作完成,在wait队列中的goroutine会被唤醒,标记为runnable,放入到某P的队列中,绑定一个m继续执行。
system call阻塞
如果goroutine被阻塞在某个system call操作上,那么不光goroutine会阻塞,执行该goroutine的m也会解绑p(实质是被sysmon抢走了),与goroutine一起进入sleep状态。如果此时有idle的m,则P与其绑定继续执行其他goroutine;如果没有idle m,但仍然有其他goroutine要去执行,那么就会创建一个新m。
当阻塞在syscall上的goroutine完成syscall调用后,goroutine会去尝试获取一个可用的p,如果没有可用的p,那么goroutine会被标记为runnable,之前的那个sleep的m将再次进入sleep。
work stealing
当没有可运行的goroutine 时,尝试从其他线程m绑定的p偷取一半的goroutine过来,而不是销毁线程。
hand off
当m阻塞时,m释放绑定的 p(mp分离),把 p 转移给其他空闲的线程执行。