schedule()
// One round of scheduler: find a runnable goroutine and execute it. // Never returns.
findRunnable :
->checkTimers(pp, 0) // runs any timers for the P that are ready.
->if trace enabled, g = traceReader
->nil && gcBlackenEnabled, g = findRunnableGCWorker // gc mark 阶段,findRunnableGCWorker returns a background mark worker for pp
->nil && 每进行61次调度就需要优先从全局运行队列中获取goroutine出来运行,因为如果只调度本地运行队列中的goroutine,则全局运行队列中的goroutine有可能得不到运行
->nil && Get g from local runnable queue.
->nil && Get g from global runq
->nil && Poll network //netpoll checks for ready network connections. Returns list of goroutines that become runnable.
->nil && steal work from other Ps
schedule() 调用时机
- The use of the keyword
go - Garbage collection
- System calls
- Synchronization and Orchestration
- atomic, mutex, or channel operation
1736 mstart1
4036 park_m // gopark
4065 goschedimpl
4141 preemptPark
4162 goyield_m
4181 goexit0
4756 exitsyscall // The goroutine g exited its system call. Arrange for it to run on a cpu again.
Never returns 的含义
在一个复杂的程序中,调度可能会进行无数次循环,也就是说会进行无数次没有返回的函数调用,大家都知道,每调用一次函数都会消耗一定的栈空间,而如果一直这样无返回的调用下去无论 g0 有多少栈空间终究是会耗尽的,那么这里是不是有问题?其实没有问题!关键点就在于,每次执行 mcall 切换到 g0 栈时都是切换到 g0.sched.sp 所指的固定位置,这之所以行得通,正是因为从 schedule 函数开始之后的一系列函数永远都不会返回,所以重用这些函数上一轮调度时所使用过的栈内存是没有问题的。
我再解释一下:栈空间在调用函数时会自动“增大”,而函数返回时,会自动“减小”,这里的增大和减小是指栈顶指针 SP 的变化。上述这些函数都没有返回,说明调用者不需要用到被调用者的返回值,有点像“尾递归”。
因为 g0 一直没有动过,所有它之前保存的 sp 还能继续使用。每一次调度循环都会覆盖上一次调度循环的栈数据,完美!
go statement 的实现(newproc)
// Create a new g running fn.
// Put it on the queue of g's waiting to run.
// The compiler turns a go statement into a call to this.
func newproc(fn *funcval) {
gp := getg()
pc := getcallerpc()
systemstack(func() {
newg := newproc1(fn, gp, pc)
_p_ := getg().m.p.ptr()
runqput(_p_, newg, true)
if mainStarted {
wakep()
}
})
}
// runqput tries to put g on the local runnable queue.
// If next is false, runqput adds g to the tail of the runnable queue.
// If next is true, runqput puts g in the _p_.runnext slot.
// If the run queue is full, runnext puts g on the global queue.
// Executed only by the owner P.
// Tries to add one more P to execute G's.
// Called when a G is made runnable (newproc, ready).
func wakep() {
if atomic.Load(&sched.npidle) == 0 {
return
}
// be conservative about spinning threads
if atomic.Load(&sched.nmspinning) != 0 || !atomic.Cas(&sched.nmspinning, 0, 1) {
return
}
startm(nil, true)
}
// Schedules some M to run the p (creates an M if necessary).
func startm(_p_ *p, spinning bool)
goroutine 的抢占
cloud.tencent.com/developer/a…
If a goroutine is taking more than 10ms, Go will trigger the preemption of it (in sysmon).
At startup, the runtime creates a System Monitor (sysmon) thread. sysmon would preempt long running G's.
preemptone() 函数会沿着下面这条路径:
preemptone->preemptM->signalM->tgkill
每个 M 在初始化的时候都会设置信号处理函数:
initsig->setsig->sighandler
当线程收到 SIGURG 信号的时候,就会去执行 sighandler 函数,核心是 doSigPreempt 函数。
最精彩的部分就在 goschedImpl 函数。它首先将 goroutine 的状态从 running 改成 runnable;接着调 dropg 将 g 和 m 解绑;然后调用 globrunqput 将 goroutine 丢到全局可运行队列,由于是全局可运行队列,所以需要加锁。最后,调用 schedule() 函数进入调度循环。
// Tell the goroutine running on processor P to stop.
// This function is purely best-effort. It can incorrectly fail to inform the
// goroutine. It can inform the wrong goroutine. Even if it informs the
// correct goroutine, that goroutine might ignore the request if it is
// simultaneously executing newstack.
// No lock needs to be held.
// Returns true if preemption request was issued.
// The actual preemption will happen at some point in the future
// and will be indicated by the gp->status no longer being
// Grunning
func preemptone(_p_ *p) bool {
mp := _p_.m.ptr()
if mp == nil || mp == getg().m {
return false
}
gp := mp.curg
if gp == nil || gp == mp.g0 {
return false
}
gp.preempt = true
// Every call in a goroutine checks for stack overflow by
// comparing the current stack pointer to gp->stackguard0.
// Setting gp->stackguard0 to StackPreempt folds
// preemption into the normal stack overflow check.
gp.stackguard0 = stackPreempt
// Request an async preemption of this P.
if preemptMSupported && debug.asyncpreemptoff == 0 {
_p_.preempt = true
preemptM(mp)
}
return true
}
// If multiple threads are preempting the same M, it may send many
// signals to the same M such that it hardly make progress, causing
// live-lock problem. Apparently this could happen on darwin. See
// issue #37741.
// Only send a signal if there isn't already one pending.
signalM(mp, sigPreempt)
func signalM(mp *m, sig int) {
pthread_kill(pthread(mp.procid), uint32(sig))
}
schedt
type schedt struct {
// accessed atomically. keep at top to ensure alignment on 32-bit systems.
goidgen uint64
lastpoll uint64 // time of last network poll, 0 if currently polling
pollUntil uint64 // time to which current poll is sleeping
lock mutex
// When increasing nmidle, nmidlelocked, nmsys, or nmfreed, be
// sure to call checkdead().
midle muintptr // idle m's waiting for work
nmidle int32 // number of idle m's waiting for work
nmidlelocked int32 // number of locked m's waiting for work
mnext int64 // number of m's that have been created and next M ID
maxmcount int32 // maximum number of m's allowed (or die)
nmsys int32 // number of system m's not counted for deadlock
nmfreed int64 // cumulative number of freed m's
ngsys uint32 // number of system goroutines; updated atomically
pidle puintptr // idle p's
npidle uint32
nmspinning uint32 // See "Worker thread parking/unparking" comment in proc.go.
// Global runnable queue.
runq gQueue
runqsize int32
// disable controls selective disabling of the scheduler.
//
// Use schedEnableUser to control this.
//
// disable is protected by sched.lock.
disable struct {
// user disables scheduling of user goroutines.
user bool
runnable gQueue // pending runnable Gs
n int32 // length of runnable
}
// Global cache of dead G's.
gFree struct {
lock mutex
stack gList // Gs with stacks
noStack gList // Gs without stacks
n int32
}
// Central cache of sudog structs.
sudoglock mutex
sudogcache *sudog
// Central pool of available defer structs.
deferlock mutex
deferpool *_defer
// freem is the list of m's waiting to be freed when their
// m.exited is set. Linked through m.freelink.
freem *m
gcwaiting uint32 // gc is waiting to run
stopwait int32
stopnote note
sysmonwait uint32
sysmonnote note
// safepointFn should be called on each P at the next GC
// safepoint if p.runSafePointFn is set.
safePointFn func(*p)
safePointWait int32
safePointNote note
profilehz int32 // cpu profiling rate
procresizetime int64 // nanotime() of last change to gomaxprocs
totaltime int64 // ∫gomaxprocs dt up to procresizetime
// sysmonlock protects sysmon's actions on the runtime.
//
// Acquire and hold this mutex to block sysmon from interacting
// with the rest of the runtime.
sysmonlock mutex
// timeToRun is a distribution of scheduling latencies, defined
// as the sum of time a G spends in the _Grunnable state before
// it transitions to _Grunning.
//
// timeToRun is protected by sched.lock.
timeToRun timeHistogram
}