0. 简介
前面多个章节,我们介绍了Golang调度器的各种工作原理,其中提到了在main goroutine创建的过程中,会新建一个不在调度中的线程,执行sysmon任务,这个任务也称为系统监控任务,接下来我们就看看这个任务究竟做了什么。
1. 休眠策略
func sysmon() {
...
for {
if idle == 0 { // start with 20us sleep... // 启动先休眠20us
delay = 20
} else if idle > 50 { // start doubling the sleep after 1ms... // 1ms后休眠时间翻倍
delay *= 2
}
if delay > 10*1000 { // up to 10ms // 翻倍到10ms时保持
delay = 10 * 1000
}
usleep(delay) // 休眠
...
}
}
通过以上代码,可以分析出以下结论,每次循环开始都会休眠,有多种策略:
- 至少休眠20us;
- 如果抢占失败的次数超过50,且还没有触发GC,说明系统挺闲的,那么休眠时间翻倍;
- 休眠时长最长10ms,之后保持10ms不变,除非抢占成功,或者触发GC。
2. 运行计时器
func sysmon() {
...
for {
...
// 如果在 STW,则暂时休眠
// sysmon should not enter deep sleep if schedtrace is enabled so that
// it can print that information at the right time.
//
// It should also not enter deep sleep if there are any active P's so
// that it can retake P's from syscalls, preempt long running G's, and
// poll the network if all P's are busy for long stretches.
//
// It should wakeup from deep sleep if any P's become active either due
// to exiting a syscall or waking up due to a timer expiring so that it
// can resume performing those duties. If it wakes from a syscall it
// resets idle and delay as a bet that since it had retaken a P from a
// syscall before, it may need to do it again shortly after the
// application starts work again. It does not reset idle when waking
// from a timer to avoid adding system load to applications that spend
// most of their time sleeping.
now := nanotime()
if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
lock(&sched.lock)
if atomic.Load(&sched.gcwaiting) != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs) {
syscallWake := false
next, _ := timeSleepUntil()
if next > now {
atomic.Store(&sched.sysmonwait, 1)
unlock(&sched.lock)
// Make wake-up period small enough
// for the sampling to be correct.
sleep := forcegcperiod / 2
if next-now < sleep {
sleep = next - now
}
shouldRelax := sleep >= osRelaxMinNS
if shouldRelax {
osRelax(true)
}
syscallWake = notetsleep(&sched.sysmonnote, sleep)
if shouldRelax {
osRelax(false)
}
lock(&sched.lock)
atomic.Store(&sched.sysmonwait, 0)
noteclear(&sched.sysmonnote)
}
if syscallWake { //
idle = 0
delay = 20
}
}
unlock(&sched.lock)
}
...
}
}
在系统监控循环中,通过nanotime计算出当前时间,通过timeSleepUntil计算出计时器下一次需要唤醒的时间;当前调度器需要执行垃圾回收或者所有处理器都是闲置状态时,如果没有需要触发的计时器,那么系统监控可以暂时陷入休眠。
休眠会使用notesleep函数使得sysmon线程进入到休眠状态。当被唤醒后,会重置idle和delay。
3. 轮询网络
func sysmon() {
...
for {
...
// poll network if not polled for more than 10ms
// 上一篇博客讲到的poll,如果10ms没有执行了,则执行一次
lastpoll := int64(atomic.Load64(&sched.lastpoll))
if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
list := netpoll(0) // non-blocking - returns list of goroutines
if !list.empty() {
// Need to decrement number of idle locked M's
// (pretending that one more is running) before injectglist.
// Otherwise it can lead to the following situation:
// injectglist grabs all P's but before it starts M's to run the P's,
// another M returns from syscall, finishes running its G,
// observes that there is no work to do and no other running M's
// and reports deadlock.
incidlelocked(-1)
injectglist(&list)
incidlelocked(1)
}
}
if GOOS == "netbsd" && needSysmonWorkaround {
// netpoll is responsible for waiting for timer
// expiration, so we typically don't have to worry
// about starting an M to service timers. (Note that
// sleep for timeSleepUntil above simply ensures sysmon
// starts running again when that timer expiration may
// cause Go code to run again).
//
// However, netbsd has a kernel bug that sometimes
// misses netpollBreak wake-ups, which can lead to
// unbounded delays servicing timers. If we detect this
// overrun, then startm to get something to handle the
// timer.
//
// See issue 42515 and
// https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50094.
if next, _ := timeSleepUntil(); next < now {
startm(nil, false)
}
}
...
}
}
上一篇博客我们讲过,会在findrunnable中进行网络轮询器中epollwait的轮询,但是如果距离上一次轮询网络已经超过10ms,那么系统监控还会再循环中轮询网络,检查是否有待执行的文件描述符事件,进而加入到全局队列中(因为sysmon没有P,所以会放到全局队列中)进行调度。
4. 抢占处理器
func sysmon() {
...
for {
...
// retake P's blocked in syscalls // 抢夺,在《协作与抢占》中讲过
// and preempt long running G's
if retake(now) != 0 {
idle = 0
} else {
idle++
}
...
}
在博客协作与抢占,我们分析过retake的逻辑,下面做个总结:
- 当处理器处于
_Prunning或者_Psyscall状态时,如果上一次触发调度的时间已经过去了 10ms,我们会通过runtime.preemptone抢占当前处理器; - 当处理器处于
_Psyscall状态时,在满足以下两种情况下会调用runtime.handoffp让出处理器的使用权:- 当处理器的运行队列不为空或者不存在空闲处理器;
- 当系统调用时间超过了 10ms 时;
系统监控通过在循环中抢占处理器来避免goroutine占据线程太久而造成的饥饿问题。
5. 垃圾回收
func sysmon() {
...
for {
...
// check if we need to force a GC
if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 {
lock(&forcegc.lock)
forcegc.idle = 0
var list gList
list.push(forcegc.g)
injectglist(&list)
unlock(&forcegc.lock)
}
if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
lasttrace = now
schedtrace(debug.scheddetail > 0)
}
unlock(&sched.sysmonlock)
}
}
在最后,系统监控还会判断是否需要触发强制垃圾回收,垃圾回收将会在后续介绍。
小结
运行时通过系统监控来触发线程的抢占、网络的轮询和垃圾回收,保证Go语言运行时的可用性。系统监控能够很好地解决尾延迟的问题,减少调度器调度 Goroutine 的饥饿问题并保证计时器在尽可能准确的时间触发。
参考文献
本文大部分参考6.7 系统监控。