Golang调度器(8)—系统监控0. 简介前面多个章节，我们介绍了Golang调度器的各种工作原理，其中提到了在ma

0. 简介

前面多个章节，我们介绍了Golang调度器的各种工作原理，其中提到了在main goroutine创建的过程中，会新建一个不在调度中的线程，执行sysmon任务，这个任务也称为系统监控任务，接下来我们就看看这个任务究竟做了什么。

1. 休眠策略

func sysmon() {
   ...

   for {
      if idle == 0 { // start with 20us sleep... // 启动先休眠20us
         delay = 20
      } else if idle > 50 { // start doubling the sleep after 1ms... // 1ms后休眠时间翻倍
         delay *= 2
      }
      if delay > 10*1000 { // up to 10ms // 翻倍到10ms时保持
         delay = 10 * 1000
      }
      usleep(delay) // 休眠

      ...
   }
}

通过以上代码，可以分析出以下结论，每次循环开始都会休眠，有多种策略：

至少休眠20us；
如果抢占失败的次数超过50，且还没有触发GC，说明系统挺闲的，那么休眠时间翻倍；
休眠时长最长10ms，之后保持10ms不变，除非抢占成功，或者触发GC。

2. 运行计时器

func sysmon() {
   ...

   for {
      ...

      // 如果在 STW，则暂时休眠
      // sysmon should not enter deep sleep if schedtrace is enabled so that
      // it can print that information at the right time.
      //
      // It should also not enter deep sleep if there are any active P's so
      // that it can retake P's from syscalls, preempt long running G's, and
      // poll the network if all P's are busy for long stretches.
      //
      // It should wakeup from deep sleep if any P's become active either due
      // to exiting a syscall or waking up due to a timer expiring so that it
      // can resume performing those duties. If it wakes from a syscall it
      // resets idle and delay as a bet that since it had retaken a P from a
      // syscall before, it may need to do it again shortly after the
      // application starts work again. It does not reset idle when waking
      // from a timer to avoid adding system load to applications that spend
      // most of their time sleeping.
      now := nanotime()
      if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
         lock(&sched.lock)
         if atomic.Load(&sched.gcwaiting) != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs) {
            syscallWake := false
            next, _ := timeSleepUntil()
            if next > now {
               atomic.Store(&sched.sysmonwait, 1)
               unlock(&sched.lock)
               // Make wake-up period small enough
               // for the sampling to be correct.
               sleep := forcegcperiod / 2
               if next-now < sleep {
                  sleep = next - now
               }
               shouldRelax := sleep >= osRelaxMinNS
               if shouldRelax {
                  osRelax(true)
               }
               syscallWake = notetsleep(&sched.sysmonnote, sleep)
               if shouldRelax {
                  osRelax(false)
               }
               lock(&sched.lock)
               atomic.Store(&sched.sysmonwait, 0)
               noteclear(&sched.sysmonnote)
            }
            if syscallWake { //
               idle = 0
               delay = 20
            }
         }
         unlock(&sched.lock)
      }

      ...
   }
}

在系统监控循环中，通过nanotime计算出当前时间，通过timeSleepUntil计算出计时器下一次需要唤醒的时间；当前调度器需要执行垃圾回收或者所有处理器都是闲置状态时，如果没有需要触发的计时器，那么系统监控可以暂时陷入休眠。

休眠会使用notesleep函数使得sysmon线程进入到休眠状态。当被唤醒后，会重置idle和delay。

3. 轮询网络

func sysmon() {
   ...

   for {
      ...
      // poll network if not polled for more than 10ms
      // 上一篇博客讲到的poll，如果10ms没有执行了，则执行一次
      lastpoll := int64(atomic.Load64(&sched.lastpoll))
      if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
         atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
         list := netpoll(0) // non-blocking - returns list of goroutines
         if !list.empty() {
            // Need to decrement number of idle locked M's
            // (pretending that one more is running) before injectglist.
            // Otherwise it can lead to the following situation:
            // injectglist grabs all P's but before it starts M's to run the P's,
            // another M returns from syscall, finishes running its G,
            // observes that there is no work to do and no other running M's
            // and reports deadlock.
            incidlelocked(-1)
            injectglist(&list)
            incidlelocked(1)
         }
      }
      if GOOS == "netbsd" && needSysmonWorkaround {
         // netpoll is responsible for waiting for timer
         // expiration, so we typically don't have to worry
         // about starting an M to service timers. (Note that
         // sleep for timeSleepUntil above simply ensures sysmon
         // starts running again when that timer expiration may
         // cause Go code to run again).
         //
         // However, netbsd has a kernel bug that sometimes
         // misses netpollBreak wake-ups, which can lead to
         // unbounded delays servicing timers. If we detect this
         // overrun, then startm to get something to handle the
         // timer.
         //
         // See issue 42515 and
         // https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50094.
         if next, _ := timeSleepUntil(); next < now {
            startm(nil, false)
         }
      }
      ...
   }
}

上一篇博客我们讲过，会在findrunnable中进行网络轮询器中epollwait的轮询，但是如果距离上一次轮询网络已经超过10ms，那么系统监控还会再循环中轮询网络，检查是否有待执行的文件描述符事件，进而加入到全局队列中（因为sysmon没有P，所以会放到全局队列中）进行调度。

4. 抢占处理器

func sysmon() {
   ...

   for {
      ...
      // retake P's blocked in syscalls // 抢夺，在《协作与抢占》中讲过
      // and preempt long running G's
      if retake(now) != 0 {
         idle = 0
      } else {
         idle++
      }
      ...
}

在博客协作与抢占，我们分析过retake的逻辑，下面做个总结：

当处理器处于_Prunning 或者 _Psyscall 状态时，如果上一次触发调度的时间已经过去了 10ms，我们会通过 runtime.preemptone 抢占当前处理器；
当处理器处于 _Psyscall 状态时，在满足以下两种情况下会调用runtime.handoffp让出处理器的使用权：
1. 当处理器的运行队列不为空或者不存在空闲处理器；
2. 当系统调用时间超过了 10ms 时；

系统监控通过在循环中抢占处理器来避免goroutine占据线程太久而造成的饥饿问题。

5. 垃圾回收

func sysmon() {
   ...

   for {
      ...
      // check if we need to force a GC
      if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 {
         lock(&forcegc.lock)
         forcegc.idle = 0
         var list gList
         list.push(forcegc.g)
         injectglist(&list)
         unlock(&forcegc.lock)
      }
      if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
         lasttrace = now
         schedtrace(debug.scheddetail > 0)
      }
      unlock(&sched.sysmonlock)
   }
}

在最后，系统监控还会判断是否需要触发强制垃圾回收，垃圾回收将会在后续介绍。

小结

运行时通过系统监控来触发线程的抢占、网络的轮询和垃圾回收，保证Go语言运行时的可用性。系统监控能够很好地解决尾延迟的问题，减少调度器调度 Goroutine 的饥饿问题并保证计时器在尽可能准确的时间触发。

参考文献

本文大部分参考6.7 系统监控。