Golang调度器(8)—系统监控

1,229 阅读3分钟

0. 简介

前面多个章节,我们介绍了Golang调度器的各种工作原理,其中提到了在main goroutine创建的过程中,会新建一个不在调度中的线程,执行sysmon任务,这个任务也称为系统监控任务,接下来我们就看看这个任务究竟做了什么。

1. 休眠策略

func sysmon() {
   ...

   for {
      if idle == 0 { // start with 20us sleep... // 启动先休眠20us
         delay = 20
      } else if idle > 50 { // start doubling the sleep after 1ms... // 1ms后休眠时间翻倍
         delay *= 2
      }
      if delay > 10*1000 { // up to 10ms // 翻倍到10ms时保持
         delay = 10 * 1000
      }
      usleep(delay) // 休眠

      ...
   }
}

通过以上代码,可以分析出以下结论,每次循环开始都会休眠,有多种策略:

  1. 至少休眠20us;
  2. 如果抢占失败的次数超过50,且还没有触发GC,说明系统挺闲的,那么休眠时间翻倍;
  3. 休眠时长最长10ms,之后保持10ms不变,除非抢占成功,或者触发GC。

2. 运行计时器

func sysmon() {
   ...

   for {
      ...

      // 如果在 STW,则暂时休眠
      // sysmon should not enter deep sleep if schedtrace is enabled so that
      // it can print that information at the right time.
      //
      // It should also not enter deep sleep if there are any active P's so
      // that it can retake P's from syscalls, preempt long running G's, and
      // poll the network if all P's are busy for long stretches.
      //
      // It should wakeup from deep sleep if any P's become active either due
      // to exiting a syscall or waking up due to a timer expiring so that it
      // can resume performing those duties. If it wakes from a syscall it
      // resets idle and delay as a bet that since it had retaken a P from a
      // syscall before, it may need to do it again shortly after the
      // application starts work again. It does not reset idle when waking
      // from a timer to avoid adding system load to applications that spend
      // most of their time sleeping.
      now := nanotime()
      if debug.schedtrace <= 0 && (sched.gcwaiting != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs)) {
         lock(&sched.lock)
         if atomic.Load(&sched.gcwaiting) != 0 || atomic.Load(&sched.npidle) == uint32(gomaxprocs) {
            syscallWake := false
            next, _ := timeSleepUntil()
            if next > now {
               atomic.Store(&sched.sysmonwait, 1)
               unlock(&sched.lock)
               // Make wake-up period small enough
               // for the sampling to be correct.
               sleep := forcegcperiod / 2
               if next-now < sleep {
                  sleep = next - now
               }
               shouldRelax := sleep >= osRelaxMinNS
               if shouldRelax {
                  osRelax(true)
               }
               syscallWake = notetsleep(&sched.sysmonnote, sleep)
               if shouldRelax {
                  osRelax(false)
               }
               lock(&sched.lock)
               atomic.Store(&sched.sysmonwait, 0)
               noteclear(&sched.sysmonnote)
            }
            if syscallWake { //
               idle = 0
               delay = 20
            }
         }
         unlock(&sched.lock)
      }

      ...
   }
}

在系统监控循环中,通过nanotime计算出当前时间,通过timeSleepUntil计算出计时器下一次需要唤醒的时间;当前调度器需要执行垃圾回收或者所有处理器都是闲置状态时,如果没有需要触发的计时器,那么系统监控可以暂时陷入休眠。

休眠会使用notesleep函数使得sysmon线程进入到休眠状态。当被唤醒后,会重置idledelay

3. 轮询网络

func sysmon() {
   ...

   for {
      ...
      // poll network if not polled for more than 10ms
      // 上一篇博客讲到的poll,如果10ms没有执行了,则执行一次
      lastpoll := int64(atomic.Load64(&sched.lastpoll))
      if netpollinited() && lastpoll != 0 && lastpoll+10*1000*1000 < now {
         atomic.Cas64(&sched.lastpoll, uint64(lastpoll), uint64(now))
         list := netpoll(0) // non-blocking - returns list of goroutines
         if !list.empty() {
            // Need to decrement number of idle locked M's
            // (pretending that one more is running) before injectglist.
            // Otherwise it can lead to the following situation:
            // injectglist grabs all P's but before it starts M's to run the P's,
            // another M returns from syscall, finishes running its G,
            // observes that there is no work to do and no other running M's
            // and reports deadlock.
            incidlelocked(-1)
            injectglist(&list)
            incidlelocked(1)
         }
      }
      if GOOS == "netbsd" && needSysmonWorkaround {
         // netpoll is responsible for waiting for timer
         // expiration, so we typically don't have to worry
         // about starting an M to service timers. (Note that
         // sleep for timeSleepUntil above simply ensures sysmon
         // starts running again when that timer expiration may
         // cause Go code to run again).
         //
         // However, netbsd has a kernel bug that sometimes
         // misses netpollBreak wake-ups, which can lead to
         // unbounded delays servicing timers. If we detect this
         // overrun, then startm to get something to handle the
         // timer.
         //
         // See issue 42515 and
         // https://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=50094.
         if next, _ := timeSleepUntil(); next < now {
            startm(nil, false)
         }
      }
      ...
   }
}

上一篇博客我们讲过,会在findrunnable中进行网络轮询器中epollwait的轮询,但是如果距离上一次轮询网络已经超过10ms,那么系统监控还会再循环中轮询网络,检查是否有待执行的文件描述符事件,进而加入到全局队列中(因为sysmon没有P,所以会放到全局队列中)进行调度。

4. 抢占处理器

func sysmon() {
   ...

   for {
      ...
      // retake P's blocked in syscalls // 抢夺,在《协作与抢占》中讲过
      // and preempt long running G's
      if retake(now) != 0 {
         idle = 0
      } else {
         idle++
      }
      ...
}

在博客协作与抢占,我们分析过retake的逻辑,下面做个总结:

  1. 当处理器处于_Prunning 或者 _Psyscall 状态时,如果上一次触发调度的时间已经过去了 10ms,我们会通过 runtime.preemptone 抢占当前处理器;
  2. 当处理器处于 _Psyscall 状态时,在满足以下两种情况下会调用runtime.handoffp让出处理器的使用权:
    1. 当处理器的运行队列不为空或者不存在空闲处理器;
    2. 当系统调用时间超过了 10ms 时;

系统监控通过在循环中抢占处理器来避免goroutine占据线程太久而造成的饥饿问题。

5. 垃圾回收

func sysmon() {
   ...

   for {
      ...
      // check if we need to force a GC
      if t := (gcTrigger{kind: gcTriggerTime, now: now}); t.test() && atomic.Load(&forcegc.idle) != 0 {
         lock(&forcegc.lock)
         forcegc.idle = 0
         var list gList
         list.push(forcegc.g)
         injectglist(&list)
         unlock(&forcegc.lock)
      }
      if debug.schedtrace > 0 && lasttrace+int64(debug.schedtrace)*1000000 <= now {
         lasttrace = now
         schedtrace(debug.scheddetail > 0)
      }
      unlock(&sched.sysmonlock)
   }
}

在最后,系统监控还会判断是否需要触发强制垃圾回收,垃圾回收将会在后续介绍。

小结

运行时通过系统监控来触发线程的抢占、网络的轮询和垃圾回收,保证Go语言运行时的可用性。系统监控能够很好地解决尾延迟的问题,减少调度器调度 Goroutine 的饥饿问题并保证计时器在尽可能准确的时间触发。

参考文献

本文大部分参考6.7 系统监控