golang schedule、go关键字、抢占

177 阅读5分钟

schedule()

// One round of scheduler: find a runnable goroutine and execute it. // Never returns.


findRunnable :

->checkTimers(pp, 0) // runs any timers for the P that are ready.

->if trace enabled, g = traceReader

->nil && gcBlackenEnabled, g = findRunnableGCWorker // gc mark 阶段,findRunnableGCWorker returns a background mark worker for pp

->nil && 每进行61次调度就需要优先从全局运行队列中获取goroutine出来运行,因为如果只调度本地运行队列中的goroutine,则全局运行队列中的goroutine有可能得不到运行

->nil && Get g from local runnable queue.

->nil && Get g from global runq

->nil && Poll network //netpoll checks for ready network connections. Returns list of goroutines that become runnable.

->nil && steal work from other Ps

schedule() 调用时机

  • The use of the keyword go
  • Garbage collection
  • System calls
  • Synchronization and Orchestration
    • atomic, mutex, or channel operation

1736  mstart1

4036  park_m // gopark

4065 goschedimpl

4141  preemptPark

4162  goyield_m

4181  goexit0

4756 exitsyscall  // The goroutine g exited its system call. Arrange for it to run on a cpu again.

Never returns 的含义

golang.design/go-question…

在一个复杂的程序中,调度可能会进行无数次循环,也就是说会进行无数次没有返回的函数调用,大家都知道,每调用一次函数都会消耗一定的栈空间,而如果一直这样无返回的调用下去无论 g0 有多少栈空间终究是会耗尽的,那么这里是不是有问题?其实没有问题!关键点就在于,每次执行 mcall 切换到 g0 栈时都是切换到 g0.sched.sp 所指的固定位置,这之所以行得通,正是因为从 schedule 函数开始之后的一系列函数永远都不会返回,所以重用这些函数上一轮调度时所使用过的栈内存是没有问题的

我再解释一下:栈空间在调用函数时会自动“增大”,而函数返回时,会自动“减小”,这里的增大和减小是指栈顶指针 SP 的变化。上述这些函数都没有返回,说明调用者不需要用到被调用者的返回值,有点像“尾递归”。

因为 g0 一直没有动过,所有它之前保存的 sp 还能继续使用。每一次调度循环都会覆盖上一次调度循环的栈数据,完美!

go statement 的实现(newproc)

// Create a new g running fn.
// Put it on the queue of g's waiting to run.
// The compiler turns a go statement into a call to this.
func newproc(fn *funcval) {
   gp := getg()
   pc := getcallerpc()
   systemstack(func() {
      newg := newproc1(fn, gp, pc)

      _p_ := getg().m.p.ptr()
      runqput(_p_, newg, true)

      if mainStarted {
         wakep()
      }
   })
}
// runqput tries to put g on the local runnable queue.
// If next is false, runqput adds g to the tail of the runnable queue.
// If next is true, runqput puts g in the _p_.runnext slot.
// If the run queue is full, runnext puts g on the global queue.
// Executed only by the owner P.
// Tries to add one more P to execute G's.
// Called when a G is made runnable (newproc, ready).
func wakep() {
   if atomic.Load(&sched.npidle) == 0 {
      return
   }
   // be conservative about spinning threads
   if atomic.Load(&sched.nmspinning) != 0 || !atomic.Cas(&sched.nmspinning, 0, 1) {
      return
   }
   startm(nil, true)
}
// Schedules some M to run the p (creates an M if necessary).

func startm(_p_ *p, spinning bool)

goroutine 的抢占

cloud.tencent.com/developer/a…

If a goroutine is taking more than 10ms, Go will trigger the preemption of it (in sysmon).

At startup, the runtime creates a System Monitor (sysmon) thread. sysmon would preempt long running G's.

preemptone() 函数会沿着下面这条路径:

preemptone->preemptM->signalM->tgkill

每个 M 在初始化的时候都会设置信号处理函数:

initsig->setsig->sighandler

当线程收到 SIGURG 信号的时候,就会去执行 sighandler 函数,核心是 doSigPreempt 函数。

最精彩的部分就在 goschedImpl 函数。它首先将 goroutine 的状态从 running 改成 runnable;接着调 dropg 将 g 和 m 解绑;然后调用 globrunqput 将 goroutine 丢到全局可运行队列,由于是全局可运行队列,所以需要加锁。最后,调用 schedule() 函数进入调度循环。

// Tell the goroutine running on processor P to stop.
// This function is purely best-effort. It can incorrectly fail to inform the
// goroutine. It can inform the wrong goroutine. Even if it informs the
// correct goroutine, that goroutine might ignore the request if it is
// simultaneously executing newstack.
// No lock needs to be held.
// Returns true if preemption request was issued.
// The actual preemption will happen at some point in the future
// and will be indicated by the gp->status no longer being
// Grunning
func preemptone(_p_ *p) bool {
   mp := _p_.m.ptr()
   if mp == nil || mp == getg().m {
      return false
   }
   gp := mp.curg
   if gp == nil || gp == mp.g0 {
      return false
   }

   gp.preempt = true

   // Every call in a goroutine checks for stack overflow by
   // comparing the current stack pointer to gp->stackguard0.
   // Setting gp->stackguard0 to StackPreempt folds
   // preemption into the normal stack overflow check.
   gp.stackguard0 = stackPreempt

   // Request an async preemption of this P.
  
   if preemptMSupported && debug.asyncpreemptoff == 0 {
      _p_.preempt = true
      preemptM(mp)
   }

   return true
}
// If multiple threads are preempting the same M, it may send many
// signals to the same M such that it hardly make progress, causing
// live-lock problem. Apparently this could happen on darwin. See
// issue #37741.
// Only send a signal if there isn't already one pending.
signalM(mp, sigPreempt)
func signalM(mp *m, sig int) {
   pthread_kill(pthread(mp.procid), uint32(sig))
}

schedt

type schedt struct {
   // accessed atomically. keep at top to ensure alignment on 32-bit systems.
   goidgen   uint64
   lastpoll  uint64 // time of last network poll, 0 if currently polling
   pollUntil uint64 // time to which current poll is sleeping

   lock mutex

   // When increasing nmidle, nmidlelocked, nmsys, or nmfreed, be
   // sure to call checkdead().

   midle        muintptr // idle m's waiting for work
   nmidle       int32    // number of idle m's waiting for work
   nmidlelocked int32    // number of locked m's waiting for work
   mnext        int64    // number of m's that have been created and next M ID
   maxmcount    int32    // maximum number of m's allowed (or die)
   nmsys        int32    // number of system m's not counted for deadlock
   nmfreed      int64    // cumulative number of freed m's

   ngsys uint32 // number of system goroutines; updated atomically

   pidle      puintptr // idle p's
   npidle     uint32
   nmspinning uint32 // See "Worker thread parking/unparking" comment in proc.go.

   // Global runnable queue.
   runq     gQueue
   runqsize int32

   // disable controls selective disabling of the scheduler.
   //
   // Use schedEnableUser to control this.
   //
   // disable is protected by sched.lock.
   disable struct {
      // user disables scheduling of user goroutines.
      user     bool
      runnable gQueue // pending runnable Gs
      n        int32  // length of runnable
   }

   // Global cache of dead G's.
   gFree struct {
      lock    mutex
      stack   gList // Gs with stacks
      noStack gList // Gs without stacks
      n       int32
   }

   // Central cache of sudog structs.
   sudoglock  mutex
   sudogcache *sudog

   // Central pool of available defer structs.
   deferlock mutex
   deferpool *_defer

   // freem is the list of m's waiting to be freed when their
   // m.exited is set. Linked through m.freelink.
   freem *m

   gcwaiting  uint32 // gc is waiting to run
   stopwait   int32
   stopnote   note
   sysmonwait uint32
   sysmonnote note

   // safepointFn should be called on each P at the next GC
   // safepoint if p.runSafePointFn is set.
   safePointFn   func(*p)
   safePointWait int32
   safePointNote note

   profilehz int32 // cpu profiling rate

   procresizetime int64 // nanotime() of last change to gomaxprocs
   totaltime      int64 // ∫gomaxprocs dt up to procresizetime

   // sysmonlock protects sysmon's actions on the runtime.
   //
   // Acquire and hold this mutex to block sysmon from interacting
   // with the rest of the runtime.
   sysmonlock mutex

   // timeToRun is a distribution of scheduling latencies, defined
   // as the sum of time a G spends in the _Grunnable state before
   // it transitions to _Grunning.
   //
   // timeToRun is protected by sched.lock.
   timeToRun timeHistogram
}