从hello world谈起Golang的runtime机制是Golang语言的核心组成部分之一，它负责管理和调度gor

工作原因很久没更，记得这篇blog还是从java转golang刚刚一个月写下的，不知不觉躺了3年了，orz。当时写了不少CRUD 但不了解golang底层，写起来还是有些不踏实。故而整理了这篇关于golang runtime机制冰山一角的文章。

废话少说全文有些长建议先马后看人灿烂

全文枯燥预警

Golang的runtime机制是Golang语言的核心组成部分之一，它负责管理和调度goroutine，垃圾回收，内存分配，锁和其他底层功能。

Goroutine

Goroutine是Golang语言中的并发执行单元，它比操作系统线程更轻量级，可以轻松创建和管理。在Golang的runtime机制中，Goroutine的实现是核心部分之一。Goroutine的实现基于M:N线程模型，即多个Goroutine可以运行在少量的线程之上，这样可以更好地利用系统资源。

在runtime/sched包中，有一个goroutine结构体用于描述Goroutine的状态和信息，它的定义如下：

type g struct {
    ...
    atomicstatus uint32 //Goroutine的状态
    _p           uintptr //Goroutine关联的P
    ...
}

Goroutine的状态可以是以下几种：

_Gidle：Goroutine处于空闲状态，等待新的任务。
_Grunnable：Goroutine可以运行，但还没有被调度。
_Grunning：Goroutine正在运行。
_Gsyscall：Goroutine正在执行系统调用。
_Gdead：Goroutine已经完成，但还没有被垃圾回收。

Goroutine的调度是由调度器（scheduler）来完成的。调度器是Golang runtime机制的另一个核心部分，它负责将Goroutine分配给可用的线程。调度器的实现在runtime/proc.go文件中，其中最重要的函数是schedule函数，它的作用是将Goroutine分配给可用的线程。

垃圾回收

垃圾回收是Golang runtime机制的另一个重要部分。在Golang中，垃圾回收是由runtime自动执行的，它负责回收不再使用的内存，防止内存泄漏和溢出。

Golang的垃圾回收算法使用的是标记-清除算法（mark-and-sweep）。该算法分为两个阶段：标记阶段和清除阶段。在标记阶段，垃圾回收器会遍历程序中所有的对象，并标记那些还在使用中的对象。在清除阶段，垃圾回收器会清除那些没有被标记的对象，将它们的内存释放回来。

在runtime/mgc.go文件中，有一个mgc函数用于执行垃圾回收。该函数会遍历所有的对象，并标记那些还在使用中的对象。然后，它会清除那些没有被标记的对象，并将它们的内存释放回来。

内存分配

内存分配是Golang runtime机制的另一个重要部分。在Golang中，内存分配是由runtime自动完成的，它负责管理程序中所有的内存分配和释放操作。

在runtime/malloc.go文件中，有一个mallocgc函数用于执行内存分配。该函数会从堆中分配一块内存，并返回一个指向该内存的指针。分配的内存会被垃圾回收器自动管理，当内存不再使用时，它会被释放回到堆中。

锁是Golang runtime机制的另一个核心部分。在Golang中，锁用于保护共享资源，防止多个Goroutine同时访问同一资源导致的竞态条件。

在runtime/lock.go文件中，有一个mutex结构体用于描述锁的状态和信息，它的定义如下：

type mutex struct {
    state int32 //锁的状态
    ...
}

锁的状态可以是以下几种：

mutexLocked：锁被占用。
mutexUnlocked：锁未被占用。

在runtime/lock_futex.go文件中，有一个lock函数用于获取锁，有一个unlock函数用于释放锁。这两个函数会通过操作系统提供的内核级别的futex机制来实现锁的同步和唤醒。

总之，Golang的runtime机制是Golang语言的核心组成部分之一，它负责管理和调度goroutine，垃圾回收，内存分配，锁和其他底层功能。通过深入了解Golang的runtime机制，我们可以更好地理解Golang的并发模型，从而更加高效地编写并发程序。

而今天我们就从一个简单的hello world程序来追溯一下golang进程的启动和调度，一窥golang runtime的冰山一角。

搬个砖的同学都清楚，所有的程序都会有一个main函数作为入口，golang（go版本1.13.5）也不例外，不过用户定义的main.main并不是真正的入口，在这之前还会有一段plan9的汇编引导程序，接下来我们使用gdb一点一点找到程序入口慢慢触及golang的runtime。

gdb的安装就不再赘述了，我们先来写一个golang版的hello world

package main

func main(){
	println("hello world")
}

接下来我们使用go build -gcflags "-N -l" -o hello hello.go对源码进行编译（-gcflags "-N -l" 参数关闭编译器代码优化和函数内联，避免断点和单步执行无法准确对应源码行，避免小函数和局部变量被优化掉）然后gdb hello执行之：

root@4ff18d748169:/home/workspace# gdb hello
(gdb) info files
Symbols from "/home/workspace/hello".
Local exec file:
	`/home/workspace/hello', file type elf64-x86-64.
	Entry point: 0x44d730
	0x0000000000401000 - 0x0000000000452601 is .text
	0x0000000000453000 - 0x000000000048379f is .rodata
	0x0000000000483960 - 0x00000000004840dc is .typelink
	0x00000000004840e0 - 0x00000000004840e8 is .itablink
	0x00000000004840e8 - 0x00000000004840e8 is .gosymtab
	0x0000000000484100 - 0x00000000004c17f3 is .gopclntab
	0x00000000004c2000 - 0x00000000004c2020 is .go.buildinfo
	0x00000000004c2020 - 0x00000000004c2c08 is .noptrdata
	0x00000000004c2c20 - 0x00000000004c4ab0 is .data
	0x00000000004c4ac0 - 0x00000000004dff30 is .bss
	0x00000000004dff40 - 0x00000000004e2668 is .noptrbss
	0x0000000000400f9c - 0x0000000000401000 is .note.go.buildid
(gdb) b *0x44d730
Note: breakpoint 1 also set at pc 0x44d730.
Breakpoint 2 at 0x44d730: file /home/app/go/src/runtime/rt0_linux_amd64.s, line 8.

可以看到我们很容易就看到go程序的真正入口，接下来我们一步一步调试看看go进程启动时如何初始化的

初始化

通过汇编文件名找到对应的汇编源码：

#include "textflag.h"

TEXT _rt0_amd64_linux(SB),NOSPLIT,$-8
	JMP	_rt0_amd64(SB)

TEXT _rt0_amd64_linux_lib(SB),NOSPLIT,$0
	JMP	_rt0_amd64_lib(SB)

可以看到直接无条件跳转到_rt0_amd64(SB) gdb接着断点找到跳转具体位置

(gdb) b _rt0_amd64
Breakpoint 8 at 0x449d60: file /home/app/go/src/runtime/asm_amd64.s, line 15.

找到对应源码

// _rt0_amd64 is common startup code for most amd64 systems when using
// internal linking. This is the entry point for the program from the
// kernel for an ordinary -buildmode=exe program. The stack holds the
// number of arguments and the C-style argv.
TEXT _rt0_amd64(SB),NOSPLIT,$-8
	MOVQ	0(SP), DI	// argc
	LEAQ	8(SP), SI	// argv
	JMP	runtime·rt0_go(SB)

可以看到接着又无条件跳转到了runtime·rt0_go(SB) 如法炮制：

gdb) b runtime.rt0_go
Breakpoint 3 at 0x449d70: file /home/app/go/src/runtime/asm_amd64.s, line 89.


TEXT runtime·rt0_go(SB),NOSPLIT,$0
... ...
//程序启动时必定会有一个线程启动（主线程）
//将当前的栈和资源保存在g0
//将该线程保存在m0
// set the per-goroutine and per-mach "registers"
	get_tls(BX)
	LEAQ	runtime·g0(SB), CX
	MOVQ	CX, g(BX)
	LEAQ	runtime·m0(SB), AX
	//m0和g0相互绑定
	// save m->g0 = g0
	MOVQ	CX, m_g0(AX)
	// save m0 to g0->m
	MOVQ	AX, g_m(CX)

	CLD				// convention is D is always left cleared
	CALL	runtime·check(SB)

	MOVL	16(SP), AX		// copy argc
	MOVL	AX, 0(SP)
	MOVQ	24(SP), AX		// copy argv
	MOVQ	AX, 8(SP)
	//处理args
	CALL	runtime·args(SB)
	//os初始化 os_linux.go 主要干了一个事儿 获取系统的cpu个数
	CALL	runtime·osinit(SB)
	//调度系统初始化 proc.go
	CALL	runtime·schedinit(SB)

	//创建一个goroutine 然后启动程序
	// create a new goroutine to start program
	MOVQ	$runtime·mainPC(SB), AX		// entry
	PUSHQ	AX
	PUSHQ	$0			// arg size
	CALL	runtime·newproc(SB)
	POPQ	AX
	POPQ	AX
	//启动线程并启动调度系统
	// start this M
	CALL	runtime·mstart(SB)

	CALL	runtime·abort(SB)	// mstart should never return
	RET

	// Prevent dead-code elimination of debugCallV1, which is
	// intended to be called by debuggers.
	MOVQ	$runtime·debugCallV1(SB), AX
	RET

DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
GLOBL	runtime·mainPC(SB),RODATA,$8

其实asm_amd64.s的汇编源码中的初始化过程相当复杂，这里我们只介绍几个我们比较关心的步骤：

命令行参数处理
系统初始化
调度系统初始化

命令行参数处理

(gdb) b runtime.args
Breakpoint 4 at 0x432b60: file /home/app/go/src/runtime/runtime1.go, line 60.

func args(c int32, v **byte) {
	argc = c
	argv = v
	sysargs(c, v)
}

args函数整理命令行参数

系统初始化

runtime.osinit系统初始化其实就干了确定CPU core数量这一个事儿

(gdb) b runtime.osinit
Breakpoint 5 at 0x423030: file /home/app/go/src/runtime/os_linux.go, line 289.

func osinit() {
	ncpu = getproccount()
	physHugePageSize = getHugePageSize()
}

调度系统初始化

schedinit()函数注释已经帮我们简单描述了启动的过程，我们关注的运行时环境的初始化构造也基本都在这里被调用。

(gdb) b runtime.schedinit
Breakpoint 6 at 0x427690: file /home/app/go/src/runtime/proc.go, line 529.

// The bootstrap sequence is:
//
//	call osinit
//	call schedinit
//	make & queue new G
//	call runtime·mstart
//
// The new G calls runtime·main.
func schedinit() {
	// raceinit must be the first call to race detector.
	// In particular, it must be done before mallocinit below calls racemapshadow.
	_g_ := getg()
	if raceenabled {
		_g_.racectx, raceprocctx0 = raceinit()
	}
	//最大系统线程数量限制
	sched.maxmcount = 10000

	tracebackinit()
	moduledataverify()
	//栈相关初始化
	stackinit()
	//内存相关初始化
	mallocinit()
	//调度器相关初始化
	mcommoninit(_g_.m)
	cpuinit()       // must run before alginit
	alginit()       // maps must not be used before this call
	modulesinit()   // provides activeModules
	typelinksinit() // uses maps, activeModules
	itabsinit()     // uses activeModules

	msigsave(_g_.m)
	initSigmask = _g_.m.sigmask
	//处理命令行参数和环境变量
	goargs()
	goenvs()
	//处理GODEBUG、GOTRACEBACK调试相关的环境变量设置
	parsedebugvars()
	//垃圾回收器初始化
	gcinit()

	sched.lastpoll = uint64(nanotime())
	//通过CPU core和GOMAXPROCS环境变量确定P的数量
	procs := ncpu //默认等于CPU个数
	if n, ok := atoi32(gogetenv("GOMAXPROCS")); ok && n > 0 {
		procs = n
	}
	//调整P数量
	if procresize(procs) != nil {
		throw("unknown runnable goroutine during bootstrap")
	}

	// For cgocheck > 1, we turn on the write barrier at all times
	// and check all pointer writes. We can't do this until after
	// procresize because the write barrier needs a P.
	if debug.cgocheck > 1 {
		writeBarrier.cgo = true
		writeBarrier.enabled = true
		for _, p := range allp {
			p.wbBuf.reset()
		}
	}

	if buildVersion == "" {
		// Condition should never trigger. This code just serves
		// to ensure runtime·buildVersion is kept in the resulting binary.
		buildVersion = "unknown"
	}
	if len(modinfo) == 1 {
		// Condition should never trigger. This code just serves
		// to ensure runtime·modinfo is kept in the resulting binary.
		modinfo = ""
	}
}

根据注释 go进程启动简化大致分为：

call osinit 调用runtime.osinit()获取系统CPU个数
call schedinit 调用runtime.schedinit()初始化调度系统，进行P的初始化，并将m0与某个P绑定
make & queue new G 调用runtime.newproc新建一个goroutine即主线程它的任务函数为runtime.main,创建好后放到m0绑定的P的本地队列
call runtime·mstart 调用runtime.mstart启动m 这样m启动后就能从自己绑定的P的本地队列拿到runtime.main任务进行调度了

启动调度系统

那么CALL runtime·mstart(SB) 是如何启动m 开始调度的呢？源码下面无秘密：

(gdb) b runtime.mstart
Breakpoint 9 at 0x429150: file /home/app/go/src/runtime/proc.go, line 1146.

源码一窥：

// mstart is the entry-point for new Ms.
//
// This must not split the stack because we may not even have stack
// bounds set up yet.
//
// May run during STW (because it doesn't have a P yet), so write
// barriers are not allowed.
//
//go:nosplit
//go:nowritebarrierrec
func mstart() {
	//获取g0
	_g_ := getg()
	... ...
	//主要逻辑在mstart1()
	mstart1()
 	... ...
}

func mstart1() {
	//获取g0
	_g_ := getg()
	//确保g是系统栈上的g0 调度只能在g0上运行
	if _g_ != _g_.m.g0 {
		throw("bad runtime·mstart")
	}

	// Record the caller for use as the top of stack in mcall and
	// for terminating the thread.
	// We're never coming back to mstart1 after we call schedule,
	// so other calls can reuse the current frame.
	save(getcallerpc(), getcallersp())
	asminit()
	//初始化m 主要是设置线程的备用信号堆栈和信号掩码 没有深入研究过
	minit()

	// Install signal handlers; after minit so that minit can
	// prepare the thread to be able to handle the signals.
	//如果_g_绑定的m是m0 则执行mstartm0()
	if _g_.m == &m0 {
		//对于初始m，需要一些特殊处理，主要是设置系统信号量的处理函数
		mstartm0()
	}

	// 如果有m的起始任务函数，则执行，比如 sysmon 函数
	// 对于m0来说，是没有 mstartfn 的
	if fn := _g_.m.mstartfn; fn != nil {
		fn()
	}

	if _g_.m != &m0 {//如果不是m0 需要绑定P
		//绑定P
		acquirep(_g_.m.nextp.ptr())
		_g_.m.nextp = 0
	}
	// 进入调度，而且不会在返回
	schedule()
}

mstart()只是简单调用了mstart1()，而是让`mstart1()来做了调度的前置初始化工作：

调用getg()获取g，检查获取的g不是g0，则直接抛出异常，因为调度器只在g0上执行
初始化m主要是设置线程的备用信号堆栈和信号掩码没有深入研究过
判断g绑定的是不是m0，如果是则做一些特殊处理，没深入研究
检查m有无初始任务有则执行

m0 表示进程启动的第一个线程，它跟普通m没啥区别。但是m0是进程启动通过汇编赋值得到的，而普通m是runtime自己创建的，一个golang进程只有一个m0

g0：每个m都有一个g0，因为每个m都有一个系统堆栈，g0和普通的g结构一样，差异在于g0的栈是系统分配的，在linux上栈的大小默认是固定的8M，不能扩展，也不能缩小。而普通的g的栈一开始只有2kb，可扩展。且g0上没有任何任务函数，也没有任何状态，且不能被调度程序抢占，因为调度程序就是跑在g0上。

实际的调度逻辑在schedule()中

// One round of scheduler: find a runnable goroutine and execute it.
// Never returns.
func schedule() {
	_g_ := getg()

	if _g_.m.locks != 0 {
		throw("schedule: holding locks")
	}

	if _g_.m.lockedg != 0 {
		stoplockedm()
		execute(_g_.m.lockedg.ptr(), false) // Never returns.
	}

	// We should not schedule away from a g that is executing a cgo call,
	// since the cgo call is using the m's g0 stack.
	if _g_.m.incgo {
		throw("schedule: in cgo")
	}

top:
	//如果当前GC需要STW，则调用gcstopm()休眠当前m
	if sched.gcwaiting != 0 {
		gcstopm()
		//STW结束后回到top
		goto top
	}
	if _g_.m.p.ptr().runSafePointFn != 0 {
		runSafePointFn()
	}

	var gp *g
	var inheritTime bool

	// Normal goroutines will check for need to wakeP in ready,
	// but GCworkers and tracereaders will not, so the check must
	// be done here instead.
	tryWakeP := false
	if trace.enabled || trace.shutdown {
		gp = traceReader()
		if gp != nil {
			casgstatus(gp, _Gwaiting, _Grunnable)
			traceGoUnpark(gp, 0)
			tryWakeP = true
		}
	}
	if gp == nil && gcBlackenEnabled != 0 {
		gp = gcController.findRunnableGCWorker(_g_.m.p.ptr())
		tryWakeP = tryWakeP || gp != nil
	}
	if gp == nil {
		// Check the global runnable queue once in a while to ensure fairness.
		// Otherwise two goroutines can completely occupy the local runqueue
		// by constantly respawning each other.
		//每隔61次调度 从全局队列获取goroutine
		if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 {
			lock(&sched.lock)
			gp = globrunqget(_g_.m.p.ptr(), 1)
			unlock(&sched.lock)
		}
	}
	if gp == nil {
		//从P的本地队列获取goroutine
		gp, inheritTime = runqget(_g_.m.p.ptr())
		if gp != nil && _g_.m.spinning {
			throw("schedule: spinning with local work")
		}
	}
	if gp == nil {
		//findrunnable()想尽办法获取goroutine找不到就不返回
		gp, inheritTime = findrunnable() // blocks until work is available
	}

	// This thread is going to run a goroutine and is not spinning anymore,
	// so if it was marked as spinning we need to reset it now and potentially
	// start a new spinning M.
	if _g_.m.spinning {
		resetspinning()
	}

	if sched.disable.user && !schedEnabled(gp) {
		// Scheduling of this goroutine is disabled. Put it on
		// the list of pending runnable goroutines for when we
		// re-enable user scheduling and look again.
		lock(&sched.lock)
		if schedEnabled(gp) {
			// Something re-enabled scheduling while we
			// were acquiring the lock.
			unlock(&sched.lock)
		} else {
			sched.disable.runnable.pushBack(gp)
			sched.disable.n++
			unlock(&sched.lock)
			goto top
		}
	}

	// If about to schedule a not-normal goroutine (a GCworker or tracereader),
	// wake a P if there is one.
	if tryWakeP {
		if atomic.Load(&sched.npidle) != 0 && atomic.Load(&sched.nmspinning) == 0 {
			wakep()
		}
	}
	if gp.lockedm != 0 {
		// Hands off own p to the locked m,
		// then blocks waiting for a new p.
		startlockedm(gp)
		goto top
	}
	//找到goroutine 执行其任务函数
	execute(gp, inheritTime)
}

调度如何找到goroutine

线程启动后需要找到可执行的任务goruntine，大致逻辑为：

每隔61次调度会从全局队列中获取goroutine 避免全局队列饿死。
如果全局队列未获取到，则从m绑定的P的本地队列获取
如果本地队列仍未获取到，则调用findrunnable()方法获取获取不到则不返回

全局队列获取

if _g_.m.p.ptr().schedtick%61 == 0 && sched.runqsize > 0 每隔61次调度，通过globrunqget()从全局队列获取goroutine,获取逻辑不复杂

// Try get a batch of G's from the global runnable queue.
// Sched must be locked.
func globrunqget(_p_ *p, max int32) *g {
	if sched.runqsize == 0 {
		return nil
	}

	n := sched.runqsize/gomaxprocs + 1
	if n > sched.runqsize {
		n = sched.runqsize
	}
	if max > 0 && n > max {
		n = max
	}
	if n > int32(len(_p_.runq))/2 {
		n = int32(len(_p_.runq)) / 2
	}

	sched.runqsize -= n

	gp := sched.runq.pop()
	n--
	for ; n > 0; n-- {
		gp1 := sched.runq.pop()
		runqput(_p_, gp1, false)
	}
	return gp
}

本地队列获取

全局队列里拿不到任务，则尝试从本地队列获取。

// Get g from local runnable queue.
// If inheritTime is true, gp should inherit the remaining time in the
// current time slice. Otherwise, it should start a new time slice.
// Executed only by the owner P.
func runqget(_p_ *p) (gp *g, inheritTime bool) {
	// If there's a runnext, it's the next G to run.
	for {
		next := _p_.runnext
		if next == 0 {
			break
		}
		if _p_.runnext.cas(next, 0) {
			return next.ptr(), true
		}
	}

	for {
		h := atomic.LoadAcq(&_p_.runqhead) // load-acquire, synchronize with other consumers
		t := _p_.runqtail
		if t == h {
			return nil, false
		}
		gp := _p_.runq[h%uint32(len(_p_.runq))].ptr()
		if atomic.CasRel(&_p_.runqhead, h, h+1) { // cas-release, commits consume
			return gp, false
		}
	}
}

findrunnable

本地队列还拿不到，则调用findrunnable()找g，找不到就把m给睡了，让他等待唤醒。

// Finds a runnable goroutine to execute.
// Tries to steal from other P's, get g from global queue, poll network.
func findrunnable() (gp *g, inheritTime bool) {
	_g_ := getg()

	// The conditions here and in handoffp must agree: if
	// findrunnable would return a G to run, handoffp must start
	// an M.

top:
	_p_ := _g_.m.p.ptr()
	if sched.gcwaiting != 0 {
		gcstopm()
		goto top
	}
	if _p_.runSafePointFn != 0 {
		runSafePointFn()
	}
	if fingwait && fingwake {
		if gp := wakefing(); gp != nil {
			ready(gp, 0, true)
		}
	}
	if *cgo_yield != nil {
		asmcgocall(*cgo_yield, nil)
	}

	// local runq
	if gp, inheritTime := runqget(_p_); gp != nil {
		return gp, inheritTime
	}

	// global runq
	if sched.runqsize != 0 {
		lock(&sched.lock)
		gp := globrunqget(_p_, 0)
		unlock(&sched.lock)
		if gp != nil {
			return gp, false
		}
	}

	// Poll network.
	// This netpoll is only an optimization before we resort to stealing.
	// We can safely skip it if there are no waiters or a thread is blocked
	// in netpoll already. If there is any kind of logical race with that
	// blocked thread (e.g. it has already returned from netpoll, but does
	// not set lastpoll yet), this thread will do blocking netpoll below
	// anyway.
	if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Load64(&sched.lastpoll) != 0 {
		if list := netpoll(false); !list.empty() { // non-blocking
			gp := list.pop()
			injectglist(&list)
			casgstatus(gp, _Gwaiting, _Grunnable)
			if trace.enabled {
				traceGoUnpark(gp, 0)
			}
			return gp, false
		}
	}

	// Steal work from other P's.
	procs := uint32(gomaxprocs)
	if atomic.Load(&sched.npidle) == procs-1 {
		// Either GOMAXPROCS=1 or everybody, except for us, is idle already.
		// New work can appear from returning syscall/cgocall, network or timers.
		// Neither of that submits to local run queues, so no point in stealing.
		goto stop
	}
	// If number of spinning M's >= number of busy P's, block.
	// This is necessary to prevent excessive CPU consumption
	// when GOMAXPROCS>>1 but the program parallelism is low.
	if !_g_.m.spinning && 2*atomic.Load(&sched.nmspinning) >= procs-atomic.Load(&sched.npidle) {
		goto stop
	}
	if !_g_.m.spinning {
		_g_.m.spinning = true
		atomic.Xadd(&sched.nmspinning, 1)
	}
	for i := 0; i < 4; i++ {
		for enum := stealOrder.start(fastrand()); !enum.done(); enum.next() {
			if sched.gcwaiting != 0 {
				goto top
			}
			stealRunNextG := i > 2 // first look for ready queues with more than 1 g
			if gp := runqsteal(_p_, allp[enum.position()], stealRunNextG); gp != nil {
				return gp, false
			}
		}
	}

stop:

	// We have nothing to do. If we're in the GC mark phase, can
	// safely scan and blacken objects, and have work to do, run
	// idle-time marking rather than give up the P.
	if gcBlackenEnabled != 0 && _p_.gcBgMarkWorker != 0 && gcMarkWorkAvailable(_p_) {
		_p_.gcMarkWorkerMode = gcMarkWorkerIdleMode
		gp := _p_.gcBgMarkWorker.ptr()
		casgstatus(gp, _Gwaiting, _Grunnable)
		if trace.enabled {
			traceGoUnpark(gp, 0)
		}
		return gp, false
	}

	// wasm only:
	// If a callback returned and no other goroutine is awake,
	// then pause execution until a callback was triggered.
	if beforeIdle() {
		// At least one goroutine got woken.
		goto top
	}

	// Before we drop our P, make a snapshot of the allp slice,
	// which can change underfoot once we no longer block
	// safe-points. We don't need to snapshot the contents because
	// everything up to cap(allp) is immutable.
	allpSnapshot := allp

	// return P and block
	lock(&sched.lock)
	if sched.gcwaiting != 0 || _p_.runSafePointFn != 0 {
		unlock(&sched.lock)
		goto top
	}
	if sched.runqsize != 0 {
		gp := globrunqget(_p_, 0)
		unlock(&sched.lock)
		return gp, false
	}
	if releasep() != _p_ {
		throw("findrunnable: wrong p")
	}
	pidleput(_p_)
	unlock(&sched.lock)

	// Delicate dance: thread transitions from spinning to non-spinning state,
	// potentially concurrently with submission of new goroutines. We must
	// drop nmspinning first and then check all per-P queues again (with
	// #StoreLoad memory barrier in between). If we do it the other way around,
	// another thread can submit a goroutine after we've checked all run queues
	// but before we drop nmspinning; as the result nobody will unpark a thread
	// to run the goroutine.
	// If we discover new work below, we need to restore m.spinning as a signal
	// for resetspinning to unpark a new worker thread (because there can be more
	// than one starving goroutine). However, if after discovering new work
	// we also observe no idle Ps, it is OK to just park the current thread:
	// the system is fully loaded so no spinning threads are required.
	// Also see "Worker thread parking/unparking" comment at the top of the file.
	wasSpinning := _g_.m.spinning
	if _g_.m.spinning {
		_g_.m.spinning = false
		if int32(atomic.Xadd(&sched.nmspinning, -1)) < 0 {
			throw("findrunnable: negative nmspinning")
		}
	}

	// check all runqueues once again
	for _, _p_ := range allpSnapshot {
		if !runqempty(_p_) {
			lock(&sched.lock)
			_p_ = pidleget()
			unlock(&sched.lock)
			if _p_ != nil {
				acquirep(_p_)
				if wasSpinning {
					_g_.m.spinning = true
					atomic.Xadd(&sched.nmspinning, 1)
				}
				goto top
			}
			break
		}
	}

	// Check for idle-priority GC work again.
	if gcBlackenEnabled != 0 && gcMarkWorkAvailable(nil) {
		lock(&sched.lock)
		_p_ = pidleget()
		if _p_ != nil && _p_.gcBgMarkWorker == 0 {
			pidleput(_p_)
			_p_ = nil
		}
		unlock(&sched.lock)
		if _p_ != nil {
			acquirep(_p_)
			if wasSpinning {
				_g_.m.spinning = true
				atomic.Xadd(&sched.nmspinning, 1)
			}
			// Go back to idle GC check.
			goto stop
		}
	}

	// poll network
	if netpollinited() && atomic.Load(&netpollWaiters) > 0 && atomic.Xchg64(&sched.lastpoll, 0) != 0 {
		if _g_.m.p != 0 {
			throw("findrunnable: netpoll with p")
		}
		if _g_.m.spinning {
			throw("findrunnable: netpoll with spinning")
		}
		list := netpoll(true) // block until new work is available
		atomic.Store64(&sched.lastpoll, uint64(nanotime()))
		if !list.empty() {
			lock(&sched.lock)
			_p_ = pidleget()
			unlock(&sched.lock)
			if _p_ != nil {
				acquirep(_p_)
				gp := list.pop()
				injectglist(&list)
				casgstatus(gp, _Gwaiting, _Grunnable)
				if trace.enabled {
					traceGoUnpark(gp, 0)
				}
				return gp, false
			}
			injectglist(&list)
		}
	}
	stopm()
	goto top
}

至此我们回想下schedinit()的逻辑，我们已经将runtime.main作为初始任务放到里m0绑定的某个p的本地队列里了。故而在通过runqget从本地队列里拿g的时候，必然就拿到了runtime.main。下面接着一窥究竟

主线程调度任务

//asm_amd64.s
... ...
MOVQ	$runtime·mainPC(SB), AX		// entry
... ...

DATA	runtime·mainPC+0(SB)/8,$runtime·main(SB)
GLOBL	runtime·mainPC(SB),RODATA,$8

(gdb) b runtime.main
Breakpoint 7 at 0x426470: file /home/app/go/src/runtime/proc.go, line 113.

上述可知go启动的主线程的任务函数为runtime.main：

// The main goroutine.
func main() {
	g := getg() //获取main goroutine

	...

	if GOARCH != "wasm" { // no threads on wasm yet, so no sysmon
    //在系统栈上运行sysmon
		systemstack(func() {
      //分配一个新的m 运行sysmon系统后台监控 定期垃圾回收和调度抢占
			newm(sysmon, nil)
		})
	}
 /*将主 goroutine 锁定到主 OS 线程上， 在初始化期间。 大多数程序不会关心，但有一些确实需要主线程进行某些调用。
  那些可以安排 main.main 在主线程中运行,通过在初始化期间调用 runtime.LockOSThread 来保留锁。
  */
	lockOSThread()
	//确保是主线程
	if g.m != &m0 {
		throw("runtime.main not on m0")
	}
	//runtime内部init函数的执行 编译器动态生成
	doInit(&runtime_inittask) // must be before defer
  ...

	// Defer unlock so that runtime.Goexit during init does the unlock too.
	needUnlock := true
	defer func() {
		if needUnlock {
			unlockOSThread()
		}
	}()

  ...
  //gc 启动一个goroutine进行gc清扫
	gcenable()

	...
	//执行init函数，编译器动态生成，且包括用户自定义的所有的init函数
	doInit(&main_inittask)

	close(main_init_done)

	needUnlock = false
	unlockOSThread()

	if isarchive || islibrary {
		// A program compiled with -buildmode=c-archive or c-shared
		// has a main, but it is not executed.
		return
	}
  //真正执行用户编写的package main中的main function
	fn := main_main // make an indirect call, as the linker doesn't know the address of the main package when laying down the runtime
	fn()
	...
  //退出程序
	exit(0)

	for {
		var x *int32
		*x = 0
	}
}

到此我们终于看到我们自己敲的这段

package main

func main(){
	println("hello world")
}

main function的调用的地方了。调用之前还做了一些其他工作：

创建一个新的线程来执行sysmon，来定期垃圾回收、调度抢占
检查确保当前是在主线程上运行
runtime init函数执行
创建一个线程启动gc清扫
执行main_init函数，执行编译器生成和用户自定义的init函数

func main (){}函数结尾留了一个代码彩蛋

for {
		var x *int32
		*x = 0
	}

有人知道是干嘛用的嘛？评论区说出你的想法💡