解读Mutex源码及其发展历程假设当前执行锁定的goroutine为goA。在锁定代码Lock中，goA原子的将计数器

引子

关于版本，我选择atomic包加入后的go版本来学习，因为代码更简洁清晰。

12b7875b（Base版本）

源码解读

package sync

import (
	"runtime"
	"sync/atomic"
)

// A Mutex is a mutual exclusion lock.
// Mutexes can be created as part of other structures;
// the zero value for a Mutex is an unlocked mutex.
type Mutex struct {
	key  int32
	sema uint32
}

// A Locker represents an object that can be locked and unlocked.
type Locker interface {
	Lock()
	Unlock()
}

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
	if atomic.AddInt32(&m.key, 1) == 1 {
		// changed from 0 to 1; we hold lock
		return
	}
	runtime.Semacquire(&m.sema)
}

// Unlock unlocks m.
// It is a run-time error if m is not locked on entry to Unlock.
//
// A locked Mutex is not associated with a particular goroutine.
// It is allowed for one goroutine to lock a Mutex and then
// arrange for another goroutine to unlock it.
func (m *Mutex) Unlock() {
	switch v := atomic.AddInt32(&m.key, -1); {
	case v == 0:
		// changed from 1 to 0; no contention
		return
	case v == -1:
		// changed from 0 to -1: wasn't locked
		// (or there are 4 billion goroutines waiting)
		panic("sync: unlock of unlocked mutex")
	}
	runtime.Semrelease(&m.sema)
}

在commit id: 12b7875b中，mutex首次引入sync/atomic包，在将cas和xadd封装后，代码变得流畅且清晰。

在早期版本中，mutex完全是公平锁的实现。 mutex结构体具有两个成员变量，key计数器和sema信号量。

假设当前执行锁定的goroutine为goA。在锁定代码Lock中，goA原子的将计数器key的值加1。计数器的值具有三个含义，0：未锁定，1：已锁定，> 1 已被其他goroutine锁定。计数器在 > 1时，说明goA没有抢到这把锁，则执行runtime.Semacquire(&m.sema)进入等待队列中，监听该信号量sema，直到被唤醒。

runtime.Semacquire(&m.sema)将goA置于等待队列的头部，同时将goA陷于Lock代码中。一旦goA被唤醒，那么他将跳出Lock函数，相当于goA获取到了锁。

goA在执行完业务代码后，进入解锁方法UnLock。goA原子的将计数器key的值减1。此处关注到key值的边界条件，key值一定是正数，因为不可能存在-1个goroutine在等待队列中。-1有两种可能，第一中可能是解锁了一个没有被锁定的mutex，第二种可能是等待队列中的goroutine个数超过了int32的表示范围。也就是说，在该版本的mutex实现中，等待队列长度的上限是int32的表示范围2^32-1。触发边界条件的后果就是panic。goA在将计数器key值减1后，唤醒等待队列中的对头元素，加解锁方法就此闭环。

实现分析

mutex在commit id:12b7875b的实现非常简单，完全可以用简陋来形容。总结一句话就是：基于计数器原子操作和等待队列的数值判断。

优点：简单。用最少的变量实现了公平锁的功能。简单的好处就是为后续迭代留下了充足的空间，试想用极其复杂的方法实现了一个公平锁，后续迭代将会异常复杂。

缺点：

显而易见，在这种简陋公平锁实现下，等待队列中的goroutine可能会被饿死。
goroutine陷入等待队列，被剥夺cpu。在被唤醒后，涉及一次固定的上下文切换来将cpu交给goroutine。也就是说，等待队列中goroutine有多少个，就会涉及到多少次上下文切换。即使goroutine的上下文切换完全处于用户态，这个开销也不容小视。

dd2074c8

源码解读

// Copyright 2009 The Go Authors. All rights reserved.
// Use of this source code is governed by a BSD-style
// license that can be found in the LICENSE file.

// Package sync provides basic synchronization primitives such as mutual
// exclusion locks.  Other than the Once and WaitGroup types, most are intended
// for use by low-level library routines.  Higher-level synchronization is
// better done via channels and communication.
package sync

import (
	"runtime"
	"sync/atomic"
)

// A Mutex is a mutual exclusion lock.
// Mutexes can be created as part of other structures;
// the zero value for a Mutex is an unlocked mutex.
type Mutex struct {
	state int32
	sema  uint32
}

// A Locker represents an object that can be locked and unlocked.
type Locker interface {
	Lock()
	Unlock()
}

const (
	mutexLocked = 1 << iota // mutex is locked
	mutexWoken
	mutexWaiterShift = iota
)

// Lock locks m.
// If the lock is already in use, the calling goroutine
// blocks until the mutex is available.
func (m *Mutex) Lock() {
	// Fast path: grab unlocked mutex.
	if atomic.CompareAndSwapInt32(&m.state, 0, mutexLocked) {
		return
	}

	awoke := false
	for {
		old := m.state
		new := old | mutexLocked
		if old&mutexLocked != 0 {
			new = old + 1<<mutexWaiterShift
		}
		if awoke {
			// The goroutine has been woken from sleep,
			// so we need to reset the flag in either case.
			new &^= mutexWoken
		}
		if atomic.CompareAndSwapInt32(&m.state, old, new) {
			if old&mutexLocked == 0 {
				break
			}
			runtime.Semacquire(&m.sema)
			awoke = true
		}
	}
}

// Unlock unlocks m.
// It is a run-time error if m is not locked on entry to Unlock.
//
// A locked Mutex is not associated with a particular goroutine.
// It is allowed for one goroutine to lock a Mutex and then
// arrange for another goroutine to unlock it.
func (m *Mutex) Unlock() {
	// Fast path: drop lock bit.
	new := atomic.AddInt32(&m.state, -mutexLocked)
	if (new+mutexLocked)&mutexLocked == 0 {
		panic("sync: unlock of unlocked mutex")
	}

	old := new
	for {
		// If there are no waiters or a goroutine has already
		// been woken or grabbed the lock, no need to wake anyone.
		if old>>mutexWaiterShift == 0 || old&(mutexLocked|mutexWoken) != 0 {
			return
		}
		// Grab the right to wake someone.
		new = (old - 1<<mutexWaiterShift) | mutexWoken
		if atomic.CompareAndSwapInt32(&m.state, old, new) {
			runtime.Semrelease(&m.sema)
			return
		}
		old = m.state
	}
}

在commit id: dd2074c8中，我们立刻发现，mutex结构体中的计数器名称从key-> state，这意味着key的语义发生了改变，从计数器的语义变成了状态的语义。

紧接着，相比于上版本实现，可以看到，除了新增定义的mutexLocked和mutexWoken，还有一个与状态无关的mutexWaiterShift = iota，用于标识mutex等待队列的长度。

基于以上定义，我们修改最开始的认知，计数器名称从key变为state，实际上是在计数器语义上叠加了状态标识语义，而不是从计数器变成状态标识，即state = 计数器+状态。一个变量同时支持两种功能，我们见到过类似的实现，如http header的实现，一个bit slot就是一个状态，而从第n位~第n+m位可以用来记录header的某些字段，如包编号。

字段分析.png

mutexLocked = 1 << iota = 1: 标志位，用于标识mutex是否被锁定。
mutexWoken = 2: 标志位，用于表示mutex是否已经唤醒过等待队列中的goroutine。
mutexWaiterShift = 2: 用于记录等待队列的长度。

Lock

我们同样假设当前抢锁的goroutine名为goA，正在执行Lock方法。

首先，goA CAS的将mutex的state置为已锁定，如果CAS尝试成功，则说明获取到了锁，立刻返回。立刻抢锁的关键在于state必须为0值，意味着mutex的等待队列长度为0。一旦有任何一个goroutine在等待获取锁，则本次尝试必定是失败的。

如果CAS失败，则需要进入到慢路径，进入for循环一遍一遍的尝试获取锁：

首先，goA记录state的原值old，并使用按位或|叠加上Locked状态生成新值new。
如果原值为已锁定状态，则goA需要进入等待队列，将等待队列长度+1。new = old + 1<<mutexWaiterShift = old + 1 << 2 = old + 4 = old + 0100，相当于跳过了woken & locked标志位占用的bit slot，对等待队列的长度进行+1。
goA进入for循环有两种情形，首次进入Lock方法，或从等待队列中被唤醒。从等待队列中被唤醒时，需要清空woken状态。这里有一个小技巧，A &^ B，即可将实现在A中清空B下bit位为1的功能。
尝试CAS的将新值写入state。注意，即使此时写入成功，也不代表goA获取到了锁。goA即使写state成功，也仅是在state中写入了锁标记mutexLocked，并在一些情况下增加了等待队列的长度或重置mutexWoken标志位。
什么情况下算抢锁成功？在原值old为未锁定的情况下。只有原值为未锁定，写入的锁定标志位才意味着抢锁成功。
如果原值old为已锁定，那么goA老老实实的进入等待队列中，同时标记局部变量woken = true，用来在下一次被唤醒时重置mutexWoken标记。

Lock流程图.png

UnLock

goA退出临界区后释放锁调用UnLock方法。

由于mutexLocked是state二进制位的第一位，对state - mutexLocked即可达到解锁的效果，即new := atomic.AddInt32(&m.state, -mutexLocked)。

注意，在对state进行解锁的一瞬间，解锁前抢锁的goroutine进入等待队列 + 写入等待队列+1，而解锁后抢锁的goroutine可以立刻抢锁成功。抢锁成功有两条成功路径：

等待队列长度为0
成功写入mutexLocked状态

相比于Base版本的实现，该版本提升了锁的吞吐量。在等待队列长度不为0的前提下，任何goroutine调用Locked方法都会尝试向state中写入mutexLock，即尝试获取锁，而不是无脑的进入等待队列中等待。

另一方面，处于等待队列中的goroutine，在被唤醒后需要等待被分配cpu才能再次进入for循环的，而此时新到达的goroutine由于占有cpu，大概率已经写入mutexLocked成功了。这意味着，虽然mutex维度的吞吐量提升了，但是在竞态条件激烈的情况下，等待队列中的goroutine获取到锁的概率会更低。

继续回到UnLock。在goA释放了锁后，开始进入for循环：

在等待队列长度为0 或锁已经被锁定、唤醒过等待队列中goroutine的情况下，可以直接视为解锁成功。
- 等待队列长度为0意味着mutex没有任何需要唤醒的goroutine，此时mutex在已经释放的前提下所有状态 + 等待队列长度已经为初始状态，不需要更多操作
- 锁已经被锁定、唤醒，说明已经有其他goroutine加锁或解锁过，goA不应重复释放或释放其他goroutine锁定、释放的锁。
如果状态正常，则准备对等待队列-1、写入mutexWoken状态。一旦写入成功，则说明解锁成功，需要唤醒等待队列中的goroutine。如果写入失败，说明有其他goroutine正在加锁或锁已经被其他goroutine抢锁成功后释放，需要再次进行重试防止释放其他goroutine释放的锁或等待队列长度长度计数不正确。

实现分析

相比于Base版本，该版本通过”到达的goroutine直接参与抢锁“的机制增加了mutex的吞吐量，但并未解决等待队列goroutine被”饿死“的问题。

to be continue...