go语言的context源码和errgroupcontext概念我们经常会遇到主协程与子协程之间存在控制、协同、取消的

context概念

我们经常会遇到主协程与子协程之间存在控制、协同、取消的需求，例如：

主协程需要等待全部子协程完成：即子协程怎样通知主协程。
在使用流水线增加数据计算吞吐量的场景中：对于一个输入，该输入需要经过流水线上的多个子任务处理函数一步步串行计算最终得到结果，主协程等待一段时间后，若整体任务还没有完成则需要抛弃这个输入。而抛弃的那一刻，该任务可能处于当期某个子任务中，怎样在下一个子任务开始的时候抛弃这个输入不再浪费计算资源：即主协程通知子协程放弃任务。
对于网络请求，我们一般都会新启一个goroutine去执行，同时在执行的过程中又会新建goroutine去执行一些耗时的任务，这样层层嵌套如果单纯的使用select/chan去控制未免过于复杂

此外，上下文信息的传递也是强需求，如 http 请求的处理中，下游可能需要请求的各种参数（如用户信息、登录设备信息等），总是需要一个函数参数能够带给下游函数。

任务超时时间的控制，对于服务来说至关重要。假如某个下游任务耗时很长，那么大量的请求都阻塞在这里，同一时间内开启了过多的goroutine，占用了服务器大量资源。很可能会导致服务无法使用。Context 就是为了解决上述问题而生的，主要用来传递上下文信息，包括：取消信号、超时时间、截止时间、携带 kv 等

context接口

ype Context interface {
	// 获取是否设置了截止时间，如果ok==false则表示未设置
	// 当设置了截止时间并且已经达到的话会帮我们取消context
	Deadline() (deadline time.Time, ok bool)
	// 返回一个通道用于取消context
	Done() <-chan struct{}
	// 返回一个错误表示context为何取消
	Err() error
	// 携带值并传递，是多goroutine安全的
	Value(key any) any
}

Deadline：获取是否设置了到期时间以及所设置的截止时间。如果有设置过期时间的话，Context 会到那个时间点时自动发起取消Context的操作。
Done：返回一个通道，如果通道关闭则代表该 Context 已经被取消；如果返回的为 nil，则代表该 Context 是一个永远不会被取消的 Context。
Err：返回该 Context 被取消的原因，如果只使用 Context 包的 Context 类型的话，那么只可能返回 Canceled （代表被明确取消）或者 DeadlineExceeded （因超时而取消）。
Value：你可能经常看到代码中使用该函数从 ctx 中获取一个 string key 中的值，即一个 Context 内部是可以携带很多 kv 的，那这里的 kv 是怎么存储的呢？是用的 map 么？（答案是可以想象成一个树中的某个节点，每个节点保存了指向父节点的指针，从当前的节点一层层向根节点寻找）。

emptyCtx

type emptyCtx struct{}

func (emptyCtx) Deadline() (deadline time.Time, ok bool) {
	return
}

func (emptyCtx) Done() <-chan struct{} {
	return nil
}

func (emptyCtx) Err() error {
	return nil
}

func (emptyCtx) Value(key any) any {
	return nil
}

emptyCtx是最简单基础的一个context接口的实现。它主要是作为context树的树根。我们常使用的context.BackGround和context.TODO实际内部都是返回了一个emptyCtx。

type backgroundCtx struct{ emptyCtx }

type todoCtx struct{ emptyCtx }

// Background returns a non-nil, empty [Context]. It is never canceled, has no
// values, and has no deadline. It is typically used by the main function,
// initialization, and tests, and as the top-level Context for incoming
// requests.
func Background() Context {
	return backgroundCtx{}
}

// TODO returns a non-nil, empty [Context]. Code should use context.TODO when
// it's unclear which Context to use or it is not yet available (because the
// surrounding function has not yet been extended to accept a Context
// parameter).
func TODO() Context {
	return todoCtx{}
}

backgroundCtx主要是作为context树的树根，它无法取消和携带值。而todoCtx则是我们在编写代码的过程中不知道或者说不确定用什么类型的context的时候暂时用来占位的。

cancelCtx

带有取消功能的context可以说是这个包的一个核心功能。它的定义如下：

// A cancelCtx can be canceled. When canceled, it also cancels any children
// that implement canceler.
type cancelCtx struct {
	Context

	mu       sync.Mutex            // protects following fields
	done     atomic.Value          // of chan struct{}, created lazily, closed by first cancel call
	children map[canceler]struct{} // set to nil by the first cancel call
	err      error                 // set to non-nil by the first cancel call
	cause    error                 // set to non-nil by the first cancel call
}

func (c *cancelCtx) Value(key any) any {
	if key == &cancelCtxKey {
		return c
	}
	return value(c.Context, key)
}

// Done()返回一个通道
// done通道是一个原子变量
// 通道使用的时候才会创建，并且使用锁和双重检查确保线程安全
func (c *cancelCtx) Done() <-chan struct{} {
	d := c.done.Load()
	if d != nil {
		return d.(chan struct{})
	}
	c.mu.Lock()
	defer c.mu.Unlock()
	d = c.done.Load()
	if d == nil {
		d = make(chan struct{})
		c.done.Store(d)
	}
	return d.(chan struct{})
}

func (c *cancelCtx) Err() error {
	c.mu.Lock()
	err := c.err
	c.mu.Unlock()
	return err
}

包含了一个 Context 类型的值，存储了当前 cancelCtx 的父 Context 的指针。
done 作为取消信号的通道，子协程监听该通道了解到是否需要取消任务。
children 存储了当前 Context 衍生的所有可取消类型的子 Context。
err 会被第一次取消的时候设置。

接下来我们跟随用户使用的角度，看一下它的两个核心代码。

// WithCancel returns a copy of parent with a new Done channel. The returned
// context's Done channel is closed when the returned cancel function is called
// or when the parent context's Done channel is closed, whichever happens first.
//
// Canceling this context releases resources associated with it, so code should
// call cancel as soon as the operations running in this Context complete.
// WithCancel用于返回一个cancelCtx，带有取消context的功能
// 同类型的context会以一个树形结构相连接
// 当context取消，所有的子context都会被取消，但不影响父context和同层的context
// 这里设计的非常巧妙，通过返回一个可导出类型的函数，将cancelCtx的不可导出的方法包装导出
func WithCancel(parent Context) (ctx Context, cancel CancelFunc) {
	c := withCancel(parent)
	return c, func() { c.cancel(true, Canceled, nil) }
}

我们一般通过context.WithCancel函数传入父context并返回一个带有取消功能的cancelCtx。从上面的代码里可以看到实际生成cancelCtx的逻辑在withCancel中。我们接着看代码：

// 新建一个context要求传入的父context为非空
func withCancel(parent Context) *cancelCtx {
	if parent == nil {
		panic("cannot create context from nil parent")
	}
	c := &cancelCtx{}
	c.propagateCancel(parent, c)
	return c
}

// propagateCancel arranges for child to be canceled when parent is.
// It sets the parent context of cancelCtx.
// 注册该cancelCtx，找到它的父context
// 同类型的context才可以连接在一颗树上
func (c *cancelCtx) propagateCancel(parent Context, child canceler) {
	c.Context = parent

	done := parent.Done()
	if done == nil {
		return // parent is never canceled
	}

	// 当发现父context已经被取消，则取消所有的子context
	select {
	case <-done:
		// parent is already canceled
		child.cancel(false, parent.Err(), Cause(parent))
		return
	default:
	}

	if p, ok := parentCancelCtx(parent); ok {
		// parent is a *cancelCtx, or derives from one.
		p.mu.Lock()
		if p.err != nil {
			// parent has already been canceled
			child.cancel(false, p.err, p.cause)
		} else {
			if p.children == nil {
				p.children = make(map[canceler]struct{})
			}
			p.children[child] = struct{}{}
		}
		p.mu.Unlock()
		return
	}

	if a, ok := parent.(afterFuncer); ok {
		// parent implements an AfterFunc method.
		c.mu.Lock()
		stop := a.AfterFunc(func() {
			child.cancel(false, parent.Err(), Cause(parent))
		})
		c.Context = stopCtx{
			Context: parent,
			stop:    stop,
		}
		c.mu.Unlock()
		return
	}

	goroutines.Add(1)
	// 起一个goroutine监督父context是否被取消
	go func() {
		select {
		case <-parent.Done():
			child.cancel(false, parent.Err(), Cause(parent))
		case <-child.Done():
		}
	}()
}

propagateCancel 根据传入的父 Context 值沿着树向上查找到 cancelCtx 类型的节点，将新建的子 cancelCtx 加入到该节点的 children 中。如果找不到 cancelCtx 类型的节点的话，那么就要新启一个协程等待父 Context 被取消的时候明确调用新产生的子 cancelCtx 的取消函数，从而将 parent 和子 cancelCtx 组织成一树形结构

接下来从另外一个角度看，WithCanel返回的可以取消context的函数又是怎么取消子节点的呢。

// cancel closes c.done, cancels each of c's children, and, if
// removeFromParent is true, removes c from its parent's children.
// cancel sets c.cause to cause if this is the first time c is canceled.
// 取消当前的context，同时递归递归调用所有的子context去取消子context
func (c *cancelCtx) cancel(removeFromParent bool, err, cause error) {
	if err == nil {
		panic("context: internal error: missing cancel error")
	}
	if cause == nil {
		cause = err
	}
	c.mu.Lock()
	if c.err != nil {
		c.mu.Unlock()
		return // already canceled
	}
	c.err = err
	c.cause = cause
	d, _ := c.done.Load().(chan struct{})
	if d == nil {
		c.done.Store(closedchan)
	} else {
		close(d)
	}
	for child := range c.children {
		// NOTE: acquiring the child's lock while holding parent's lock.
		child.cancel(false, err, cause)
	}
	c.children = nil
	c.mu.Unlock()

	if removeFromParent {
		removeChild(c.Context, c)
	}
}

context的设计理念和实践

Cancelation should be advisory 取消操作应该是建议性质的，调用者并不知道被调用者内部实现，调用者不应该 interrupt/panic 被调用者。调用者应该通知被调用者处理不再必要，被调用者来决定如何处理后续操作。实现：调用者和被调用者之间利用一个单向 channel 来实现取消信息的传递，调用者发送取消信号(close)，被调用者通过监听此信号，来捕获到取消操作。

Cancelation should be transitive 取消操作应该被传播。实现：Context 是线程安全的，可以传递给多个被调用者，channel 的 close 信号是广播性质的；另外 Context 在组织上实现了父子关系的存储，取消操作会自动向下传播。

所有的长耗时、可能阻塞的任务都需要 Context
不要把 Context 放在结构体中，要以参数的方式传递（也有例外，如 http.Request 中的 request 的实现就将 Context 放入了结构体中）
以 Context 作为参数的函数方法，应该把 Context 作为第一个参数，放在第一位。
给一个函数方法传递 Context 的时候，不要传递 nil，如果不知道传递什么，就使用 context.TODO
Context 的 Value 相关方法应该传递必须的数据，不要什么数据都使用这个传递
不要试图在 Context.Value 里存某个可变更的值，然后改变
Context 是线程安全的，可以放心的在多个 goroutine 中传递
要养成关闭 Context 的习惯，特别是超时的 Context
Context 应该随 Request 消亡而消亡

errgroup

对于多个goroutine的同步控制，比如主goroutine需要等待多个下游任务的完成，常使用sync.WaitGroup来控制goroutine的并发。

func main() {
	wg := sync.WaitGroup{}
	for i := 0; i < 4; i++ {
		wg.Add(1)
		go func() {
			fmt.Println(i)
			wg.Done()
		}()
	}
	wg.Wait()
	fmt.Println("end")
}

errgroup则是借助sync.WatiGrop实现了更好的并发控制。

它使用sync.WaitGroup控制goroutine的同步顺序
使用了一个带缓冲的channel来限制goroutine的并发数
当内部的任务有error发生，只会保存顶一个，并执行用户传入的取消函数

我们看它的详细内部实现：

type token struct{}

type Group struct {
    //取消函数，使用context包生成
    cancel func(error)
    wg sync.WaitGroup
    //限制goroutine并发数量
    sem chan token
    errOnce sync.Once
    err error
}

func WithContext(ctx context.Context) (*Group, context.Context) {
	ctx, cancel := withCancelCause(ctx)
	return &Group{cancel: cancel}, ctx
}

上面就是errgrop的核心结构。可以看到，它实质也是使用了sync.WaitGroup的能力控制并发，同时它还使用了一个带有缓冲的chan来控制并发goroutine的数量。

func (g *Group) Go(f func() error) {
   // 如果chan不为空，则使用它限制并发
	if g.sem != nil {
		g.sem <- token{}
	}

   //开启新goroutine之前先Add
	g.wg.Add(1)
	go func() {
           // 使用defer确保执行结束减1
		defer g.done()
           // 执行f()，并记录error
		if err := f(); err != nil {
			g.errOnce.Do(func() {
				g.err = err
                           //使用带有cancel cause的context取消goroutine并记录错误
				if g.cancel != nil {
					g.cancel(g.err)
				}
			})
		}
	}()
}

需要注意的是：errgroup并不会记录所有执行的goroutine错误，而是将并发的goroutine中的第一个错误记录下来，并使用context包中提供的功能取消goroutine

func (g *Group) Wait() error {
    g.wg.Wait()
    if g.cancel != nil {
        g.cancel(g.err)
    }
    return g.err
}

最后就是Wait方法，也是使用了sync.WaitGroup提供的方法。同时判断是否有error发生，最后返回error。

总体来说，errgroup的只是封装了sync.WaitGroup和context的一些方法，并据此提供了更为方便的并发控制。但它的缺点是只记录了第一个error，如果需要记录所有发生的错误，我们可以定义一个slice来记录所有的goroutine可能发生的错误。