kotlin 进阶教程二：协程本文主要分析协程的原理。主要内容有： CoroutineContext 源码分析协

本文不讲用法，适合有协程经验的同学。

本文主要分析协程的原理。主要内容有：

CoroutineContext 源码分析
协程异常处理机制
协程挂起原理

协程的作者也对原理讲的很详细，可以参考：Kotlin 协程。

1 CoroutineContext

1.1 源码分析

lifecycleScope 的 coroutineContext 是 SupervisorJob() + Dispatchers.Main.immediate，要搞懂这个 ”+“ 的含义，得看 CoroutineContext 的源码。

CoroutineContext 是协程的上下文，每个 CoroutineScope 包含一个上下文。

具体的继承关系，可以参考下面我画的流程图：

看下 CoroutineContext 的源码：

// CoroutineContext.kt
public interface CoroutineContext {

    public operator fun <E : Element> get(key: Key<E>): E?

    public fun <R> fold(initial: R, operation: (R, Element) -> R): R
    
    public operator fun plus(context: CoroutineContext): CoroutineContext

    public fun minusKey(key: Key<*>): CoroutineContext
    
    public interface Key<E : Element>
    
    public interface Element : CoroutineContext { ... }
}

Key 是一种 CoroutineContext 的唯一标记，注意，是“一种”，每种 CoroutineContext 的 Key 是唯一的。举个例子：

public interface ContinuationInterceptor : CoroutineContext.Element {
	companion object Key : CoroutineContext.Key<ContinuationInterceptor>
}

CoroutineContext 的 get 函数和 minusKey 函数都是根据 key 来处理的。

现在可以看下 “+”号的实现：plus函数

public operator fun plus(context: CoroutineContext): CoroutineContext =
        if (context === EmptyCoroutineContext) this else // fast path -- avoid lambda creation
            context.fold(this) { acc, element ->
                val removed = acc.minusKey(element.key)
                if (removed === EmptyCoroutineContext) element else {
                    // make sure interceptor is always last in the context (and thus is fast to get when present)
                    val interceptor = removed[ContinuationInterceptor]
                    if (interceptor == null) CombinedContext(removed, element) else {
                        val left = removed.minusKey(ContinuationInterceptor)
                        if (left === EmptyCoroutineContext) CombinedContext(element, interceptor) else
                            CombinedContext(CombinedContext(left, element), interceptor)
                    }
                }
            }

第一步，判断加号右边是不是 EmptyCoroutineContext，如果是则直接返回自己。
第二步，执行 fold 函数，fold 方法要看子类的实现，这里主要有两种实现，CombinedContext.fold 和 Element.fold，CombinedContext 是 CoroutineContext 的子类，Element 是 CoroutineContext 的子接口。

在看 fold 的实现之前，先看下 kotlin 集合里 fold 函数：

public inline fun <T, R> Iterable<T>.fold(initial: R, operation: (acc: R, T) -> R): R {
    var accumulator = initial
    for (element in this) accumulator = operation(accumulator, element)
    return accumulator
}

这是个累加函数，initial 是初始的累加值，operation是累加器，遍历集合的每一项都会执行 operation 函数，operation 函数的第一个参数是累加值，第二个参数是集合遍历时的当前项。
下面再看 Element 和 CombinedContext的 fold 方法：

// Element
public override fun <R> fold(initial: R, operation: (R, Element) -> R): R =
    operation(initial, this)

            
// CombinedContext
public override fun <R> fold(initial: R, operation: (R, Element) -> R): R =
    operation(left.fold(initial, operation), element)

Element 可以看成是集合的单项，所以fold 方法只执行了一次 operation，第一个参数是传入的初始值（对应外面调用时的累加值），第二个参数是它自己。
CombinedContext 是一个链表，它包含两个变量，left: CoroutineContext 和 element: Element，left相当于累加值，element是链表头。累加时，先对 left执行 operation函数，得出结果后后再对 element执行累加。

接着执行 minusKey 方法，用于去除重复上下文，我们常规逻辑上看，如果添加的新值和旧值重复了，应该是要保留新值。具体逻辑解释我都写在注释里了。

// Element
public override fun minusKey(key: Key<*>): CoroutineContext =
    // 如果 key 相同，说明是同种类型，应该去掉，返回空上下文，否则返回自己。
    if (this.key == key) EmptyCoroutineContext else this

// CombinedContext
public override fun minusKey(key: Key<*>): CoroutineContext {
    // 首先根据 key 来判断 element 和传入的类型是不是同种上下文，如果是，则直接丢弃 element，返回 left。
    element[key]?.let { return left }
    // 走到这里说明 element 传入的类型是两种上下文，接着从 left 中移除同种上下文，left 分为两种情况，
    // 一种是 left 为 Element 类型， 一种是 CombinedContext 类型。
    val newLeft = left.minusKey(key)
    return when {
        // 如果 newLeft 等于 left，说明没有重复值，返回自己 [left, element]。
        newLeft === left -> this
        // 说明 left 是 element，减完后自己就为空了，返回 element。
        newLeft === EmptyCoroutineContext -> element
        // 减完后既不是空，也和老的 left 不相等，说明老的 left 是个包含传入 key 的 ConbinedContext，应该重新组合
        // 成新的 CombinedContext: [newLeft, element]。
        else -> CombinedContext(newLeft, element)
    }
}

这里的 element[key] 用到了 operator get() 方法，来看下定义：

// Element
public override operator fun <E : Element> get(key: Key<E>): E? =
    @Suppress("UNCHECKED_CAST")
    if (this.key == key) this as E else null

// CombinedContext
override fun <E : Element> get(key: Key<E>): E? {
    var cur = this
    while (true) {
        cur.element[key]?.let { return it }
        val next = cur.left
        if (next is CombinedContext) {
            cur = next
        } else {
            return next[key]
        }
    }
}

Element 是单项上下文，key一致就返回自己，而 CombinedContext 是多项上下文，从这个get 方法的逻辑可以看出这是个链表结构。element 是链表头部，left 是链表尾部。为了方便理解，我画了个图：

这是一个采用头插法的左向链表。左端固定，向右延伸，每次插入都放最右边，最后输出的时候是逆序的，即最后插入的最先输出，最先得到的是头结点（最后插入的一个，图中是 element4。网上有的地方甚至百度百科都说头插法最先得到尾结点，这点看个人理解吧，我不这样理解。）。这里可以得到一个结论：在 CombinedContext 中，elemet** 是最后添加的，也是最先被访问的，即最后加的上下文可以最先被访问（不考虑将拦截器始终置于头部逻辑的情况下）。**

继续看 plus 方法的逻辑，去重后，如果累加值空了，则直接返回当前项 element，即，同类型（Key）的旧上下文 + 新上下文 = 新上下文。
如果去重后累加值不为空，则要把 ContinuationInterceptor（协程拦截器）放到链表头。 ContinuationInterceptor主要用于拦截协程的执行，以便于做协程执行前的准备工作，比如切线程，它只有一个子类： CoroutineDispatcher，CoroutineDispatcher 的主要作用是切线程，所以这里一般是把 Dispatchers.IO、Dispatchers.Main 等协程调度器放到链表头，以便于能最快速地获取到。

val interceptor = removed[ContinuationInterceptor]
if (interceptor == null) CombinedContext(removed, element) else {
    val left = removed.minusKey(ContinuationInterceptor)
    if (left === EmptyCoroutineContext) CombinedContext(element, interceptor) else
        CombinedContext(CombinedContext(left, element), interceptor)
}

首先要说个知识点，伴生对象所在类的类名可以用作伴生对象的引用，这里的 removed[ContinuationInterceptor] 可以翻译成 removed.get(ContinuationInterceptor.Key)，removed是去重后的结果，这里的 get 方法调用就是为了获取到协程拦截器（一般是协程调度器）。
有了上面 CombinedContext 的讲解，这小段代码的逻辑就很容易懂了，目的就是把拦截器放到 CombinedContext 中 element 的位置。

来看个实例，SupervisorJob() + Dispatchers.Main + CoroutineName("test"):

// 第一步：SupervisorJob() + Dispatchers.Main
public operator fun plus(context: CoroutineContext): CoroutineContext =
    if (context === EmptyCoroutineContext) this else
        // 这里context.fold(this) 走的是 Element 中的代码，this 是 SupervisorJob，对应 lambda 中的 acc 参数，
        // context 是 Dispatatchers.Main，对应 lambda 中的 element 参数
        context.fold(this) { acc, element ->
            // acc（SupervisorJob）中不包含 element(Dispatchers.Main，key 为 CoroutineDispatcher.Key)
            val removed = acc.minusKey(element.key)
            if (removed === EmptyCoroutineContext) element else {
                // acc 中不包含协程拦截器，直接组成 CombinedContext，[SupervisorJob, Dispatchers.Main]
                val interceptor = removed[ContinuationInterceptor]
                if (interceptor == null) CombinedContext(removed, element) else {
                    val left = removed.minusKey(ContinuationInterceptor)
                    if (left === EmptyCoroutineContext) CombinedContext(element, interceptor) else
                        CombinedContext(CombinedContext(left, element), interceptor)
                }
            }
        }

// 第二步：CombinedContext(SupervisorJob, Dispatchers.Main) + CoroutineName("test")
public operator fun plus(context: CoroutineContext): CoroutineContext =
    if (context === EmptyCoroutineContext) this else
        // 这里context.fold(this) 走的是 Element 中的代码，this 是 [SupervisorJob, Dispatchers.Main]，对应 lambda 中的 acc 参数，
        // context 是 CoroutineName("test")，，对应 lambda 中的 element 参数
        context.fold(this) { acc, element ->
            // acc 中不包含 element(CoroutineName("test")，key 为 CoroutineName.Key)
            val removed = acc.minusKey(element.key)
            if (removed === EmptyCoroutineContext) element else {
                // acc 中包含协程拦截器，结果为：interceptor = Dispatchers.Main
                val interceptor = removed[ContinuationInterceptor]
                if (interceptor == null) CombinedContext(removed, element) else {
                    // 先移除拦截器，结果为： left = SupervisorJob
                    val left = removed.minusKey(ContinuationInterceptor)
                    if (left === EmptyCoroutineContext) CombinedContext(element, interceptor) else
                    	// 组成新的上下文 [SupervisorJob, CoroutineName("test"), Dispatchers.Main]
                        CombinedContext(CombinedContext(left, element), interceptor)
                }
            }
        }

注意这个例子还没有涉及到 CombinedContext 中的 fold 操作，如果加的是的 CombinedContext，则会先给右边 CombinedContext 的 left 和左边结果执行累加，拿到结果后再对右边 CombbinedContext 的 element 执行累加。

1.2 优化小结

可以得出几条结论：

最后添加的，也是最先被访问的，即最后加的上下文可以最先被访问

同类型（Key）的旧上下文 + 新上下文 = 新上下文

协程调度器总是在链表头部

因此，开发时需要注意：

（1）加法操作，协程调度器最好最后添加，因为内部是个左向头插法链表，最后添加的可以最先被访问，而协程调度器需要经常访问，在 plus 方法中有逻辑将其提到链表头部，最后添加，可以减少这部分逻辑的调用。
（2）尽量避免 Element + CombinedContext的操作，最好把 CombinedContext 放左边。

2 异常处理

2.1 引言

首先，我们来看一段代码的执行结果。

val scope1 = lifecycleScope + CoroutineExceptionHandler { coroutineContext, throwable ->
    Log.e("ParentErrorHandler", "Caught error: $throwable")
}

val job1 = scope1.launch {
    Log.v("Job1", "job1 start, thread: ${Thread.currentThread().name}")
    withContext(Dispatchers.IO + SupervisorJob() + CoroutineExceptionHandler { coroutineContext, throwable ->
        Log.e("WithContextErrorHandler", "Caught error: $throwable")
    }) {
        Log.v("Job1", "withContext block execute, thread: ${Thread.currentThread().name}")
        throw RuntimeException("job1 runtimeException")
    }
    Log.v("Job1", "job1 complete, thread: ${Thread.currentThread().name}")
}

val job2 = scope1.launch {
    Log.d("Job2", "job2 start")
    delay(3000)
    Log.d("Job2", "job2 completed")
}

val job3 = scope1.launch(CoroutineExceptionHandler { coroutineContext, throwable ->
    Log.i("Job3ErrorHandler", "Caught error: $throwable")
}) {
    Log.i("Job3", "job3 start")
    throw RuntimeException("job3 runtimeException")
}

val scope2 = scope1 + Job()

val job4 = scope2.launch {
    Log.w("Job4", "job4 start")
    delay(3000)
    Log.w("Job4", "job4 completed")
}

val job5 = scope2.launch(CoroutineExceptionHandler { coroutineContext, throwable ->
    Log.e("Job5ErrorHandler", "Caught error: $throwable")
}) {
    Log.e("Job5", "job5 start")
    launch(CoroutineExceptionHandler { coroutineContext, throwable ->
        Log.e("ChildErrorHandler", "Caught error: $throwable")
    }) {
        throw RuntimeException("job5 child runtimeException")
    }
}

结果如下：

先看 job1，我们在 withContext 里抛了个异常，我们在 withContext 后面继续执行操作，发现 ide 提示了警告：

而同样也会抛异常的另一个 launch 函数中，ide 却没有报警告：

这里我们先标记为 问题1。

接着来看下 job1 的结果，发现 job1 在 main 线程启动了，然后在 DefaultDispatcher-worker-1 执行了，然后抛异常，发现是 ParentErrorHandler 接收了异常，并不是 WithContextErrorHandler，这里我们标记为 问题2。
然后我们注释掉这个抛异常语句，多了条日志如下：

job1 从 DefaultDispatcher-worker-1 自动切回了 main 线程，这里我们标记为 问题3。

下面再看下 job2 和 job3，job2 延迟 3 秒执行完成，而 job3 则是直接抛异常，发现异常被 launch 时设置的 CoroutineExceptionHandler 接收了，跟 withContext 的表现不一样，这里我们标记为 问题4，异常发生后，job2的执行并没有被取消，而是延迟 3 秒后完成了。
对比 job4 和 job5，job5 的子 job 抛出异常后，job4 的执行完成的日志并没有打印，说明被取消了。而跟 job3区别是用的 scope2 是在 scope1加了个 Job()，这里我们标记为 问题5，job5 的子 job 抛出异常后，并没有被 ChildErrorHandler 处理，而是被 Job5ErrorHandler 处理了，这里标记为 问题6。

2.2 异常处理机制

异常处理是协程里面比较难懂的一部分。谷歌官方已经写的很好啦：
Exceptions in coroutines，文章是英文原文，官方也有翻译过：协程中的取消和异常 | 异常处理详解。

这里要注意下，谷歌翻译后的文章是存在错误的，看下原文：

原文意思是：这里 **child#1** 失败并不会导致 **child#2** 被取消。 但是翻译后的文章写的是：翻译是错的。我猜是原文一开始也存在错误，所以翻译也是错的，后来原文里有人评论了这个错误，原文修改了，但是翻译没有纠正。

文章里的图很经典，看完就能掌握异常处理的核心：

这张图讲了异常的传递机制，如果其中一个 Child 发生了异常，它会先传递给 Parent， Parent 会取消所有的 Child，接着取消自己，然后异常继续向上传递。

如果 Parent 使用了 SupervisorJob，那结果会是这样的：

Child 发生的异常不再向上传递。

意思是 launch 方法传 Job 没有意义。

3 挂起

协程挂起恢复主要的好处在于可以用同步的形式写异步代码，而且这些同步形式的代码是顺序执行的。要实现这一点，首先需要把同步代码转化为异步代码（即通常情况通过回调形式实现的异步代码），然后这部分代码执行时，需要在挂起点暂停，等异步代码执行完后再继续往下执行。

对于第一点，需要通过 CPS 变换来实现同步代码异步化。对于第二点，是通过状态机来实现的。

3.1 CPS 变换

CPS 变换（Continuation Passing Style Transformation）是指控制流以 Continuation 的形式显示传递的编程风格，通俗的说就是通过回调来执行控制流的编程风格，一般用于异步调用，相对的，不通过回调，同步执行的叫直接风格。Continuation 在 kotlin 的源码中会经常看到，其实就是一个回调。反编译一个 suspend 方法，可以看到参数里多了个 Continuation：

// 源码
private suspend fun a() = 1

// 反编译后
private static final Object a(Continuation $completion) {
    return Boxing.boxInt(1);
}

public interface Continuation<in T> {
    public val context: CoroutineContext

    public fun resumeWith(result: Result<T>)
}

suspend 标记的方法，编译后都会在参数里多一个 Continuation 参数，朱涛大佬画了个图来讲 suspend 的 CPS 变换过程，也可以参考他的文章：Kotlin Jetpack 实战 | 09. 图解协程原理。

3.2 状态机

kotlin 协程的作者有讲过状态机实现，可以参考：Kotlin 协程#状态机。

// 源码
val a = a()
val y = foo(a).await() // 挂起点 #1
b()
val z = bar(a, y).await() // 挂起点 #2
c(z)

// 伪代码
class <anonymous_for_state_machine> extends SuspendLambda<...> {
    // 状态机当前状态
    int label = 0
    
    // 协程的局部变量
    A a = null
    Y y = null
    
    void resumeWith(Object result) {
        if (label == 0) goto L0
        if (label == 1) goto L1
        if (label == 2) goto L2
        else throw IllegalStateException()
        
      L0:
        // 这次调用，result 应该为空
        a = a()
        label = 1
        result = foo(a).await(this) // 'this' 作为续体传递
        if (result == COROUTINE_SUSPENDED) return // 如果 await 挂起了执行则返回
      L1:
        // 外部代码传入 .await() 的结果恢复协程 
        y = (Y) result
        b()
        label = 2
        result = bar(a, y).await(this) // 'this' 作为续体传递
        if (result == COROUTINE_SUSPENDED) return // 如果 await 挂起了执行则返回
      L2:
        // 外部代码传入 .await() 的结果恢复协程
        Z z = (Z) result
        c(z)
        label = -1 // 没有其他步骤了
        return
    }          
}

现在，当协程开始时，我们调用了它的 resumeWith() —— label 是 0，然后我们跳去 L0，接着我们做一些工作，将 label 设为下一个状态—— 1，调用 .await()，如果协程执行挂起就返回。当我们想继续执行时，我们再次调用 resumeWith()，现在它继续执行到了 L1，做一些工作，将状态设为 2，调用 .await()，同样在挂起时返回。下一次它从 L3 继续，将状态设为 -1，这意味着"结束了，没有更多工作要做了"。