Kotlin 协程使用手册 —— Job(上)

5,007 阅读8分钟

注:本文协程特质 Kotlin 协程

前一篇文章介绍了协程中的挂起函数——引出了协程中的 Continuation 接口以及 CPS 变化这一概念,详细探讨了挂起函数由挂起到恢复的整个流程。

AbstractCoroutine 继承关系 由 AbstractCoroutine 的继承关系可以看出,协程对象身兼三重身份:Continuation、Job、CoroutineScope,它们分别在协程的运作过程中起到什么作用呢? 这里对它们三者的作用进行概括:

  • Continuation(续体):隐藏回调,异步写法转换成同步写法
  • Job:控制协程的生命周期,帮助协程间进行结构性并发
  • CoroutineScope:划定协程的作用域

什么是 Job ?

Job 翻译作任务,Job 赋予协程可取消,赋予协程以生命周期,赋予协程以结构化并发的能力。其中平常使用中最为重要的是可取消、结构化并发的特点。尤其 在日常 Android 开发过程中,协程配合 Lifecycle 可以做到自动取消。

Job 的生命周期

Job 的生命周期分为 6 种状态,分为 New、Active、Completing、Cancelling、Cancelled、Completed,通常外界会持有 Job 接口会作为引用被协程调用者所持有, Job 接口提供 isActive、isCompleted、isCancelled 3 个变量使外界可以感知 Job 内部的状态,这3个变量和 Job 生命周期的6种状态的对应关系如下图所示

Job 生命周期表

这样子对 Job 的生命周期认识并不只管,我们借用三个例子来讲解 Job 的生命周期是如何运作的

New => Active => Completed

    val job = launch(start = CoroutineStart.LAZY) {
        println("Active")   
    }
    println("New")  
    job.join()
    println("Completed")    
  1. 以 lazy 方式创建出来的协程 state 为 New
  2. 对应的 job 调用 join 函数后,协程进入 Active 状态,并开始执行协程对应的具体代码
  3. 当协程执行完毕后,由于没有需要等待的子协程,协程直接进入 Completed 状态

ParentActive => ChildActive => ParentCompleting => ChildCompleted => ParentCompleted

val parent = launch {
    println("ParentActive")
    val child = launch {
        println("ChildActive")
    }
    child.invokeOnCompletion {
        println("ChildCompleted")
    }
    println("ParentCompleting")
}
parent.invokeOnCompletion {
    println("ParentCompleted")
}
  1. 这里使用 invokeOnCompletion 函数监听协程进入 Completed 和 Cancelled 状态的回调
  2. 启动父协程(代指 parent),父协程进入 Active 状态
  3. 父协程在运行代码中启动子协程(代指 child),子协程进入 Active 状态,并开始执行子协程对应的具体代码
  4. 父协程执行完自己的代码进入 Completing 状态等待子协程执行结束
  5. 子协程执行结束进入 Completed 状态,接着父协程也进入 Completed 状态

ParentActive => ChildActive => ParentCancelling => ChildCancelled => ParentCancelled

val parent = launch {
    println("ParentActive")
    val child = launch {
        println("ChildActive")
    }
    child.invokeOnCompletion {
        println("ChildCancelled")
    }
    cancel()
    println("ParentCancelling")
}
parent.invokeOnCompletion {
    println("ParentCancelled")
}
  1. 协程的取消依赖于 CancellationException ,这类异常作为取消的行为不会被外界获取
  2. 启动父协程,父协程进入 Active 状态
  3. 在父协程中启动子协程,子协程进入 Active 状态
  4. 待父协程未执行结束的时候调用 cancel 函数取消父协程,接着将取消事件传递给其所有的子协程
  5. 父协程此时进入 Cancelling 状态,等待子协程结束
  6. 子协程结束执行进入 Cancelled 状态接着父协程也进入 Cancelled 状态

Job 的取消

如何触发协程的取消,通过调用 cancel 函数或抛出异常。当使用 cancel 函数取消协程时,取消事件会由父协程向子协程传递,进而取消子协程,这是个递归的过程;当因抛出异常而取消协程会导致取消事件同时向上向下双向传递。所以这里讨论协程的取消可以依照两个维度进行探讨:

  • 通过 cancel 触发的取消事件
  • 通过非 CancellationException 异常触发的取消事件

通过 cancel 触发的取消事件

以在协程运行内部调用 cancel 的场景举例子,整体的调用链如下:

CoroutineScope.cancel(cause: CancellationException?)
-> JobSupport.cancel(cause: CancellationException?)    // 1
-> JobSupport.cancelInternal(cause: Throwable)
-> JobSupport.cancelImpl(cause: Any?)    // 2
-> JobSupport.makeCancelling(cause: Any?)    // 3
-> JobSupport.tryMakeCancelling(state: Incomplete, rootCause: Throwable)    //4
-> JobSupport.notifyCancelling(list: NodeList, cause: Throwable)    //5

对上述注有标记的函数进行重点分析:

标注 1

public override fun cancel(cause: CancellationException?) {
        cancelInternal(cause ?: defaultCancellationException())
    }

通过 cancel 函数取消协程依赖于 CancellationException 异常的传递,如果 cancel 函数没有传递任何 CancellationException 会生成一个默认的 CancellationException

标注 2

// cause is Throwable or ParentJob when cancelChild was invoked
// returns true is exception was handled, false otherwise
internal fun cancelImpl(cause: Any?): Boolean {
        var finalState: Any? = COMPLETING_ALREADY
        if (onCancelComplete) {
            // make sure it is completing, if cancelMakeCompleting returns state it means it had make it
            // completing and had recorded exception
            finalState = cancelMakeCompleting(cause)
            if (finalState === COMPLETING_WAITING_CHILDREN) return true
        }
        if (finalState === COMPLETING_ALREADY) {
            finalState = makeCancelling(cause)
        }
        return when {
            finalState === COMPLETING_ALREADY -> true
            finalState === COMPLETING_WAITING_CHILDREN -> true
            finalState === TOO_LATE_TO_CANCEL -> false
            else -> {
                afterCompletion(finalState)
                true
            }
        }
    }

上面函数中 finalState 开始会有个默认状态 COMPLETING_ALREADY,查看 onCancelComplete 的注释可知当运行的协程没有 “body block” —— 具体执行的逻辑代码,此时该参数为 true ,所以我们跳过这一段分析,最后调用了 makeCancelling(cause),接着看注解3

标注 3

    // transitions to Cancelling state
    // cause is Throwable or ParentJob when cancelChild was invoked
    // It contains a loop and never returns COMPLETING_RETRY, can return
    // COMPLETING_ALREADY -- if already completing or successfully made cancelling, added exception
    // COMPLETING_WAITING_CHILDREN -- if started waiting for children, added exception
    // TOO_LATE_TO_CANCEL -- too late to cancel, did not add exception
    // final state -- when completed, for call to afterCompletion
    private fun makeCancelling(cause: Any?): Any? {
        var causeExceptionCache: Throwable? = null // lazily init result of createCauseException(cause)
        loopOnState { state ->
            when (state) {
                is Finishing -> { // already finishing -- collect exceptions
                    val notifyRootCause = synchronized(state) {
                        if (state.isSealed) return TOO_LATE_TO_CANCEL // already sealed -- cannot add exception nor mark cancelled
                        // add exception, do nothing is parent is cancelling child that is already being cancelled
                        val wasCancelling = state.isCancelling // will notify if was not cancelling
                        // Materialize missing exception if it is the first exception (otherwise -- don't)
                        if (cause != null || !wasCancelling) {
                            val causeException = causeExceptionCache ?: createCauseException(cause).also { causeExceptionCache = it }
                            state.addExceptionLocked(causeException)
                        }
                        // take cause for notification if was not in cancelling state before
                        state.rootCause.takeIf { !wasCancelling }
                    }
                    notifyRootCause?.let { notifyCancelling(state.list, it) }
                    return COMPLETING_ALREADY
                }
                is Incomplete -> {
                    // Not yet finishing -- try to make it cancelling
                    val causeException = causeExceptionCache ?: createCauseException(cause).also { causeExceptionCache = it }
                    if (state.isActive) {
                        // active state becomes cancelling
                        if (tryMakeCancelling(state, causeException)) return COMPLETING_ALREADY
                    } else {
                        // non active state starts completing
                        val finalState = tryMakeCompleting(state, CompletedExceptionally(causeException))
                        when {
                            finalState === COMPLETING_ALREADY -> error("Cannot happen in $state")
                            finalState === COMPLETING_RETRY -> return@loopOnState
                            else -> return finalState
                        }
                    }
                }
                else -> return TOO_LATE_TO_CANCEL // already complete
            }
        }
    }

调用此方法进入 Job 的 Cancelling 状态。首先看 loopOnState 这个方法,该方法在 Job 接口的实现类 JobSupport 中出现的频率很高,因为可能存在多个线程中尝试修改 state 的状态,所以设计出一套循环读取状态的机制,当我们使用类似于 _state.compareAndSet(state, cancelling) 这种 CompareAndSet 的形式修改变量时,存在修改失败的情况,此时可能需要再次尝试读取新的状态。接着进入 state 的判断环节,不过这里需要先暂停一波~因为我们要熟悉一下 Job 实现类 JobSupport 中的几个状态,首先看下 JobSupport 是如何实现 Job 接口中的 isActive、isCompleted、isCancelled:

    public override val isActive: Boolean get() {
        val state = this.state
        return state is Incomplete && state.isActive
    }

    public final override val isCompleted: Boolean get() = state !is Incomplete

    public final override val isCancelled: Boolean get() {
        val state = this.state
        return state is CompletedExceptionally || (state is Finishing && state.isCancelling)
    }

分析上述代码可以得出结论在 JobSupport 的实现中状态的判断完全依赖于 Incomplete、CompletedExceptionally 和 Finishing 三个类对象。

  • 如 state 对象对应的类实现了 InCompete 接口,表明改写成对象还未完成
  • 如 state 对象继承了 CompletedExceptionally 类,可以代表 cancelled 状态
  • 如 state 对象对应 Finishing 类,表明该协程被正常的 cancel 或者处于等待子协程结束执行的状态

接着探究 makeCancelling 的执行,由于当我们在内部取消协程的时候,协程还未执行完毕,此时仍处于 InComplete 状态,val causeException = causeExceptionCache ?: createCauseException(cause).also { causeExceptionCache = it } 获取取消的原因,进入 createCauseException 方法:

 // cause is Throwable or ParentJob when cancelChild was invoked
    private fun createCauseException(cause: Any?): Throwable = when (cause) {
        is Throwable? -> cause ?: defaultCancellationException()
        else -> (cause as ParentJob).getChildJobCancellationCause()
    }

当 cause 为普通异常时,则返回异常本身或者默认的 CancellationException ,当 cause 为 ParentJob 时,调用 ParentJob 的 getChildJobCancellationCause 方法,很明显这是子协程调用父协程的方法获取父协程取消的原因,将父协程取消的原因传递给子协程。 当获取到取消的原因时,接着执行 tryMakeCancelling ,接着分析标注 4

标注 4

    // try make new Cancelling state on the condition that we're still in the expected state
    private fun tryMakeCancelling(state: Incomplete, rootCause: Throwable): Boolean {
        // 1
        assert { state !is Finishing } // only for non-finishing states
        assert { state.isActive } // only for active states
        // get state's list or else promote to list to correctly operate on child lists
        // 2
        val list = getOrPromoteCancellingList(state) ?: return false
        // Create cancelling state (with rootCause!)
        val cancelling = Finishing(list, false, rootCause)
        if (!_state.compareAndSet(state, cancelling)) return false
        // 3
        // Notify listeners
        notifyCancelling(list, rootCause)
        return true
    }

tryMakeCancelling 方法这里进行了三个步骤,如代码中的标注:

  1. 进行状态的断言,确保运行时处于正确的状态
  2. 将 state 状态转换为 NodeList 状态 (NodeList 内部储存着一连串的 JobNode,类似于对子协程的引用,可以用来取消子协程)
  3. 提醒 Cancelling 状态,通知取消

notifyCancelling 的具体分析见标注5

标注5

private fun notifyCancelling(list: NodeList, cause: Throwable) {
        // first cancel our own children
        onCancelling(cause)
        notifyHandlers<JobCancellingNode<*>>(list, cause)
        // then cancel parent
        cancelParent(cause) // tentative cancellation -- does not matter if there is no parent
    }

onCancelling 方法在 JobSupport 中是空实现,唯一实现的地方在 ActorCoroutine 中,这里暂时分析不到。接下来 notifyHandlers 和 cancelParent 两个方法分别对应通知子协程 、通知父协程取消。这里通知父协程并无实际的意义,因为对应的 cause 为 Cancellation

private inline fun <reified T: JobNode<*>> notifyHandlers(list: NodeList, cause: Throwable?) {
        var exception: Throwable? = null
        list.forEach<T> { node ->
            try {
                node.invoke(cause)
            } catch (ex: Throwable) {
                exception?.apply { addSuppressedThrowable(ex) } ?: run {
                    exception =  CompletionHandlerException("Exception in completion handler $node for $this", ex)
                }
            }
        }
        exception?.let { handleOnCompletionException(it) }
    }

notifyHandlers 遍历 NodeList 的节点,调用每个节点的 invoke 方法。在该例子中,节点对应的类是 ChildHandleNode,ChildHandleNode 调用了 ChildJob 接口的 parentCancelled(parentJob: ParentJob) 方法,该方法同样被 JobSupport 所实现,可以发现该方法同样会调用 cancelImpl 方法,所以子节点会递归调用 cancelImpl 方法以取消所有子节点。

异常触发的取消事件

协程的异常取消具有上下传递的特点,这一点是如何做到的呢,以下面一段代码为例子:

    val parent = launch {
        println("ParentActive")
        val child = launch {
            println("before cancel")
            throw Exception("cancel")
            println("after cancel")
        }
        child.invokeOnCompletion {
            println("ChildCancelled")
        }
        child.join()
    }
    parent.invokeOnCompletion {
        println("ParentCancelled")
    } 

这一段代码输出的结果如下:

ParentActive
before cancel
ChildCancelled
ParentCancelled

异常触发的取消事件与 cancel 触发的取消事件有一定的相似之处,我们需要借助上一章所讲的 BaseContinuationImpl 来理解整个过程,先截取出必要的帮助理解的代码,如下所示:

 val outcome: Result<Any?> =
     try {
         val outcome = invokeSuspend(param)
         if (outcome === COROUTINE_SUSPENDED) return
         Result.success(outcome)
     } catch (exception: Throwable) {
         Result.failure(exception)
     }
 releaseIntercepted() // this state machine instance is terminating
 if (completion is BaseContinuationImpl) {
     // unrolling recursion via loop
     current = completion
     param = outcome
 } else {
     // top-level completion reached -- invoke and return
     completion.resumeWith(outcome)
     return
 }

invokeSuspend 是逻辑代码实际运行的位置,如果在 invokeSuspend 中抛出异常,会中止代码的执行,并执行 AbstractCoroutine 的 resumeWith 方法,接下来的调用过程如下所示:

AbstractCoroutine.resumeWith(result: Result<T>)
-> JobSupport.makeCompletingOnce(proposedUpdate: Any?)
-> JobSupport.tryMakeCompleting(state: Any?, proposedUpdate: Any?)
-> JobSupport.tryMakeCompletingSlowPath(state: Incomplete, proposedUpdate: Any?)
-> Jobsupport.notifyCancelling(list: NodeList, cause: Throwable)

经过一系列的调用后又回到了 JobSupport 的 notifyCancelling 方法,其中 notifyHandlers 在上面已经详细分析,这里并不多加讨论,我们重点讨论 cancelParent 这个方法:

 /**
     * The method that is invoked when the job is cancelled to possibly propagate cancellation to the parent.
     * Returns `true` if the parent is responsible for handling the exception, `false` otherwise.
     *
     * Invariant: never returns `false` for instances of [CancellationException], otherwise such exception
     * may leak to the [CoroutineExceptionHandler].
     */
    private fun cancelParent(cause: Throwable): Boolean {
        // Is scoped coroutine -- don't propagate, will be rethrown
        if (isScopedCoroutine) return true

        /* CancellationException is considered "normal" and parent usually is not cancelled when child produces it.
         * This allow parent to cancel its children (normally) without being cancelled itself, unless
         * child crashes and produce some other exception during its completion.
         */
        val isCancellation = cause is CancellationException
        val parent = parentHandle
        // No parent -- ignore CE, report other exceptions.
        if (parent === null || parent === NonDisposableHandle) {
            return isCancellation
        }

        // Notify parent but don't forget to check cancellation
        return parent.childCancelled(cause) || isCancellation
    }

cancelParent 这个方法会简单的判断有没有父协程,如果有父协程会接着调用 JobSupport 的 childCancelled 方法,父协程会检查是不是除 CancellationException 外的异常,最后调用的是父协程的 cancelImpl 方法,到此剩下的调用与 cancel 导致的协程取消类似,除了所传递的异常不是 CancellationException。

cancel 真的终止运行了吗?

先来看一段代码,来看它的运行结果:

val parent = launch {
        println("ParentActive")
        val child = launch {
            println("ChildActive")
        }
        child.invokeOnCompletion {
            println("ChildCancelled")
        }
        println("parent: before cancel")
        cancel()
        println("parent: after cancel")
    }
    parent.invokeOnCompletion {
        println("ParentCancelled")
    }

上面这段代码输出的结果为:

ParentActive
parent: before cancel
parent: after cancel
ChildCancelled
ParentCancelled

为什么当在 parent: after cancel 之前调用 cancel 函数时,parent: after cancel 依旧被输出。在上一篇中我们讨论了协程的续体,我们的所有逻辑代码都被包含在 invokeSuspend 中,cancel 函数的调用和 后面打印 parent: after cancel 是处在同一个级别,cancel 成功后只是成功的修改了协程的 Job 状态,并没有终止执行,那如何可以成功终止协程的执行呢,我们可以借助 yeild() 函数(包括 delay 都是支持取消的),我们可以在 cancel 函数调用后紧接着调用 yeild() 函数检查协程的取消,yeild 函数时如何检查取消这里不详细 讨论,在这个场景下,yeild 函数检查到 state 此时是 Finish 类型的对象并且是取消的类型,随即主动抛出 CancellationException 异常终止了代码的执行。

总结

本篇内容聚焦于探讨 Kotlin 协程实现中 Job (任务) 部分的实现机制,讲解了 Job 接口实现效果中很重要的生命周期和取消事件,还剩下 Job 接口如何做到结构化并发的内容将在下一篇文章中详细讲解。