0x03. CoroutineDispatcher概览

231 阅读3分钟

CoroutineDispatcher是ContinuationInterceptor的子类,意味它也有拦截的能力。在Dispatchers.kt中定义了几种CoroutineDispatcher,分别是Default,IO,Main以及Unconfined。它们都是CoroutineDispatcher的子类,都拥有调度协程的能力。我们根据协程任务的性质来决定使用哪一种Dispatcher。Default适合CPU密集型的任务,IO适合IO密集型任务,Main适合必须在主线程运行的任务,如更新UI,Unconfined会在当前线程调度。

正如上一节末尾提到的例子,当我们把completion的context设置为Dispatchers.Default的时候,协程体和completion的运行线程就变成了DefaultDispatcher-worker-2,这是因为在intercepted阶段会取出协程体context中的ContinuationInterceptor调用,而协程体的context则来自于上一步传入的completion的context,正是我们的Dispatchers.Default。

取出inteceptor后会调用interceptContinuation方法,让我们看看Dispatchers.Default的interceptContinuation方法:

// CoroutineDispatcher

public final override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> =
        DispatchedContinuation(this, continuation)

具体实现在父类里,返回了一个DispatchedContinuation,又是一层封装。接下来就要resumeWith了,继续看看:

override fun resumeWith(result: Result<T>) {
    val context = continuation.context
    val state = result.toState()
    if (dispatcher.isDispatchNeeded(context)) {
        _state = state
        resumeMode = MODE_ATOMIC
        dispatcher.dispatch(context, this)
    } else {
        executeUnconfined(state, MODE_ATOMIC) {
            withCoroutineContext(this.context, countOrElement) {
                continuation.resumeWith(result)
            }
        }
    }
}

CoroutineDispatcher的isDispatchNeeded方法默认返回true,所以一般是走第一个分支,else分支以后再分析。dispatcher是调用interceptContinuation的对象,这样又回到了Dispatchers.Default,它的dispatch长这样:

// Dispatcher

override fun dispatch(context: CoroutineContext, block: Runnable): Unit = coroutineScheduler.dispatch(block)

private var coroutineScheduler = createScheduler()

private fun createScheduler() =
    CoroutineScheduler(corePoolSize, maxPoolSize, idleWorkerKeepAliveNs, schedulerName)

// CoroutineScheduler
fun dispatch(block: Runnable, taskContext: TaskContext = NonBlockingContext, tailDispatch: Boolean = false) {
    trackTask() // this is needed for virtual time support
    // 创建任务
    val task = createTask(block, taskContext)
    // try to submit the task to the local queue and act depending on the result
    // 当前线程
    val currentWorker = currentWorker()
    // 提交任务
    val notAdded = currentWorker.submitToLocalQueue(task, tailDispatch)
    if (notAdded != null) {
        // 提交至GlobalQueue
        if (!addToGlobalQueue(notAdded)) {
            // Global queue is closed in the last step of close/shutdown -- no more tasks should be accepted
            throw RejectedExecutionException("$schedulerName was terminated")
        }
    }
    val skipUnpark = tailDispatch && currentWorker != null
    // Checking 'task' instead of 'notAdded' is completely okay
    if (task.mode == TASK_NON_BLOCKING) {
        if (skipUnpark) return
        // 提醒Cpu线程开始干活
        signalCpuWork()
    } else {
        // Increment blocking tasks anyway
        signalBlockingWork(skipUnpark = skipUnpark)
    }
}

CoroutineScheduler即是协程调度器,Task和Worker都是协程调度器相关的接口,前者表示一个协程任务,实现了Runnable接口;后者表示一个协程执行者,在jvm中继承了Thread,与线程绑定。Worker内部和调度器都有各自的任务队列,但前者一般是私有的,后者则可以供所有需要的Worker调取。dispatch中会先把任务添加到当前线程Woeker的内部队列里,添加失败时才会转而添加到全局队列中,最后通知Cpu线程开始干活(Cpu线程概念稍后解释)。

那么这里我认为有有三处比较重要,一是submitToLocalQueue,二是addToGlobalQueue,三是signalCpuWork。接下来我们依次解析。

首先是submitToLocalQueue:

private fun Worker?.submitToLocalQueue(task: Task, tailDispatch: Boolean): Task? {
    // 当前线程没有可用worker时直接返回任务
    if (this == null) return task
    // worker已终止,不可用
    if (state === WorkerState.TERMINATED) return task
    // 阻塞型worker不可添加非阻塞型任务
    if (task.mode == TASK_NON_BLOCKING && state === WorkerState.BLOCKING) {
        return task
    }
    mayHaveLocalTasks = true
    return localQueue.add(task, fair = tailDispatch)
}

Worker和Task都有阻塞和非阻塞之分。它们的类型如下:

// worker state
enum class WorkerState {
    CPU_ACQUIRED,

    BLOCKING,

    PARKING,

    DORMANT,

    TERMINATED
}

// task mode
internal const val TASK_NON_BLOCKING = 0

internal const val TASK_PROBABLY_BLOCKING = 1

Task初始化时需要一个TaskContext,它里面包含了mode,而context又是dispatcher们调度的时候传进来的,一般Default调度器调度的任务就是TASK_NON_BLOCKING,而IO调度器调度的任务就是TASK_PROBABLY_BLOCKING。

WorkerState的类型要复杂一些,它是不断变化的。Worker在创建的时候state默认是DORMANT,终止后变为TERMINATED,而在调度任务的过程中会在CPU_ACQUIRED,BLOCKING,PARKING之间流转,之后我们便会看到。

接下来是addToGlobalQueue:

private fun addToGlobalQueue(task: Task): Boolean {
    return if (task.isBlocking) {
        globalBlockingQueue.addLast(task)
    } else {
        globalCpuQueue.addLast(task)
    }
}

有两个队列,分别存储阻塞型和非阻塞型任务。这两个队列都是LockFreeTaskQueue类型,基于链表。由于可能存在并发读写,所以内部采用了大量的cas操作保证并发安全。

最后是signalCpuWork:

fun signalCpuWork() {
    if (tryUnpark()) return
    if (tryCreateWorker()) return
    tryUnpark()
}

private fun tryUnpark(): Boolean {
    while (true) {
        val worker = parkedWorkersStackPop() ?: return false
        if (worker.workerCtl.compareAndSet(PARKED, CLAIMED)) {
            LockSupport.unpark(worker)
            return true
        }
    }
}

private fun tryCreateWorker(state: Long = controlState.value): Boolean {
    val created = createdWorkers(state)
    val blocking = blockingTasks(state)
    val cpuWorkers = (created - blocking).coerceAtLeast(0)
    /*
        * We check how many threads are there to handle non-blocking work,
        * and create one more if we have not enough of them.
        */
    if (cpuWorkers < corePoolSize) {
        val newCpuWorkers = createNewWorker()
        // If we've created the first cpu worker and corePoolSize > 1 then create
        // one more (second) cpu worker, so that stealing between them is operational
        if (newCpuWorkers == 1 && corePoolSize > 1) createNewWorker()
        if (newCpuWorkers > 0) return true
    }
    return false
}

首先会调用tryUnpark,在parkedWorkersStack中出栈一个挂起的线程,更新线程状态并恢复执行。若是没有可用的Worker,就调用tryCreateWorker创建worker,若创建的线程已经很多了,就再尝试一次unpark。tryCreateWorker中Worker数量和阻塞task数量是正相关的,两者的差值表示为CpuWorker数。若阻塞task数量过多,worker线程和task的差值小于核心线程数时会马上创建新的worker;task被不断消费,数量减少,差值大于核心数就不再创建,空闲的worker在取不到任务后就会挂起,直至超时退出。

这里分析一下CoroutineScheduler中的Worker。scheduler本身实现了Executor,能够像线程池一样通过execute方法执行线程任务,而且也拥有和线程池类似的配置参数:corePoolSize核心线程数,maxPoolSize最大线程数,idleWorkerKeepAliveNs,空闲线程最长存活时间,schedulerName调度器名称。核心线程在scheduler中叫做CpuWorker,除此以外的线程叫做BlockingWorker,它们之间的区分便是以之前提到的WorkState来标识的。CPU_ACQUIRED表示拿到了Cpu任务的执行权限,会更倾向于执行Cpu密集型任务,BLOCKING表示执行阻塞任务。

同时我们看到在计算过程中当前的已创建worker数和阻塞任务数都是通过controlState拿到的,那么它又是什么呢

private const val BLOCKING_SHIFT = 21 // 2M threads max
private const val CREATED_MASK: Long = (1L shl BLOCKING_SHIFT) - 1
private const val BLOCKING_MASK: Long = CREATED_MASK shl BLOCKING_SHIFT
private const val CPU_PERMITS_SHIFT = BLOCKING_SHIFT * 2
private const val CPU_PERMITS_MASK = CREATED_MASK shl CPU_PERMITS_SHIFT

private val controlState = atomic(corePoolSize.toLong() shl CPU_PERMITS_SHIFT)

private val createdWorkers: Int inline get() = (controlState.value and CREATED_MASK).toInt()
// ...

它是一个原子操作对象,封装了一个Long类型变量,有三种CREATED_MASK,BLOCKING_MASK,CPU_PERMITS_MASK,每21位一分割,从低到高分别可以通过位运算取出已创建Worker数,阻塞任务数,可用Cpu线程数,这也是经典设计模式了。

初始情况下在此时就会创建一个Worker执行任务了,让我们看看它是怎么执行的吧。

override fun run() = runWorker()

private fun runWorker() {
    var rescanned = false
    // 没退出就一直执行
    while (!isTerminated && state != WorkerState.TERMINATED) {
        // 取任务
        val task = findTask(mayHaveLocalTasks)
        if (task != null) {
            rescanned = false
            minDelayUntilStealableTaskNs = 0L
            // 执行
            executeTask(task)
            continue
        } else {
            mayHaveLocalTasks = false
        }

        // 不为0说明还有可窃取的任务,会通过重新扫描或挂起一会的方式继续尝试窃取
        if (minDelayUntilStealableTaskNs != 0L) {
            if (!rescanned) {
                rescanned = true
            } else {
                rescanned = false
                tryReleaseCpu(WorkerState.PARKING)
                interrupted()
                LockSupport.parkNanos(minDelayUntilStealableTaskNs)
                minDelayUntilStealableTaskNs = 0L
            }
            continue
        }

        // 说明真的没任务了,尝试挂起
        tryPark()
    }
    // 让出cpu,线程退出
    tryReleaseCpu(WorkerState.TERMINATED)
}

整体还是比较清晰的,值得注意的地方有三个,findTask,executeTask,tryPark。老样子,依次分析。

fun findTask(scanLocalQueue: Boolean): Task? {
    if (tryAcquireCpuPermit()) return findAnyTask(scanLocalQueue)
    // If we can't acquire a CPU permit -- attempt to find blocking task
    val task = if (scanLocalQueue) {
        localQueue.poll() ?: globalBlockingQueue.removeFirstOrNull()
    } else {
        globalBlockingQueue.removeFirstOrNull()
    }
    return task ?: trySteal(blockingOnly = true)
}

private fun tryAcquireCpuPermit(): Boolean = when {
    state == WorkerState.CPU_ACQUIRED -> true
    this@CoroutineScheduler.tryAcquireCpuPermit() -> {
        state = WorkerState.CPU_ACQUIRED
        true
    }
    else -> false
}

private fun findAnyTask(scanLocalQueue: Boolean): Task? {
    if (scanLocalQueue) {
        val globalFirst = nextInt(2 * corePoolSize) == 0
        if (globalFirst) pollGlobalQueues()?.let { return it }
        localQueue.poll()?.let { return it }
        if (!globalFirst) pollGlobalQueues()?.let { return it }
    } else {
        pollGlobalQueues()?.let { return it }
    }
    return trySteal(blockingOnly = false)
}

首先尝试获取cpu执行权,这里的cpupermit应该指的是cpu任务的执行权。我们知道cpu密集型任务需要长时间使用cpu,对上下文切换比较敏感,过多的线程争用反而导致性能下降。所以对于这类任务,scheduler将执行线程的数量控制在了核心线程数之下,每次获取cpupermit的过程就是一次可用核心线程数的自减,直至为0,这样就区分开了两种worker。那么对于这两种worker,寻找可执行任务的方式也略有差别。cpu线程使用findAnyTask,会随机从本地或全局队列取出cpu型或阻塞型任务,阻塞线程也会从本地或全局队列中寻找阻塞任务来执行,而两者有一个共同点,就是都会在本地或全局队列中找不到任务时到别的Worker线程的本地队列中窃取任务,从而提高工作效率,这叫做worksteal。

WorkSteal就是一种通过主动获取任务来提高执行效率的机制,java中提供了ForkJoinPool供开发者使用,现在scheduler也实现了这样一种机制。让我们看看它的具体实现吧:

private fun trySteal(blockingOnly: Boolean): Task? {
    assert { localQueue.size == 0 }
    val created = createdWorkers
    // 0 to await an initialization and 1 to avoid excess stealing on single-core machines
    if (created < 2) {
        return null
    }

    var currentIndex = nextInt(created)
    var minDelay = Long.MAX_VALUE
    repeat(created) {
        ++currentIndex
        if (currentIndex > created) currentIndex = 1
        val worker = workers[currentIndex]
        if (worker !== null && worker !== this) {
            assert { localQueue.size == 0 }
            val stealResult = if (blockingOnly) {
                localQueue.tryStealBlockingFrom(victim = worker.localQueue)
            } else {
                localQueue.tryStealFrom(victim = worker.localQueue)
            }
            if (stealResult == TASK_STOLEN) {
                return localQueue.poll()
            } else if (stealResult > 0) {
                minDelay = min(minDelay, stealResult)
            }
        }
    }
    minDelayUntilStealableTaskNs = if (minDelay != Long.MAX_VALUE) minDelay else 0
    return null
}

主要就是从随机下标开始遍历Workers数组,调用每个worker的stealFrom方法,通过返回值判断结果,最后返回一个可执行任务或Null。stealResult有三种结果:顺利的话TASK_STOLEN,worker里没可用任务NOTHING_TO_STEAL,还有就是目标worker中最近添加的任务和当前的时间之间的间隔小于阈值,则返回这个阈值减去时间间隔的差值,runWorker中会根据这个值挂起一段时间,意思就是一个任务被添加到一个worker的本地队列之后最少等待阈值这么长的时间才能被其他worker窃取,可能是为了减少短时间的频繁争用。

这里简单介绍下本地队列WorkQueue,就不放代码了。它内部创建了一个定长128的数组,以及两个标志位producerIndex,consumerIndex,分别标记当前存入元素的最远下标和被消费元素的最远下标,队列满时会直接返回task,加入全局队列。除此之外,它内部还有一个lastScheduledTask存储最近添加的任务,每次出队时会先取出最近任务,这样就实现了所谓的semi-FIFO,半先入先出的机制。

然后就是executeTask方法,它内部顺序调用了四个方法,我们依次来看:

  1. idleReset
private fun idleReset(mode: Int) {
    terminationDeadline = 0L // reset deadline for termination
    if (state == WorkerState.PARKING) {
        assert { mode == TASK_PROBABLY_BLOCKING }
        state = WorkerState.BLOCKING
    }
}

重置了终止期限,并把PARKING状态的worker置为BLOCKING。terminationDeadline的值通常在worker挂起一段时间前设置,并在恢复执行后判断是否这段时间内有别的worker唤醒它,没有的话就可以考虑终止自己了。这里判断worker如果是挂起状态,说明它一定是从挂起栈中恢复过来没抢到cpu任务执行权的,所以它就是阻塞型worker。

  1. beforeTask
private fun beforeTask(taskMode: Int) {
    if (taskMode == TASK_NON_BLOCKING) return
    // Always notify about new work when releasing CPU-permit to execute some blocking task
    if (tryReleaseCpu(WorkerState.BLOCKING)) {
        signalCpuWork()
    }
}

阻塞型worker让出cpu任务执行权。

  1. runSafely
fun runSafely(task: Task) {
    try {
        task.run()
    } catch (e: Throwable) {
        val thread = Thread.currentThread()
        thread.uncaughtExceptionHandler.uncaughtException(thread, e)
    } finally {
        unTrackTask()
    }
}

执行任务,捕获异常。

  1. afterTask
private fun afterTask(taskMode: Int) {
    if (taskMode == TASK_NON_BLOCKING) return
    decrementBlockingTasks()
    val currentState = state
    // Shutdown sequence of blocking dispatcher
    if (currentState !== WorkerState.TERMINATED) {
        assert { currentState == WorkerState.BLOCKING } // "Expected BLOCKING state, but has $currentState"
        state = WorkerState.DORMANT
    }
}

阻塞任务数减一,重置阻塞worker状态。

最后就是tryPark方法了:

private fun tryPark() {
    if (!inStack()) {
        parkedWorkersStackPush(this)
        return
    }
    assert { localQueue.size == 0 }
    workerCtl.value = PARKED // Update value once
    while (inStack() && workerCtl.value == PARKED) { // Prevent spurious wakeups
        if (isTerminated || state == WorkerState.TERMINATED) break
        tryReleaseCpu(WorkerState.PARKING)
        interrupted() // Cleanup interruptions
        park()
    }
}

fun parkedWorkersStackPush(worker: Worker): Boolean {
    if (worker.nextParkedWorker !== NOT_IN_STACK) return false // already in stack, bail out

    parkedWorkersStack.loop { top ->
        val index = (top and PARKED_INDEX_MASK).toInt()
        val updVersion = (top + PARKED_VERSION_INC) and PARKED_VERSION_MASK
        val updIndex = worker.indexInArray
        assert { updIndex != 0 } // only this worker can push itself, cannot be terminated
        worker.nextParkedWorker = workers[index]

        if (parkedWorkersStack.compareAndSet(top, updVersion or updIndex.toLong())) return true
    }
}

private fun park() {
    if (terminationDeadline == 0L) terminationDeadline = System.nanoTime() + idleWorkerKeepAliveNs
    // actually park
    LockSupport.parkNanos(idleWorkerKeepAliveNs)
    if (System.nanoTime() - terminationDeadline >= 0) {
        terminationDeadline = 0L // if attempt to terminate worker fails we'd extend deadline again
        tryTerminateWorker()
    }
}

没入栈把当前worker放入挂起栈,然后返回再扫描一次;如果已经入栈了就设置挂起状态让出cpu执行权,最后在一个循环中不断挂起恢复,直到别的worker唤醒或超时终止。挂起栈之前就提到过,这次终于看到了真面目,就是一个Long类型,由两部分组成,低位是栈顶worker的数组下标,高位是版本号,用来解决并发时的ABA问题。同时栈顶的worker会将内部的nextParkedWorker变量赋值为原栈顶worker,这样出栈的时候就能保证恢复。

最后超时的话就调用tryTerminateWorker终止自身,大致就是一堆状态和worker数组的维护,这里就不分析了。

毫无疑问,调度器的实现是复杂的,它毕竟将任务调度的实现从操作系统中接管了一部分过来,可以说它是协程的核心之一了。