CoroutineDispatcher是ContinuationInterceptor的子类,意味它也有拦截的能力。在Dispatchers.kt中定义了几种CoroutineDispatcher,分别是Default,IO,Main以及Unconfined。它们都是CoroutineDispatcher的子类,都拥有调度协程的能力。我们根据协程任务的性质来决定使用哪一种Dispatcher。Default适合CPU密集型的任务,IO适合IO密集型任务,Main适合必须在主线程运行的任务,如更新UI,Unconfined会在当前线程调度。
正如上一节末尾提到的例子,当我们把completion的context设置为Dispatchers.Default的时候,协程体和completion的运行线程就变成了DefaultDispatcher-worker-2,这是因为在intercepted阶段会取出协程体context中的ContinuationInterceptor调用,而协程体的context则来自于上一步传入的completion的context,正是我们的Dispatchers.Default。
取出inteceptor后会调用interceptContinuation方法,让我们看看Dispatchers.Default的interceptContinuation方法:
// CoroutineDispatcher
public final override fun <T> interceptContinuation(continuation: Continuation<T>): Continuation<T> =
DispatchedContinuation(this, continuation)
具体实现在父类里,返回了一个DispatchedContinuation,又是一层封装。接下来就要resumeWith了,继续看看:
override fun resumeWith(result: Result<T>) {
val context = continuation.context
val state = result.toState()
if (dispatcher.isDispatchNeeded(context)) {
_state = state
resumeMode = MODE_ATOMIC
dispatcher.dispatch(context, this)
} else {
executeUnconfined(state, MODE_ATOMIC) {
withCoroutineContext(this.context, countOrElement) {
continuation.resumeWith(result)
}
}
}
}
CoroutineDispatcher的isDispatchNeeded方法默认返回true,所以一般是走第一个分支,else分支以后再分析。dispatcher是调用interceptContinuation的对象,这样又回到了Dispatchers.Default,它的dispatch长这样:
// Dispatcher
override fun dispatch(context: CoroutineContext, block: Runnable): Unit = coroutineScheduler.dispatch(block)
private var coroutineScheduler = createScheduler()
private fun createScheduler() =
CoroutineScheduler(corePoolSize, maxPoolSize, idleWorkerKeepAliveNs, schedulerName)
// CoroutineScheduler
fun dispatch(block: Runnable, taskContext: TaskContext = NonBlockingContext, tailDispatch: Boolean = false) {
trackTask() // this is needed for virtual time support
// 创建任务
val task = createTask(block, taskContext)
// try to submit the task to the local queue and act depending on the result
// 当前线程
val currentWorker = currentWorker()
// 提交任务
val notAdded = currentWorker.submitToLocalQueue(task, tailDispatch)
if (notAdded != null) {
// 提交至GlobalQueue
if (!addToGlobalQueue(notAdded)) {
// Global queue is closed in the last step of close/shutdown -- no more tasks should be accepted
throw RejectedExecutionException("$schedulerName was terminated")
}
}
val skipUnpark = tailDispatch && currentWorker != null
// Checking 'task' instead of 'notAdded' is completely okay
if (task.mode == TASK_NON_BLOCKING) {
if (skipUnpark) return
// 提醒Cpu线程开始干活
signalCpuWork()
} else {
// Increment blocking tasks anyway
signalBlockingWork(skipUnpark = skipUnpark)
}
}
CoroutineScheduler即是协程调度器,Task和Worker都是协程调度器相关的接口,前者表示一个协程任务,实现了Runnable接口;后者表示一个协程执行者,在jvm中继承了Thread,与线程绑定。Worker内部和调度器都有各自的任务队列,但前者一般是私有的,后者则可以供所有需要的Worker调取。dispatch中会先把任务添加到当前线程Woeker的内部队列里,添加失败时才会转而添加到全局队列中,最后通知Cpu线程开始干活(Cpu线程概念稍后解释)。
那么这里我认为有有三处比较重要,一是submitToLocalQueue,二是addToGlobalQueue,三是signalCpuWork。接下来我们依次解析。
首先是submitToLocalQueue:
private fun Worker?.submitToLocalQueue(task: Task, tailDispatch: Boolean): Task? {
// 当前线程没有可用worker时直接返回任务
if (this == null) return task
// worker已终止,不可用
if (state === WorkerState.TERMINATED) return task
// 阻塞型worker不可添加非阻塞型任务
if (task.mode == TASK_NON_BLOCKING && state === WorkerState.BLOCKING) {
return task
}
mayHaveLocalTasks = true
return localQueue.add(task, fair = tailDispatch)
}
Worker和Task都有阻塞和非阻塞之分。它们的类型如下:
// worker state
enum class WorkerState {
CPU_ACQUIRED,
BLOCKING,
PARKING,
DORMANT,
TERMINATED
}
// task mode
internal const val TASK_NON_BLOCKING = 0
internal const val TASK_PROBABLY_BLOCKING = 1
Task初始化时需要一个TaskContext,它里面包含了mode,而context又是dispatcher们调度的时候传进来的,一般Default调度器调度的任务就是TASK_NON_BLOCKING,而IO调度器调度的任务就是TASK_PROBABLY_BLOCKING。
WorkerState的类型要复杂一些,它是不断变化的。Worker在创建的时候state默认是DORMANT,终止后变为TERMINATED,而在调度任务的过程中会在CPU_ACQUIRED,BLOCKING,PARKING之间流转,之后我们便会看到。
接下来是addToGlobalQueue:
private fun addToGlobalQueue(task: Task): Boolean {
return if (task.isBlocking) {
globalBlockingQueue.addLast(task)
} else {
globalCpuQueue.addLast(task)
}
}
有两个队列,分别存储阻塞型和非阻塞型任务。这两个队列都是LockFreeTaskQueue类型,基于链表。由于可能存在并发读写,所以内部采用了大量的cas操作保证并发安全。
最后是signalCpuWork:
fun signalCpuWork() {
if (tryUnpark()) return
if (tryCreateWorker()) return
tryUnpark()
}
private fun tryUnpark(): Boolean {
while (true) {
val worker = parkedWorkersStackPop() ?: return false
if (worker.workerCtl.compareAndSet(PARKED, CLAIMED)) {
LockSupport.unpark(worker)
return true
}
}
}
private fun tryCreateWorker(state: Long = controlState.value): Boolean {
val created = createdWorkers(state)
val blocking = blockingTasks(state)
val cpuWorkers = (created - blocking).coerceAtLeast(0)
/*
* We check how many threads are there to handle non-blocking work,
* and create one more if we have not enough of them.
*/
if (cpuWorkers < corePoolSize) {
val newCpuWorkers = createNewWorker()
// If we've created the first cpu worker and corePoolSize > 1 then create
// one more (second) cpu worker, so that stealing between them is operational
if (newCpuWorkers == 1 && corePoolSize > 1) createNewWorker()
if (newCpuWorkers > 0) return true
}
return false
}
首先会调用tryUnpark,在parkedWorkersStack中出栈一个挂起的线程,更新线程状态并恢复执行。若是没有可用的Worker,就调用tryCreateWorker创建worker,若创建的线程已经很多了,就再尝试一次unpark。tryCreateWorker中Worker数量和阻塞task数量是正相关的,两者的差值表示为CpuWorker数。若阻塞task数量过多,worker线程和task的差值小于核心线程数时会马上创建新的worker;task被不断消费,数量减少,差值大于核心数就不再创建,空闲的worker在取不到任务后就会挂起,直至超时退出。
这里分析一下CoroutineScheduler中的Worker。scheduler本身实现了Executor,能够像线程池一样通过execute方法执行线程任务,而且也拥有和线程池类似的配置参数:corePoolSize核心线程数,maxPoolSize最大线程数,idleWorkerKeepAliveNs,空闲线程最长存活时间,schedulerName调度器名称。核心线程在scheduler中叫做CpuWorker,除此以外的线程叫做BlockingWorker,它们之间的区分便是以之前提到的WorkState来标识的。CPU_ACQUIRED表示拿到了Cpu任务的执行权限,会更倾向于执行Cpu密集型任务,BLOCKING表示执行阻塞任务。
同时我们看到在计算过程中当前的已创建worker数和阻塞任务数都是通过controlState拿到的,那么它又是什么呢
private const val BLOCKING_SHIFT = 21 // 2M threads max
private const val CREATED_MASK: Long = (1L shl BLOCKING_SHIFT) - 1
private const val BLOCKING_MASK: Long = CREATED_MASK shl BLOCKING_SHIFT
private const val CPU_PERMITS_SHIFT = BLOCKING_SHIFT * 2
private const val CPU_PERMITS_MASK = CREATED_MASK shl CPU_PERMITS_SHIFT
private val controlState = atomic(corePoolSize.toLong() shl CPU_PERMITS_SHIFT)
private val createdWorkers: Int inline get() = (controlState.value and CREATED_MASK).toInt()
// ...
它是一个原子操作对象,封装了一个Long类型变量,有三种CREATED_MASK,BLOCKING_MASK,CPU_PERMITS_MASK,每21位一分割,从低到高分别可以通过位运算取出已创建Worker数,阻塞任务数,可用Cpu线程数,这也是经典设计模式了。
初始情况下在此时就会创建一个Worker执行任务了,让我们看看它是怎么执行的吧。
override fun run() = runWorker()
private fun runWorker() {
var rescanned = false
// 没退出就一直执行
while (!isTerminated && state != WorkerState.TERMINATED) {
// 取任务
val task = findTask(mayHaveLocalTasks)
if (task != null) {
rescanned = false
minDelayUntilStealableTaskNs = 0L
// 执行
executeTask(task)
continue
} else {
mayHaveLocalTasks = false
}
// 不为0说明还有可窃取的任务,会通过重新扫描或挂起一会的方式继续尝试窃取
if (minDelayUntilStealableTaskNs != 0L) {
if (!rescanned) {
rescanned = true
} else {
rescanned = false
tryReleaseCpu(WorkerState.PARKING)
interrupted()
LockSupport.parkNanos(minDelayUntilStealableTaskNs)
minDelayUntilStealableTaskNs = 0L
}
continue
}
// 说明真的没任务了,尝试挂起
tryPark()
}
// 让出cpu,线程退出
tryReleaseCpu(WorkerState.TERMINATED)
}
整体还是比较清晰的,值得注意的地方有三个,findTask,executeTask,tryPark。老样子,依次分析。
fun findTask(scanLocalQueue: Boolean): Task? {
if (tryAcquireCpuPermit()) return findAnyTask(scanLocalQueue)
// If we can't acquire a CPU permit -- attempt to find blocking task
val task = if (scanLocalQueue) {
localQueue.poll() ?: globalBlockingQueue.removeFirstOrNull()
} else {
globalBlockingQueue.removeFirstOrNull()
}
return task ?: trySteal(blockingOnly = true)
}
private fun tryAcquireCpuPermit(): Boolean = when {
state == WorkerState.CPU_ACQUIRED -> true
this@CoroutineScheduler.tryAcquireCpuPermit() -> {
state = WorkerState.CPU_ACQUIRED
true
}
else -> false
}
private fun findAnyTask(scanLocalQueue: Boolean): Task? {
if (scanLocalQueue) {
val globalFirst = nextInt(2 * corePoolSize) == 0
if (globalFirst) pollGlobalQueues()?.let { return it }
localQueue.poll()?.let { return it }
if (!globalFirst) pollGlobalQueues()?.let { return it }
} else {
pollGlobalQueues()?.let { return it }
}
return trySteal(blockingOnly = false)
}
首先尝试获取cpu执行权,这里的cpupermit应该指的是cpu任务的执行权。我们知道cpu密集型任务需要长时间使用cpu,对上下文切换比较敏感,过多的线程争用反而导致性能下降。所以对于这类任务,scheduler将执行线程的数量控制在了核心线程数之下,每次获取cpupermit的过程就是一次可用核心线程数的自减,直至为0,这样就区分开了两种worker。那么对于这两种worker,寻找可执行任务的方式也略有差别。cpu线程使用findAnyTask,会随机从本地或全局队列取出cpu型或阻塞型任务,阻塞线程也会从本地或全局队列中寻找阻塞任务来执行,而两者有一个共同点,就是都会在本地或全局队列中找不到任务时到别的Worker线程的本地队列中窃取任务,从而提高工作效率,这叫做worksteal。
WorkSteal就是一种通过主动获取任务来提高执行效率的机制,java中提供了ForkJoinPool供开发者使用,现在scheduler也实现了这样一种机制。让我们看看它的具体实现吧:
private fun trySteal(blockingOnly: Boolean): Task? {
assert { localQueue.size == 0 }
val created = createdWorkers
// 0 to await an initialization and 1 to avoid excess stealing on single-core machines
if (created < 2) {
return null
}
var currentIndex = nextInt(created)
var minDelay = Long.MAX_VALUE
repeat(created) {
++currentIndex
if (currentIndex > created) currentIndex = 1
val worker = workers[currentIndex]
if (worker !== null && worker !== this) {
assert { localQueue.size == 0 }
val stealResult = if (blockingOnly) {
localQueue.tryStealBlockingFrom(victim = worker.localQueue)
} else {
localQueue.tryStealFrom(victim = worker.localQueue)
}
if (stealResult == TASK_STOLEN) {
return localQueue.poll()
} else if (stealResult > 0) {
minDelay = min(minDelay, stealResult)
}
}
}
minDelayUntilStealableTaskNs = if (minDelay != Long.MAX_VALUE) minDelay else 0
return null
}
主要就是从随机下标开始遍历Workers数组,调用每个worker的stealFrom方法,通过返回值判断结果,最后返回一个可执行任务或Null。stealResult有三种结果:顺利的话TASK_STOLEN,worker里没可用任务NOTHING_TO_STEAL,还有就是目标worker中最近添加的任务和当前的时间之间的间隔小于阈值,则返回这个阈值减去时间间隔的差值,runWorker中会根据这个值挂起一段时间,意思就是一个任务被添加到一个worker的本地队列之后最少等待阈值这么长的时间才能被其他worker窃取,可能是为了减少短时间的频繁争用。
这里简单介绍下本地队列WorkQueue,就不放代码了。它内部创建了一个定长128的数组,以及两个标志位producerIndex,consumerIndex,分别标记当前存入元素的最远下标和被消费元素的最远下标,队列满时会直接返回task,加入全局队列。除此之外,它内部还有一个lastScheduledTask存储最近添加的任务,每次出队时会先取出最近任务,这样就实现了所谓的semi-FIFO,半先入先出的机制。
然后就是executeTask方法,它内部顺序调用了四个方法,我们依次来看:
- idleReset
private fun idleReset(mode: Int) {
terminationDeadline = 0L // reset deadline for termination
if (state == WorkerState.PARKING) {
assert { mode == TASK_PROBABLY_BLOCKING }
state = WorkerState.BLOCKING
}
}
重置了终止期限,并把PARKING状态的worker置为BLOCKING。terminationDeadline的值通常在worker挂起一段时间前设置,并在恢复执行后判断是否这段时间内有别的worker唤醒它,没有的话就可以考虑终止自己了。这里判断worker如果是挂起状态,说明它一定是从挂起栈中恢复过来没抢到cpu任务执行权的,所以它就是阻塞型worker。
- beforeTask
private fun beforeTask(taskMode: Int) {
if (taskMode == TASK_NON_BLOCKING) return
// Always notify about new work when releasing CPU-permit to execute some blocking task
if (tryReleaseCpu(WorkerState.BLOCKING)) {
signalCpuWork()
}
}
阻塞型worker让出cpu任务执行权。
- runSafely
fun runSafely(task: Task) {
try {
task.run()
} catch (e: Throwable) {
val thread = Thread.currentThread()
thread.uncaughtExceptionHandler.uncaughtException(thread, e)
} finally {
unTrackTask()
}
}
执行任务,捕获异常。
- afterTask
private fun afterTask(taskMode: Int) {
if (taskMode == TASK_NON_BLOCKING) return
decrementBlockingTasks()
val currentState = state
// Shutdown sequence of blocking dispatcher
if (currentState !== WorkerState.TERMINATED) {
assert { currentState == WorkerState.BLOCKING } // "Expected BLOCKING state, but has $currentState"
state = WorkerState.DORMANT
}
}
阻塞任务数减一,重置阻塞worker状态。
最后就是tryPark方法了:
private fun tryPark() {
if (!inStack()) {
parkedWorkersStackPush(this)
return
}
assert { localQueue.size == 0 }
workerCtl.value = PARKED // Update value once
while (inStack() && workerCtl.value == PARKED) { // Prevent spurious wakeups
if (isTerminated || state == WorkerState.TERMINATED) break
tryReleaseCpu(WorkerState.PARKING)
interrupted() // Cleanup interruptions
park()
}
}
fun parkedWorkersStackPush(worker: Worker): Boolean {
if (worker.nextParkedWorker !== NOT_IN_STACK) return false // already in stack, bail out
parkedWorkersStack.loop { top ->
val index = (top and PARKED_INDEX_MASK).toInt()
val updVersion = (top + PARKED_VERSION_INC) and PARKED_VERSION_MASK
val updIndex = worker.indexInArray
assert { updIndex != 0 } // only this worker can push itself, cannot be terminated
worker.nextParkedWorker = workers[index]
if (parkedWorkersStack.compareAndSet(top, updVersion or updIndex.toLong())) return true
}
}
private fun park() {
if (terminationDeadline == 0L) terminationDeadline = System.nanoTime() + idleWorkerKeepAliveNs
// actually park
LockSupport.parkNanos(idleWorkerKeepAliveNs)
if (System.nanoTime() - terminationDeadline >= 0) {
terminationDeadline = 0L // if attempt to terminate worker fails we'd extend deadline again
tryTerminateWorker()
}
}
没入栈把当前worker放入挂起栈,然后返回再扫描一次;如果已经入栈了就设置挂起状态让出cpu执行权,最后在一个循环中不断挂起恢复,直到别的worker唤醒或超时终止。挂起栈之前就提到过,这次终于看到了真面目,就是一个Long类型,由两部分组成,低位是栈顶worker的数组下标,高位是版本号,用来解决并发时的ABA问题。同时栈顶的worker会将内部的nextParkedWorker变量赋值为原栈顶worker,这样出栈的时候就能保证恢复。
最后超时的话就调用tryTerminateWorker终止自身,大致就是一堆状态和worker数组的维护,这里就不分析了。
毫无疑问,调度器的实现是复杂的,它毕竟将任务调度的实现从操作系统中接管了一部分过来,可以说它是协程的核心之一了。