文章测试一个系统中存在着大量的调度任务，同时调度任务存在时间的滞后性，而大量的调度任务如果每一个都使用自己的调度器来管理

go-zero 如何应对海量定时/延迟任务？| 🏆 掘金技术征文-双节特别篇

一个系统中存在着大量的调度任务，同时调度任务存在时间的滞后性，而大量的调度任务如果每一个都使用自己的调度器来管理任务的生命周期的话，浪费cpu的资源而且很低效。本文来介绍 go-zero 中 延迟操作，它可能让开发者调度多个任务时，只需关注具体的业务执行函数和执行时间「立即或者延迟」。而延迟操作，通常可以采用两个方案：

Timer：定时器维护一个优先队列，到时间点执行，然后把需要执行的 task 存储在 map 中
collection 中的 timingWheel ，维护一个存放任务组的数组，每一个槽都维护一个存储task的双向链表。开始执行时，计时器每隔指定时间执行一个槽里面的tasks。方案2把维护task从 优先队列 O(nlog(n)) 降到双向链表 O(1)，而执行task也只要轮询一个时间点的tasks O(N)，不需要像优先队列，放入和删除元素 O(nlog(n))。

我们先看看 go-zero 中自己对 timingWheel 的使用：

cache 中的 timingWheel

首先我们先来在 collection 的 cache 中关于 timingWheel 的使用：

timingWheel, err := NewTimingWheel(time.Second, slots, func(k, v interface{}) {aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaasasasasasasddsdssxsxscccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccccxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxssxsxxssxxsxssxsccddcdcdddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddddd
 key, ok := k.(string)
 if !ok {
   return
 }
 cache.Del(key)
})
if err != nil {
 return nil, err
}

这是 cache 初始化中也同时初始化 timingWheel 做key的过期处理，参数依次代表：

interval：时间划分刻度
numSlots：时间槽
execute：时间点执行函数

SetTimer() -> setTask()：not exist task：getPostion -> pushBack to list -> fdfd

可以看出，在初始化的时候就开始了 timer 执行，并以internal时间段转动，然后底层不停的获取来自 slot 中的 list 的task，交给 execute 执行。

先看在 data map 中有没有存在这个key
存在，则更新 expire -> MoveTimer()
第一次设置key -> SetTimer()

SetTimer() -> setTask()：

not exist task：getPostion -> pushBack to list -> setPosition

exist task：get from timers -> moveTask()
MoveTimer() -> moveTask() 由上面的调用链，有一个都会调用的函数：moveTask()

func (tw *TimingWheel) moveTask(task baseEntry) {
	// timers: Map => 通过key获取 [positionEntry「pos, task」]
	val, ok := tw.timers.Get(task.key)
	if !ok {
		return
	}

	timer := val.(*positionEntry)
  	// {delay < interval} => 延迟时间比一个时间格间隔还小，没有更小的刻度，说明任务应该立即执行
	if task.delay < tw.interval {
		threading.GoSafe(func() {
			tw.execute(timer.item.key, timer.item.value)
		})
		return
	}
	// 如果 > interval，则通过 延迟时间delay 计算其出时间轮中的 new pos, circle
	pos, circle := tw.getPositionAndCircle(task.delay)
	if pos >= timer.pos {
		timer.item.circle = circle
    // 记录前后的移动offset。为了后面过程重新入队
		timer.item.diff = pos - timer.pos
	} else if circle > 0 {
		// 转移到下一层，将 circle 转换为 diff 一部分
		circle--
		timer.item.circle = circle
		// 因为是一个数组，要加上 numSlots [也就是相当于要走到下一层]
		timer.item.diff = tw.numSlots + pos - timer.pos
	} else {
		// 如果 offset 提前了，此时 task 也还在第一层
		// 标记删除老的 task，并重新入队，等待被执行
		timer.item.removed = true
		newItem := &timingEntry{
			baseEntry: task,
			value:     timer.item.value,
		}
		tw.slots[pos].PushBack(newItem)
		tw.setTimerPosition(pos, newItem)
	 }
}

以上过程有以下几种情况：

delay < internal：因为 < 单个时间精度，表示这个任务已经过期，需要马上执行
针对改变的 delay：
- new >= old：<newPos, newCircle, diff>
- newCircle > 0：计算diff，并将 circle 转换为下一层，故diff + numslots
  - 如果只是单纯延迟时间缩短，则将老的task标记删除，重新加入list，等待下一轮loop被execute
  - jnjndcnnjdncjncjsjncjsngjnvdjksnveknvfvvfvseervbbgf

上面的过程可以简化成下面： steps = d / interval pos = step % numSlots - 1 circle = (step - 1) / numSlots 总结总结

总结

timingWheel 靠定时器推动，时间前进的同时会取出当前时间格中 list「双向链表」的task，传递到 execute 中执行。因为是是靠 internal 固定时间刻度推进，可能就会出现：一个 60s 的task，internal = 1s，这样就会空跑59次loop。
而在扩展时间上，采取 circle 分层，这样就可以不断复用原有的 numSlots ，因为定时器在不断 loop，而执行可以把上层的 slot 下降到下层，在不断 loop 中就可以执行到上层的task。这样的设计可以在不创造额外的数据结构，突破长时间的限制。

同时在 go-zero 中还有很多实用的组件工具，用好工具对于提升服务性能和开发效率都有很大的帮助，希望本篇文章能给大家带来一些收获。

参考资料

🏆 掘金技术征文|双节特别篇