1. ScheduledThreadPoolExecutor、DelayedQueue、Timer

ScheduledThreadPoolExecutor

//new ScheduledThreadPoolExecutor(corePoolSize)
ScheduledExecutorService scheduledExecutorService = Executors.newScheduledThreadPool(5);
scheduledExecutorService.schedule(()->{ System.out.println("hello world3");
}, 3,  TimeUnit.SECONDS);
scheduledExecutorService.schedule(()->{ System.out.println("hello world1");
}, 1,  TimeUnit.SECONDS);
scheduledExecutorService.schedule(()->{ System.out.println("hello world2");
}, 2,  TimeUnit.SECONDS);

ScheduledThreadPoolExecutor 继承于 ThreadPoolExecutor，添加任务时将Runnable包装成ScheduledFutureTask并存入DelayedWorkQueue 。ScheduledFutureTask 继承于 FutureTask，并重写了 run() 方法，其具备周期执行任务的能力。DelayedWorkQueue 内部类似优先级队列 PriorityQueue，不过内部的ScheduledFutureTask同时还记录了在heap array中的index（PriorityQueue取消任务时需要遍历找到index），deadline 最近的任务在队列头部。对于周期执行的任务，在执行完会重新设置时间，并再次放入队列中。基于小根堆的结构，使得新增和取消任务的时间复杂度都是 O(logn)。

RunnableScheduledFuture<?>[] queue =
            new RunnableScheduledFuture<?>[INITIAL_CAPACITY];

private void siftUp(int k, RunnableScheduledFuture<?> key) {
    while (k > 0) {
        int parent = (k - 1) >>> 1;
        RunnableScheduledFuture<?> e = queue[parent];
        if (key.compareTo(e) >= 0)
            break;
        queue[k] = e;
        setIndex(e, k);
        k = parent;
    }
    queue[k] = key;
    setIndex(key, k);
}

private void siftDown(int k, RunnableScheduledFuture<?> key) {
    int half = size >>> 1;
    while (k < half) {
        int child = (k << 1) + 1;
        RunnableScheduledFuture<?> c = queue[child];
        int right = child + 1;
        if (right < size && c.compareTo(queue[right]) > 0)
            c = queue[child = right];
        if (key.compareTo(c) <= 0)
            break;
        queue[k] = c;
        setIndex(c, k);
        k = child;
    }
    queue[k] = key;
    setIndex(key, k);
}

DelayedQueue

如果想实现自己的定时机制，还可以直接使用DelayedQueue，搭配异步线程使用。其内部是采用优先级队列 PriorityQueue 存储对象。DelayQueue 中的每个对象必须实现 Delayed 接口，并重写 compareTo 和 getDelay 方法

public static void main(String[] args) throws Exception {
	BlockingQueue<SampleTask> delayQueue = new DelayQueue<>();
	long now = System.currentTimeMillis();
	delayQueue.put(new SampleTask(now + 1000));
	delayQueue.put(new SampleTask(now + 20000));
	delayQueue.put(new SampleTask(now + 3000));
	for (int i = 0; i < 3; i++) {
		SampleTask poll = delayQueue.poll(2000, TimeUnit.MILLISECONDS);
		if(poll != null){
			System.out.println(new Date(poll.getTime()));
		}
	}
}

static class SampleTask implements Delayed {
	long time;
	public SampleTask(long time) {
		this.time = time;
	}
	public long getTime() {
		return time;
	}
	@Override
	public int compareTo(Delayed o) {
		return Long.compare(this.getDelay(TimeUnit.MILLISECONDS), o.getDelay(TimeUnit.MILLISECONDS));
	}
	@Override
	public long getDelay(TimeUnit unit) {
		return unit.convert(time - System.currentTimeMillis(), TimeUnit.MILLISECONDS);
	}
}

Timer

也可以简单点直接使用Timer

public static void main(String[] args) {
        AtomicInteger atomicInteger = new AtomicInteger(-1);
        Timer timer = new Timer();
        timer.schedule(new TimerTask() {
            @Override
            public void run() {
                System.out.println("task1" + new Date());
            }
        }, 1000, 1000);
        timer.schedule(new TimerTask() {
            @Override
            public void run() {
                int i = 1 / atomicInteger.getAndIncrement();
                System.out.println("task2" + new Date());
            }
        }, 1000, 2000);
    }

task1Mon Jul 11 10:34:26 CST 2022
task2Mon Jul 11 10:34:26 CST 2022
task1Mon Jul 11 10:34:27 CST 2022
Exception in thread "Timer-0" java.lang.ArithmeticException: / by zero
	at com.example.nettydemo.demo1.concurrent.TimerTest$2.run(TimerTest.java:21)
	at java.util.TimerThread.mainLoop(Timer.java:555)
	at java.util.TimerThread.run(Timer.java:505)

其内部启动了一个 TimerThread 异步线程。队列也是由数组结构实现的小根堆。异常不会被捕获，会影响到其他任务的执行，且线程直接终止了。

以上几种关于定时任务的操作，新增和取消任务的时间复杂度都是 O(logn)，面对海量任务场景，都会遇到比较严重的性能瓶颈，接下来我们看下时间轮是怎么解决这个问题的。

HashedWheelTimer

网上关于时间轮的说法很多，这里懒得画图了，简单总结下

时间轮就是类似钟表的一个环形结构，有固定的时间间隔和槽位。所有提交的任务都会根据deadLine分配到各个槽位中，但不是时间走到相应的槽位时，里面的任务都会被执行。每个任务分配到槽位时都是具备round属性，就像每分钟，乃至每小时、每天，秒钟都会走到当前6的位置，但不代表只要走到6 ，所有在6的槽位内的任务都需要被执行。所以时间轮的粗略的逻辑可以分为两块

将任务分配至槽位
取出槽位的任务，判断是否需要执行。使用时间轮的好处是，任务的新增和取消都是 O(1) 时间复杂度，而且只需要一个线程就可以驱动时间轮进行工作

示例

public static void main(String[] args) {
    Timer timer = new HashedWheelTimer();
    Timeout timeout1 = timer.newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) {
            System.out.println("task1: " + new Date());
        }
    }, 3, TimeUnit.SECONDS);
    if (!timeout1.isExpired()) {
        timeout1.cancel();
    }
    timer.newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) {
            System.out.println("task2: " + new Date());
        }
    }, 2, TimeUnit.SECONDS);
    timer.newTimeout(new TimerTask() {
        @Override
        public void run(Timeout timeout) throws InterruptedException {
            System.out.println("task3: " + new Date());
            Thread.sleep(5000);
            int i = 1/0;
        }
    }, 1, TimeUnit.SECONDS);
    //直接终止 忽略待执行的任务
    //Set<Timeout> stop = timer.stop(); 会返回没有执行的任务
}

task3: Mon Jul 11 10:30:00 CST 2022
10:30:05.966 [pool-1-thread-1] WARN io.netty.util.HashedWheelTimer - An exception was thrown by TimerTask.
java.lang.ArithmeticException: / by zero
	at com.example.nettydemo.demo1.concurrent.HashedWheelTimerTest$3.run(HashedWheelTimerTest.java:34)
	at io.netty.util.HashedWheelTimer$HashedWheelTimeout.expire(HashedWheelTimer.java:588)
	at io.netty.util.HashedWheelTimer$HashedWheelBucket.expireTimeouts(HashedWheelTimer.java:662)
	at io.netty.util.HashedWheelTimer$Worker.run(HashedWheelTimer.java:385)
	at java.lang.Thread.run(Thread.java:745)
task2: Mon Jul 11 10:30:05 CST 2022

可以看到任务1被取消，任务3先执行且5s后任务2才执行，可见任务是串行执行的,且单个任务的异常不会影响其他任务的执行。

国际惯例，我们从构造函数入手

HashedWheelTimer构造方法

Timer timer = new HashedWheelTimer();

public HashedWheelTimer() {
    //传入一个默认的ThreadFactory   那么相应的 我们也可以自己传入自定义的
    this(Executors.defaultThreadFactory());
}

public HashedWheelTimer(ThreadFactory threadFactory) {
    this(threadFactory, 100, TimeUnit.MILLISECONDS);
}

public HashedWheelTimer(
        ThreadFactory threadFactory, long tickDuration, TimeUnit unit) {
    this(threadFactory, tickDuration, unit, 512);
}

//最终调用到这个构造方法 传入 tickDuration = 100，ticksPerWheel = 512，leakDetection = true
public HashedWheelTimer(
            ThreadFactory threadFactory,
            long tickDuration, TimeUnit unit, int ticksPerWheel, boolean leakDetection) {
    //初始化槽位  下方HashedWheelTimer#createWheel
    wheel = createWheel(ticksPerWheel);
    //基于2次幂 -1 作 & 运算，等同取模，效率更高
    mask = wheel.length - 1;
    //转纳秒
    this.tickDuration = unit.toNanos(tickDuration);
    //创建工作线程 线程中传入任务 worker（new Worker() 属性）
    workerThread = threadFactory.newThread(worker);
    leak = leakDetection || !workerThread.isDaemon() ? leakDetector.open(this) : null;
}

HashedWheelTimer#createWheel

 private static HashedWheelBucket[] createWheel(int ticksPerWheel) {
    ticksPerWheel = normalizeTicksPerWheel(ticksPerWheel);
    HashedWheelBucket[] wheel = new HashedWheelBucket[ticksPerWheel];
    for (int i = 0; i < wheel.length; i ++) {
        //填充HashedWheelBucket
        wheel[i] = new HashedWheelBucket();
    }
    return wheel;
}

取不小于 ticksPerWheel的最下2次幂
private static int normalizeTicksPerWheel(int ticksPerWheel) {
    int normalizedTicksPerWheel = 1;
    while (normalizedTicksPerWheel < ticksPerWheel) {
        normalizedTicksPerWheel <<= 1;
    }
    return normalizedTicksPerWheel;
}

也可以参考hashmap的做法
 static final int tableSizeFor(int cap) {
        int n = cap - 1;
        n |= n >>> 1;
        n |= n >>> 2;
        n |= n >>> 4;
        n |= n >>> 8;
        n |= n >>> 16;
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}

可以看到构造方法创建了一个HashedWheelBucket[]数组,这就是槽位了，默认512个槽位。创建了一个线程，传入了一个任务worker，暂未启动,可以猜测的是当用户传入任务时线程才启动，然后worker死循环运转。

HashedWheelBucket

private static final class HashedWheelBucket {
    private HashedWheelTimeout head;
    private HashedWheelTimeout tail;

    public void addTimeout(HashedWheelTimeout timeout) {
        assert timeout.bucket == null;
        timeout.bucket = this;
        if (head == null) {
            head = tail = timeout;
        } else {
            tail.next = timeout;
            timeout.prev = tail;
            tail = timeout;
        }
    }
}

可以看到HashedWheelBucket就是一个HashedWheelTimeout组成的链表

时间轮加入任务 HashedWheelTimer#newTimeout

我们在看下加入定时任务时如何处理的 HashedWheelTimer#newTimeout

Timeout timeout1 = timer.newTimeout(new TimerTask() {
    @Override
    public void run(Timeout timeout) {
        System.out.println("task1: " + new Date());
    }
}, 3, TimeUnit.SECONDS);


public Timeout newTimeout(TimerTask task, long delay, TimeUnit unit) {
    //激活线程  首次调用可能短暂等待  下方HashedWheelTimer#start
    start();
    //获取deadline转化为纳秒
    long deadline = System.nanoTime() + unit.toNanos(delay) - startTime;
    //将任务包装为HashedWheelTimeout
    HashedWheelTimeout timeout = new HashedWheelTimeout(this, task, deadline);
    //Queue<HashedWheelTimeout> timeouts = PlatformDependent.newMpscQueue(); MPSC队列
    timeouts.add(timeout);
    return timeout;
}

可以看到加入的任务包装成了HashedWheelTimeout，随后加入到一个MPSC队列中，结合下方的start方法，那么所有的处理逻辑应该都是在Workder之中了

HashedWheelTimer#start

public void start() {
    switch (WORKER_STATE_UPDATER.get(this)) {
       //防止重复激活
        case WORKER_STATE_INIT:
            if (WORKER_STATE_UPDATER.compareAndSet(this, WORKER_STATE_INIT, WORKER_STATE_STARTED)) {
                workerThread.start();
            }
            break;
        ...
    }
    //startTime long类型 初始值为0
    while (startTime == 0) {
        try {
            // CountDownLatch startTimeInitialized = new CountDownLatch(1);
            // 等待计数器归0  那么相当于主线程会阻塞 ，肯定会有异步线程调用countDown，
            //毫无疑问就是上面的workerThread，而workerThread内部绑定的就是Worker，所以我们去看下Worker内部的逻辑
            startTimeInitialized.await();
        } catch (InterruptedException ignore) {
            // Ignore - it will be ready very soon.
        }
    }
}

查看Worker执行逻辑之前先简单梳理下：

创建的HashedWheelTimer内部会维护一个HashedWheelBucket[],每个数组单元就是一个槽位，HashedWheelBucket内部是HashedWheelTimeout组成的链表结构。HashedWheelTimeout是TimerTask(我们提交任务传入的参数)的包装, 我们加入的任务直接存放到MPSC队列timeouts中，暂时与HashedWheelBucket[]没有关联。

时间轮内部循环逻辑 HashedWheelTimer.Worker#run

HashedWheelTimer.Worker#run

public void run() {
    startTime = System.nanoTime();
    //激活主线程 告诉我这边就绪了 可以推任务了
    startTimeInitialized.countDown();
    do {
        //阻塞到下一次tick
        final long deadline = waitForNextTick();
        if (deadline > 0) {
           //mask用于快速取模的掩码 等于HashedWheelBucket[]长度-1，对mask取模等同取余，效率更高
           //计算下一个槽位，tick初始为0，那么开始时idx就是0
            int idx = (int) (tick & mask);
            //处理取消的任务 这个逻辑有点类似jdk的selector轮询，
               //取消cancelKeys时只是往set里存入，每次执行select时检查
            //而这里的逻辑时检查cancelledTimeouts 这个队列中是不是有取消的任务。
               //每次执行前都会看下，防止执行了不必要的任务
            processCancelledTasks();
            //取出第0位的 HashedWheelBucket
            HashedWheelBucket bucket = wheel[idx];
            //将任务从MPSC队列中取出 塞入槽位的链表中
            transferTimeoutsToBuckets();
            //处理槽位中的任务
            bucket.expireTimeouts(deadline);
            tick++;
        }
    } while (WORKER_STATE_UPDATER.get(HashedWheelTimer.this) == WORKER_STATE_STARTED);
    //可以看到时间轮不停止时，上面的循环是不会终止的
    //时间轮停止 ，开始清理
    for (HashedWheelBucket bucket: wheel) {
        bucket.clearTimeouts(unprocessedTimeouts);
    }
    for (;;) {
        HashedWheelTimeout timeout = timeouts.poll();
        if (timeout == null) {
            break;
        }
        if (!timeout.isCancelled()) {
            unprocessedTimeouts.add(timeout);
        }
    }
    processCancelledTasks();
}

HashedWheelTimer.Worker#waitForNextTick

private long waitForNextTick() {
     //默认传入的100 tick计数从0开始 ，看方法名waitForNextTick，相当于是每100毫秒,时间轮tick到下一个槽位
    long deadline = tickDuration * (tick + 1);
    for (;;) {
        final long currentTime = System.nanoTime() - startTime;
        //下面这段逻辑就是 睡眠直到 下次tick后1毫秒 返回当前时间
        //tickDuration默认为100ms，设置的越小，worder的精度越高，+1ms是为了不会唤醒的太过频繁。
        long sleepTimeMs = (deadline - currentTime + 999999) / 1000000;
        if (sleepTimeMs <= 0) {
            if (currentTime == Long.MIN_VALUE) {
                return -Long.MAX_VALUE;
            } else {
                return currentTime;
            }
        }
        try {
            Thread.sleep(sleepTimeMs);
        } catch (InterruptedException ignored) {
            if (WORKER_STATE_UPDATER.get(HashedWheelTimer.this) == WORKER_STATE_SHUTDOWN) {
                return Long.MIN_VALUE;
            }
        }
    }
}

HashedWheelTimer.Worker#transferTimeoutsToBuckets

private void transferTimeoutsToBuckets() {
    //相当于每100毫秒从队列中取10000个任务，那么队列是否支持排序，是否需要重排序就不重要了，
    //所以这是使用时间轮的好处，不需要考虑任务的添加，删除对海量数据队列的影响。
    for (int i = 0; i < 100000; i++) {
        HashedWheelTimeout timeout = timeouts.poll();
        //没有任务可取
        if (timeout == null) {
            break;
        }
        //任务取消 使用状态位标记，同理不需要考虑对队列其他元素的的影响
        if (timeout.state() == HashedWheelTimeout.ST_CANCELLED) {
            continue;
        }
        //计算总计需要多少下tick这个任务才会被执行
        long calculated = timeout.deadline / tickDuration;
        //tick是当前tick数 初始时为0，每执行完一次 + 1
        //假设tick为3，代表正在执行第4次tick,calculated为515，则代表该任务需要等到时间轮的下一轮循环时
        //才需要被执行。注意看下面，任务还是被加入到槽位的链表中，所以这个remainingRounds是判断需不需要执行
        //的关键。
        timeout.remainingRounds = (calculated - tick) / wheel.length;
        //有可能已经过了执行时间 比如delay直接传一个负数
        final long ticks = Math.max(calculated, tick); 
        //计算槽位
        int stopIndex = (int) (ticks & mask);
        HashedWheelBucket bucket = wheel[stopIndex];
        //加入链表 同时将bucket引用传给HashedWheelTimeout
        bucket.addTimeout(timeout);
    }
}

HashedWheelTimer.HashedWheelBucket#expireTimeouts

public void expireTimeouts(long deadline) {
    HashedWheelTimeout timeout = head;
    while (timeout != null) {
        boolean remove = false;
        //remainingRounds <= 0代表在本次时间轮的循环内，否则就是之后的轮次
        if (timeout.remainingRounds <= 0) {
            if (timeout.deadline <= deadline) {
                //try catch执行任务 不影响其他任务
                timeout.expire();
            } else {
               ...
            }
            remove = true;
        } else if (timeout.isCancelled()) {
            remove = true;
        } else {
            //没有执行的 remainingRounds- 1，使得下次再进如这个槽位的链表循环时可以被执行
            timeout.remainingRounds --;
        }
        HashedWheelTimeout next = timeout.next;
        if (remove) {
           //执行过的/已取消的任务在此处从链表中移除
           //可以看到时间轮将大量数据的队列重组 变为了小范围的链表重组，性能自然就高了
            remove(timeout);
        }
        timeout = next;
    }
}

总结

可以看到HashedWheelTimeout性能优越性在于无需考虑大队列的排序、重组。将任务分配到各个槽位，并标注round属性，来保证任务能得到有效执行。但其相比jdk的ScheduledThreadPoolExecutor而言会占用更多的内存，每个槽位都会维持一个链表结构，类似hashmap。同时在长时间内没有任务时或者后续的任务到期时间很长，时间轮并不会阻塞，会持续空转，有一定资源的浪费。需要解决空转的问题，可以参照kafka的实现。

用一个 DelayQueue 保存时间轮中的每个 Bucket，并且根据 Bucket 的到期时间进行排序，最近的到期时间被放在
DelayQueue 的队头。使用一个线程来读取 DelayQueue 中的任务列表，如果时间没有到，那么 DelayQueue 会一直处于阻塞状态

附：JDK 原生并发队列

阻塞队列：

阻塞队列在队列为空或者队列满时，都会发生阻塞。

ArrayBlockingQueue：最基础且开发中最常用的阻塞队列，底层采用数组实现的有界队列，初始化需要指定队列的容量。内部使用了一个重入锁 ReentrantLock，并搭配 notEmpty、notFull 两个条件变量 Condition 来控制并发访问。从队列读取数据时，如果队列为空，那么会阻塞等待，直到队列有数据了才会被唤醒。如果队列已经满了，也同样会进入阻塞状态，直到队列有空闲才会被唤醒。
LinkedBlockingQueue：内部采用的数据结构是链表，队列的长度可以是有界或者无界的，初始化不需要指定队列长度，默认是 Integer.MAX_VALUE。LinkedBlockingQueue 内部使用了 takeLock、putLock两个重入锁 ReentrantLock，以及 notEmpty、notFull 两个条件变量 Condition 来控制并发访问。采用读锁和写锁的好处是可以避免读写时相互竞争锁的现象，所以相比于 ArrayBlockingQueue，LinkedBlockingQueue 的性能要更好。
PriorityBlockingQueue：采用最小堆实现的优先级队列，队列中的元素按照优先级进行排列，每次出队都是返回优先级最高的元素。PriorityBlockingQueue 内部是使用了一个 ReentrantLock 以及一个条件变量 Condition notEmpty 来控制并发访问，不需要 notFull 是因为 PriorityBlockingQueue 是无界队列，所以每次 put 都不会发生阻塞。PriorityBlockingQueue 底层的最小堆是采用数组实现的，当元素个数大于等于最大容量时会触发扩容，在扩容时会先释放锁，保证其他元素可以正常出队，然后使用 CAS 操作确保只有一个线程可以执行扩容逻辑。
DelayQueue 一种支持延迟获取元素的阻塞队列，常用于缓存、定时任务调度等场景。DelayQueue 内部是采用优先级队列 PriorityQueue 存储对象。DelayQueue 中的每个对象都必须实现 Delayed 接口，并重写 compareTo 和 getDelay 方法。向队列中存放元素的时候必须指定延迟时间，只有延迟时间已满的元素才能从队列中取出。
SynchronizedQueue 又称无缓冲队列。比较特别的是 SynchronizedQueue 内部不会存储元素。与 ArrayBlockingQueue、LinkedBlockingQueue 不同，SynchronizedQueue 直接使用 CAS 操作控制线程的安全访问。其中 put 和 take 操作都是阻塞的，每一个 put 操作都必须阻塞等待一个 take 操作，反之亦然。所以 SynchronizedQueue 可以理解为生产者和消费者配对的场景，双方必须互相等待，直至配对成功。在 JDK 的线程池 Executors.newCachedThreadPool 中就存在 SynchronousQueue 的运用，对于新提交的任务，如果有空闲线程，将重复利用空闲线程处理任务，否则将新建线程进行处理。
LinkedTransferQueue 一种特殊的无界阻塞队列，可以看作 LinkedBlockingQueues、SynchronousQueue（公平模式）、ConcurrentLinkedQueue 的合体。与 SynchronousQueue 不同的是，LinkedTransferQueue 内部可以存储实际的数据，当执行 put 操作时，如果有等待线程，那么直接将数据交给对方，否则放入队列中。与 LinkedBlockingQueues 相比，LinkedTransferQueue 使用 CAS 无锁操作进一步提升了性能。

非阻塞队列

非阻塞队列不需要通过加锁的方式对线程阻塞,并发性能更好

ConcurrentLinkedQueue 它是一个采用双向链表实现的无界并发非阻塞队列，它属于 LinkedQueue 的安全版本。ConcurrentLinkedQueue 内部采用 CAS 操作保证线程安全，这是非阻塞队列实现的基础，相比 ArrayBlockingQueue、LinkedBlockingQueue 具备较高的性能。
ConcurrentLinkedDeque 也是一种采用双向链表结构的无界并发非阻塞队列。与 ConcurrentLinkedQueue 不同的是，ConcurrentLinkedDeque 属于双端队列，它同时支持 FIFO 和 FILO 两种模式，可以从队列的头部插入和删除数据，也可以从队列尾部插入和删除数据，适用于多生产者和多消费者的场景。

netty⑤ 源码 - HashedWheelTimer