线程池源码剖析

99 阅读10分钟

写在最前面

大家好我是傻鱼

这篇文章是我自己为了巩固自己的多线程知识来进行的一个总结文章,适合对线程有一定理解,明白如何使用线程池,但是想看看线程池里面到底发生了什么的同学和我一起观看.

开始分析

我们这里基于最基础的线程池ThreadPoolExcutor进行分型,当然后续还有很多高深的其他线程池,后续会11分析.

创建线程池:

新手知道线程池的基本基本上都是面试问答会问,常用的几种线程池,实际上这几种线程池都是对ThreadPoolExcutor这个核心线程池的不同参数调用,所以在这里我基本不推荐你们记忆到底有几种常见线程池,而是直接看源代码去理解,每个参数在线程池中的意义. 可以看一下single的线程池是怎么生成的:

/**
 * Creates an Executor that uses a single worker thread operating
 * off an unbounded queue, and uses the provided ThreadFactory to
 * create a new thread when needed. Unlike the otherwise
 * equivalent {@code newFixedThreadPool(1, threadFactory)} the
 * returned executor is guaranteed not to be reconfigurable to use
 * additional threads.
 *
 * @param threadFactory the factory to use when creating new
 * threads
 *
 * @return the newly created single-threaded Executor
 * @throws NullPointerException if threadFactory is null
 */
public static ExecutorService newSingleThreadExecutor(ThreadFactory threadFactory) {
    return new FinalizableDelegatedExecutorService
        (new ThreadPoolExecutor(1, 1,
                                0L, TimeUnit.MILLISECONDS,
                                new LinkedBlockingQueue<Runnable>(),
                                threadFactory));

上面的代码实际上是调用同一个构造方法来实现的

接下来我们用代码新建一个线程池:

ExecutorService executor = new ThreadPoolExecutor(5, 10, 600L, TimeUnit.SECONDS,
        new ArrayBlockingQueue<>(100), new DefaultThreadFactory("poolDefault"),

下面是线程池的生成方法:

/**
 * Creates a new {@code ThreadPoolExecutor} with the given initial
 * parameters.
 *
 * @param corePoolSize 线程的核心线程数量,即使线程空闲他也不会停止,除非你设置了
 *        {@code allowCoreThreadTimeOut}变量.
 * @param maximumPoolSize 线程池允许存在的最大线程数量.
 * @param keepAliveTime 当最大线程数量超过核心线程数,允许线程空闲的时间.
 * @param unit keepAliveTime参数的单位.
 * @param workQueue 在任务被执行前的存储队列.
 * @param threadFactory 线程池创建新线程的工厂方法.
 * @param handler 当线程池达到最大值,并且等待队列也满的时候需要执行的策略.
 * @throws IllegalArgumentException if one of the following holds:<br>
 *         {@code corePoolSize < 0}<br>
 *         {@code keepAliveTime < 0}<br>
 *         {@code maximumPoolSize <= 0}<br>
 *         {@code maximumPoolSize < corePoolSize}
 * @throws NullPointerException if {@code workQueue}
 *         or {@code threadFactory} or {@code handler} is null
 */
public ThreadPoolExecutor(int corePoolSize,
                          int maximumPoolSize,
                          long keepAliveTime,
                          TimeUnit unit,
                          BlockingQueue<Runnable> workQueue,
                          ThreadFactory threadFactory,
                          RejectedExecutionHandler handler) {
    if (corePoolSize < 0 ||
        maximumPoolSize <= 0 ||
        maximumPoolSize < corePoolSize ||
        keepAliveTime < 0)
        throw new IllegalArgumentException();
    if (workQueue == null || threadFactory == null || handler == null)
        throw new NullPointerException();
    this.acc = System.getSecurityManager() == null ?
            null :
            AccessController.getContext();
    this.corePoolSize = corePoolSize;
    this.maximumPoolSize = maximumPoolSize;
    this.workQueue = workQueue;
    this.keepAliveTime = unit.toNanos(keepAliveTime);
    this.threadFactory = threadFactory;
    this.handler = handler;
}

这一步结束之后,你就拥有了一个新的线程池,线程池在创建的初始并不会直接调用工厂方法对其中的线程进行初始化,直到有任务进来需要执行后才会开始初始化线程. 到这一步

任务新进入到线程池

对刚新建的线程池提交一个任务:

executor.submit(() -> {
    System.out.println("thread id is: " + Thread.currentThread().getId());
    try {
        Thread.sleep(1000L);
    } catch (InterruptedException e) {
        e.printStackTrace();
    }

根据我们新建的线程池类型,使用类型匹配,追踪到他最终执行的方法.在线程中会开始执行execute方法:

/**
 * @throws RejectedExecutionException {@inheritDoc}
 * @throws NullPointerException       {@inheritDoc}
 */
public Future<?> submit(Runnable task) {
    if (task == null) throw new NullPointerException();
    RunnableFuture<Void> ftask = newTaskFor(task, null);
    execute(ftask);
    return ftask;
}

下面的代码可以看到.所有任务被提交到线程池后,基本执行的以下4个步骤

  1. 创建线程.
  2. 等待队列.
  3. 核心线程.
  4. 拒绝策略.
/**
 * Executes the given task sometime in the future.  The task
 * may execute in a new thread or in an existing pooled thread.
 *
 * If the task cannot be submitted for execution, either because this
 * executor has been shutdown or because its capacity has been reached,
 * the task is handled by the current {@code RejectedExecutionHandler}.
 *
 * @param command the task to execute
 * @throws RejectedExecutionException at discretion of
 *         {@code RejectedExecutionHandler}, if the task
 *         cannot be accepted for execution
 * @throws NullPointerException if {@code command} is null
 */
public void execute(Runnable command) {
    if (command == null)
        throw new NullPointerException();
    /*
     * Proceed in 3 steps:
     *
     * 1. If fewer than corePoolSize threads are running, try to
     * start a new thread with the given command as its first
     * task.  The call to addWorker atomically checks runState and
     * workerCount, and so prevents false alarms that would add
     * threads when it shouldn't, by returning false.
     *
     * 2. If a task can be successfully queued, then we still need
     * to double-check whether we should have added a thread
     * (because existing ones died since last checking) or that
     * the pool shut down since entry into this method. So we
     * recheck state and if necessary roll back the enqueuing if
     * stopped, or start a new thread if there are none.
     *
     * 3. If we cannot queue task, then we try to add a new
     * thread.  If it fails, we know we are shut down or saturated
     * and so reject the task.
     */
    int c = ctl.get();
    //确认当前的woker少于核心线程池,则直接将任务放入核心线程池中,并启动firshTask
    //一个woker可以理解为一个线程
    if (workerCountOf(c) < corePoolSize) {
        if (addWorker(command, true))
            return;
        c = ctl.get();
    }
    //确认整个线程池运行状态,将任务添加至等待队列.
    //如果这个时候核心线程之前任务执行成功了,在这里实际上还是会放到等待队列中,worker会从队列中获取任务
    if (isRunning(c) && workQueue.offer(command)) {
        //如果线程池仍然处于可用,并且等待队列已经添加了task
        int recheck = ctl.get();
        if (! isRunning(recheck) && remove(command))
            reject(command);
        //当线程数为0的时候,这边会重新新建线程,这里是防止在添加等待队列的时候,线程池刚好将所有线程都关闭了,这样会导致等待队列有任务,但是没有worker了
        else if (workerCountOf(recheck) == 0)
            addWorker(null, false);
    }
    //这里会增加非核心线程,达到线程的maxpool,当maxpool都没办法处理的时候,就会启动拒绝策略
    else if (!addWorker(command, false))
        reject(command);
}

addWoker() 这个方法属于增加线程的方法,只有在需要增加线程的时候调用;

/**
 * 根据当前池状态和给定的边界(核心或最大值)检查是否可以添加一个新的worker。如果成功,
 * worker计数会相应地调整,如果可能的话,会创建并启动一个新的worker,将firstTask
 * 作为它的第一个任务运行。如果池已停止或符合关闭条件,则此方法返回false。如果线程工厂在请求 
 * 时未能创建线程,它还返回false。如果线程创建失败,要么由于线程工厂返回null,要么由于异常
 * (通常是thread .start()中的OutOfMemoryError)),我们会干净地回滚。
 *
 * @param firstTask the task the new thread should run first (or
 * null if none). Workers are created with an initial first task
 * (in method execute()) to bypass queuing when there are fewer
 * than corePoolSize threads (in which case we always start one),
 * or when the queue is full (in which case we must bypass queue).
 * Initially idle threads are usually created via
 * prestartCoreThread or to replace other dying workers.
 *
 * @param core if true use corePoolSize as bound, else
 * maximumPoolSize. (A boolean indicator is used here rather than a
 * value to ensure reads of fresh values after checking other pool
 * state).
 * @return true if successful
 */
private boolean addWorker(Runnable firstTask, boolean core) {
    retry:
    for (;;) {
    //整体的线程池状态
        int c = ctl.get();
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN &&
            ! (rs == SHUTDOWN &&
               firstTask == null &&
               ! workQueue.isEmpty()))
            return false;

        for (;;) {
            //获取当前线程数量
            int wc = workerCountOf(c);
            //查看是否是增加核心线程数的
            if (wc >= CAPACITY ||
                wc >= (core ? corePoolSize : maximumPoolSize))
                return false;
            //原子类进行增加mark当前线程池线程数量
            if (compareAndIncrementWorkerCount(c))
                break retry;
            c = ctl.get();  // Re-read ctl
            if (runStateOf(c) != rs)
                continue retry;
            // else CAS failed due to workerCount change; retry inner loop
        }
    }

    boolean workerStarted = false;
    boolean workerAdded = false;
    Worker w = null;
    try {
        //这个地方会获取线程工厂,在这里进行
        w = new Worker(firstTask);
        final Thread t = w.thread;
        if (t != null) {
            //进行上锁,这里上锁主要是因为需要对当前的线程池进行操作,这里会让线程池进行强引用.
            final ReentrantLock mainLock = this.mainLock;
            mainLock.lock();
            try {
                // Recheck while holding lock.
                // Back out on ThreadFactory failure or if
                // shut down before lock acquired.
                int rs = runStateOf(ctl.get());

                if (rs < SHUTDOWN ||
                    (rs == SHUTDOWN && firstTask == null)) {
                    if (t.isAlive()) // precheck that t is startable
                        throw new IllegalThreadStateException();
                    //这里记录了目前所有woker的一个set
                    workers.add(w);
                    int s = workers.size();
                    if (s > largestPoolSize)
                        largestPoolSize = s;
                    workerAdded = true;
                }
            } finally {
                mainLock.unlock();
            }
            if (workerAdded) {
                //启动这个线程
                t.start();
                workerStarted = true;
            }
        }
    } finally {
        if (! workerStarted)
            addWorkerFailed(w);
    }
    return workerStarted;
}

到目前为止你已经掌握了任务在submit到线程池时会发生什么.我在这里做一个小节:

  1. 如果线程池没有达到核心线程数量,则新增线程并初始化task直到满足核心线程.
  2. 如果核心线程满足,则会将所有线程放入等待队列进行排队.
  3. 如果等待队列无法添加,则会新增非核心线程直到线程最大值上线.
  4. 如果核心线程,等待队列,最大线程都被打满,会执行拒绝策略.

等待队列的被消费

一旦线程开始消费,线程池的管理就变成了内部worker的管理,接下来我们聚焦到worker内部发生了什么

/**
 * Creates with given first task and thread from ThreadFactory.
 * @param firstTask the first task (null if none)
 */
 //创建Worker的方法
Worker(Runnable firstTask) {
    setState(-1); // inhibit interrupts until runWorker
    this.firstTask = firstTask;
    this.thread = getThreadFactory().newThread(this);
}

/** Delegates main run loop to outer runWorker  */
//调用start()后,系统会在cpu空闲的时候启动run()
public void run() {
    runWorker(this);
}

这里可以看到所有的worker内部的run()都是调用runWorker();

/**
 * Main worker run loop.  Repeatedly gets tasks from queue and
 * executes them, while coping with a number of issues:
 *
 * 1. We may start out with an initial task, in which case we
 * don't need to get the first one. Otherwise, as long as pool is
 * running, we get tasks from getTask. If it returns null then the
 * worker exits due to changed pool state or configuration
 * parameters.  Other exits result from exception throws in
 * external code, in which case completedAbruptly holds, which
 * usually leads processWorkerExit to replace this thread.
 *
 * 2. Before running any task, the lock is acquired to prevent
 * other pool interrupts while the task is executing, and then we
 * ensure that unless pool is stopping, this thread does not have
 * its interrupt set.
 *
 * 3. Each task run is preceded by a call to beforeExecute, which
 * might throw an exception, in which case we cause thread to die
 * (breaking loop with completedAbruptly true) without processing
 * the task.
 *
 * 4. Assuming beforeExecute completes normally, we run the task,
 * gathering any of its thrown exceptions to send to afterExecute.
 * We separately handle RuntimeException, Error (both of which the
 * specs guarantee that we trap) and arbitrary Throwables.
 * Because we cannot rethrow Throwables within Runnable.run, we
 * wrap them within Errors on the way out (to the thread's
 * UncaughtExceptionHandler).  Any thrown exception also
 * conservatively causes thread to die.
 *
 * 5. After task.run completes, we call afterExecute, which may
 * also throw an exception, which will also cause thread to
 * die. According to JLS Sec 14.20, this exception is the one that
 * will be in effect even if task.run throws.
 *
 * The net effect of the exception mechanics is that afterExecute
 * and the thread's UncaughtExceptionHandler have as accurate
 * information as we can provide about any problems encountered by
 * user code.
 *
 * @param w the worker
 */
final void runWorker(Worker w) {
    Thread wt = Thread.currentThread();
    Runnable task = w.firstTask;
    w.firstTask = null;
    w.unlock(); // allow interrupts
    boolean completedAbruptly = true;
    try {
        这里会先调用初始化task.接着从等待队列中获取task,getTask()获取等待队列中的task
        while (task != null || (task = getTask()) != null) {
            w.lock();
            // If pool is stopping, ensure thread is interrupted;
            // if not, ensure thread is not interrupted.  This
            // requires a recheck in second case to deal with
            // shutdownNow race while clearing interrupt
            //对线程stop的一些处理
            if ((runStateAtLeast(ctl.get(), STOP) ||
                 (Thread.interrupted() &&
                  runStateAtLeast(ctl.get(), STOP))) &&
                !wt.isInterrupted())
                wt.interrupt();
            try {
                beforeExecute(wt, task);
                Throwable thrown = null;
                try {
                    //执行任务
                    task.run();
                } catch (RuntimeException x) {
                    thrown = x; throw x;
                } catch (Error x) {
                    thrown = x; throw x;
                } catch (Throwable x) {
                    thrown = x; throw new Error(x);
                } finally {
                    afterExecute(task, thrown);
                }
            } finally {
                task = null;
                w.completedTasks++;
                w.unlock();
            }
        }
        completedAbruptly = false;
    } finally {
        processWorkerExit(w, completedAbruptly);
    }
}

看到这里就能知道,一直要到所有的等待队列中的数据执行完毕,然后线程就会在最后的processWorkerExit执行销毁.

当前任务都消费完毕的时候

但是等等是不是有什么不对?因为在之前说了核心线程是不会在所有任务执行完毕之后被销毁的,除非定义了销毁参数.那我们漏了哪里呢,再深入展开代码.进行观察. 让我们来看看如何获得阻塞队列的任务的: getTask()

/**
 * Performs blocking or timed wait for a task, depending on
 * current configuration settings, or returns null if this worker
 * must exit because of any of:
 * 1. There are more than maximumPoolSize workers (due to
 *    a call to setMaximumPoolSize).
 * 2. The pool is stopped.
 * 3. The pool is shutdown and the queue is empty.
 * 4. This worker timed out waiting for a task, and timed-out
 *    workers are subject to termination (that is,
 *    {@code allowCoreThreadTimeOut || workerCount > corePoolSize})
 *    both before and after the timed wait, and if the queue is
 *    non-empty, this worker is not the last thread in the pool.
 *
 * @return task, or null if the worker must exit, in which case
 *         workerCount is decremented
 */
private Runnable getTask() {
    boolean timedOut = false; // Did the last poll() time out?

    for (;;) {
        //获取整体状态
        int c = ctl.get();
        //获取运行状态
        int rs = runStateOf(c);

        // Check if queue empty only if necessary.
        if (rs >= SHUTDOWN && (rs >= STOP || workQueue.isEmpty())) {
            decrementWorkerCount();
            return null;
        }

        int wc = workerCountOf(c);

        // Are workers subject to culling?
        boolean timed = allowCoreThreadTimeOut || wc > corePoolSize;

        if ((wc > maximumPoolSize || (timed && timedOut))
            && (wc > 1 || workQueue.isEmpty())) {
            if (compareAndDecrementWorkerCount(c))
                return null;
            continue;
        }

        try {
            //这里获取任务,注意这个time参数,这个参数是为非核心线程,或者核心线程允许销毁时调用的.
            Runnable r = timed ?
                workQueue.poll(keepAliveTime, TimeUnit.NANOSECONDS) :
                workQueue.take();
            if (r != null)
                return r;
            timedOut = true;
        } catch (InterruptedException retry) {
            timedOut = false;
        }
    }
}

这里好像没有什么特殊的.我们再展开看看take();不同的阻塞队列不同,我这边先针对我们最常用的ArrayBlockingQueue,后续章节会再讨论不同阻塞队列的效果.

public E take() throws InterruptedException {
    final ReentrantLock lock = this.lock;
    lock.lockInterruptibly();
    try {
        while (count == 0)
            //注意这里,这里对神明的condition进行了await
            notEmpty.await();
        return dequeue();
    } finally {
        lock.unlock();
    }
}

再次重启消费

代码看到这里就能够知道.为什么线程会在执行队列为空的时候进行等待. 可以看到在等待队列中,对获取锁的这个线程进行了等待,所以到这里,当worker无法获取新的task的时候,他就会陷入等待中.等待有新的数据被offer()到队列中

/**
 * Inserts the specified element at the tail of this queue if it is
 * possible to do so immediately without exceeding the queue's capacity,
 * returning {@code true} upon success and {@code false} if this queue
 * is full.  This method is generally preferable to method {@link #add},
 * which can fail to insert an element only by throwing an exception.
 *
 * @throws NullPointerException if the specified element is null
 */
public boolean offer(E e) {
    checkNotNull(e);
    final ReentrantLock lock = this.lock;
    lock.lock();
    try {
        if (count == items.length)
            return false;
        else {
            enqueue(e);
            return true;
        }
    } finally {
        lock.unlock();
    }
}
/**
 * Inserts element at current put position, advances, and signals.
 * Call only when holding lock.
 */
private void enqueue(E x) {
    // assert lock.getHoldCount() == 1;
    // assert items[putIndex] == null;
    final Object[] items = this.items;
    items[putIndex] = x;
    if (++putIndex == items.length)
        putIndex = 0;
    count++;
    //这一步对conditio进行了唤醒操作.在这一步唤醒了刚才正在等待的worker.
    notEmpty.signal();
}

worker的终结

当timeout参数被激活时,我们就不会对等待队列调用take()而是调用poll. poll结束之后就进入了线程销毁的倒计时.

public E poll(long timeout, TimeUnit unit) throws InterruptedException {
    long nanos = unit.toNanos(timeout);
    final ReentrantLock lock = this.lock;
    lock.lockInterruptibly();
    try {
        while (count == 0) {
            if (nanos <= 0)
                return null;
            nanos = notEmpty.awaitNanos(nanos);
        }
        return dequeue();
    } finally {
        lock.unlock();
    }
}
/**
 * Extracts element at current take position, advances, and signals.
 * Call only when holding lock.
 */
private E dequeue() {
    // assert lock.getHoldCount() == 1;
    // assert items[takeIndex] != null;
    final Object[] items = this.items;
    @SuppressWarnings("unchecked")
    E x = (E) items[takeIndex];
    items[takeIndex] = null;
    if (++takeIndex == items.length)
        takeIndex = 0;
    count--;
    if (itrs != null)
        itrs.elementDequeued();
    //在这里就让队列正常返回啦.所以线程就自动进入到销毁啦
    notFull.signal();
    return x;
}

进行线程销毁作业

/**
 * Performs cleanup and bookkeeping for a dying worker. Called
 * only from worker threads. Unless completedAbruptly is set,
 * assumes that workerCount has already been adjusted to account
 * for exit.  This method removes thread from worker set, and
 * possibly terminates the pool or replaces the worker if either
 * it exited due to user task exception or if fewer than
 * corePoolSize workers are running or queue is non-empty but
 * there are no workers.
 *
 * @param w the worker
 * @param completedAbruptly if the worker died due to user exception
 */
private void processWorkerExit(Worker w, boolean completedAbruptly) {
    if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted
        decrementWorkerCount();
    final ReentrantLock mainLock = this.mainLock;
    mainLock.lock();
    try {
        completedTaskCount += w.completedTasks;
        workers.remove(w);
    } finally {
        mainLock.unlock();
    }

    tryTerminate();
    //为了防止线程池中所有线程全部被销毁,导致我们线程池北回收.搞了这么个东西.
    int c = ctl.get();
    if (runStateLessThan(c, STOP)) {
        if (!completedAbruptly) {
            int min = allowCoreThreadTimeOut ? 0 : corePoolSize;
            if (min == 0 && ! workQueue.isEmpty())
                min = 1;
            if (workerCountOf(c) >= min)
                return; // replacement not needed
        }
        addWorker(null, false);
    }
}

到这里为止,整个线程池中线程的生命周期都剖析完毕. 接下来的我将会再从多个不同的阻塞队列类型进行源码分析.并进行实战演练.

#线程池的类型和用法:

1)ArrayBlockingQueue:由数组实现的有界阻塞队列,该队列按照 FIFO 对元素进行排序。维护两个整形变量,标识队列头尾在数组中的位置,在生产者放入和消费者获取数据共用一个锁对象,意味着两者无法真正的并行运行,性能较低。

2)LinkedBlockingQueue:由链表组成的有界阻塞队列,如果不指定大小,默认使用 Integer.MAX_VALUE 作为队列大小,该队列按照 FIFO 对元素进行排序,对生产者和消费者分别维护了独立的锁来控制数据同步,意味着该队列有着更高的并发性能。

3)SynchronousQueue:不存储元素的阻塞队列,无容量,可以设置公平或非公平模式,插入操作必须等待获取操作移除元素,反之亦然。

4)PriorityBlockingQueue:支持优先级排序的无界阻塞队列,默认情况下根据自然序排序,也可以指定 Comparator。

5)DelayQueue:支持延时获取元素的无界阻塞队列,创建元素时可以指定多久之后才能从队列中获取元素,常用于缓存系统或定时任务调度系统。

6)LinkedTransferQueue:一个由链表结构组成的无界阻塞队列,与LinkedBlockingQueue相比多了transfer和tryTranfer方法,该方法在有消费者等待接收元素时会立即将元素传递给消费者。

7)LinkedBlockingDeque:一个由链表结构组成的双端阻塞队列,可以从队列的两端插入和删除元素。

我怎么知道我每个参数大概要配多少

计算公式:

Ncpu = CPU 核数

Ucpu = 目标 CPU 利用率,0 <= Ucpu <= 1

W / C = 等待时间 / 计算时间的比例

要程序跑到 CPU 的目标利用率,需要的线程数为:

Nthreads = Ncpu * Ucpu * (1 + W / C)