ThreadPoolExecutor的原理与调参

999 阅读8分钟

ThreadPoolExecutor

ThreadPoolExecutor是一种框架,使用这个框架能帮助我们通过多线程的方式调度任务,同时合理管理资源。

TPE中的构造参数有哪些

  • int corePoolSize:pool当中常驻的线程数目(即使他们处于idle状态),除非设置了allowCoreThreadTimeOut
  • int maximumPoolSize:pool当中最大的线程数
  • int keepAliveTime:如果线程数超过core,那么idle线程最多等待这么久,否则就会terminate
  • TimeUnit unit
  • BlockingQueue workQueue:用来容纳待处理的任务。
  • ThreadFactory threadFactory:用来创建线程
  • RejectedExecutionHandler handler:如果线程和队列容量都达到上限时候的处理方法

线程数和队列的关系

  • 当池中的线程数低于core的时候,任何刚提交进来的任务都会起一个新线程来处理,即使其他的线程此时idle。
  • 当池中的线程数超过core,但是低于max的时候,进来的任务首先要去排队。如果排队的队列也满了,就会创建一个线程处理。

推论:

  • 如果设置core和max相等,就能创建一个固定大小的线程池。
  • 如果把max设置为无限大,线程池就能接收任意数目的任务
  • core和max在使用中可以修改的,通过setCorePoolSize和setMaximumPoolSize方法。

线程工厂

默认线程工厂创建的线程都在同一个threadGroup,并且拥有相同的优先级NORM_PRIORITY和non-daemon状态。如果线程工厂创建线程失败,线程池还能继续运行,但是将不能再执行任何任务(已经接受的任务呢?)。

keep-alive time

如果当前线程数超过core,超过的那些线程如果idle超时后就会terminate,这有助于减少资源的消耗(消耗了什么资源?)。如果后面又有任务,可以再创建新的线程。这个参数也可以通过setKeepAliveTime动态调整。默认情况下,这个参数只对超过core的线程有效,也可以通过设置allowCoreThreadTimeOut让core线程也超时。

排队

任何BlockingQueue都可以用来传递和保存提交的任务。

  • 如果当前线程数低于core,优先创建新线程处理任务
  • 如果超过了core线程数,优先先排队,否则创建新线程
  • 如果队列已满,并且线程数也超过了,这个任务会被拒绝。

使用无界的队列,比如LinkedBlockingQueue,当core线程用光的时候,所有新加入的任务都要在队列中等待,而不会创建新线程。即此时最多有core这么多线程工作。

使用有界的队列,很难调参。使用大的队列,小的线程池可以最小化os的资源和上下文切换的开销,但是可能降低吞吐量。如果任务经常要阻塞,那么可以提高线程数。利用小队列和大线程池可以更好利用cpu,但是可能带来调度的开销,也降低了吞吐量。

拒绝任务

如果线程池已经shut down或者无法再接收新任务的时候,将会调用RejectedExecutionHandler的rejectedExecution(Runnable, ThreadPoolExecutor)方法。内部已经实现提供了四种处理方法:

  • 默认的,直接拒绝,抛出RejectedExecutionException异常。
  • CallerRunsPolicy,由调用execute的线程来运行任务,同时有一个负反馈机制来控制任务提交的速度(如果控制,实现?)
  • DiscardPolicy,直接丢弃
  • DiscardOldestPolicyDiscardOldestPolicy,把最老的任务丢弃,然后重试

Finalization

如果一个线程池不再被引用,并且没有线程的话,就会自动shutdown。如果想让一个不被引用的线程池自己shutdown,只要让线程自己结束。需要设置keep-alive,0 core 线程或者allowCoreThreadTimeOut。

状态

线程池内部有一些状态

    private static final int RUNNING    = -1 << COUNT_BITS;
    private static final int SHUTDOWN   =  0 << COUNT_BITS;
    private static final int STOP       =  1 << COUNT_BITS;
    private static final int TIDYING    =  2 << COUNT_BITS;
    private static final int TERMINATED =  3 << COUNT_BITS;

提交任务的过程

直接看方法的代码吧:

public void execute(Runnable command) {
        if (command == null)
            throw new NullPointerException();
        /*
         * Proceed in 3 steps:
         *
         * 1. If fewer than corePoolSize threads are running, try to
         * start a new thread with the given command as its first
         * task.  The call to addWorker atomically checks runState and
         * workerCount, and so prevents false alarms that would add
         * threads when it shouldn't, by returning false.
         *
         * 2. If a task can be successfully queued, then we still need
         * to double-check whether we should have added a thread
         * (because existing ones died since last checking) or that
         * the pool shut down since entry into this method. So we
         * recheck state and if necessary roll back the enqueuing if
         * stopped, or start a new thread if there are none.
         *
         * 3. If we cannot queue task, then we try to add a new
         * thread.  If it fails, we know we are shut down or saturated
         * and so reject the task.
         */
        int c = ctl.get();
        //如果当前的线程数低于core,优先创建一个worker(代表一个工作的线程)
        if (workerCountOf(c) < corePoolSize) {
            if (addWorker(command, true))
                return;
            c = ctl.get();
        }
        //如果线程池没有挂,就进入队列
        if (isRunning(c) && workQueue.offer(command)) {
            int recheck = ctl.get();
            if (! isRunning(recheck) && remove(command))
                reject(command);
            else if (workerCountOf(recheck) == 0)
                addWorker(null, false);
        }
        //排不进队列,创建非core线程(是否FIFO?)
        else if (!addWorker(command, false))
            reject(command);
    }

从上面代码看出,线程池的核心在于addWorker。这个方法代码:

private boolean addWorker(Runnable firstTask, boolean core) {
        retry:
        for (;;) {
            int c = ctl.get();
            int rs = runStateOf(c);

            // Check if queue empty only if necessary.
            //rs > shutdown,返回false
            //rs == shutdown,如果firstTask==null且workQueue为空,或者firstTask不为空,则返回false。
            //意思就是,shutdown了,还提交任务,或者在队列为空的情况任务为null的情况还想创建worker,当然返回false了,因为没任何必要。在shutdown的情况要想提交成功,之能是firsttask==null且workQueue不为空
            if (rs >= SHUTDOWN &&
                ! (rs == SHUTDOWN &&
                   firstTask == null &&
                   ! workQueue.isEmpty()))
                return false;

            for (;;) {
                int wc = workerCountOf(c);
                //线程数不能超
                if (wc >= CAPACITY ||
                    wc >= (core ? corePoolSize : maximumPoolSize))
                    return false;
                //原子性修改成功,就结束这个loop
                if (compareAndIncrementWorkerCount(c))
                    break retry;
                c = ctl.get();  // Re-read ctl
                if (runStateOf(c) != rs)
                    continue retry;
                // else CAS failed due to workerCount change; retry inner loop
            }
        }

        boolean workerStarted = false;
        boolean workerAdded = false;
        Worker w = null;
        try {
            w = new Worker(firstTask);
            final Thread t = w.thread;
            //创建线程不能失败
            if (t != null) {
                final ReentrantLock mainLock = this.mainLock;
                mainLock.lock();
                try {
                    // Recheck while holding lock.
                    // Back out on ThreadFactory failure or if
                    // shut down before lock acquired.
                    int rs = runStateOf(ctl.get());

                    if (rs < SHUTDOWN ||
                        (rs == SHUTDOWN && firstTask == null)) {
                        if (t.isAlive()) // precheck that t is startable
                            throw new IllegalThreadStateException();
                        //workers是一个hashset
                        workers.add(w);
                        int s = workers.size();
                        if (s > largestPoolSize)
                            largestPoolSize = s;
                        workerAdded = true;
                    }
                } finally {
                    mainLock.unlock();
                }
                if (workerAdded) {
                    //一切都添加好了,线程就起来吧
                    t.start();
                    workerStarted = true;
                }
            }
        } finally {
            if (! workerStarted)
                addWorkerFailed(w);
        }
        return workerStarted;
    }
    
    Worker(Runnable firstTask) {
            setState(-1); // inhibit interrupts until runWorker
            this.firstTask = firstTask;
            this.thread = getThreadFactory().newThread(this);
    }

从上面看出,addWorker是创建了一个worker,并启动了一个线程来执行这个worker。所以要看一下worker的run方法:

public void run() {
            runWorker(this);
        }
final void runWorker(Worker w) {
        Thread wt = Thread.currentThread();
        Runnable task = w.firstTask;
        w.firstTask = null;
        w.unlock(); // allow interrupts
        boolean completedAbruptly = true;
        try {
            while (task != null || (task = getTask()) != null) {
                w.lock();
                // If pool is stopping, ensure thread is interrupted;
                // if not, ensure thread is not interrupted.  This
                // requires a recheck in second case to deal with
                // shutdownNow race while clearing interrupt
                
                //如果当前线程未被中断,但是线程池在stopping或当前线程被中断,则中断这个线程。
                if ((runStateAtLeast(ctl.get(), STOP) ||
                     (Thread.interrupted() &&
                      runStateAtLeast(ctl.get(), STOP))) &&
                    !wt.isInterrupted())
                    wt.interrupt();
                try {
                    beforeExecute(wt, task);
                    Throwable thrown = null;
                    try {
                        task.run();
                    } catch (RuntimeException x) {
                        thrown = x; throw x;
                    } catch (Error x) {
                        thrown = x; throw x;
                    } catch (Throwable x) {
                        thrown = x; throw new Error(x);
                    } finally {
                        afterExecute(task, thrown);
                    }
                } finally {
                    task = null;
                    w.completedTasks++;
                    w.unlock();
                }
            }
            completedAbruptly = false;
        } finally {
            //抛出任何一次,这个worker都会死掉
            processWorkerExit(w, completedAbruptly);
        }
    }
private void processWorkerExit(Worker w, boolean completedAbruptly) {
        if (completedAbruptly) // If abrupt, then workerCount wasn't adjusted
            decrementWorkerCount();

        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            completedTaskCount += w.completedTasks;
            workers.remove(w);
        } finally {
            mainLock.unlock();
        }

        tryTerminate();

        int c = ctl.get();
        if (runStateLessThan(c, STOP)) {
            if (!completedAbruptly) {
                int min = allowCoreThreadTimeOut ? 0 : corePoolSize;
                if (min == 0 && ! workQueue.isEmpty())
                    min = 1;
                if (workerCountOf(c) >= min)
                    return; // replacement not needed
            }
            addWorker(null, false);
        }
    }

总体上还是很好理解的,除了w.unlock()这个是干嘛用的?后面处理继续中断这种事情。先看几个让线程池shutdown的方式

public void shutdown() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            checkShutdownAccess();
            //修改状态为SHUTDOWN
            advanceRunState(SHUTDOWN);
            //中断idle的worker线程
            interruptIdleWorkers();
            onShutdown(); // hook for ScheduledThreadPoolExecutor
        } finally {
            mainLock.unlock();
        }
        tryTerminate();
    }

其中

private void advanceRunState(int targetState) {
        for (;;) {
            int c = ctl.get();
            //如果当前至少已经是这个状态,即不用设置了
            if (runStateAtLeast(c, targetState) ||
                ctl.compareAndSet(c, ctlOf(targetState, workerCountOf(c))))
                break;
        }
    }
private void interruptIdleWorkers() {
        interruptIdleWorkers(false);
    }
private void interruptIdleWorkers(boolean onlyOne) {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            for (Worker w : workers) {
                Thread t = w.thread;
                //首先得先拿到worker的锁,再判断检查没有中断
                //这也就是idle的含义,当worker能够拿到一个task的时候,是不会再让这里获取到锁的
                if (!t.isInterrupted() && w.tryLock()) {
                    try {
                        t.interrupt();
                    } catch (SecurityException ignore) {
                    } finally {
                        w.unlock();
                    }
                }
                if (onlyOne)
                    break;
            }
        } finally {
            mainLock.unlock();
        }
    }

从上面代码看出,shutdown()这个方法的目的是将线程池状态设置为shutdown,同时让那么idle(此时没有任务在处理)的worker的线程中断。注意runworker中这段代码:

if ((runStateAtLeast(ctl.get(), STOP) ||
                     (Thread.interrupted() &&
                      runStateAtLeast(ctl.get(), STOP))) &&
                    !wt.isInterrupted())
                    wt.interrupt();

这里只关心stop以后的状态,shutdown这个方法只是让线程池shutdown,worker还是可以继续从队列中获取task。再看addworker中这段代码:

if (rs >= SHUTDOWN &&
                ! (rs == SHUTDOWN &&
                   firstTask == null &&
                   ! workQueue.isEmpty()))
                return false;

如果线程池池shutdown了,就不能再添加新的worker了(除非新的task为null,然后队列不为空)。 shutdown方法完了以后,还看到一个tryterminate的调用,

final void tryTerminate() {
        for (;;) {
            int c = ctl.get();
            //这几种情况先忽略
            if (isRunning(c) ||
                runStateAtLeast(c, TIDYING) ||
                (runStateOf(c) == SHUTDOWN && ! workQueue.isEmpty()))
                return;
                //为什么要ONLY_ONE?
            if (workerCountOf(c) != 0) { // Eligible to terminate
                interruptIdleWorkers(ONLY_ONE);
                return;
            }

            final ReentrantLock mainLock = this.mainLock;
            mainLock.lock();
            try {
                //只是设置一下状态
                if (ctl.compareAndSet(c, ctlOf(TIDYING, 0))) {
                    try {
                        terminated();
                    } finally {
                        ctl.set(ctlOf(TERMINATED, 0));
                        termination.signalAll();
                    }
                    return;
                }
            } finally {
                mainLock.unlock();
            }
            // else retry on failed CAS
        }
    }

综合以上,shutdown这个方法的目的,是让线程池不要再接收新的任务,原来在队列中的任务还是照样处理的。

下面看shutdown的升级版本shutdownNow

public List<Runnable> shutdownNow() {
        List<Runnable> tasks;
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            checkShutdownAccess();
            advanceRunState(STOP);
            interruptWorkers();
            tasks = drainQueue();
        } finally {
            mainLock.unlock();
        }
        tryTerminate();
        return tasks;
    }

和shutdown相比,有几行的改动

  1. interruptIdleWorkers变成interruptWorkers
private void interruptWorkers() {
        final ReentrantLock mainLock = this.mainLock;
        mainLock.lock();
        try {
            for (Worker w : workers)
                //不用锁了,直接interrupt
                w.interruptIfStarted();
        } finally {
            mainLock.unlock();
        }
    }
void interruptIfStarted() {
            Thread t;
            if (getState() >= 0 && (t = thread) != null && !t.isInterrupted()) {
                try {
                    t.interrupt();
                } catch (SecurityException ignore) {
                }
            }
        }
  1. tasks = drainQueue();
//把原来的队列清空
private List<Runnable> drainQueue() {
        BlockingQueue<Runnable> q = workQueue;
        ArrayList<Runnable> taskList = new ArrayList<Runnable>();
        q.drainTo(taskList);
        if (!q.isEmpty()) {
            for (Runnable r : q.toArray(new Runnable[0])) {
                if (q.remove(r))
                    taskList.add(r);
            }
        }
        return taskList;
    }

综上所述,shutdownnow相比shutdown更加严格;一是将所有worker的线程都interrupt,二是将队列情况。