深入理解Java并发编程之通过JDK C++源码以及Debug源码死扣Thread.join()如果一个线程A执行了th

基本含义

如果一个线程A执行了thread.join()语句，其含义是：当前线程A等待thread线程终止之后才从thread.join()返回。

线程Thread除了提供join()方法之外，还提供了join(long millis)和join(long millis,int nanos)两个具备超时特性的方法。这两个超时方法表示，如果线程thread在给定的超时时间里没有终止，那么将会从该超时方法中返回。

实现原理

首先介绍下线程的状态

线程的状态

Java线程在运行的生命周期中可能处于6种不同的状态，在给定的一个时刻，线程只能处于其中的一个状态。如下内容截取JDK 1.8 Thread.java的源码：

NEW: 初始转态，线程被构建，但是还没有调用start()方法。
RUNNABLE: 正在执行的线程状态，JVM中runnable线程状态对应于操作系统中的就绪和运行两种状态。
BLOCKED: 线程等待monitor互斥量的阻塞状态，在blocked状态的线程通常是由于执行Object.wait()后等待着进入或者再次进入同步块或者同步方法。
WAITING: 等待状态，下列方法会导致线程处于等待状态：
- Object.wait with no timeout
- Thread.join with on timeout
- LockSupport.park
TIMED_WAITING: 超时等待，超过等待时间便会自动返回运行状态，下列方法会导致线程处于超时等待状态：
- Thread.sleep
- Object.wait(long) with timeout
- Thread.join(long) with timeout
- LockSupport.parkNanos
- LockSupport.parkUntil
TERMINATED: 线程完成执行后结束的状态。

再介绍下Monitor

Monitor

Monitor是 Java中用以实现线程之间的互斥与协作的主要手段，它可以看成是对象的锁。每一个对象都有，也仅有一个 monitor。

在HotSpot JVM中，monitor是由ObjectMonitor实现的，其主要数据结构如下（位于HotSpot虚拟机源码ObjectMonitor.hpp文件，C++实现的）:

ObjectMonitor() {
    _header       = NULL;
    _count        = 0; //记录个数
    _waiters      = 0,
    _recursions   = 0;
    _object       = NULL;
    _owner        = NULL;
    _WaitSet      = NULL; //处于wait状态的线程，会被加入到_WaitSet
    _WaitSetLock  = 0 ;
    _Responsible  = NULL ;
    _succ         = NULL ;
    _cxq          = NULL ;
    FreeNext      = NULL ;
    _EntryList    = NULL ; //处于等block状态的线程，会被加入到该列表
    _SpinFreq     = 0 ;
    _SpinClock    = 0 ;
    OwnerIsThread = 0 ;
  }

ObjectMonitor中主要有以下4个参数：

_Owner: 用于指向ObjectMonito对象的线程
_EntrySet：用来保存处于blocked状态的线程列表
_WaitSet: 用来保存处于waiting状态的线程
_count: 计数器

当多个线程同时访问一段同步代码时，首先会进入 _EntryList 集合，当线程获取到对象的monitor 后进入 _Owner 区域并把monitor中的owner变量设置为当前线程。同时monitor中的计数器count加1，若线程调用 wait() 方法，将释放当前持有的monitor，owner变量恢复为null，count自减1，同时该线程进入 _WaitSet集合中等待被唤醒。若当前线程执行完毕也将释放monitor(锁)并复位变量的值，以便其他线程进入获取monitor(锁)。如下图所示：

实现机制

一个简单的例子。

public class ThreadA {
    public static void main(String[] args) {
        Runnable r = () -> {
            try {
                TimeUnit.SECONDS.sleep(5);
            } catch (Exception e) {
                e.printStackTrace();
            }
            System.out.println("子线程执行完毕");
        };
        Thread threadB = new Thread(r, "Son-Thread");
        //启动线程
        threadB.start();
        try {
            //调用join()方法
            threadB.join();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }
        System.out.println("主线程执行完毕");
        System.out.println("~~~~~~~~~~~~~~~");
    }
}

底层是如何实现join()语义的呢，以上面的例子举例。

由于join(long millis)方法加了对象锁，锁的是Thread类当前对象实例即threadB。同时，Thread.start()方法在启动后，threadB也持有自己线程对象实例的所有内容，包括对象实例threadB的对应的monitor。具体可参见start0()源码。

    public final void join() throws InterruptedException {
        join(0);
    }
	...
    public final synchronized void join(long millis)
    throws InterruptedException {
	...
        if (millis == 0) {
            while (isAlive()) {
                wait(0);
            }
        }
     ...
    }

Object.java

    /**
     * The current thread must own this object's monitor. Causes the current thread to wait until either another thread invokes the method...
     * This method causes the current thread call it to place itself in the wait set for this object and then to relinquish any and all synchronization claims on this object.
     */
    public final native void wait(long timeout) throws InterruptedException;

如果threadB线程在join()方法前执行完了，释放了对象锁，threadA获取锁进入同步方法join(long millis)时，调用threadB的方法isAlive()判断threadB线程已经不存活，那么执行完join()逻辑退出，继续执行threadA的逻辑。
如果threadB线程在join()方法前没执行完，并且由于某种原因释放了对象锁，当threadA获取锁进入同步方法join(long millis)时，调用threadB的方法isAlive()判断threadB线程还存活。于是，threadA就调用native方法wait()释放锁并进行等待（threadA进入threadB对象实例对应的monitor对象的Wait Set，此时threadA的线程状态为waiting）。以便这个对象锁能被threadB获取继续执行。直到threadB执行完成，释放锁并结束。

    /**
     * This method is called by the system to give a Thread
     * a chance to clean up before it actually exits.
     */
    private void exit() {
        if (group != null) {
            group.threadTerminated(this);
            group = null;
        }
        /* Aggressively null out all reference fields: see bug 4006245 */
        target = null;
        /* Speed the release of some of these resources */
        threadLocals = null;
        inheritableThreadLocals = null;
        inheritedAccessControlContext = null;
        blocker = null;
        uncaughtExceptionHandler = null;
    }

threadB线程结束时会执行exit()方法，进行一些资源的清理。从源码的注释可以发现，这个时候实际上线程是事实上存在的。那么是谁唤醒waiting状态的threadA呢？

错误解释：有很多博文的大致解释如下：threadB线程结束时会执行exit()方法，notifyAll()同一线程组的其他线程。threadA线程在new threadB的时候，threadA和threadB共享一个线程组。同时线程初始化的时候，线程所在的线程组都包含线程本身，于是threadB的线程组会包含threadA。那么，threadB结束时threadA会被notify。

    public Thread(Runnable target) {
        init(null, target, "Thread-" + nextThreadNum(), 0);
    }
    ...
    private void init(ThreadGroup g, Runnable target, String name,
                      long stackSize, AccessControlContext acc,
                      boolean inheritThreadLocals) {
        ...
        Thread parent = currentThread();                     
        ...
            if (g == null) {
                g = parent.getThreadGroup();
            }
       ...
   }
   ...
    public synchronized void start() {
    	...
        group.add(this);
        ...
    }

ThreadGroup.java

    void threadTerminated(Thread t) {
        synchronized (this) {
            remove(t);
            if (nthreads == 0) {
                notifyAll();
            }
			...
        }
    }

这个解释是错误的，为什么呢？由于if (nthreads == 0)的触发条件不满足，threadA和threadB共享一个线程组，当threadB被移除了，threadA还在线程组中，nthreads = 1。

/jdk7/hotspot/src/os/linux/vm/os_linux.cpp

int ret = pthread_create(&tid, &attr, (void* (*)(void*)) java_start, thread);

static void *java_start(Thread *thread) {
  ...
  thread->run();
  return 0;
}

/jdk7/hotspot/src/share/vm/runtime/thread.cpp

void JavaThread::run() {
  ...
  thread_main_inner();
}

void JavaThread::thread_main_inner() {
  ...
  this->exit(false);
  delete this;
}

void JavaThread::exit(bool destroy_vm, ExitType exit_type) {
  ...
  // Notify waiters on thread object. This has to be done after exit() is called
  // on the thread (if the thread is the last thread in a daemon ThreadGroup the
  // group should have the destroyed bit set before waiters are notified).
  ensure_join(this);
  ...
}

static void ensure_join(JavaThread* thread) {
  // We do not need to grap the Threads_lock, since we are operating on ourself.
  Handle threadObj(thread, thread->threadObj());
  assert(threadObj.not_null(), "java thread object must exist");
  ObjectLocker lock(threadObj, thread);
  // Ignore pending exception (ThreadDeath), since we are exiting anyway
  thread->clear_pending_exception();
  // Thread is exiting. So set thread_status field in  java.lang.Thread class to TERMINATED.
  java_lang_Thread::set_thread_status(threadObj(), java_lang_Thread::TERMINATED);
  // Clear the native thread instance - this makes isAlive return false and allows the join()
  // to complete once we've done the notify_all below
  java_lang_Thread::set_thread(threadObj(), NULL);
  lock.notify_all(thread);
  // Ignore pending exception (ThreadDeath), since we are exiting anyway
  thread->clear_pending_exception();
}

正确解释：在线程native代码的run()方法的结束，native代码会将线程的alive状态置为false，同时会notifyAll等待在这个线程实例上的所有其他线程。根据上面的c++源码，是lock.notify_all(thread) 这个动作会notify所有等待在当前线程实例上的其他线程。

除了看C++的源码验证，我们也写了一个demo来验证这点，waitThread执行完结束后后，wait()在waitThread对象实例的其他线程才会被唤醒继续执行。

    /**
     * Wait Thread wait thread.
     * Run Thread1 run thread outer.
     * Run Thread2 run thread outer.
     * Run Thread1before wait run thread inner.
     * Run Thread2before wait run thread inner.
     * exit: Wait Thread wait thread.
     * Run Thread2after wait run thread inner.
     * Run Thread1after wait run thread inner.
     */
    public static void main(String[] args) throws Exception {
        WaitThread waitRunner = new WaitThread();
        Thread waitThread = new Thread(waitRunner, "Wait Thread");

        waitThread.start();

        RunThread runRunner1 = new RunThread(waitThread);
        RunThread runRunner2 = new RunThread(waitThread);

        Thread runThread1 = new Thread(runRunner1, "Run Thread1");
        Thread runThread2 = new Thread(runRunner2, "Run Thread2");

        runThread1.start();
        runThread2.start();
    }

    static class WaitThread implements Runnable {
        @Override
        public void run() {
            long t1 = System.currentTimeMillis();
            System.out.println(Thread.currentThread().getName() + " wait thread.");
            while (true) {
                long t2 = System.currentTimeMillis();
                if (t2 - t1 > 10 * 1000) {
                    break;
                }
            }
            System.out.println("exit: " + Thread.currentThread().getName() + " wait thread.");
        }
    }

    static class RunThread implements Runnable {
        private final Thread thread;

        public RunThread(Thread thread) {
            this.thread = thread;
        }

        @Override
        public void run() {
            System.out.println(Thread.currentThread().getName() + " run thread outer.");
            synchronized (thread) {
                System.out.println(Thread.currentThread().getName() + "before wait run thread inner.");
                try {
                    thread.wait(0);
                } catch (InterruptedException e) {
                    e.printStackTrace();
                }
                System.out.println(Thread.currentThread().getName() + "after wait run thread inner.");
            }
        }
    }
}

那么，threadB结束时threadA会被notify，从而threadB对应的monitor对象的Wait Set移动到该monitor对象的Entry Set，线程状态变为Blocked，等待调度获取monitor的控制权。
threadA获取monitor的控制权后，继续执行while (isAlive()) 循环，此时isAlive()为false。那么执行完join()逻辑退出，继续执行threadA的逻辑。

通过综上的设计，Thread.join()实现了当前线程A等待thread线程终止之后才从thread.join()返回的设计逻辑。

Debug分析

我们通过上面的那个简单的例子来Debug逐点分析：

当主线程执行到join()逻辑中时，是RUNNING的状态

当子线程执行到exit()逻辑时，threadB依旧是存活，状态为RUNNING

threadA的状态为WAIT

threadB执行到threadTerminated()逻辑，这时候发现nthreads：1，根本不会执行notifyAll()操作。就算执行了notifyAll()操作，也不会唤醒threadA，因为锁的对象都不一样。一个是threadB的实例，一个是线程组的实例。

等待/通知的经典范式

可以发现Thread.join()方法与等待/通知的经典范式中的等待范式如出一辙。而Thread.exit()方法则有点类似于其中的通知范式。

等待/通知的经典范式分为两个部分：等待方和通知方。 等待方遵循如下原则：

获取对象的锁。
如果条件不满足，那么调用对象的wait()方法，被通知后仍要检查条件。
条件满足则执行对应的逻辑。对应的伪代码如下：

synchronized(对象) {
	while(条件不满足) {
		对象.wait();
	}
	对应的处理逻辑
}

通知方遵循如下原则：

获得对象的锁。
改变条件。
通知所有等待在对象上的线程。对应的伪代码如下：

synchronized(对象) {
	改变条件
	对象.notifyAll();
}

最后，觉得写的不错的同学麻烦点个赞，支持一下呗^_^~

深入理解Java并发编程之通过JDK C++源码以及Debug源码死扣Thread.join()