Java线程间的通信,一文带你了解其核心思想及原理

304 阅读17分钟

1 前言

相信刚接触Java的同学,以及用Java干了好几年CRUD的同学,一定都知道,Java里面有个很重要的概念叫做线程,Java可以通过Thread类或者Runnable等Interface来调度操作系统层面的线程资源。此外,大家还知道,每个线程都是独立的,运行时,每个线程有一个独立的虚拟机栈和本地方法栈,彼此互不影响,这也是为什么我们往往使用“单独开一个线程”的方式来实现异步函数执行的原因之一。

但,线程与线程之前如果涉及到需要“通信”的场景,要怎么实现呢?

有的同学可能说,我们的项目根本不需要用到线程间通信,如果真要通信,我搞个中间件就好了,比如搞个Redis、kafka,发个消息,这边一监听,收到了,我想怎么通信怎么通信。好家伙,这样确实可以,但这多多少少有点儿“高射炮打蚊子”的意思,一是依赖太重,而是不够简洁。

那不如我这样问吧,假如你突然想换工作,去面试时面试官又是个八股文老师,非要问你:线程间通信如何实现,原理是什么?这时候如果没有亿点知识储备,很有可能直接回去等通知了。我们学习和理解知识当然不仅仅是为了吊打面试官,更多的是为了丰富我们自己的知识储备,能够在工作中有更多的方案可以选择,遇到问题时能快速精准地进行解决,在编码或者指导别人编码时能准确识别出其中的风险并结合业务梳理出规避方案。

2 那么线程间的通信到底指的是什么呢

线程间的通信,与我们业务应用开发中的通信不一样。

我们做CRUD时,所说的通信,往往是某个springboot工程与数据库或者nacos等中间件的通信,或者是服务与服务之间的RPC通信,主要是涉及到具体的数据传输,通常是指基于某层网络协议的请求。

但线程间的通信,更像是一种信号传输,其中不涉及具体的数据内容。比如两个线程互不干扰,ThreadA想要告诉ThreadB一个什么事儿(比如满足了某项业务条件)时,就拍他一下,但是ThreadB怎么知道ThreadA拍我这一下意味着什么呢,毕竟条件有限,没法携带具体的数据内容,这就得看咱俩事先约定好的是什么了,换句话说,程序员最开始写代码的时候,就要约定好,虽然线程间的通信就是“拍一下”,但具体含义,要结合业务场景和代码本身,让被“拍”的线程接收到“拍”的指令后,执行你想要的业务逻辑。

image.png

3 线程间的通信方式有哪些呢,如何通知

学过操作系统以及JVM的同学都知道,操作系统的线程彼此间无法直接通信。

在JVM中,运行时数据区主要分为两大类,五大块,如下图所示:

image.png

其中:橙色背景的两块区域是线程共享,紫色背景的三块区域是线程私有

每个线程在运行时,都有自己独立的虚拟机栈和本地方法栈,彼此互不干扰,如果要强行干扰,就需要依赖线程共享的两块区域,一个是元空间,一个是堆。需要通过一个共享的内存来进行通信,这种思想在很多地方都有应用,比如多核CPU彼此间同步数据时,需要基于MESI协议,基于总线锁进行数据同步。

image.png

有点儿扯远了,马上回归主题

3.1 在Java中的线程间通信方式

在Java中,线程间的通信主要有3种:

  • 基于wait/notify进行通信
  • 基于线程interrupt机制进行通信
  • 基于其它阻塞/唤醒机制通信

实际上“基于wait/notify进行通信”和“基于其它阻塞/唤醒机制通信”本质上是一致的,这里分开说是因为在编码层面或者说api的使用层面有所不同

3.1.1 基于wait/notify进行通信

首先,只要学过Java的同学都应该知道,wait和notify是Object类的方法,也就是说,在Java中只要你是个对象,都拥有这些方法。先来看看wait和notify的源码注释:

/**
 * Wakes up a single thread that is waiting on this object's
 * monitor. If any threads are waiting on this object, one of them
 * is chosen to be awakened. The choice is arbitrary and occurs at
 * the discretion of the implementation. A thread waits on an object's
 * monitor by calling one of the {@code wait} methods.
 * <p>
 * The awakened thread will not be able to proceed until the current
 * thread relinquishes the lock on this object. The awakened thread will
 * compete in the usual manner with any other threads that might be
 * actively competing to synchronize on this object; for example, the
 * awakened thread enjoys no reliable privilege or disadvantage in being
 * the next thread to lock this object.
 * <p>
 * This method should only be called by a thread that is the owner
 * of this object's monitor. A thread becomes the owner of the
 * object's monitor in one of three ways:
 * <ul>
 * <li>By executing a synchronized instance method of that object.
 * <li>By executing the body of a {@code synchronized} statement
 *     that synchronizes on the object.
 * <li>For objects of type {@code Class,} by executing a
 *     synchronized static method of that class.
 * </ul>
 * <p>
 * Only one thread at a time can own an object's monitor.
 *
 * @throws  IllegalMonitorStateException  if the current thread is not
 *               the owner of this object's monitor.
 * @see        java.lang.Object#notifyAll()
 * @see        java.lang.Object#wait()
 */
public final native void notify();

/**
 * Wakes up all threads that are waiting on this object's monitor. A
 * thread waits on an object's monitor by calling one of the
 * {@code wait} methods.
 * <p>
 * The awakened threads will not be able to proceed until the current
 * thread relinquishes the lock on this object. The awakened threads
 * will compete in the usual manner with any other threads that might
 * be actively competing to synchronize on this object; for example,
 * the awakened threads enjoy no reliable privilege or disadvantage in
 * being the next thread to lock this object.
 * <p>
 * This method should only be called by a thread that is the owner
 * of this object's monitor. See the {@code notify} method for a
 * description of the ways in which a thread can become the owner of
 * a monitor.
 *
 * @throws  IllegalMonitorStateException  if the current thread is not
 *               the owner of this object's monitor.
 * @see        java.lang.Object#notify()
 * @see        java.lang.Object#wait()
 */
public final native void notifyAll();

/**
 * Causes the current thread to wait until either another thread invokes the
 * {@link java.lang.Object#notify()} method or the
 * {@link java.lang.Object#notifyAll()} method for this object, or a
 * specified amount of time has elapsed.
 * <p>
 * The current thread must own this object's monitor.
 * <p>
 * This method causes the current thread (call it <var>T</var>) to
 * place itself in the wait set for this object and then to relinquish
 * any and all synchronization claims on this object. Thread <var>T</var>
 * becomes disabled for thread scheduling purposes and lies dormant
 * until one of four things happens:
 * <ul>
 * <li>Some other thread invokes the {@code notify} method for this
 * object and thread <var>T</var> happens to be arbitrarily chosen as
 * the thread to be awakened.
 * <li>Some other thread invokes the {@code notifyAll} method for this
 * object.
 * <li>Some other thread {@linkplain Thread#interrupt() interrupts}
 * thread <var>T</var>.
 * <li>The specified amount of real time has elapsed, more or less.  If
 * {@code timeout} is zero, however, then real time is not taken into
 * consideration and the thread simply waits until notified.
 * </ul>
 * The thread <var>T</var> is then removed from the wait set for this
 * object and re-enabled for thread scheduling. It then competes in the
 * usual manner with other threads for the right to synchronize on the
 * object; once it has gained control of the object, all its
 * synchronization claims on the object are restored to the status quo
 * ante - that is, to the situation as of the time that the {@code wait}
 * method was invoked. Thread <var>T</var> then returns from the
 * invocation of the {@code wait} method. Thus, on return from the
 * {@code wait} method, the synchronization state of the object and of
 * thread {@code T} is exactly as it was when the {@code wait} method
 * was invoked.
 * <p>
 * A thread can also wake up without being notified, interrupted, or
 * timing out, a so-called <i>spurious wakeup</i>.  While this will rarely
 * occur in practice, applications must guard against it by testing for
 * the condition that should have caused the thread to be awakened, and
 * continuing to wait if the condition is not satisfied.  In other words,
 * waits should always occur in loops, like this one:
 * <pre>
 *     synchronized (obj) {
 *         while (&lt;condition does not hold&gt;)
 *             obj.wait(timeout);
 *         ... // Perform action appropriate to condition
 *     }
 * </pre>
 * (For more information on this topic, see Section 3.2.3 in Doug Lea's
 * "Concurrent Programming in Java (Second Edition)" (Addison-Wesley,
 * 2000), or Item 50 in Joshua Bloch's "Effective Java Programming
 * Language Guide" (Addison-Wesley, 2001).
 *
 * <p>If the current thread is {@linkplain java.lang.Thread#interrupt()
 * interrupted} by any thread before or while it is waiting, then an
 * {@code InterruptedException} is thrown.  This exception is not
 * thrown until the lock status of this object has been restored as
 * described above.
 *
 * <p>
 * Note that the {@code wait} method, as it places the current thread
 * into the wait set for this object, unlocks only this object; any
 * other objects on which the current thread may be synchronized remain
 * locked while the thread waits.
 * <p>
 * This method should only be called by a thread that is the owner
 * of this object's monitor. See the {@code notify} method for a
 * description of the ways in which a thread can become the owner of
 * a monitor.
 *
 * @param      timeout   the maximum time to wait in milliseconds.
 * @throws  IllegalArgumentException      if the value of timeout is
 *               negative.
 * @throws  IllegalMonitorStateException  if the current thread is not
 *               the owner of the object's monitor.
 * @throws  InterruptedException if any thread interrupted the
 *             current thread before or while the current thread
 *             was waiting for a notification.  The <i>interrupted
 *             status</i> of the current thread is cleared when
 *             this exception is thrown.
 * @see        java.lang.Object#notify()
 * @see        java.lang.Object#notifyAll()
 */
public final native void wait(long timeout) throws InterruptedException;

由于这些方法都是native方法,所以没有办法直接跟踪到底层实现,好在我们能看到原汁原味的注释,也能够通过注释了解方法的使用建议和特性,总结下来就是说:

  1. wait方法一定要用在synchronized同步代码块中,并且该方法会抛出InterruptedException异常
  2. notify和notifyAll方法分别基于施加了synchronized同步代码块的monitor来唤醒被wait的一个或全部线程

下面给出一段代码示例:

public class WaitNotifyTest {

    private final Object monitor;

    public WaitNotifyTest(Object monitor) {
        this.monitor = monitor;
    }

    /**
     * 模拟执行业务函数1
     */
    public void doBusiness1() {
        // do your business

        // 此时基于monitor施加同步锁,与下面的doBusiness2进行竞争
        synchronized (this.monitor) {
            // do your business

            // 此时触发某个条件,需要等待
            if(condition1()) {
                try {
                    this.monitor.wait();
                    // 执行该方法的线程会被阻塞在wait方法中,等待其它线程唤醒
                } catch (InterruptedException e) {
                    e.printStackTrace();
                    // 捕获中断异常,视情况处理
                }
            }
            
            // 如果执行到这里,则唤醒所有等待的线程
            if(condition2()) {
                this.monitor.notifyAll();
            }
         }
    }

    /**
     * 某一个判断条件方法
     */
    public boolean condition1() {
        // 执行判断逻辑
        return true;
    }

    /**
     * 某一个判断条件方法
     */
    public boolean condition2() {
        // 执行判断逻辑
        return true;
    }
}

当多个线程同时执行doBusiness1函数时,可能由于业务逻辑的关系,每个线程执行时,系统状态都不一致,可能数据未准备好,可能等待其它系统的请求等等,不满足条件时需要wait等待,此时由其它线程去执行逻辑,当具备状态后将通知其它线程继续往下执行,就跟上面代码抽象的逻辑一样。

他们都是基于同一个监视器,也就是同一个对象锁来的,因为只有通过处于堆或者元空间这种共享区域,才能通知到其它线程。

3.1.2 wait/notify的原理简述

为什么这样写能够实现线程通信呢,为什么非要基于synchronized同步代码块才能这样写呢,为什么wait和notify这些函数是Object也就是所有的对象都有的呢?

好个一问三连。故事的开始要从synchronized说起,更早的话,要从JVM中对象锁的原理说起。

在Java中,我们都知道synchronized可以实现单进程内的代码执行权限的控制,谁先进来就能获得执行权,其它线程会被统统阻塞,而我们使用synchronized时可以施加到class类、方法、普通对象上,但本质,都是基于对象

在JVM中,除了基本数据类型,凡是能够在堆或者元空间进行分配的,都有一个对象头的概念,对象头中有一个markword的字段,如下图所示:

image.png

也就是说,回归到本章节所举例当中,全局变量monitor作为一个锁对象,当被线程获取到锁之后,会记录获得该monitor的线程ID等信息,当获得执行权的线程调用wait方法后,在让出执行权的同时,在JVM一侧会将该线程置入等待集合中等待被唤醒。而当其它线程执行notify/notifyAll方法后,处于等待集合中的线程会被唤醒,继续执行后续逻辑。如下图所示:

image.png

需要注意的是,调用notify/notifyAll方法后,并不是立刻恢复业务方法的执行,而是重新进行等待队列,去重新争抢synchronized的监视器对象锁,然后再去依次执行。试想一下,上图这三个线程本来是基于synchronized阻塞顺序执行的,都wait了之后突然一个线程notify唤醒了他们,结果编程同时执行,岂不乱套了。

3.2 基于线程interrupt机制进行通信

其实细心的你已经发现,在上面的wait这个native方法后面,就抛出了一个InterruptedException,并且在调用wait方法时,也必须要处理该异常,那么这个Interrupt到底是个啥东西呢?

3.2.1 Interrupt通信的原理解释

其实这是JVM线程中的一个机制,要触发这个机制,则需要用到Thread类中的Interrupt函数,我们先来看看源码:

/**
 * Interrupts this thread.
 *
 * <p> Unless the current thread is interrupting itself, which is
 * always permitted, the {@link #checkAccess() checkAccess} method
 * of this thread is invoked, which may cause a {@link
 * SecurityException} to be thrown.
 *
 * <p> If this thread is blocked in an invocation of the {@link
 * Object#wait() wait()}, {@link Object#wait(long) wait(long)}, or {@link
 * Object#wait(long, int) wait(long, int)} methods of the {@link Object}
 * class, or of the {@link #join()}, {@link #join(long)}, {@link
 * #join(long, int)}, {@link #sleep(long)}, or {@link #sleep(long, int)},
 * methods of this class, then its interrupt status will be cleared and it
 * will receive an {@link InterruptedException}.
 *
 * <p> If this thread is blocked in an I/O operation upon an {@link
 * java.nio.channels.InterruptibleChannel InterruptibleChannel}
 * then the channel will be closed, the thread's interrupt
 * status will be set, and the thread will receive a {@link
 * java.nio.channels.ClosedByInterruptException}.
 *
 * <p> If this thread is blocked in a {@link java.nio.channels.Selector}
 * then the thread's interrupt status will be set and it will return
 * immediately from the selection operation, possibly with a non-zero
 * value, just as if the selector's {@link
 * java.nio.channels.Selector#wakeup wakeup} method were invoked.
 *
 * <p> If none of the previous conditions hold then this thread's interrupt
 * status will be set. </p>
 *
 * <p> Interrupting a thread that is not alive need not have any effect.
 *
 * @throws  SecurityException
 *          if the current thread cannot modify this thread
 *
 * @revised 6.0
 * @spec JSR-51
 */
public void interrupt() {
    if (this != Thread.currentThread())
        checkAccess();

    synchronized (blockerLock) {
        Interruptible b = blocker;
        if (b != null) {
            interrupt0();           // Just to set the interrupt flag
            b.interrupt(this);
            return;
        }
    }
    interrupt0();
}

/**
 * Tests whether the current thread has been interrupted.  The
 * <i>interrupted status</i> of the thread is cleared by this method.  In
 * other words, if this method were to be called twice in succession, the
 * second call would return false (unless the current thread were
 * interrupted again, after the first call had cleared its interrupted
 * status and before the second call had examined it).
 *
 * <p>A thread interruption ignored because a thread was not alive
 * at the time of the interrupt will be reflected by this method
 * returning false.
 *
 * @return  <code>true</code> if the current thread has been interrupted;
 *          <code>false</code> otherwise.
 * @see #isInterrupted()
 * @revised 6.0
 */
public static boolean interrupted() {
    return currentThread().isInterrupted(true);
}

关于其它IO或者Channel调用暂且不说,根据注释可知:指定某线程调用它的Interrupt函数后,如果该线程正在调用以下函数:

  • Object#wait()
  • Object#wait(long)
  • Object#wait(long, int)
  • Thread#join()
  • Thread#join(long)
  • Thread#join(long, int)
  • Thread#sleep(long)
  • Thread#sleep(long, int)

则会收到一个InterruptedException并重置该中断状态(本文不对中断状态进行展开解释)。

细心的你如果去查看这些函数,你会发现他们都会要求抛出一个InterruptedException,也就是说,你在显示调用它们的时候,都需要处理该异常。这正式因为在Interrupt函数中所解释的那样,会基于这个进行一次通知,触发这些函数的InterruptedException。

3.2.2 如何使用interrupt来实现通信呢

也很简单,可以参考前文中关于wait/notify的代码示例,在这里我再扩展说明一下:

public class InterruptTest {
    
    private final Map<String, Thread> threadPool;
    
    public InterruptTest() {
        this.threadPool = new ConcurrentHashMap<>(16);
    }

    /**
     * 业务逻辑1
     * @param key
     */
    public void doBusiness1(String key) {
        // 先将当前线程放到公共容器中
        this.threadPool.put(key, Thread.currentThread());
        
        // do your business
        try {
            Thread.sleep(1000);
        } catch (InterruptedException e) {
            // 处理中断逻辑,响应通知
            // 记住,这里不是异常,只是通过异常的手段来接收通知,可以执行你想要的业务逻辑
        }

        // 最后移除
        this.threadPool.remove(key);
    }

    /**
     * 中断逻辑
     * @param keys
     */
    public void interrupt(String... keys) {
        // 对符合条件的keys进行中断请求发送
        for (String key : keys) {
            if(condition1()) {
                this.threadPool.computeIfPresent(key, (k, v) ->{v.interrupt(); return v});
            }
        }
    }

    /**
     * 某一个判断条件方法
     */
    public boolean condition1() {
        // 执行判断逻辑
        return true;
    }
}

记住!InterruptedException并不单纯是一个常规异常,而是一种接收通知的手段,你可以在你的业务逻辑里根据约定基于Interrupt机制进行通知,在上述wait()、join()、sleep()等方法处接收这些通知并执行自己的业务逻辑,相当于埋点。

3.3 基于其它阻塞/唤醒机制通信

其它方式就多种多样的,举个最典型的例子,就是JUC下面的重入锁:ReentraintLock以及Condition,这里就不写代码了,跟前文wait/notify中通过synchronized写的代码示例换汤不换药。

实际上JUC下的Lock相关的api本质上也是参考JVM底层synchronized和wait/notify来实现的,本质上都是两个队列,一个双向链表同步队列以及一个单向链表等待队列。那为什么可以给予Lock来进行通信呢?其实也是基于Condition的signal/signalAll机制,这与Object的notify/notifyAll如出一辙,原理类似,此处不做赘述。

4 线程间通信的应用举例

现在我知道了线程间通信的机制、原理,那这有什么用呢?跟我CRUD,接口开发有什么关系吗,八竿子打不着呀!除了面试的时候吊打一下面试官以外,好像别无用处。

的确,在一般的业务开发中很难接触到,也很难在项目中用到,但如果你参与过一些中间件开发,底层工具开发,或者阅读过一些底层源码,不难发现它们的身影和应用的地方,下面举两个常见的例子,别的不看,就看JDK相关源码,JUC下的两个工具,一个是大家常用的线程池,一个也是大家常用的阻塞队列

4.1 线程池中的线程通信应用

线程池中的线程通信应用点其实蛮多的,这里就挑一个核心的函数来说明:

final void runWorker(Worker w) {
    Thread wt = Thread.currentThread();
    Runnable task = w.firstTask;
    w.firstTask = null;
    w.unlock(); // allow interrupts
    boolean completedAbruptly = true;
    try {
        while (task != null || (task = getTask()) != null) {
            w.lock();
            // If pool is stopping, ensure thread is interrupted;
            // if not, ensure thread is not interrupted.  This
            // requires a recheck in second case to deal with
            // shutdownNow race while clearing interrupt
            if ((runStateAtLeast(ctl.get(), STOP) ||
                 (Thread.interrupted() &&
                  runStateAtLeast(ctl.get(), STOP))) &&
                !wt.isInterrupted())
                wt.interrupt();
            try {
                beforeExecute(wt, task);
                Throwable thrown = null;
                try {
                    task.run();
                } catch (RuntimeException x) {
                    thrown = x; throw x;
                } catch (Error x) {
                    thrown = x; throw x;
                } catch (Throwable x) {
                    thrown = x; throw new Error(x);
                } finally {
                    afterExecute(task, thrown);
                }
            } finally {
                task = null;
                w.completedTasks++;
                w.unlock();
            }
        }
        completedAbruptly = false;
    } finally {
        processWorkerExit(w, completedAbruptly);
    }
}

该方法是线程池中的Worker线程执行业务逻辑时的核心逻辑,可以看到,里面用到了两种通信机制

  • 同步锁Lock
  • Interrupt

通过同步锁让其它线程进行等待和唤醒,这与前文提到的wait/notify在思想上如出一辙;通过Interrupt来判断当前线程是否有被中断,以便于执行特性的业务逻辑,与前文提到的基于interrupt进行通信也如出一辙

4.2 阻塞队列中线程通信的应用

阻塞队列大家都不陌生吧,核心特性是:当生产者添加数据时,如果发现对列是满的,则阻塞等待有空位时再添加;消费者当发现队列是空的时,阻塞等待有数据后再消费。这个思想的核心实现原理,离不开基于AQS的Lock和Condition的实现(AQS相关原理不在此展开)。

public void put(E e) throws InterruptedException {
    if (e == null) throw new NullPointerException();
    // Note: convention in all put/take/etc is to preset local var
    // holding count negative to indicate failure unless set.
    int c = -1;
    Node<E> node = new Node<E>(e);
    final ReentrantLock putLock = this.putLock;
    final AtomicInteger count = this.count;
    putLock.lockInterruptibly();
    try {
        /*
         * Note that count is used in wait guard even though it is
         * not protected by lock. This works because count can
         * only decrease at this point (all other puts are shut
         * out by lock), and we (or some other waiting put) are
         * signalled if it ever changes from capacity. Similarly
         * for all other uses of count in other wait guards.
         */
        while (count.get() == capacity) {
            notFull.await();
        }
        enqueue(node);
        c = count.getAndIncrement();
        if (c + 1 < capacity)
            notFull.signal();
    } finally {
        putLock.unlock();
    }
    if (c == 0)
        signalNotEmpty();
}

public E take() throws InterruptedException {
    E x;
    int c = -1;
    final AtomicInteger count = this.count;
    final ReentrantLock takeLock = this.takeLock;
    takeLock.lockInterruptibly();
    try {
        while (count.get() == 0) {
            notEmpty.await();
        }
        x = dequeue();
        c = count.getAndDecrement();
        if (c > 1)
            notEmpty.signal();
    } finally {
        takeLock.unlock();
    }
    if (c == capacity)
        signalNotFull();
    return x;
}

上述两个函数,一个是put(阻塞添加),一个是take(阻塞获取),其中使用到了Lock和两个Condition队列,基于同步队列和等待队列,完成多线程之间的通信,当获得线程执行权限时,执行自定义的业务逻辑。

5 总结

在Java中,线程通信不比传统意义上的进程间的通信,它更简洁,更高效,但限制也非常多。在业务开发中往往很难发现它的身影,但你在业务开发中使用到的那些工具和api中一定有它们的存在。

我们掌握好这项知识,不仅仅是为了吊打面试官,也不是为了在公司分享会上进行装逼,更多的是能够扩展我们的知识面,在面临各种技术方案选择时,面临实际的问题和困难时,能够第一时间精准地做出决策,挽狂澜于既倒。