AQS原理和 Condition Queue（上）(基于Java 8)1. AbstractQueuedSynchron

欢迎大家关注 github.com/hsfxuebao/j… ，希望对大家有所帮助，要是觉得可以的话麻烦给点一下Star哈

1. AbstractQueuedSynchronizer 定义

AbstractQueuedSynchronizer 是JUC 中通过 Sync Queue(并发安全的 CLH Queue), Condition Queue(普通的 list) , volatile 变量 state 提供的控制线程获取统一资源(state) 的 Synchronized 工具.

主要特点:

1. 内部含有两条 Queue(Sync Queue, Condition Queue), 这两条 Queue 后面会详细说明.
2. AQS 内部定义获取锁(acquire), 释放锁(release)的主逻辑, 子类实现响应的模版方法即可
3. 支持共享和独占两种模式(共享模式时只用 Sync Queue, 独占模式有时只用 Sync Queue, 但若涉及 Condition, 则还有 Condition Queue); 独占是排他的.
4. 支持 不响应中断获取独占锁(acquire), 响应中断获取独占锁(acquireInterruptibly), 超时获取独占锁(tryAcquireNanos); 不响应中断获取共享锁(acquireShared), 响应中断获取共享锁(acquireSharedInterruptibly), 超时获取共享锁(tryAcquireSharedNanos);
5. 在子类的 tryAcquire, tryAcquireShared 中实现公平与非公平的区分

先看一个demo(实现独占的但是非重入)

import java.io.IOException;
import java.io.ObjectInputStream;
import java.util.concurrent.TimeUnit;
import java.util.concurrent.locks.AbstractQueuedSynchronizer;
import java.util.concurrent.locks.Condition;
import java.util.concurrent.locks.Lock;

/**
 * 独占模式 并且不支持重入的 lock
 * Created by xujiankang on 2016/12/19.
 */
public class Mutex implements Lock, java.io.Serializable {

    // The sync object does all the hard work. We just forward to it
    private final Sync sync = new Sync();

    @Override
    public void lock() {
        sync.acquire(1);
    }

    // 支持中断式的获取 lock
    @Override
    public void lockInterruptibly() throws InterruptedException {
        sync.acquireInterruptibly(1);
    }
    // 尝试获取 lock
    @Override
    public boolean tryLock() {
        return sync.tryAcquire(1);
    }

    /**
     * 尝试 带 timeout 的获取 lock
     */
    @Override
    public boolean tryLock(long time, TimeUnit unit) throws InterruptedException {
        return sync.tryAcquireNanos(1, unit.toNanos(time));
    }

    /** 释放lock */
    @Override
    public void unlock() {
        sync.release(1);
    }

    @Override
    public Condition newCondition() {
        return sync.newCondition();
    }

    public boolean isLocked(){
        return sync.inHeldExclusively();
    }
    public boolean hasQueuedThreads(){
        return sync.hasQueuedThreads();
    }

    // internal helper class
    static class Sync extends AbstractQueuedSynchronizer{

        // report whether in locked state
        protected boolean inHeldExclusively(){ // 判断 lock 是否被占用
            return getState() == 1;
        }
        // 获取 lock
        // Acquire the lock if state is zero
        public boolean tryAcquire(int acquires){
            assert acquires == 1; // Otherwise unsed
            if(compareAndSetState(0, 1)){
                setExclusiveOwnerThread(Thread.currentThread());
                return true;
            }
            return false;
        }
        // 释放 lock
        // Releses the lock by setting state to zero
        protected boolean tryRelease(int release){
            assert release == 1; // Otherwise unused
            if(getState() == 0){
                throw new IllegalMonitorStateException();
            }
            setExclusiveOwnerThread(null);
            setState(0);
            return true;
        }

        // Provides a Condition
        Condition newCondition(){
            return new ConditionObject();
        }

        // Deserializes properly
        private void readObject(ObjectInputStream s)throws IOException, ClassNotFoundException{
            s.defaultReadObject();
            setState(0); // reset to unlocked state
        }
    }

}

继承 AQS 的子类通畅需要实现以下方法:

# 实现独占
tryAcquire
tryRelease
isHeldExclusively
# 实现共享
tryAcquireShared
tryReleaseShared

而一般的lock获取释放流程如下

# lock 获取
Acquire:
while(!tryAcquire(arg)){ // tryAcquire交由子类来实现, 改变 AQS 的state的值
   1. tryAcquire 获取lock没成功, 则入 Sync Queue
   2. 若当前节点是 head.next, 则再次尝试获取一下lock (tryAcquire)
   3. 获取 lock 失败, 则改变 前继节点的 waitStatus 的值(变成SIGNAL), 进行 blocked
}

# lock 释放
Release:
if(tryRelease(arg)){ / tryRelease交由子类来实现, 改变 AQS 的state的值
   1. 判断 lock 是否释放彻底
   2. 若自己被标记为SIGNAL, 则唤醒后继节点, 通知其去获取 AQS 中 state 的值
   3. 将自己的 waitStatus 进行复位到 0
}

整个 AQS 非为以下几部分

Node 节点, 用于存放获取线程的节点, 存在于 Sync Queue, Condition Queue, 这些节点主要的区分在于 waitStatus 的值(下面会详细叙述)
Condition Queue, 这个队列是用于独占模式中, 只有用到 Condition.awaitXX 时才会将 node加到 tail 上(PS: 在使用 Condition的前提是已经获取 Lock)
Sync Queue, 独占共享的模式中均会使用到的存放 Node 的 CLH queue(主要特点是, 队列中总有一个 dummy 节点, 后继节点获取锁的条件由前继节点决定, 前继节点在释放 lock 时会唤醒sleep中的后继节点)
ConditionObject, 用于独占的模式, 主要是线程释放lock, 加入 Condition Queue, 并进行相应的 signal 操作, 详情点击这里 Java 8 源码分析 Condition
独占的获取lock (acquire, release), 例如 ReentrantLock 就是使用这种, 详情点击这里 Java 8 源码分析 ReentrantLock
共享的获取lock (acquireShared, releaseShared), 例如 ReeantrantReadWriteLock, Semaphore, CountDownLatch
下面将一一讲解

1.1 整体了解一下AQS框架：

上图中有颜色的为Method，无颜色的为Attribution。
总的来说，AQS框架共分为五层，自上而下由浅入深，从AQS对外暴露的API到底层基础数据。
当有自定义同步器接入时，只需重写第一层所需要的部分方法即可，不需要关注底层具体的实现流程。当自定义同步器进行加锁或者解锁操作时，先经过第一层的API进入AQS内部方法，然后经过第二层进行锁的获取，接着对于获取锁失败的流程，进入第三层和第四层的等待队列处理，而这些处理方式均依赖于第五层的基础数据提供层。

1.2 原理概览

AQS核心思想是，如果被请求的共享资源空闲，那么就将当前请求资源的线程设置为有效的工作线程，将共享资源设置为锁定状态；如果共享资源被占用，就需要一定的阻塞等待唤醒机制来保证锁分配。这个机制主要用的是CLH队列的变体实现的，将暂时获取不到锁的线程加入到队列中。

CLH：Craig、Landin and Hagersten队列，是单向链表，AQS中的队列是CLH变体的虚拟双向队列（FIFO），AQS是通过将每条请求共享资源的线程封装成一个节点来实现锁的分配。

主要原理图如下：

AQS使用一个Volatile的int类型的成员变量来表示同步状态，通过内置的FIFO队列来完成资源获取的排队工作，通过CAS完成对State值的修改。

1.3 AQS数据结构

先来看下AQS中最基本的数据结构——Node，Node即为上面CLH变体队列中的节点。

解释一下几个方法和属性值的含义：

线程两种锁的模式：

waitStatus有下面几个枚举值：

1.4 同步状态State

在了解数据结构后，接下来了解一下AQS的同步状态——State。AQS中维护了一个名为state的字段，意为同步状态，是由Volatile修饰的，用于展示当前临界资源的获锁情况。

// java.util.concurrent.locks.AbstractQueuedSynchronizer

private volatile int state;

2. 内部类 Node

Node 节点是代表获取lock的线程, 存在于 Condition Queue, Sync Queue 里面，而其主要的分别就是 nextWaiter (标记共享还是独占),
waitStatus 标记node的状态(PS: 这是关键, 理解了 waitStatus 的变化流程, 就能理解整个 AQS)

下图就是 Node

Node.png

见代码:

/**
 * 代表 Thread 存在于 Sync Queue 与 Condition Queue 的节点
 */
static final class Node {
    /** marker to indicate a node is wating in shared mode */
    /** 标识节点是否是 共享的节点(这样的节点只存在于 Sync Queue 里面) */
    static final Node SHARED = new Node();
    /** marker to indicate a node is waiting in exclusive mode */
    /** 标识节点是 独占模式 */
    static final Node EXCLUSIVE = null;

    /** waitStatus value yto indicate thread has cancelled */
    /**
     *  CANCELLED 说明节点已经 取消获取 lock 了(一般是由于 interrupt 或 timeout 导致的)
     *  很多时候是在 cancelAcquire 里面进行设置这个标识
     */
    static final int CANCELLED = 1;

    /** waitStatus value to indicate successor;s thread needs unparking */
    /**
     * SIGNAL 标识当前节点的后继节点需要唤醒(PS: 这个通常是在 独占模式下使用, 在共享模式下有时用 PROPAGATE)
     *
     */
    static final int SIGNAL = -1;

    /** waitStatus value to indicate thread is waiting on condition */
    /**
     * 当前节点在 Condition Queue 里面
     */
    static final int CONDITION = -2;
    /**
     * waitStatus value to indicate the next acquireShared should
     * unconditionally propagate
     */
    /**
     * 当前节点获取到 lock 或进行 release lock 时, 共享模式的最终状态是 PROPAGATE(PS: 有可能共享模式的节点变成 PROPAGATE 之前就被其后继节点抢占 head 节点, 而从Sync Queue中被踢出掉)
     */
    static final int PROPAGATE = -3;

    /**
     * Status field, taking only the values:
     *
     *  SIGNAL:     The successor of this node is (or will soon be)
     *              blocked (via park), so the current node must
     *              unpark its successor when is releases or
     *              cancels. To avoid races, acquire methods must
     *              first indicate they need a signal,
     *              then retry the atomic acquire, and then,
     *              on failure, block
     *  CANCELLED: This node is cancelled due to timeout or interrupt
     *              Nodes never leave this state. In particular,
     *              a thread with cancelled node never again blocks
     *  CONDITION: This node is currently on a condition queue.
     *              It will not be used as a sync queue node
     *              until transferred, at which time the status
     *              will be set to 0. (Use of this value here has
     *              nothing to do with other uses of the
     *              field, but simplifies mechanics)
     * PROPAGATE: A releaseShared should be propagated to other
     *              nodes. This is set (for head node only) in
     *              doReleaseShared to ensure propagation
     *              continues, even if other operations hava
     *              since intervened
     * 0:          None of the above(以上)
     *
     * The values are arranged numerically to simplify use.
     * Non-negative values mean that a node doesn't need to
     * signal. So, most code doesn't need to check for particular
     * values, just for sign
     *
     * The field is initialized to 0 for narmal sync nodes, and
     * CONDITION for condition nodes. It is modified using CAS
     * (or when possible, unconditional volatile writes)
     *
     */
    volatile int waitStatus;

    /**
     * 节点在 Sync Queue 里面时的前继节点(主要来进行 skip CANCELLED 的节点)
     * 注意: 根据 addWaiter方法:
     *  1. prev节点在队列里面, 则 prev != null 肯定成立
     *  2. prev != null 成立, 不一定 node 就在 Sync Queue 里面
     */
    volatile Node prev;

    /**
     * Node 在 Sync Queue 里面的后继节点, 主要是在release lock 时进行后继节点的唤醒
     * 而后继节点在前继节点上打上 SIGNAL 标识, 来提醒他 release lock 时需要唤醒
     */
    volatile Node next;

    /** 获取 lock 的引用 */
    volatile Thread thread;

    /**
     * 作用分成两种:
     *  1. 在 Sync Queue 里面, nextWaiter用来判断节点是 共享模式, 还是独占模式
     *  2. 在 Condition queue 里面, 节点主要是链接且后继节点 (Condition queue是一个单向的, 不支持并发的 list)
     */
    Node nextWaiter;

    /** 当前节点是否是共享模式 */
    final boolean isShared() {
        return nextWaiter == SHARED;
    }

    /**
     * 获取 node 的前继节点
     */
    final Node predecessor() throws NullPointerException{
        Node p = prev;
        if(p == null){
            throw new NullPointerException();
        }else{
            return p;
        }
    }

    Node(){
        // Used to establish initial head or SHARED marker
    }

    /**
     * 初始化 Node 用于 Sync Queue 里面
     */
    Node(Thread thread, Node mode){     // Used by addWaiter
        this.nextWaiter = mode;
        this.thread = thread;
    }

    /**
     * 初始化 Node 用于 Condition Queue 里面
     */
    Node(Thread thread, int waitStatus){ // Used by Condition
        this.waitStatus = waitStatus;
        this.thread = thread;
    }
}

waitStatus的状态变化:

1. 线程刚入 Sync Queue 里面, 发现 独占锁被其他人获取, 则将其前继节点标记为 SIGNAL, 然后再尝试获取一下锁(调用 tryAcquire 方法)
2. 若 调用 tryAcquire 方法获取失败, 则判断一下是否前继节点被标记为 SIGNAL, 若是的话 直接 block(block前会确保前继节点被标记为SIGNAL, 因为前继节点在进行释放锁时根据是否标记为 SIGNAL 来决定唤醒后继节点与否 <- 这是独占的情况下)
3. 前继节点使用完lock, 进行释放, 因为自己被标记为 SIGNAL, 所以唤醒其后继节点

waitStatus 变化过程:

1. 独占模式下:  0(初始) -> signal(被后继节点标记为release需要唤醒后继节点) -> 0 (等释放好lock, 会恢复到0)
2. 独占模式 + 使用 Condition情况下: 0(初始) -> signal(被后继节点标记为release需要唤醒后继节点) -> 0 (等释放好lock, 会恢复到0)
   其上可能涉及 中断与超时, 只是多了一个 CANCELLED, 当节点变成 CANCELLED, 后就等着被清除
3. 共享模式下: 0(初始) -> PROPAGATE(获取 lock 或release lock 时) (获取 lock 时会调用 setHeadAndPropagate 来进行 传递式的唤醒后继节点, 直到碰到 独占模式的节点)
4. 共享模式 + 独占模式下: 0(初始) -> signal(被后继节点标记为release需要唤醒后继节点) -> 0 (等释放好lock, 会恢复到0)
其上的这些状态变化主要在: doReleaseShared , shouldParkAfterFailedAcquire 里面

3. 内部Queue Condition Queue

Condition Queue 是一个并发不安全的, 只用于独占模式的队列(PS: 为什么是并发不安全的呢? 主要是在操作 Condition 时, 线程必需获取独占的 lock, 所以不需要考虑并发的安全问题);
而当Node存在于 Condition Queue 里面, 则其只有 waitStatus, thread, nextWaiter 有值, 其他的都是null(其中的 waitStatus 只能是 CONDITION, 0(0 代表node进行转移到 Sync Queue里面, 或被中断/timeout)); 这里有个注意点, 就是当线程被中断或获取 lock 超时, 则一瞬间 node 会存在于 Condition Queue, Sync Queue 两个队列中.

见图:

节点 Node4, Node5, Node6, Node7 都是调用 Condition.awaitXX 方法加入 Condition Queue(PS: 加入后会将原来的 lock 释放)

3.1. 入队列方法 addConditionWaiter

/**
 * Adds a new waiter to wait queue
 * 将当前线程封装成一个 Node 节点 放入大 Condition Queue 里面
 * 大家可以注意到, 下面对 Condition Queue 的操作都没考虑到 并发(Sync Queue 的队列是支持并发操作的), 这是为什么呢? 因为在进行操作 Condition 是当前的线程已经获取了AQS的独占锁, 所以不需要考虑并发的情况
 * @return
 */
private Node addConditionWaiter(){
    Node t = lastWaiter;                                // 1. Condition queue 的尾节点
    // If lastWaiter is cancelled, clean out              // 2.尾节点已经Cancel, 直接进行清除,
                                                          //    这里有1个问题, 1 何时出现t.waitStatus != Node.CONDITION -> 在对线程进行中断时 ConditionObject -> await -> checkInterruptWhileWaiting -> transferAfterCancelledWait "compareAndSetWaitStatus(node, Node.CONDITION, 0)" <- 导致这种情况一般是 线程中断或 await 超时
                                                          //    一个注意点: 当Condition进行 awiat 超时或被中断时, Condition里面的节点是没有被删除掉的, 需要其他 await 在将线程加入 Condition Queue 时调用addConditionWaiter而进而删除, 或 await 操作差不多结束时, 调用 "node.nextWaiter != null" 进行判断而删除 (PS: 通过 signal 进行唤醒时 node.nextWaiter 会被置空, 而中断和超时时不会)
    if(t != null && t.waitStatus != Node.CONDITION){
        unlinkCancelledWaiters();                        // 3. 调用 unlinkCancelledWaiters 对 "waitStatus != Node.CONDITION" 的节点进行删除(在Condition里面的Node的waitStatus 要么是CONDITION(正常), 要么就是 0 (signal/timeout/interrupt))
        t = lastWaiter;                                // 4. 获取最新的 lastWaiter
    }
    Node node = new Node(Thread.currentThread(), Node.CONDITION); // 5. 将线程封装成 node 准备放入 Condition Queue 里面
    if(t == null){
        firstWaiter = node;                           // 6 .Condition Queue 是空的
    }else{
        t.nextWaiter = node;                          // 7. 最加到 queue 尾部
    }
    lastWaiter = node;                                // 8. 重新赋值 lastWaiter
    return node;
}

3.2.删除Cancelled节点的方法 unlinkCancelledWaiters

当Node在Condition Queue 中, 若状态不是 CONDITION, 则一定是被中断或超时

/**
 * 在 调用 addConditionWaiter 将线程放入 Condition Queue 里面时 或 awiat 方法获取 差不多结束时 进行清理 Condition queue 里面的因 timeout/interrupt 而还存在的节点
 * 这个删除操作比较巧妙, 其中引入了 trail 节点， 可以理解为traverse整个 Condition Queue 时遇到的最后一个有效的节点
 */
private void unlinkCancelledWaiters(){
    Node t = firstWaiter;
    Node trail = null;
    while(t != null){
        Node next = t.nextWaiter;               // 1. 先初始化 next 节点
        if(t.waitStatus != Node.CONDITION){   // 2. 节点不有效, 在Condition Queue 里面 Node.waitStatus 只有可能是 CONDITION 或是 0(timeout/interrupt引起的)
            t.nextWaiter = null;               // 3. Node.nextWaiter 置空
            if(trail == null){                  // 4. 一次都没有遇到有效的节点
                firstWaiter = next;            // 5. 将 next 赋值给 firstWaiter(此时 next 可能也是无效的, 这只是一个临时处理)
            }else{
                trail.nextWaiter = next;       // 6. next 赋值给 trail.nextWaiter, 这一步其实就是删除节点 t
            }
            if(next == null){                  // 7. next == null 说明 已经 traverse 完了 Condition Queue
                lastWaiter = trail;
            }
        }else{
            trail = t;                         // 8. 将有效节点赋值给 trail
        }
        t = next;
    }
}

3.3. 转移节点的方法 transferForSignal

transferForSignal只有在节点被正常唤醒才调用的正常转移的方法

/**
 * 将 Node 从Condition Queue 转移到 Sync Queue 里面
 * 在调用transferForSignal之前, 会 first.nextWaiter = null;
 * 而我们发现 若节点是因为 timeout / interrupt 进行转移, 则不会进行这步操作; 两种情况的转移都会把 wautStatus 置为 0
 */
final boolean transferForSignal(Node node){
    /**
     * If cannot change waitStatus, the node has been cancelled
     */
    if(!compareAndSetWaitStatus(node, Node.CONDITION, 0)){ // 1. 若 node 已经 cancelled 则失败
        return false;
    }

    /**
     * Splice onto queue and try to set waitStatus of predecessor to
     * indicate that thread is (probably) waiting, If cancelled or
     * attempt to set waitStatus fails, wake up to resync (in which
     * case the waitStatus can be transiently and harmlessly wrong)
     */
    Node p = enq(node);                                 // 2. 加入 Sync Queue
    int ws = p.waitStatus;
    if(ws > 0 || !compareAndSetWaitStatus(p, ws, Node.SIGNAL)){ // 3. 这里的 ws > 0 指Sync Queue 中node 的前继节点cancelled 了, 所以, 唤醒一下 node ; compareAndSetWaitStatus(p, ws, Node.SIGNAL)失败, 则说明 前继节点已经变成 SIGNAL 或 cancelled, 所以也要 唤醒
        LockSupport.unpark(node.thread);
    }
    return true;
}

3.4. 转移节点的方法 transferAfterCancelledWait

transferAfterCancelledWait 在节点获取lock时被中断或获取超时才调用的转移方法

/**
 * 将 Condition Queue 中因 timeout/interrupt 而唤醒的节点进行转移
 */
final boolean transferAfterCancelledWait(Node node){
    if(compareAndSetWaitStatus(node, Node.CONDITION, 0)){ // 1. 没有 node 没有 cancelled , 直接进行转移 (转移后, Sync Queue , Condition Queue 都会存在 node)
        enq(node);
        return true;
    }

    /**
     * If we lost out to a signal(), then we can't proceed
     * until it finishes its enq(). Cancelling during an
     * incomplete transfer is both race and transient, so just
     * spin
     */
    while(!isOnSyncQueue(node)){                // 2.这时是其他的线程发送signal,将本线程转移到 Sync Queue 里面的工程中(转移的过程中 waitStatus = 0了, 所以上面的 CAS 操作失败)
        Thread.yield();                         // 这里调用 isOnSyncQueue判断是否已经 入Sync Queue 了
    }
    return false;
}

AQS原理和 Sync Queue（下）(基于Java 8)

参考：

AbstractQueuedSynchronizer 源码分析 (基于Java 8)
The java.util.concurrent Synchronizer Framework (这是AQS作者写的, 强烈建议看一下)

从ReentrantLock的实现看AQS的原理及应用