原文地址：The "Double-Checked Locking is Broken" Declaration

双重锁校验作为一种在多线程环境中实现懒加载的手段，应用广泛。

遗憾的是，在 Java 中实现双重锁校验时，如果没有额外的同步，它并能以独立平台的方式生效。当使用用其他语言(比如 C++)实现时，能否生效取决于处理器的内存模型、编译器执行的重排序、编译器与同步库之间的交互。由于这些都没有在诸如 C++ 之类的语言中指定，我们无法确定它将在哪些情况下生效。我们可以在 C++ 中使用显式的内存屏障使双重锁校验生效，但这些屏障在 Java 中不可用。

首先来解释下我们预期的行为：

// 单线程版本
class Foo {    
    private Helper helper = null;   
    public Helper getHelper() {     
        if (helper == null) {         
            helper = new Helper();     
        }
        return helper;     
    }   
}

如果这段代码用多线程执行的话，就会出错。最明显的是，两个或更多个 Helper 对象会被创建。（其他问题我们后面再做说明）。最简单的修复方法是，给 getHelper() 方法做同步（加同步锁），如下。

// 正确的多线程版本
class Foo {
    private Helper helper = null;
    public synchronized Helper getHelper() {
        if (helper == null) {
            helper = new Helper();
        }
        return helper;
    }
}

上面这段代码每次执行 getHelper() 方法都会执行同步操作。而双重锁校验模式期望在 helper 对象创建完成后不再进行同步操作：

// 错误的多线程版本
class Foo {
    private Helper helper = null;
    public Helper getHelper() {
        if (helper == null) {
            synchronized(this) {
                if (helper == null) {
                    helper = new Helper();
                }
            }
        }
        return helper;
    }
}

遗憾的是，这段代码无论是在优化型的编译器下还是在共享内存处理器中都不能有效工作。

为什么双重校验锁不起作用

不起作用的原因有许多。我们先说几个比较明显的原因。你可能会尝试“修复”双重锁校验模式。但修复不会生效：因为还有些不易察觉的问题存在。理解了这些原因之后，再次做一些修复，可能还是不生效，因为还有其他不易察觉的原因。

很多聪明的人花了许多时间来解决这个问题，但除了让每个线程在访问helper对象时加锁以外别无他法。

第一个不起作用的原因

最明显的原因是：初始化Helper对象的操作和给helper字段赋值的操作可以是无序的。因此，一个线程调用getHelper()时可能会拿到非空的helpe对象引用，但却只能看到默认字段值、而不是构造器中赋予的值。

如果编译器能证明构造方法不会抛出异常或执行同步操作，初始化对象的写操作与给hepler字段赋值的操作就可以被自由的重排序。

即使编译器不对这些操作进行重排序，在多线程处理器中，处理器或者内存系统都可能对他们进行重排序，运行在其它处理器上的线程就可能看到重排序带来的结果。

Doug Lea写了一篇基于编译器的重排序细节

一个证明不起作用的测试用例

Paul Jakubik找到一个能证明使用双重校验锁无法生效的例子。这里的代码做了一些整理。

如果你的系统使用了Symantec JIT，双重校验锁将会无法生效。因为，Symantec JIT将这段代码 singletons[i].reference = new Singleton();编译成了下面的样子（Symantec JIT用了一种基于句柄的对象分配系统）。

0206106A   mov         eax,0F97E78h
0206106F   call        01F6B210                  ; allocate space for
                                                 ; Singleton, return result in eax
02061074   mov         dword ptr [ebp],eax       ; EBP is &singletons[i].reference 
                                                ; store the unconstructed object here.
02061077   mov         ecx,dword ptr [eax]       ; dereference the handle to
                                                 ; get the raw pointer
02061079   mov         dword ptr [ecx],100h      ; Next 4 lines are
0206107F   mov         dword ptr [ecx+4],200h    ; Singleton's inlined constructor
02061086   mov         dword ptr [ecx+8],400h
0206108D   mov         dword ptr [ecx+0Ch],0F84030h

就如同你看到的，在Singleton类的构造器执行前，singletons[i].reference的赋值就被执行了。在现有的Java内存模型中，这是完全合法的，并且在C、C++中也是合法的（因为这两种语言都没有用内存模型）。

一个没什么用的修复

鉴于上面给出的解释，很多人建议这样写：

// (Still) Broken multithreaded version
// "Double-Checked Locking" idiom
class Foo { 
  private Helper helper = null;
  public Helper getHelper() {
    if (helper == null) {
      Helper h;
      synchronized(this) {
        h = helper;
        if (h == null) 
            synchronized (this) {
              h = new Helper();
            } // release inner synchronization lock
        helper = h;
      } 
    }    
    return helper;
  }
  // other functions and members...
}

这段代码将Helper对象的构造放在了内部的synchronized块中。直觉的想法是，在释放同步锁的地方应该会有个内存屏障，能够阻止初始化Helper对象和给helper字段赋值这两个操作的重排序。

遗憾的是，这种直觉是完全错误的。同步的规则并不是这样。monitorexit（比如，释放同步锁）的规则是，monitorexit之前的动作必须在monitor被释放前执行。然而，没有任何规则约束monitorexit之后的动作比如在monitor释放之后执行。因此，编译器把赋值语句helper = h;挪到synchronized块里面是合情合理的，这就又回到了我们之前的问题。许多处理器提供了这种单向的内存屏障指令。但如果把monitorexit的语义变成需要释放一个成为完全内存屏障的锁，性能就会变差。

monitorexit：一个字节码指令，synchronized就是通过monitorenter和monitorexit这两个指令来实现的。当执行monitorexit时，Java虚拟机则需将锁对象的计数器减1。当计数器减为0时，表示锁已经被释放掉了。

值得费这么大劲来优化双重校验锁吗？

对大多数应用来说，仅仅使getHelper()变成同步方法的开销并不大。只有当你知道这对应用确实造成了很大的开销时，才应该考虑这种细节的优化。通常，更高级别的技巧，比如使用内置的归并排序，而不是交换排序（见SPECJVM DB的基准），带来的影响更大。

使它对static单例生效

如果你创建的单例是静态的（只会有一个Helper被创建），那么有一个简单且优雅的解决方案。

只需将singleton变量作为另一个类的静态字段。Java的语法会确保这个字段被引用后才能初始化，并且所有访问这个字段的线程都只能看到初始化字段产生的所有结果，

class HelperSingleton {   
    static Helper singleton = new Helper();   
}

对32位的基本变量有效

尽管双重校验锁不能用于对象引用，它可以用于32位的基本类型（比如，int或者float）。注意，双重校验锁并不对long或者double类型生效，因为64位的基本类型的非同步读写操作并不是原子的。

// Correct Double-Checked Locking for 32-bit primitives 
class Foo {    
    private int cachedHashCode = 0;   
    public int hashCode() {     
        int h = cachedHashCode;     
        if (h == 0)      
            synchronized(this) {       
                if (cachedHashCode != 0) return cachedHashCode;       
                h = computeHashCode();       
                cachedHashCode = h;       
            }     
            return h;     
        }   
        // other functions and members...   
    }
}

实际上，假设computeHashCode方法总是返回相同的结果并且没有副作用（比如幂等），甚至可以去掉同步。

// Lazy initialization 32-bit primitives 
// Thread-safe if computeHashCode is idempotent 
class Foo {    
    private int cachedHashCode = 0;   
    public int hashCode() {     
        int h = cachedHashCode;     
        if (h == 0) {       
            h = computeHashCode();       
            cachedHashCode = h;       
        }     
        return h;     
    }   
    // other functions and members...   
}

使它对显式的内存屏障生效

如果你使用了显式的内存屏障指令，是有可能让双重校验锁模式生效的。比如C++语言，你可以使用Doug Schmidt et al.书里的这段代码：

// C++ implementation with explicit memory barriers
// Should work on any platform, including DEC Alphas
// From "Patterns for Concurrent and Distributed Objects",
// by Doug Schmidt
template <class TYPE, class LOCK> TYPE *
Singleton<TYPE, LOCK>::instance (void) {
    // First check
    TYPE* tmp = instance_;
    // Insert the CPU-specific memory barrier instruction
    // to synchronize the cache lines on multi-processor.
    asm ("memoryBarrier");
    if (tmp == 0) {
        // Ensure serialization (guard
        // constructor acquires lock_).
        Guard<LOCK> guard (lock_);
        // Double check.
        tmp = instance_;
        if (tmp == 0) {
                tmp = new TYPE;
                // Insert the CPU-specific memory barrier instruction
                // to synchronize the cache lines on multi-processor.
                asm ("memoryBarrier");
                instance_ = tmp;
        }
    return tmp;
    }

使用线程局部存储

Alexander Terekhov (TEREKHOV@de.ibm.com)提出了聪明的建议：使用线程局部存储来实现双重校验锁。每个线程保存一个线程局部标志来确认线程是否做了同步操作。

 class Foo {
     /** If perThreadInstance.get() returns a non-null value, this thread
            has done synchronization needed to see initialization
            of helper */
     private final ThreadLocal perThreadInstance = new ThreadLocal();
     private Helper helper = null;
     public Helper getHelper() {
         if (perThreadInstance.get() == null) createHelper();
         return helper;
     }
     private final void createHelper() {
         synchronized(this) {
             if (helper == null)
                 helper = new Helper();
         }
         // Any non-null value would do as the argument here
         perThreadInstance.set(perThreadInstance);
     }
}

这种方法的性能严重依赖于所使用的JDK实现。在Sun 1.2的实现中，ThreadLocal是非常慢的。在1.3中变得更快了，在1.4中应该会更快。Doug Lea分析了一些延迟初始化技术实现的性能

使用新的内存模型

对于JDK5来说，有一种新的Java内存模型和线程规范。

使用Volatile修复双重校验锁

JDK5以及后续版本扩展了volatile语义，不再允许volatile写操作与其前面的读写操作重排序，也不允许volatile读操作与其后面的读写操作重排序。更多详细信息见Jeremy Manson的博客。

这样的话，把helper字段定义为volatile就可以使双重校验模式生效了。这对JDK4及更早的版本无效。

// Works with acquire/release semantics for volatile
// Broken under current semantics for volatile
class Foo {
    private volatile Helper helper = null;
    public Helper getHelper() {
        if (helper == null) {
            synchronized(this) {
                if (helper == null)
                    helper = new Helper();
            }
        }
        return helper;
    }
}

不可变对象使用双重校验锁

如果Helper是一个不可变的对象，比如Helper的所有字段都是final的，那么不使用volatile双重校验锁也能生效。原因是不可变对象（比如String或者Integer）的引用跟int或者float是类似的；不可变对象读写引用的操作是原子的。

初探设计模式（二）番外：关于“双重锁校验错误”的说明（翻译）