闻所未闻的JVM底层之类加载

1,375 阅读13分钟

JVM中的OOP-KLASS模型

在JVM中,使用OOP-KLASS模型来表示Java对象

OOP或者OOPS(Ordinary Object Pointer)指的是普通对象指针,主要职能是表示对象的实例数据,存储在堆里面

Klass是用来描述实例的具体类型,实现语言层面是Java类,存储在元空间(JDK8之后)

打开OpenJDK源码这里我用的jdk12编译的,之后会单独出一篇mac下编译jdk源码的教程,没编译出来的也没事,不影响后面学习

打开Klass类

这里可以看出来,Klass继承自Metadata(元数据),我们跳转到Metadata中 Metadata继承了MetaspaceObj(元空间)

在Klass下面,还继承了几个类,分别是:

  • InstanceKlass
    • InstanceMirrorKlass:描述java.lang.class的实例
    • InstanceRefKlass:描述java.lang.ref.Reference的子类
    • InstanceClassLoaderKlass:用于遍历某个加载器加载的类
  • ArrayKlass
    • TypeArrayKlass:描述Java中基本类型数组的数据结构
    • ObjArrayKlass:描述Java中引用类型数组的数据结构

接下来一一讲解

InstanceKlass

类加载器把class文件加载到内存中,生成的类

// An InstanceKlass is the VM level representation of a Java class.
// It contains all information needed for at class at execution runtime.

//  InstanceKlass embedded field layout (after declared fields):
//    [EMBEDDED Java vtable             ] size in words = vtable_len
//    [EMBEDDED nonstatic oop-map blocks] size in words = nonstatic_oop_map_size
//      The embedded nonstatic oop-map blocks are short pairs (offset, length)
//      indicating where oops are located in instances of this klass.
//    [EMBEDDED implementor of the interface] only exist for interface
//    [EMBEDDED unsafe_anonymous_host klass] only exist for an unsafe anonymous class (JSR 292 enabled)
//    [EMBEDDED fingerprint       ] only if should_store_fingerprint()==true


// forward declaration for class -- see below for definition
struct JvmtiCachedClassFileData;

class InstanceKlass: public Klass {
  friend class VMStructs;
  friend class JVMCIVMStructs;
  friend class ClassFileParser;
  friend class CompileReplay;

 public:
  static const KlassID ID = InstanceKlassID;

 protected:
  InstanceKlass(const ClassFileParser& parser, unsigned kind, KlassID id = ID);

 public:
  InstanceKlass() { assert(DumpSharedSpaces || UseSharedSpaces, "only for CDS"); }

  // See "The Java Virtual Machine Specification" section 2.16.2-5 for a detailed description
  // of the class loading & initialization procedure, and the use of the states.
  enum ClassState {
    allocated,                          // allocated (but not yet linked)
    loaded,                             // loaded and inserted in class hierarchy (but not linked yet)
    linked,                             // successfully linked/verified (but not initialized yet)
    being_initialized,                  // currently running class initializer
    fully_initialized,                  // initialized (successfull final state)
    initialization_error                // error happened during initialization
  };

 private:
  static InstanceKlass* allocate_instance_klass(const ClassFileParser& parser, TRAPS);
...

实战

这里使用到了一个工具HSDB

public class Test_1 {

    public static void main(String[] args) {
        while (true);
    }
}

class Test_1_A{
    public static String str = "A str";

    static {
        System.out.println("A Static Block");
    }
}

class Test_1_B{
    public static String str = "B str";

    static {
        System.out.println("B Static Block");
    }
}

运行这段代码,使用jps查看这个进程id,输入到HSDB上 点击Tools,选择第一个Class Browser(类浏览) 这里显示的是JVM关联的所有的class对象 找到这个类 后面显示的0x00000007c0060828是内存地址

点击Tools点击Inspector,输入复制的内存地址 可以看一下,上面的Test_1的Java类,其实就是对应一个InstanceKlass

InstanceMirrorKlass

这个类说白了就是class对象(堆区)

ArrayKlass

和InstanceKlass类似,是存储数组类的元信息

静态数据类型和动态数据类型

  • 动态数据类型是运行时动态生成的

  • 静态数据类型JVM中内置的八种数据类型

在Java中,数组是动态数据类型

证明:

还是用到之前文章中用到的idea插件jclasslib

public class Test_1 {

    public static void main(String[] args) {
        int[] arr = new int[1];
        //while (true);
    }
}

class Test_1_A{
    public static String str = "A str";

    static {
        System.out.println("A Static Block");
    }
}

class Test_1_B{
    public static String str = "B str";

    static {
        System.out.println("B Static Block");
    }
}

这是一段代码,我们运行一下,查看反编译之后生成的字节码 第一步,iconst_1将1压入操作数栈

第二步,newarray 10,上一篇文章中没有出现这个,我们来看一下newarray是什么意思 意思就是创建一个指定原始类型(int,float,char...)的数组,并将其引用值压入栈顶

基本数据类型的数组,在JVM中的表现形式就是TypeArrayKlass

那么,如果是一个引用类型数组呢?

public class Test_2 {

    public static void main(String[] args) {
        int[] arr = new int[1];
        //while (true);
        Test_2[] test_2 = new Test_2[1];
    }
}

可以看到,引用数据类型是anewarray,看一下anewarray是什么意思 创建一个引用型(如类,接口,数组)的数组,并将其引用压入栈顶

引用数据类型的数组,在JVM中的表现形式就是ObjArrayKlass

我们运行上面代码,用jps查看进程id 点击main,点击放大镜右边的按钮,查看线程的堆栈

这里我们可以看到0x000000076ada7858,点击Tools,点击Inspector,输入0x000000076ada7858回车查看 TypeArrayKlass,证明了:基本数据类型的数组,在JVM中的表现形式就是TypeArrayKlass

输入0x000000076ada78d8,回车 可以看到ObjArrayKlass,证明了:引用数据类型的数组,在JVM中的表现形式就是ObjArrayKlass

类加载过程

加载

  1. 通过类的全限定名获取存储该类的class文件(没有明确必须从哪获取)
  2. 解析成运行时数据,即instanceKlass实例,存放在方法区
  3. 在堆区生成该类的Class对象,即instaceMirrorKlass实例

何时加载

  1. new,getstatic,putstatic,invokestatic
  2. 反射
  3. 初始化一个类的子类会去主动加载类
  4. 启动类(main函数所在类)
  5. 当使用jdk1.7动态语言支持时,如果一个java.lang.invoke.MethodHandle实例最后的解析结果REF_getstatic,REF_putstatic,REF_invokeStatic的方法句柄,并且这个方法句柄所对应的类没有进行初始化,则需要先触发其初始化

预加载:包装类,String,Thread

从哪里加载

  1. 从压缩包中读取,如jar,war
  2. 从网络中获取,如Web Applet
  3. 动态生成,如动态代理,GCLIB
  4. 由其他文件生成,如JSP
  5. 从数据库读取
  6. 从加密文件中读取

验证

  1. 文件格式验证
  2. 元数据验证
  3. 字节码验证
  4. 字符引用验证

准备

为静态变量分配内存,赋初值

实例变量是在创建对象的时候完成赋值的,没有赋初值一说 如果被final修饰,在编译的时候,会给属性添加ConstantValue属性,准备阶段直接完成赋值,即没有赋初值这一步

public class Test_3 {

    public static final int a = 10;
    public static int b = 10;

    public static void main(String[] args) {
        int[] arr = new int[1];
        
        Test_3[] test_2 = new Test_3[1];
       
    }
}

上面这段代码我们运行之后,再看一下字段 可以看到,被final修饰的a比没被final修饰的b多一个ConstantValue,并且常量值所以是10,直接赋值10了

解析

将常量池中的符号引用(指向运行时常量池的引用)转为直接引用(内存地址)

解析后的信息存储在ConstantPoolCache类实例中

  1. 类或接口的解析
  2. 字段解析
  3. 方法解析
  4. 接口方法解析

何时解析

思路:

  1. 加载阶段解析常量池
  2. 用的时候解析

进入到.class目录下,控制台输入javap -v xxx.class

/Test_3.class
  Last modified 20201223日; size 617 bytes
  MD5 checksum eb92dd7cf9716f003d1f643143892fd3
  Compiled from "Test_3.java"
public class com.zzz.Test_3
  minor version: 0
  major version: 52
  flags: (0x0021) ACC_PUBLIC, ACC_SUPER
  this_class: #2                          // com/zzz/Test_3
  super_class: #4                         // java/lang/Object
  interfaces: 0, fields: 2, methods: 3, attributes: 1
Constant pool:
   #1 = Methodref          #4.#31         // java/lang/Object."<init>":()V
   #2 = Class              #32            // com/zzz/Test_3
   #3 = Fieldref           #2.#33         // com/zzz/Test_3.b:I
   #4 = Class              #34            // java/lang/Object
   #5 = Utf8               a
   #6 = Utf8               I
   #7 = Utf8               ConstantValue
   #8 = Integer            10
   #9 = Utf8               b
  #10 = Utf8               <init>
  #11 = Utf8               ()V
  #12 = Utf8               Code
  #13 = Utf8               LineNumberTable
  #14 = Utf8               LocalVariableTable
  #15 = Utf8               this
  #16 = Utf8               Lcom/zzz/Test_3;
  #17 = Utf8               main
  #18 = Utf8               ([Ljava/lang/String;)V
  #19 = Utf8               args
  #20 = Utf8               [Ljava/lang/String;
  #21 = Utf8               arr
  #22 = Utf8               [I
  #23 = Utf8               test_2
  #24 = Utf8               [Lcom/zzz/Test_3;
  #25 = Utf8               StackMapTable
  #26 = Class              #22            // "[I"
  #27 = Class              #24            // "[Lcom/zzz/Test_3;"
  #28 = Utf8               <clinit>
  #29 = Utf8               SourceFile
  #30 = Utf8               Test_3.java
  #31 = NameAndType        #10:#11        // "<init>":()V
  #32 = Utf8               com/zzz/Test_3
  #33 = NameAndType        #9:#6          // b:I
  #34 = Utf8               java/lang/Object
{
  public static final int a;
    descriptor: I
    flags: (0x0019) ACC_PUBLIC, ACC_STATIC, ACC_FINAL
    ConstantValue: int 10

  public static int b;
    descriptor: I
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC

  public com.zzz.Test_3();
    descriptor: ()V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/zzz/Test_3;

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=1, locals=3, args_size=1
         0: iconst_1
         1: newarray       int
         3: astore_1
         4: iconst_1
         5: anewarray     #2                  // class com/zzz/Test_3
         8: astore_2
         9: goto          9
      LineNumberTable:
        line 9: 0
        line 11: 4
        line 12: 9
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0      12     0  args   [Ljava/lang/String;
            4       8     1   arr   [I
            9       3     2 test_2   [Lcom/zzz/Test_3;
      StackMapTable: number_of_entries = 1
        frame_type = 253 /* append */
          offset_delta = 9
          locals = [ class "[I", class "[Lcom/zzz/Test_3;" ]

  static {};
    descriptor: ()V
    flags: (0x0008) ACC_STATIC
    Code:
      stack=1, locals=0, args_size=0
         0: bipush        10
         2: putstatic     #3                  // Field b:I
         5: return
      LineNumberTable:
        line 6: 0
}
SourceFile: "Test_3.java"

我这里是Test3.class的静态常量池,我们运行这个类,控制台输入jps查看进程,复制到HSDB上,点击Tools,点击Class Browser,点击public class com.zzz.Test_3 @0x00000007c0060828 点击Constant Pool下面的,可以查看运行时常量池 index为3的位置上,class已经指向了内存地址,而上面的静态常量池中,class指向的还是#32 这就证明了,解析阶段,符号引用转了直接引用

初始化

执行静态代码块,完成静态变量的赋值

静态字段,静态代码块,字节码层面会生成clinit方法,方法中语句的先后顺序与代码的编写顺序相关

public class Test_3 {

    public static int a = 10;
    public static int b = 10;

    public static void main(String[] args) {
        int[] arr = new int[1];
        
        Test_3[] test_2 = new Test_3[1];
        
    }
}

运行上面这段代码 用jclasslib可以看到,执行顺序是和编写顺序一样的

实战:static加载顺序

public class Test_21 {

    public static void main(String[] args) {
        Test_21_A obj = Test_21_A.getInstance();

        System.out.println(obj.val1);
        System.out.println(obj.val2);
    }
}

class Test_21_A{
    public static int val1;

    public static int val2 = 1;

    public static Test_21_A instance = new Test_21_A();

    Test_21_A(){
        val1++;
        val2++;
    }

    public static Test_21_A getInstance(){
        return instance;
    }
}

上面这段代码的运行结果: 因为val1被赋值初始值是0,val2是1,所以结果是1,2

public class Test_21 {

    public static void main(String[] args) {
        Test_21_A obj = Test_21_A.getInstance();

        System.out.println(obj.val1);
        System.out.println(obj.val2);
    }
}

class Test_21_A{
    public static int val1;

    public static Test_21_A instance = new Test_21_A();

    Test_21_A(){
        val1++;
        val2++;
    }
    public static int val2 = 1;

    public static Test_21_A getInstance(){
        return instance;
    }
}

想一下,上面这段代码结果是多少呢? 结果是1,1因为定义val2是在后面,所以虽然执行了++,但是val2的赋值把前面的直接覆盖了,所以是1,1

JVM加载类是懒加载模式

public class Test_2 {

    public static void main(String[] args) {
        System.out.println(Test_2_B.str);
    }
}

class Test_2_A{
    public static String str = "A str";

    static {
        System.out.println("A Static Block");
    }
}

看这段代码,猜一下执行结果 没有执行B的static代码块?为什么,上面说了JVM加载类是懒加载模式,str是定义在Test_2_A中的,并没有使用到Test_2_B,所以不会加载Test_2_B中的static代码块

这里我们改一下

public class Test_2 {


    public static void main(String[] args) {
        System.out.println(Test_2_B.str);
    }
}

class Test_2_A{


    static {
        System.out.println("A Static Block");
    }
}

class Test_2_B extends Test_2_A{
    //public static String str = "B str";
    public static String str = "A str";

    static {
        System.out.println("B Static Block");
    }
}

这样的话,结果 这里结果很明显了,也打印了B Static Block

public class Test_2 {


    public static void main(String[] args) {
        Test_2_C arrs[] = new Test_2_C[1];
    }
}

class Test_2_A{


    static {
        System.out.println("A Static Block");
    }
}

class Test_2_B extends Test_2_A{
    //public static String str = "B str";
    public static String str = "A str";

    static {
        System.out.println("B Static Block");
    }
}

class Test_2_C{
    static {
        System.out.println("C Static Block");
    }
}

结果 没有输出,因为定义数组只是定义一个数据类型

public class Test_2 {

    public static void main(String[] args) {
        System.out.println(Test_2_D.str);
    }
}

class Test_2_A{


    static {
        System.out.println("A Static Block");
    }
}

class Test_2_B extends Test_2_A{
    //public static String str = "B str";
    public static String str = "A str";

    static {
        System.out.println("B Static Block");
    }
}

class Test_2_C{
    static {
        System.out.println("C Static Block");
    }
}

class Test_2_D{
    public static final String str = "A str";

    static {
        System.out.println("D Static Block");
    }
}

结果: 就只打印了A str,这是怎么回事? 这里我们用javap -v查看一下

public class com.zzz.Test_2
  minor version: 0
  major version: 52
  flags: (0x0021) ACC_PUBLIC, ACC_SUPER
  this_class: #6                          // com/zzz/Test_2
  super_class: #7                         // java/lang/Object
  interfaces: 0, fields: 0, methods: 2, attributes: 1
Constant pool:
   #1 = Methodref          #7.#21         // java/lang/Object."<init>":()V
   #2 = Fieldref           #22.#23        // java/lang/System.out:Ljava/io/PrintStream;
   #3 = Class              #24            // com/zzz/Test_2_D
   #4 = String             #25            // A str
   #5 = Methodref          #26.#27        // java/io/PrintStream.println:(Ljava/lang/String;)V
   #6 = Class              #28            // com/zzz/Test_2
   #7 = Class              #29            // java/lang/Object
   #8 = Utf8               <init>
   #9 = Utf8               ()V
  #10 = Utf8               Code
  #11 = Utf8               LineNumberTable
  #12 = Utf8               LocalVariableTable
  #13 = Utf8               this
  #14 = Utf8               Lcom/zzz/Test_2;
  #15 = Utf8               main
  #16 = Utf8               ([Ljava/lang/String;)V
  #17 = Utf8               args
  #18 = Utf8               [Ljava/lang/String;
  #19 = Utf8               SourceFile
  #20 = Utf8               Test_2.java
  #21 = NameAndType        #8:#9          // "<init>":()V
  #22 = Class              #30            // java/lang/System
  #23 = NameAndType        #31:#32        // out:Ljava/io/PrintStream;
  #24 = Utf8               com/zzz/Test_2_D
  #25 = Utf8               A str
  #26 = Class              #33            // java/io/PrintStream
  #27 = NameAndType        #34:#35        // println:(Ljava/lang/String;)V
  #28 = Utf8               com/zzz/Test_2
  #29 = Utf8               java/lang/Object
  #30 = Utf8               java/lang/System
  #31 = Utf8               out
  #32 = Utf8               Ljava/io/PrintStream;
  #33 = Utf8               java/io/PrintStream
  #34 = Utf8               println
  #35 = Utf8               (Ljava/lang/String;)V
{
  public com.zzz.Test_2();
    descriptor: ()V
    flags: (0x0001) ACC_PUBLIC
    Code:
      stack=1, locals=1, args_size=1
         0: aload_0
         1: invokespecial #1                  // Method java/lang/Object."<init>":()V
         4: return
      LineNumberTable:
        line 3: 0
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       5     0  this   Lcom/zzz/Test_2;

  public static void main(java.lang.String[]);
    descriptor: ([Ljava/lang/String;)V
    flags: (0x0009) ACC_PUBLIC, ACC_STATIC
    Code:
      stack=2, locals=1, args_size=1
         0: getstatic     #2                  // Field java/lang/System.out:Ljava/io/PrintStream;
         3: ldc           #4                  // String A str
         5: invokevirtual #5                  // Method java/io/PrintStream.println:(Ljava/lang/String;)V
         8: return
      LineNumberTable:
        line 6: 0
        line 7: 8
      LocalVariableTable:
        Start  Length  Slot  Name   Signature
            0       9     0  args   [Ljava/lang/String;
}
SourceFile: "Test_2.java"

#4 = String #25 // A str 通过这个可以看出,这里指向的是常量值,这里也就是将常量str写入了Test_6的常量池中

public class Test_2 {

    public static void main(String[] args) {
        System.out.println(Test_2_E.uuid);
    }
}


class Test_2_E{
    public static final String uuid = UUID.randomUUID().toString();

    static {
        System.out.println("E Static Block");
    }
}

结果:

这里和上一个str的不同点是str是一个常量,这里虽然也是final修饰了,但是因为uuid需要动态生成,所以涉及到Test_2_E的主动使用

public class Test_2 {

    static {
        System.out.println("Static Block");
    }

    public static void main(String[] args) throws ClassNotFoundException {
        Class<?> clazz = Class.forName("com.zzz.Test_2_A");
        
    }
}

读取静态变量的底层实现

public class Test_3 {

    public static void main(String[] args) {
        System.out.println(Test_3_B.str);
        while (true);
    }
}

class Test_3_A{
    public static String str = "A str";

    static {
        System.out.println("A Static Block");
    }
}

class Test_3_B extends Test_3_A{
    static {
        System.out.println("B Static Block");
    }
}

运行这段代码,控制台输入jps,使用HSDB,找到Test_3_A 这里我们可以看到,这个str的指针是存储在InstanceMirrorKlass

静态变量str的值是存放在StringTable中(也就是之前讲的字符串常量池)

看下Test_3_B有没有str 事实证明是没有的

实现思路:

  1. 先去Test_3_B的镜像类中去取,如果有直接返回,如果没有,会沿着继承链将请求往上抛。很明显,这种算法的性能随着继承的death而上升,算法复杂度为O(n)

  2. 借助另外的数据结构实现,使用k-v的格式存储,查询性能为O(1)

Hotspot就是使用的第二种方式;借助另外的数据结构ConstantPoolCache,常量池类ConstantPool中有个属性_cache指向了这个结构。每一条数据对应一个类ConstantPoolCacheEntry

ConstantPoolCache主要用于存储某些字节码指令所需的解析(resolve)好的常量项,例如给[get | put]static,[get|put]field,invoke[static|special|virtual|interface|dynamic]等指令对应的常量池项用。

ConstantPoolCacheEntry

常量池缓存是为常量池预留的运行时数据结构。保存所有字段访问和调用字节码的解释器运行时信息。缓存是在类被积极使用之前创建和初始化的。每个缓存项在解析时被填充

ConstantPoolCacheEntry* base() const           { 
  return (ConstantPoolCacheEntry*)((address)this + in_bytes(base_offset()));
}

这个公式的意思是ConstantPoolCache对象的地址加上ConstantPoolCache对象的内存大小

如何读取

\openjdk\hotspot\src\share\vm\interpreter\bytecodeInterpreter.cpp

CASE(_getstatic):
        {
          u2 index;
          ConstantPoolCacheEntry* cache;
          index = Bytes::get_native_u2(pc+1);

          // QQQ Need to make this as inlined as possible. Probably need to
          // split all the bytecode cases out so c++ compiler has a chance
          // for constant prop to fold everything possible away.

          cache = cp->entry_at(index);
          if (!cache->is_resolved((Bytecodes::Code)opcode)) {
            CALL_VM(InterpreterRuntime::resolve_get_put(THREAD, (Bytecodes::Code)opcode),
                    handle_exception);
            cache = cp->entry_at(index);
          }
……

从代码中可以看出,是直接去获取ConstantPoolCacheEntry