JVM类解析之静态常量池解析(一)

164 阅读5分钟

在java中, 首先javac将.java文件编译成.class的字节码文件后,再由JVM去完成字节码的进行解析,首先可以参考JVM的的规范(参考链接看下一个字节码文件的构成部分如下图

image.png

类文件重要以下几个部分: 常量池、父类、接口、字段、方法的解析,本文主要分析JVM对字节码文件中静态常量池的解析。

JVM对常量池的解析是在share/classfile/ClassFileParser.cpp文件中中 parse_constant_pool_entries方法。

parse_constant_pool_entries方法主体

void ClassFileParser::parse_constant_pool_entries(const ClassFileStream* const stream,   ConstantPool* cp,  const int length,TRAPS) {
  assert(stream != NULL, "invariant");
  assert(cp != NULL, "invariant");

  const ClassFileStream cfs1 = *stream;
  const ClassFileStream* const cfs = &cfs1;

  // Used for batching symbol allocations.
  const char* names[SymbolTable::symbol_alloc_batch_size];
  int lengths[SymbolTable::symbol_alloc_batch_size];
  int indices[SymbolTable::symbol_alloc_batch_size];
  unsigned int hashValues[SymbolTable::symbol_alloc_batch_size];
  int names_count = 0;

  // parsing  Index 0 is unused
  for (int index = 1; index < length; index++) {
    // Each of the following case guarantees one more byte in the stream
    // for the following tag or the access_flags following constant pool,
    // so we don't need bounds-check for reading tag.
    const u1 tag = cfs->get_u1_fast();
    switch (tag) {
     //省略常量池项的解析,下面逐一分析
  } // end of for

  // Allocate the remaining symbols
  if (names_count > 0) {
    SymbolTable::new_symbols(_loader_data,
                             constantPoolHandle(THREAD, cp),
                             names_count,
                             names,
                             lengths,
                             indices,
                             hashValues);
  }
  // Copy _current pointer of local copy back to stream.
  assert(stream->current() == old_current, "non-exclusive use of stream");
  stream->set_current(cfs1.current());
}

这里说明下字节码文件中constant_pool_count,即是parse_constant_pool_entries方法的参数length, 这里常量池的长度索引0是乜有使用,是从1开始遍历解析每一个静态常量池的项.

解析静态常量池的项保存到ConstantPool中

首先看下JVM定义的常量池的tag的枚举

enum {
    JVM_CONSTANT_Utf8                   = 1,
    JVM_CONSTANT_Unicode                = 2, /* unused */
    JVM_CONSTANT_Integer                = 3,
    JVM_CONSTANT_Float                  = 4,
    JVM_CONSTANT_Long                   = 5,
    JVM_CONSTANT_Double                 = 6,
    JVM_CONSTANT_Class                  = 7,
    JVM_CONSTANT_String                 = 8,
    JVM_CONSTANT_Fieldref               = 9,
    JVM_CONSTANT_Methodref              = 10,
    JVM_CONSTANT_InterfaceMethodref     = 11,
    JVM_CONSTANT_NameAndType            = 12,
    JVM_CONSTANT_MethodHandle           = 15,  // JSR 292
    JVM_CONSTANT_MethodType             = 16,  // JSR 292
    JVM_CONSTANT_Dynamic                = 17,
    JVM_CONSTANT_InvokeDynamic          = 18,
};

主要就分下其中重要的几个常量池项的解析

JVM_CONSTANT_Utf8的解析

  1. 首先获取utf-8的长度,当前的utf-8的buffer的位置,并校验流的长度,并跳过utf8_length的字节长度.
  2. 调用SymbolTable的lookup_only方法,去符号表里是否存在当前utf-8的字符串,
  • 如果不存在,则result返回为NULL, 将utf8_buffer的char指针赋值char指针数据变零name中 当前常量池的 指针索引存在int的数组变量indices中,当前utf-8字符的hash值保存到 unsigned int的数组变量hashValues中并将name_count计数加1, 判断如果name_count是否等于SymbolTable::symbol_alloc_batch_size(其定义的值是8),也就是8个utf-8的字符为一个批次去SymbolTable创建符号表(即调用用 SymbolTable::new_symbol进行批量创建符号表)
  • 如果存在,则直接调用ConstantPool的symbol_at_put方法,保存已经在符号表里创建了Symbol对象。
case JVM_CONSTANT_Utf8 : {
        cfs->guarantee_more(2, CHECK);  // utf8_length
        u2  utf8_length = cfs->get_u2_fast();
        const u1* utf8_buffer = cfs->current();
        assert(utf8_buffer != NULL, "null utf8 buffer");
        // Got utf8 string, guarantee utf8_length+1 bytes, set stream position forward.
        cfs->guarantee_more(utf8_length+1, CHECK);  // utf8 string, tag/access_flags
        cfs->skip_u1_fast(utf8_length);

        unsigned int hash;
        Symbol* const result = SymbolTable::lookup_only((const char*)utf8_buffer,
                                                        utf8_length,
                                                        hash);
        if (result == NULL) {
          names[names_count] = (const char*)utf8_buffer;
          lengths[names_count] = utf8_length;
          indices[names_count] = index;
          hashValues[names_count++] = hash;
          if (names_count == SymbolTable::symbol_alloc_batch_size) {
            SymbolTable::new_symbols(_loader_data,
                                     constantPoolHandle(THREAD, cp),
                                     names_count,
                                     names,
                                     lengths,
                                     indices,
                                     hashValues);
            names_count = 0;
          }
        } else {
          cp->symbol_at_put(index, result);
        }
        break;
      }

那么我们接下来看下ConstantPool的symbol_at_put方法

  1. 调用tag_at_put方法,在ConstantPool类中 Array* 类型的变量_tags的常量池索引下保存JVM_CONSTANT_Utf8的枚举值
  2. 调用symbol_at_addr方法,将在ConstantPool的对象的尾部的常量池索引中保存Symbol对象的指针.
  void symbol_at_put(int which, Symbol* s) {
    assert(s->refcount() != 0, "should have nonzero refcount");
    tag_at_put(which, JVM_CONSTANT_Utf8);
    *symbol_at_addr(which) = s;
  }
  void tag_at_put(int which, jbyte t)     
  {
    tags()->at_put(which, t);
  }
  Symbol** symbol_at_addr(int which) const {
    assert(is_within_bounds(which), "index out of bounds");
    return (Symbol**) &base()[which];
  }
    
  intptr_t* base() const {
     return (intptr_t*) (((char*) this) + sizeof(ConstantPool));
 }

JVM_CONSTANT_Class的解析

类的常量池的解析: 调用ClassFileStream对象get_u2_fast获取class类描述的名称长度,然后调用ConstantPool的klass_index_at_put方法将类名的index保存在ConstantPool的尾部的jint*类型的常量池索引下。

      case JVM_CONSTANT_Class : {
        cfs->guarantee_more(3, CHECK);  // name_index, tag/access_flags
        const u2 name_index = cfs->get_u2_fast();
        cp->klass_index_at_put(index, name_index);
        break;
      }
    ConstantPool的klass_index_at_put方法
   void klass_index_at_put(int which, int name_index) {
    tag_at_put(which, JVM_CONSTANT_ClassIndex);
    *int_at_addr(which) = name_index;
  }

JVM_CONSTANT_Fieldref的解析

field的常量池的解析: 分别获取class_index和name_and_type_index两个索引, 还是将在 ConstantPool的_tags的常量池索引下保存为JVM_CONSTANT_Fieldref枚举, 然后在ConstantPool的对象尾部的intptr_t*地址类型的常量池索引下保存name_and_type_index和class_index的值,这里是name_and_type_index保存在int类型的高16位,class_index保存在低16位。

      case JVM_CONSTANT_Fieldref: {
        cfs->guarantee_more(5, CHECK);  // class_index, name_and_type_index, tag/access_flags
        const u2 class_index = cfs->get_u2_fast();
        const u2 name_and_type_index = cfs->get_u2_fast();
        cp->field_at_put(index, class_index, name_and_type_index);
        break;
      }
   ConstantPool的field_at_put方法
    void field_at_put(int which, int class_index, int name_and_type_index) {
    tag_at_put(which, JVM_CONSTANT_Fieldref);
    *int_at_addr(which) = ((jint) name_and_type_index<<16) | class_index;
  }

JVM_CONSTANT_Methodref的解析也是类似的。

JVM_CONSTANT_String的解析

字符串的解析: 获取字符串的常量池索引string_index,然后保存 ConstantPool的 _tags常量池索引下保存为JVM_CONSTANT_String枚举,然后在ConstantPool的对象尾部的intptr_t*地址类型的常量池索引下保存string_index的值

    case JVM_CONSTANT_String : {
        cfs->guarantee_more(3, CHECK);  // string_index, tag/access_flags
        const u2 string_index = cfs->get_u2_fast();
        cp->string_index_at_put(index, string_index);
        break;
      }

解析完常量池后, ConstantPool的对象的结构如下

image.png

总结
本文主要就java的.lcass字节码文件中静常量池的解析,并将tag保存在ConstantPool对象的 Array* 类型变零_tags,对应的信息保存在ConstantPool对象的尾部的intptr_t*的对常量池索引中。