在java中, 首先javac将.java文件编译成.class的字节码文件后,再由JVM去完成字节码的进行解析,首先可以参考JVM的的规范(参考链接看下一个字节码文件的构成部分如下图
类文件重要以下几个部分: 常量池、父类、接口、字段、方法的解析,本文主要分析JVM对字节码文件中静态常量池的解析。
JVM对常量池的解析是在share/classfile/ClassFileParser.cpp文件中中 parse_constant_pool_entries方法。
parse_constant_pool_entries方法主体
void ClassFileParser::parse_constant_pool_entries(const ClassFileStream* const stream, ConstantPool* cp, const int length,TRAPS) {
assert(stream != NULL, "invariant");
assert(cp != NULL, "invariant");
const ClassFileStream cfs1 = *stream;
const ClassFileStream* const cfs = &cfs1;
// Used for batching symbol allocations.
const char* names[SymbolTable::symbol_alloc_batch_size];
int lengths[SymbolTable::symbol_alloc_batch_size];
int indices[SymbolTable::symbol_alloc_batch_size];
unsigned int hashValues[SymbolTable::symbol_alloc_batch_size];
int names_count = 0;
// parsing Index 0 is unused
for (int index = 1; index < length; index++) {
// Each of the following case guarantees one more byte in the stream
// for the following tag or the access_flags following constant pool,
// so we don't need bounds-check for reading tag.
const u1 tag = cfs->get_u1_fast();
switch (tag) {
//省略常量池项的解析,下面逐一分析
} // end of for
// Allocate the remaining symbols
if (names_count > 0) {
SymbolTable::new_symbols(_loader_data,
constantPoolHandle(THREAD, cp),
names_count,
names,
lengths,
indices,
hashValues);
}
// Copy _current pointer of local copy back to stream.
assert(stream->current() == old_current, "non-exclusive use of stream");
stream->set_current(cfs1.current());
}
这里说明下字节码文件中constant_pool_count,即是parse_constant_pool_entries方法的参数length, 这里常量池的长度索引0是乜有使用,是从1开始遍历解析每一个静态常量池的项.
解析静态常量池的项保存到ConstantPool中
首先看下JVM定义的常量池的tag的枚举
enum {
JVM_CONSTANT_Utf8 = 1,
JVM_CONSTANT_Unicode = 2, /* unused */
JVM_CONSTANT_Integer = 3,
JVM_CONSTANT_Float = 4,
JVM_CONSTANT_Long = 5,
JVM_CONSTANT_Double = 6,
JVM_CONSTANT_Class = 7,
JVM_CONSTANT_String = 8,
JVM_CONSTANT_Fieldref = 9,
JVM_CONSTANT_Methodref = 10,
JVM_CONSTANT_InterfaceMethodref = 11,
JVM_CONSTANT_NameAndType = 12,
JVM_CONSTANT_MethodHandle = 15, // JSR 292
JVM_CONSTANT_MethodType = 16, // JSR 292
JVM_CONSTANT_Dynamic = 17,
JVM_CONSTANT_InvokeDynamic = 18,
};
主要就分下其中重要的几个常量池项的解析
JVM_CONSTANT_Utf8的解析
- 首先获取utf-8的长度,当前的utf-8的buffer的位置,并校验流的长度,并跳过utf8_length的字节长度.
- 调用SymbolTable的lookup_only方法,去符号表里是否存在当前utf-8的字符串,
- 如果不存在,则result返回为NULL, 将utf8_buffer的char指针赋值char指针数据变零name中 当前常量池的 指针索引存在int的数组变量indices中,当前utf-8字符的hash值保存到 unsigned int的数组变量hashValues中并将name_count计数加1, 判断如果name_count是否等于SymbolTable::symbol_alloc_batch_size(其定义的值是8),也就是8个utf-8的字符为一个批次去SymbolTable创建符号表(即调用用 SymbolTable::new_symbol进行批量创建符号表)
- 如果存在,则直接调用ConstantPool的symbol_at_put方法,保存已经在符号表里创建了Symbol对象。
case JVM_CONSTANT_Utf8 : {
cfs->guarantee_more(2, CHECK); // utf8_length
u2 utf8_length = cfs->get_u2_fast();
const u1* utf8_buffer = cfs->current();
assert(utf8_buffer != NULL, "null utf8 buffer");
// Got utf8 string, guarantee utf8_length+1 bytes, set stream position forward.
cfs->guarantee_more(utf8_length+1, CHECK); // utf8 string, tag/access_flags
cfs->skip_u1_fast(utf8_length);
unsigned int hash;
Symbol* const result = SymbolTable::lookup_only((const char*)utf8_buffer,
utf8_length,
hash);
if (result == NULL) {
names[names_count] = (const char*)utf8_buffer;
lengths[names_count] = utf8_length;
indices[names_count] = index;
hashValues[names_count++] = hash;
if (names_count == SymbolTable::symbol_alloc_batch_size) {
SymbolTable::new_symbols(_loader_data,
constantPoolHandle(THREAD, cp),
names_count,
names,
lengths,
indices,
hashValues);
names_count = 0;
}
} else {
cp->symbol_at_put(index, result);
}
break;
}
那么我们接下来看下ConstantPool的symbol_at_put方法
- 调用tag_at_put方法,在ConstantPool类中 Array* 类型的变量_tags的常量池索引下保存JVM_CONSTANT_Utf8的枚举值
- 调用symbol_at_addr方法,将在ConstantPool的对象的尾部的常量池索引中保存Symbol对象的指针.
void symbol_at_put(int which, Symbol* s) {
assert(s->refcount() != 0, "should have nonzero refcount");
tag_at_put(which, JVM_CONSTANT_Utf8);
*symbol_at_addr(which) = s;
}
void tag_at_put(int which, jbyte t)
{
tags()->at_put(which, t);
}
Symbol** symbol_at_addr(int which) const {
assert(is_within_bounds(which), "index out of bounds");
return (Symbol**) &base()[which];
}
intptr_t* base() const {
return (intptr_t*) (((char*) this) + sizeof(ConstantPool));
}
JVM_CONSTANT_Class的解析
类的常量池的解析: 调用ClassFileStream对象get_u2_fast获取class类描述的名称长度,然后调用ConstantPool的klass_index_at_put方法将类名的index保存在ConstantPool的尾部的jint*类型的常量池索引下。
case JVM_CONSTANT_Class : {
cfs->guarantee_more(3, CHECK); // name_index, tag/access_flags
const u2 name_index = cfs->get_u2_fast();
cp->klass_index_at_put(index, name_index);
break;
}
ConstantPool的klass_index_at_put方法
void klass_index_at_put(int which, int name_index) {
tag_at_put(which, JVM_CONSTANT_ClassIndex);
*int_at_addr(which) = name_index;
}
JVM_CONSTANT_Fieldref的解析
field的常量池的解析: 分别获取class_index和name_and_type_index两个索引, 还是将在 ConstantPool的_tags的常量池索引下保存为JVM_CONSTANT_Fieldref枚举, 然后在ConstantPool的对象尾部的intptr_t*地址类型的常量池索引下保存name_and_type_index和class_index的值,这里是name_and_type_index保存在int类型的高16位,class_index保存在低16位。
case JVM_CONSTANT_Fieldref: {
cfs->guarantee_more(5, CHECK); // class_index, name_and_type_index, tag/access_flags
const u2 class_index = cfs->get_u2_fast();
const u2 name_and_type_index = cfs->get_u2_fast();
cp->field_at_put(index, class_index, name_and_type_index);
break;
}
ConstantPool的field_at_put方法
void field_at_put(int which, int class_index, int name_and_type_index) {
tag_at_put(which, JVM_CONSTANT_Fieldref);
*int_at_addr(which) = ((jint) name_and_type_index<<16) | class_index;
}
JVM_CONSTANT_Methodref的解析也是类似的。
JVM_CONSTANT_String的解析
字符串的解析: 获取字符串的常量池索引string_index,然后保存 ConstantPool的 _tags常量池索引下保存为JVM_CONSTANT_String枚举,然后在ConstantPool的对象尾部的intptr_t*地址类型的常量池索引下保存string_index的值
case JVM_CONSTANT_String : {
cfs->guarantee_more(3, CHECK); // string_index, tag/access_flags
const u2 string_index = cfs->get_u2_fast();
cp->string_index_at_put(index, string_index);
break;
}
解析完常量池后, ConstantPool的对象的结构如下
总结
本文主要就java的.lcass字节码文件中静常量池的解析,并将tag保存在ConstantPool对象的 Array* 类型变零_tags,对应的信息保存在ConstantPool对象的尾部的intptr_t*的对常量池索引中。