Android ART 虚拟机 AOT 和 JIT 内联机制(inline-cache)浅析
inline 简介
在 C 或者 C++ 语言中,有一种 inline_function 函数,顾名思义,inline_function 就是在编译的时候,直接用函数的代码替换函数调用,例如:
函数声明定义:
class InlineCache {
public:
static constexpr uint8_t kIndividualCacheSize = 5;
private:
uint32_t dex_pc_;
GcRoot<mirror::Class> classes_[kIndividualCacheSize];
friend class jit::JitCodeCache;
friend class ProfilingInfo;
DISALLOW_COPY_AND_ASSIGN(InlineCache);
};
void ArtInterpreterToInterpreterBridge(){}
void EnterInterpreterFromDeoptimize() {}
void EnterInterpreterFromInvoke(){}
JValue EnterInterpreterFromEntryPoint(){}
void ExecuteSwitchImplCpp(SwitchImplContext* ctx) {
inline void swap(int *m, int *n)
{
int tmp = *m;
*m = *n;
*n = tmp;
}
函数调用:
swap(&x, &y);
编译之后:
int tmp = x;
x = y;
y = tmp;
通过代码内联就可以省去一次函数调用,提高代码的执行效率。
inline_caching 简介
inline_caching 也是一种类似的优化方式,只不过 inline_caching 是专门针对动态类型语言做的优化。动态类型的语言在运行时存在一个方法查找的过程,如果每次调用方法都要进行方法查找,对于代码的运行效率影响还是比较大的,inline_caching 正是为此而生,其基本实现逻辑是:
-
首次调用方法,则进入方法查找的过程,然后将结果缓存到 CallSite。
-
非首次调用方法,则先在 CallSite 里查找是否有匹配的缓存;如果没有再进行方法查找,并再次缓存到 CallSite。
同时,由于无法确定某个类或接口总共会有多少了派生类,因此也不能无限制的缓存所有的情况,需要有一个合适缓存策略,这就是 inline_caching 的四种状态:
-
uninitialized:已缓存 0 个派生类的查找结果
-
monomorphic:有且仅缓存 1 个派生类的查找结果
-
polymorphic:有且缓存 2个以上,N个之内 个派生类的查找结果
-
megamorphic:缓存 N(及N个以上) 派生类的查找结果
其中 N 为最大缓存限制,超过 N 个则不再缓存查找结果。事实上,Android ART 虚拟机在 AOT 或者 JIT 编译的时候,也仅会针对 monomorphic 和 polymorphic 进行内联,对于 uninitialized 和 megamorphic 均不会进行处理。
ART 虚拟机 inline caching 的实现方式
上文说到,AOT 或 JIT 编译的时候,仅对 monomorphic 和 polymorphic 进行内联,那么是如何进行内联的呢?
- 在解释器解释执行的时候,记录统计方法的执行次数,达到一定次数则标记为 warm-method
- 记录下 warm-method 内 virtual-method 和 interface-method 方法调用的查找结果,并保存到 profile 文件,一般在 /data/misc/profiles/..
- 在 AOT 或 JIT 编译的时候,通过添加类型检查指令和分支指令,将 Callee(inlined-method) 方法的代码内联(替换)至 Caller(outer-method) 方法
举个栗子:
Implement polymorphic inlining.
// For example, before:
HInvokeVirtual
// After:
If (receiver == Foo) {
// inlined code.
} else if (receiver == Bar) {
// inlined code
} else {
// HInvokeVirtual or HDeoptimize(receiver != Baz)
}
另外也可以看到,修改之后的指令逻辑,最后有一个容错处理分支,也就是在未匹配到任何类型之后,会执行原来未优化的指令,从而保证代码不会产生异常。
记录过程
创建 ProfilingInfo
static inline JValue Execute(
Thread* self,
const CodeItemDataAccessor& accessor,
ShadowFrame& shadow_frame,
JValue result_register,
bool stay_in_interpreter = false) REQUIRES_SHARED(Locks::mutator_lock_) {
DCHECK(!shadow_frame.GetMethod()->IsAbstract());
DCHECK(!shadow_frame.GetMethod()->IsNative());
if (LIKELY(shadow_frame.GetDexPC() == 0)) { // Entering the method, but not via deoptimization.
instrumentation::Instrumentation* instrumentation = Runtime::Current()->GetInstrumentation();
ArtMethod *method = shadow_frame.GetMethod();
......
if (!stay_in_interpreter) {
jit::Jit* jit = Runtime::Current()->GetJit();
if (jit != nullptr) {
// 分发 MethodEnter 事件
jit->MethodEntered(self, shadow_frame.GetMethod());
if (jit->CanInvokeCompiledCode(method)) {
JValue result;
// Pop the shadow frame before calling into compiled code.
self->PopShadowFrame();
// Calculate the offset of the first input reg. The input registers are in the high regs.
// It's ok to access the code item here since JIT code will have been touched by the
// interpreter and compiler already.
uint16_t arg_offset = accessor.RegistersSize() - accessor.InsSize();
ArtInterpreterToCompiledCodeBridge(self, nullptr, &shadow_frame, arg_offset, &result);
// Push the shadow frame back as the caller will expect it.
self->PushShadowFrame(&shadow_frame);
return result;
}
}
}
}
......
}
void Jit::MethodEntered(Thread* thread, ArtMethod* method) {
Runtime* runtime = Runtime::Current();
if (UNLIKELY(runtime->UseJitCompilation() && runtime->GetJit()->JitAtFirstUse())) {
// 首次调用就启用JIT编译,默认该配置为false
// The compiler requires a ProfilingInfo object.
ProfilingInfo::Create(thread,
method->GetInterfaceMethodIfProxy(kRuntimePointerSize),
/* retry_allocation */ true);
JitCompileTask compile_task(method, JitCompileTask::kCompile);
compile_task.Run(thread);
return;
}
ProfilingInfo* profiling_info = method->GetProfilingInfo(kRuntimePointerSize);
// Update the entrypoint if the ProfilingInfo has one. The interpreter will call it
// instead of interpreting the method.
if ((profiling_info != nullptr) && (profiling_info->GetSavedEntryPoint() != nullptr)) {
// 已经记录并编译,更新方法信息
Runtime::Current()->GetInstrumentation()->UpdateMethodsCode(
method, profiling_info->GetSavedEntryPoint());
} else {
// 更新或创建 ProfilingInfo 对象
AddSamples(thread, method, 1, /* with_backedges */false);
}
}
class JitCompileTask FINAL : public Task {
void SetCounter(int16_t hotness_count) {
hotness_count_ = hotness_count;
}
void Jit::AddSamples(Thread* self, ArtMethod* method, uint16_t count, bool with_backedges) {
......
if (method->IsClassInitializer() || !method->IsCompilable()) {
// We do not want to compile such methods.
return;
}
if (hot_method_threshold_ == 0) {
// 默认为 10000,见https://android.googlesource.com/platform/art/+/refs/heads/pie-r2-release/runtime/jit/jit.cc
// Tests might request JIT on first use (compiled synchronously in the interpreter).
return;
}
......
// starting_count 为方法执行的次数记录
int32_t starting_count = method->GetCounter();
if (Jit::ShouldUsePriorityThreadWeight(self)) {
// 线程权重计算,主线程权重一般为:hot_method_threshold / 20
// 也就是说在主线程重复执行某个方法超过 20 次,就会达到JIT阈值
count *= priority_thread_weight_;
}
// new_count 为计算上权重之后,目前方法的总执行次数
int32_t new_count = starting_count + count; // int32 here to avoid wrap-around;
// Note: Native method have no "warm" state or profiling info.
if (LIKELY(!method->IsNative()) && starting_count < warm_method_threshold_) {
if ((new_count >= warm_method_threshold_) &&
(method->GetProfilingInfo(kRuntimePointerSize) == nullptr)) {
// 方法执行次数达到 warm_method_threshold ,默认为 hot_method_threshold_ / 2
// 则创建 ProfilingInfo 对象
bool success = ProfilingInfo::Create(self, method, /* retry_allocation */ false);
if (success) {
VLOG(jit) << "Start profiling " << method->PrettyMethod();
}
......
if (!success) {
// We failed allocating. Instead of doing the collection on the Java thread, we push
// an allocation to a compiler thread, that will do the collection.
thread_pool_->AddTask(self, new JitCompileTask(method, JitCompileTask::kAllocateProfile));
}
}
// Avoid jumping more than one state at a time.
new_count = std::min(new_count, hot_method_threshold_ - 1);
} else if (use_jit_compilation_) {
if (starting_count < hot_method_threshold_) {
if ((new_count >= hot_method_threshold_) &&
!code_cache_->ContainsPc(method->GetEntryPointFromQuickCompiledCode())) {
DCHECK(thread_pool_ != nullptr);
// 方法执行次数达到 hot_method_threshold_
// 触发JIT编译任务
thread_pool_->AddTask(self, new JitCompileTask(method, JitCompileTask::kCompile));
}
// Avoid jumping more than one state at a time.
new_count = std::min(new_count, osr_method_threshold_ - 1);
} else if (starting_count < osr_method_threshold_) {
if (!with_backedges) {
// If the samples don't contain any back edge, we don't increment the hotness.
return;
}
DCHECK(!method->IsNative()); // No back edges reported for native methods.
if ((new_count >= osr_method_threshold_) && !code_cache_->IsOsrCompiled(method)) {
DCHECK(thread_pool_ != nullptr);
thread_pool_->AddTask(self, new JitCompileTask(method, JitCompileTask::kCompileOsr));
}
}
}
// Update hotness counter
// 更新方法执行次数信息
method->SetCounter(new_count);
}
bool ProfilingInfo::Create(Thread* self, ArtMethod* method, bool retry_allocation) {
// 记录方法中虚方法或者接口方法的调用信息
// 后续的 inline-cache 正是针对这些方法做的缓存
// Walk over the dex instructions of the method and keep track of
// instructions we are interested in profiling.
DCHECK(!method->IsNative());
std::vector<uint32_t> entries;
for (const DexInstructionPcPair& inst : method->DexInstructions()) {
switch (inst->Opcode()) {
case Instruction::INVOKE_VIRTUAL:
case Instruction::INVOKE_VIRTUAL_RANGE:
case Instruction::INVOKE_VIRTUAL_QUICK:
case Instruction::INVOKE_VIRTUAL_RANGE_QUICK:
case Instruction::INVOKE_INTERFACE:
case Instruction::INVOKE_INTERFACE_RANGE:
entries.push_back(inst.DexPc());
break;
default:
break;
}
}
// We always create a `ProfilingInfo` object, even if there is no instruction we are
// interested in. The JIT code cache internally uses it.
// Allocate the `ProfilingInfo` object int the JIT's data space.
jit::JitCodeCache* code_cache = Runtime::Current()->GetJit()->GetCodeCache();
return code_cache->AddProfilingInfo(self, method, entries, retry_allocation) != nullptr;
}
-
当 ART 虚拟机通过解释器执行代码的时候,每一次执行方法的时候,即会向 jit 分发 MethodEnter 事件;
-
jit 接收到该事件之后,会计算并保存方法的总执行次数;
-
如果方法的总执行次数达到 warm_method_threshold 就会创建一个 ProfilingInfo 对象,用以缓存方法分发的信息;
-
如果方法的总执行次数达到 hot_method_threshold,则会触发 JIT 编译任务;
缓存方法调用信息
// Handles all invoke-XXX/range instructions except for invoke-polymorphic[/range].
// Returns true on success, otherwise throws an exception and returns false.
template<InvokeType type, bool is_range, bool do_access_check>
static inline bool DoInvoke(Thread* self,
ShadowFrame& shadow_frame,
const Instruction* inst,
uint16_t inst_data,
JValue* result) {
// Make sure to check for async exceptions before anything else.
if (UNLIKELY(self->ObserveAsyncException())) {
return false;
}
const uint32_t method_idx = (is_range) ? inst->VRegB_3rc() : inst->VRegB_35c();
const uint32_t vregC = (is_range) ? inst->VRegC_3rc() : inst->VRegC_35c();
ObjPtr<mirror::Object> receiver =
(type == kStatic) ? nullptr : shadow_frame.GetVRegReference(vregC);
ArtMethod* sf_method = shadow_frame.GetMethod();
ArtMethod* const called_method = FindMethodFromCode<type, do_access_check>(
method_idx, &receiver, sf_method, self);
// The shadow frame should already be pushed, so we don't need to update it.
if (UNLIKELY(called_method == nullptr)) {
CHECK(self->IsExceptionPending());
result->SetJ(0);
return false;
} else if (UNLIKELY(!called_method->IsInvokable())) {
called_method->ThrowInvocationTimeError();
result->SetJ(0);
return false;
} else {
jit::Jit* jit = Runtime::Current()->GetJit();
if (jit != nullptr && (type == kVirtual || type == kInterface)) {
// 分发虚方法或者接口方法的执行事件
jit->InvokeVirtualOrInterface(receiver, sf_method, shadow_frame.GetDexPC(), called_method);
}
// TODO: Remove the InvokeVirtualOrInterface instrumentation, as it was only used by the JIT.
if (type == kVirtual || type == kInterface) {
instrumentation::Instrumentation* instrumentation = Runtime::Current()->GetInstrumentation();
if (UNLIKELY(instrumentation->HasInvokeVirtualOrInterfaceListeners())) {
instrumentation->InvokeVirtualOrInterface(
self, receiver.Ptr(), sf_method, shadow_frame.GetDexPC(), called_method);
}
}
return DoCall<is_range, do_access_check>(called_method, self, shadow_frame, inst, inst_data,
result);
}
}
void Jit::InvokeVirtualOrInterface(ObjPtr<mirror::Object> this_object,
ArtMethod* caller,
uint32_t dex_pc,
ArtMethod* callee ATTRIBUTE_UNUSED) {
ScopedAssertNoThreadSuspension ants(__FUNCTION__);
DCHECK(this_object != nullptr);
ProfilingInfo* info = caller->GetProfilingInfo(kRuntimePointerSize);
if (info != nullptr) {
// 更新记录虚方法或接口方法的执行信息
info->AddInvokeInfo(dex_pc, this_object->GetClass());
}
}
void ProfilingInfo::AddInvokeInfo(uint32_t dex_pc, mirror::Class* cls) {
InlineCache* cache = GetInlineCache(dex_pc);
// 将虚方法或接口方法,实际调用的类信息记录在 cache 中
// 这里为什么只需要记录类信息呢?因为在 dex2oat 的时候,通过这个类就可以找到真正调用的方法
for (size_t i = 0; i < InlineCache::kIndividualCacheSize; ++i) {
mirror::Class* existing = cache->classes_[i].Read<kWithoutReadBarrier>();
mirror::Class* marked = ReadBarrier::IsMarked(existing);
if (marked == cls) {
// Receiver type is already in the cache, nothing else to do.
return;
} else if (marked == nullptr) {
// Cache entry is empty, try to put `cls` in it.
// Note: it's ok to spin on 'existing' here: if 'existing' is not null, that means
// it is a stalled heap address, which will only be cleared during SweepSystemWeaks,
// *after* this thread hits a suspend point.
GcRoot<mirror::Class> expected_root(existing);
GcRoot<mirror::Class> desired_root(cls);
auto atomic_root = reinterpret_cast<Atomic<GcRoot<mirror::Class>>*>(&cache->classes_[i]);
if (!atomic_root->CompareAndSetStrongSequentiallyConsistent(expected_root, desired_root)) {
// Some other thread put a class in the cache, continue iteration starting at this
// entry in case the entry contains `cls`.
--i;
} else {
// We successfully set `cls`, just return.
return;
}
}
}
// Unsuccessfull - cache is full, making it megamorphic. We do not DCHECK it though,
// as the garbage collector might clear the entries concurrently.
}
目标方法内代码指令执行的时候,如果是虚方法或者接口方法的调用,则将该调用信息记录到 InlineCache 中,主要是记录 receiver 的类型,也就是实际调用时派生类的信息。在 AOT 或 JIT 编译的时候,通过该派生类就可以找到真正调用的方法,同时也会根据该派生类信息生成类型检查指令。
保存profile文件
bool ProfileSaver::ProcessProfilingInfo(bool force_save, /*out*/uint16_t* number_of_new_methods) {
......
}
/**
* Serialization format:
* [profile_header, zipped[[profile_line_header1, profile_line_header2...],[profile_line_data1,
* profile_line_data2...]]]
* profile_header:
* magic,version,number_of_dex_files,uncompressed_size_of_zipped_data,compressed_data_size
* profile_line_header:
* dex_location,number_of_classes,methods_region_size,dex_location_checksum,num_method_ids
* profile_line_data:
* method_encoding_1,method_encoding_2...,class_id1,class_id2...,startup/post startup bitmap
* The method_encoding is:
* method_id,number_of_inline_caches,inline_cache1,inline_cache2...
* The inline_cache is:
* dex_pc,[M|dex_map_size], dex_profile_index,class_id1,class_id2...,dex_profile_index2,...
* dex_map_size is the number of dex_indeces that follows.
* Classes are grouped per their dex files and the line
* `dex_profile_index,class_id1,class_id2...,dex_profile_index2,...` encodes the
* mapping from `dex_profile_index` to the set of classes `class_id1,class_id2...`
* M stands for megamorphic or missing types and it's encoded as either
* the byte kIsMegamorphicEncoding or kIsMissingTypesEncoding.
* When present, there will be no class ids following.
**/
bool ProfileCompilationInfo::Save(int fd) {
.......
}
profile 文件保存的过程,此处不再深入,主要是保存 dex、method 和 class 的简略信息,AOT 编译时会通过读取该文件获取热代码的信息,感兴趣的同学可以自行阅读相关源码。
编译过程
bool HInliner::TryInline(HInvoke* invoke_instruction) {
if (invoke_instruction->IsInvokeUnresolved() ||
invoke_instruction->IsInvokePolymorphic()) {
return false; // Don't bother to move further if we know the method is unresolved or an
// invoke-polymorphic.
}
ScopedObjectAccess soa(Thread::Current());
uint32_t method_index = invoke_instruction->GetDexMethodIndex();
const DexFile& caller_dex_file = *caller_compilation_unit_.GetDexFile();
LOG_TRY() << caller_dex_file.PrettyMethod(method_index);
ArtMethod* resolved_method = invoke_instruction->GetResolvedMethod();
if (resolved_method == nullptr) {
DCHECK(invoke_instruction->IsInvokeStaticOrDirect());
DCHECK(invoke_instruction->AsInvokeStaticOrDirect()->IsStringInit());
LOG_FAIL_NO_STAT() << "Not inlining a String.<init> method";
return false;
}
ArtMethod* actual_method = nullptr;
// 尝试确认是否是编译时可以确定的方法
if (invoke_instruction->IsInvokeStaticOrDirect()) {
actual_method = resolved_method;
} else {
// Check if we can statically find the method.
actual_method = FindVirtualOrInterfaceTarget(invoke_instruction, resolved_method);
}
bool cha_devirtualize = false;
if (actual_method == nullptr) {
ArtMethod* method = TryCHADevirtualization(resolved_method);
if (method != nullptr) {
cha_devirtualize = true;
actual_method = method;
LOG_NOTE() << "Try CHA-based inlining of " << actual_method->PrettyMethod();
}
}
// 编译时可以确定的方法,比如 final 方法,直接尝试 inline-replace
if (actual_method != nullptr) {
// Single target.
bool result = TryInlineAndReplace(invoke_instruction,
actual_method,
ReferenceTypeInfo::CreateInvalid(),
/* do_rtp */ true,
cha_devirtualize);
if (result) {
// Successfully inlined.
if (!invoke_instruction->IsInvokeStaticOrDirect()) {
if (cha_devirtualize) {
// Add dependency due to devirtualization. We've assumed resolved_method
// has single implementation.
outermost_graph_->AddCHASingleImplementationDependency(resolved_method);
MaybeRecordStat(stats_, MethodCompilationStat::kCHAInline);
} else {
MaybeRecordStat(stats_, MethodCompilationStat::kInlinedInvokeVirtualOrInterface);
}
}
} else if (!cha_devirtualize && AlwaysThrows(compiler_driver_, actual_method)) {
// Set always throws property for non-inlined method call with single target
// (unless it was obtained through CHA, because that would imply we have
// to add the CHA dependency, which seems not worth it).
invoke_instruction->SetAlwaysThrows(true);
}
return result;
}
DCHECK(!invoke_instruction->IsInvokeStaticOrDirect());
// 编译时无法确定的方法调用,尝试 inline-cache
// Try using inline caches.
return TryInlineFromInlineCache(caller_dex_file, invoke_instruction, resolved_method);
}
bool HInliner::TryInlineFromInlineCache(const DexFile& caller_dex_file,
HInvoke* invoke_instruction,
ArtMethod* resolved_method)
REQUIRES_SHARED(Locks::mutator_lock_) {
if (Runtime::Current()->IsAotCompiler() && !kUseAOTInlineCaches) {
return false;
}
StackHandleScope<1> hs(Thread::Current());
Handle<mirror::ObjectArray<mirror::Class>> inline_cache;
InlineCacheType inline_cache_type = Runtime::Current()->IsAotCompiler()
? GetInlineCacheAOT(caller_dex_file, invoke_instruction, &hs, &inline_cache)
: GetInlineCacheJIT(invoke_instruction, &hs, &inline_cache);
switch (inline_cache_type) {
case kInlineCacheNoData: {
// 未加载到相关数据或加载失败
......
return false;
}
case kInlineCacheUninitialized: {
// Uninitialized 未统计到任何子类的方法调用
......
return false;
}
case kInlineCacheMonomorphic: {
// Monomorphic 即目前仅统计到 1 个子类方法调用
MaybeRecordStat(stats_, MethodCompilationStat::kMonomorphicCall);
if (UseOnlyPolymorphicInliningWithNoDeopt()) {
return TryInlinePolymorphicCall(invoke_instruction, resolved_method, inline_cache);
} else {
return TryInlineMonomorphicCall(invoke_instruction, resolved_method, inline_cache);
}
}
case kInlineCachePolymorphic: {
// Polymorphic 即目前仅统计到 2-4 个子类方法调用
MaybeRecordStat(stats_, MethodCompilationStat::kPolymorphicCall);
return TryInlinePolymorphicCall(invoke_instruction, resolved_method, inline_cache);
}
case kInlineCacheMegamorphic: {
// Megamorphic 即目前以统计到 5 个及以上的子类方法调用
// 见 https://android.googlesource.com/platform/art/+/refs/heads/pie-r2-release/runtime/jit/profiling_info.h
// kIndividualCacheSize = 5;
......
return false;
}
case kInlineCacheMissingTypes: {
// MissingTypes 即部分类加载失败
......
return false;
}
}
UNREACHABLE();
}
bool HInliner::TryInlineMonomorphicCall(HInvoke* invoke_instruction,
ArtMethod* resolved_method,
Handle<mirror::ObjectArray<mirror::Class>> classes) {
DCHECK(invoke_instruction->IsInvokeVirtual() || invoke_instruction->IsInvokeInterface())
<< invoke_instruction->DebugName();
// 解析class_index
dex::TypeIndex class_index = FindClassIndexIn(
GetMonomorphicType(classes), caller_compilation_unit_);
if (!class_index.IsValid()) {
LOG_FAIL(stats_, MethodCompilationStat::kNotInlinedDexCache)
<< "Call to " << ArtMethod::PrettyMethod(resolved_method)
<< " from inline cache is not inlined because its class is not"
<< " accessible to the caller";
return false;
}
ClassLinker* class_linker = caller_compilation_unit_.GetClassLinker();
PointerSize pointer_size = class_linker->GetImagePointerSize();
Handle<mirror::Class> monomorphic_type = handles_->NewHandle(GetMonomorphicType(classes));
// 解析实际调用的方法 resolved_method
resolved_method = ResolveMethodFromInlineCache(
monomorphic_type, resolved_method, invoke_instruction, pointer_size);
LOG_NOTE() << "Try inline monomorphic call to " << resolved_method->PrettyMethod();
if (resolved_method == nullptr) {
// Bogus AOT profile, bail.
DCHECK(Runtime::Current()->IsAotCompiler());
return false;
}
HInstruction* receiver = invoke_instruction->InputAt(0);
HInstruction* cursor = invoke_instruction->GetPrevious();
HBasicBlock* bb_cursor = invoke_instruction->GetBlock();
// 修改和替换指令
if (!TryInlineAndReplace(invoke_instruction,
resolved_method,
ReferenceTypeInfo::Create(monomorphic_type, /* is_exact */ true),
/* do_rtp */ false,
/* cha_devirtualize */ false)) {
return false;
}
// 添加 class 检查分支
// We successfully inlined, now add a guard.
AddTypeGuard(receiver,
cursor,
bb_cursor,
class_index,
monomorphic_type,
invoke_instruction,
/* with_deoptimization */ true);
// Run type propagation to get the guard typed, and eventually propagate the
// type of the receiver.
ReferenceTypePropagation rtp_fixup(graph_,
outer_compilation_unit_.GetClassLoader(),
outer_compilation_unit_.GetDexCache(),
handles_,
/* is_first_run */ false);
rtp_fixup.Run();
MaybeRecordStat(stats_, MethodCompilationStat::kInlinedMonomorphicCall);
return true;
}
HInstruction* HInliner::AddTypeGuard(HInstruction* receiver,
HInstruction* cursor,
HBasicBlock* bb_cursor,
dex::TypeIndex class_index,
Handle<mirror::Class> klass,
HInstruction* invoke_instruction,
bool with_deoptimization) {
ClassLinker* class_linker = caller_compilation_unit_.GetClassLinker();
// 添加 class解析指令,即 receiver.getClass() 方法
HInstanceFieldGet* receiver_class = BuildGetReceiverClass(
class_linker, receiver, invoke_instruction->GetDexPc());
if (cursor != nullptr) {
bb_cursor->InsertInstructionAfter(receiver_class, cursor);
} else {
bb_cursor->InsertInstructionBefore(receiver_class, bb_cursor->GetFirstInstruction());
}
const DexFile& caller_dex_file = *caller_compilation_unit_.GetDexFile();
bool is_referrer;
ArtMethod* outermost_art_method = outermost_graph_->GetArtMethod();
if (outermost_art_method == nullptr) {
DCHECK(Runtime::Current()->IsAotCompiler());
// We are in AOT mode and we don't have an ART method to determine
// if the inlined method belongs to the referrer. Assume it doesn't.
is_referrer = false;
} else {
is_referrer = klass.Get() == outermost_art_method->GetDeclaringClass();
}
// 添加 class 加载指令
// Note that we will just compare the classes, so we don't need Java semantics access checks.
// Note that the type index and the dex file are relative to the method this type guard is
// inlined into.
HLoadClass* load_class = new (graph_->GetAllocator()) HLoadClass(graph_->GetCurrentMethod(),
class_index,
caller_dex_file,
klass,
is_referrer,
invoke_instruction->GetDexPc(),
/* needs_access_check */ false);
// LoadKind 应为 KBssEntry
HLoadClass::LoadKind kind = HSharpening::ComputeLoadClassKind(
load_class, codegen_, compiler_driver_, caller_compilation_unit_);
DCHECK(kind != HLoadClass::LoadKind::kInvalid)
<< "We should always be able to reference a class for inline caches";
// Load kind must be set before inserting the instruction into the graph.
load_class->SetLoadKind(kind);
bb_cursor->InsertInstructionAfter(load_class, receiver_class);
// In AOT mode, we will most likely load the class from BSS, which will involve a call
// to the runtime. In this case, the load instruction will need an environment so copy
// it from the invoke instruction.
if (load_class->NeedsEnvironment()) {
DCHECK(Runtime::Current()->IsAotCompiler());
load_class->CopyEnvironmentFrom(invoke_instruction->GetEnvironment());
}
// 添加 class 比较检查指令
HNotEqual* compare = new (graph_->GetAllocator()) HNotEqual(load_class, receiver_class);
bb_cursor->InsertInstructionAfter(compare, load_class);
if (with_deoptimization) {
// 添加Deoptimize指令,当前置的内联分支未匹配到的时候,则会调用到该指令,作为最后的容错处理
HDeoptimize* deoptimize = new (graph_->GetAllocator()) HDeoptimize(
graph_->GetAllocator(),
compare,
receiver,
Runtime::Current()->IsAotCompiler()
? DeoptimizationKind::kAotInlineCache
: DeoptimizationKind::kJitInlineCache,
invoke_instruction->GetDexPc());
bb_cursor->InsertInstructionAfter(deoptimize, compare);
deoptimize->CopyEnvironmentFrom(invoke_instruction->GetEnvironment());
DCHECK_EQ(invoke_instruction->InputAt(0), receiver);
receiver->ReplaceUsesDominatedBy(deoptimize, deoptimize);
deoptimize->SetReferenceTypeInfo(receiver->GetReferenceTypeInfo());
}
return compare;
}
/**
* Instruction to load a Class object.
*/
class HLoadClass FINAL : public HInstruction {
......
}
void InstructionCodeGeneratorX86_64::VisitLoadClass(HLoadClass* cls) NO_THREAD_SAFETY_ANALYSIS {
HLoadClass::LoadKind load_kind = cls->GetLoadKind();
if (load_kind == HLoadClass::LoadKind::kRuntimeCall) {
// 调用 kQuickInitializeType 方法
codegen_->GenerateLoadClassRuntimeCall(cls);
return;
}
DCHECK(!cls->NeedsAccessCheck());
LocationSummary* locations = cls->GetLocations();
Location out_loc = locations->Out();
CpuRegister out = out_loc.AsRegister<CpuRegister>();
const ReadBarrierOption read_barrier_option = cls->IsInBootImage()
? kWithoutReadBarrier
: kCompilerReadBarrierOption;
bool generate_null_check = false;
switch (load_kind) {
case HLoadClass::LoadKind::kReferrersClass: {
......
break;
}
case HLoadClass::LoadKind::kBootImageLinkTimePcRelative:
......
break;
case HLoadClass::LoadKind::kBootImageAddress: {
......
break;
}
case HLoadClass::LoadKind::kBootImageClassTable: {
......
break;
}
case HLoadClass::LoadKind::kBssEntry: {
// 先尝试从 .bss 中加载,对于 AOT 或 JIT 内联场景下的 HLoadClass 指令,应为此项
Address address = Address::Absolute(CodeGeneratorX86_64::kDummy32BitOffset,
/* no_rip */ false);
Label* fixup_label = codegen_->NewTypeBssEntryPatch(cls);
// /* GcRoot<mirror::Class> */ out = *address /* PC-relative */
GenerateGcRootFieldLoad(cls, out_loc, address, fixup_label, read_barrier_option);
generate_null_check = true;
break;
}
case HLoadClass::LoadKind::kJitTableAddress: {
......
break;
}
default:
LOG(FATAL) << "Unexpected load kind: " << cls->GetLoadKind();
UNREACHABLE();
}
if (generate_null_check || cls->MustGenerateClinitCheck()) {
DCHECK(cls->CanCallRuntime());
// 先尝试从 .bss 中加载,若为 null 则调用 kQuickInitializeType 方法
SlowPathCode* slow_path = new (codegen_->GetScopedAllocator()) LoadClassSlowPathX86_64(
cls, cls, cls->GetDexPc(), cls->MustGenerateClinitCheck());
codegen_->AddSlowPath(slow_path);
if (generate_null_check) {
__ testl(out, out);
__ j(kEqual, slow_path->GetEntryLabel());
}
if (cls->MustGenerateClinitCheck()) {
GenerateClassInitializationCheck(slow_path, out);
} else {
__ Bind(slow_path->GetExitLabel());
}
}
}
void CodeGenerator::GenerateLoadClassRuntimeCall(HLoadClass* cls) {
DCHECK_EQ(cls->GetLoadKind(), HLoadClass::LoadKind::kRuntimeCall);
LocationSummary* locations = cls->GetLocations();
MoveConstant(locations->GetTemp(0), cls->GetTypeIndex().index_);
if (cls->NeedsAccessCheck()) {
CheckEntrypointTypes<kQuickInitializeTypeAndVerifyAccess, void*, uint32_t>();
InvokeRuntime(kQuickInitializeTypeAndVerifyAccess, cls, cls->GetDexPc());
} else if (cls->MustGenerateClinitCheck()) {
CheckEntrypointTypes<kQuickInitializeStaticStorage, void*, uint32_t>();
InvokeRuntime(kQuickInitializeStaticStorage, cls, cls->GetDexPc());
} else {
// 类加载指令最后编译的代码会编译成调用 art_quick_initialize_type 即对应于 artInitializeTypeFromCode 方法
CheckEntrypointTypes<kQuickInitializeType, void*, uint32_t>();
InvokeRuntime(kQuickInitializeType, cls, cls->GetDexPc());
}
}
从上面的源码可以看到,Android ART 虚拟机在编译时存在两种内联方式:
-
inline:即将 Callee(inlined-method) 方法内的代码内联至 Caller(outer-method) 方法中,替换掉原来对 Callee 方法的调用指令
-
inline-cache:即在分析热代码方法分发信息的基础上,通过添加类型检查和分支指令,直接将该方法的代码内联,从而省略方法查找和调用的过程
编译之后的指令,可以参考官方测试用例 Main.java
/// CHECK-START: int Main.inlineMonomorphicSubA(Super) inliner (before)
/// CHECK: InvokeVirtual method_name:Super.getValue
/// CHECK-START: int Main.inlineMonomorphicSubA(Super) inliner (after)
/// CHECK: <<SubARet:i\d+>> IntConstant 42
/// CHECK: <<Obj:l\d+>> NullCheck
/// CHECK: <<ObjClass:l\d+>> InstanceFieldGet [<<Obj>>] field_name:java.lang.Object.shadow$_klass_
/// CHECK: <<InlineClass:l\d+>> LoadClass class_name:SubA
/// CHECK: <<Test:z\d+>> NotEqual [<<InlineClass>>,<<ObjClass>>]
/// CHECK: <<DefaultRet:i\d+>> InvokeVirtual [<<Obj>>] method_name:Super.getValue
/// CHECK: <<Ret:i\d+>> Phi [<<SubARet>>,<<DefaultRet>>]
/// CHECK: Return [<<Ret>>]
/// CHECK-NOT: Deoptimize
public static int inlineMonomorphicSubA(Super a) {
return a.getValue();
}
执行过程
extern "C" mirror::Class* artInitializeTypeFromCode(uint32_t type_idx, Thread* self)
REQUIRES_SHARED(Locks::mutator_lock_) {
// Called when the .bss slot was empty or for main-path runtime call.
ScopedQuickEntrypointChecks sqec(self);
// 由于内联之后,实际的方法调用可能并不存在,存在的仅是方法内的代码
// 因此需要通过该方法来拿到内联之前的方法信息
auto caller_and_outer = GetCalleeSaveMethodCallerAndOuterMethod(
self, CalleeSaveType::kSaveEverythingForClinit);
ArtMethod* caller = caller_and_outer.caller;
// 触发类加载过程
ObjPtr<mirror::Class> result = ResolveVerifyAndClinit(dex::TypeIndex(type_idx),
caller,
self,
/* can_run_clinit */ false,
/* verify_access */ false);
if (LIKELY(result != nullptr) && CanReferenceBss(caller_and_outer.outer_method, caller)) {
// 将 result 缓存到 .bss 中
StoreTypeInBss(caller_and_outer.outer_method, dex::TypeIndex(type_idx), result);
}
return result.Ptr();
}
CallerAndOuterMethod GetCalleeSaveMethodCallerAndOuterMethod(Thread* self, CalleeSaveType type) {
CallerAndOuterMethod result;
ScopedAssertNoThreadSuspension ants(__FUNCTION__);
ArtMethod** sp = self->GetManagedStack()->GetTopQuickFrameKnownNotTagged();
// 解析 outer-method
auto outer_caller_and_pc = DoGetCalleeSaveMethodOuterCallerAndPc(sp, type);
result.outer_method = outer_caller_and_pc.first;
uintptr_t caller_pc = outer_caller_and_pc.second;
// 解析 inlined-method
result.caller =
DoGetCalleeSaveMethodCaller(result.outer_method, caller_pc, /* do_caller_check */ true);
return result;
}
static inline ArtMethod* DoGetCalleeSaveMethodCaller(ArtMethod* outer_method,
uintptr_t caller_pc,
bool do_caller_check)
REQUIRES_SHARED(Locks::mutator_lock_) {
ArtMethod* caller = outer_method;
if (LIKELY(caller_pc != reinterpret_cast<uintptr_t>(GetQuickInstrumentationExitPc()))) {
if (outer_method != nullptr) {
const OatQuickMethodHeader* current_code = outer_method->GetOatQuickMethodHeader(caller_pc);
DCHECK(current_code != nullptr);
DCHECK(current_code->IsOptimized());
uintptr_t native_pc_offset = current_code->NativeQuickPcOffset(caller_pc);
CodeInfo code_info = current_code->GetOptimizedCodeInfo();
MethodInfo method_info = current_code->GetOptimizedMethodInfo();
CodeInfoEncoding encoding = code_info.ExtractEncoding();
StackMap stack_map = code_info.GetStackMapForNativePcOffset(native_pc_offset, encoding);
DCHECK(stack_map.IsValid());
if (stack_map.HasInlineInfo(encoding.stack_map.encoding)) {
InlineInfo inline_info = code_info.GetInlineInfoOf(stack_map, encoding);
// 从 inline_info 中解析 inlined-method,此时该方法的代码已经被内联,且也没有此方法的栈帧
caller = GetResolvedMethod(outer_method,
method_info,
inline_info,
encoding.inline_info.encoding,
inline_info.GetDepth(encoding.inline_info.encoding) - 1);
}
}
if (kIsDebugBuild && do_caller_check) {
// Note that do_caller_check is optional, as this method can be called by
// stubs, and tests without a proper call stack.
NthCallerVisitor visitor(Thread::Current(), 1, true);
visitor.WalkStack();
CHECK_EQ(caller, visitor.caller);
}
} else {
// We're instrumenting, just use the StackVisitor which knows how to
// handle instrumented frames.
NthCallerVisitor visitor(Thread::Current(), 1, true);
visitor.WalkStack();
caller = visitor.caller;
}
return caller;
}
inline ArtMethod* GetResolvedMethod(ArtMethod* outer_method,
const MethodInfo& method_info,
const InlineInfo& inline_info,
const InlineInfoEncoding& encoding,
uint8_t inlining_depth)
REQUIRES_SHARED(Locks::mutator_lock_) {
DCHECK(!outer_method->IsObsolete());
// This method is being used by artQuickResolutionTrampoline, before it sets up
// the passed parameters in a GC friendly way. Therefore we must never be
// suspended while executing it.
ScopedAssertNoThreadSuspension sants(__FUNCTION__);
if (inline_info.EncodesArtMethodAtDepth(encoding, inlining_depth)) {
return inline_info.GetArtMethodAtDepth(encoding, inlining_depth);
}
uint32_t method_index = inline_info.GetMethodIndexAtDepth(encoding, method_info, inlining_depth);
// 特殊处理 String.charAt() 方法
if (inline_info.GetDexPcAtDepth(encoding, inlining_depth) == static_cast<uint32_t>(-1)) {
// "charAt" special case. It is the only non-leaf method we inline across dex files.
ArtMethod* inlined_method = jni::DecodeArtMethod(WellKnownClasses::java_lang_String_charAt);
DCHECK_EQ(inlined_method->GetDexMethodIndex(), method_index);
return inlined_method;
}
// 遍历解析 inlined-method
// Find which method did the call in the inlining hierarchy.
ClassLinker* class_linker = Runtime::Current()->GetClassLinker();
ArtMethod* method = outer_method;
for (uint32_t depth = 0, end = inlining_depth + 1u; depth != end; ++depth) {
DCHECK(!inline_info.EncodesArtMethodAtDepth(encoding, depth));
DCHECK_NE(inline_info.GetDexPcAtDepth(encoding, depth), static_cast<uint32_t>(-1));
method_index = inline_info.GetMethodIndexAtDepth(encoding, method_info, depth);
ArtMethod* inlined_method = class_linker->LookupResolvedMethod(method_index,
method->GetDexCache(),
method->GetClassLoader());
if (UNLIKELY(inlined_method == nullptr)) {
LOG(FATAL) << "Could not find an inlined method from an .oat file: "
<< method->GetDexFile()->PrettyMethod(method_index) << " . "
<< "This must be due to duplicate classes or playing wrongly with class loaders";
UNREACHABLE();
}
DCHECK(!inlined_method->IsRuntimeMethod());
// 检查 outer_method 和 inlined_method 是否在同一个 dex 中
if (UNLIKELY(inlined_method->GetDexFile() != method->GetDexFile())) {
// TODO: We could permit inlining within a multi-dex oat file and the boot image,
// even going back from boot image methods to the same oat file. However, this is
// not currently implemented in the compiler. Therefore crossing dex file boundary
// indicates that the inlined definition is not the same as the one used at runtime.
LOG(FATAL) << "Inlined method resolution crossed dex file boundary: from "
<< method->PrettyMethod()
<< " in " << method->GetDexFile()->GetLocation() << "/"
<< static_cast<const void*>(method->GetDexFile())
<< " to " << inlined_method->PrettyMethod()
<< " in " << inlined_method->GetDexFile()->GetLocation() << "/"
<< static_cast<const void*>(inlined_method->GetDexFile()) << ". "
<< "This must be due to duplicate classes or playing wrongly with class loaders";
UNREACHABLE();
}
method = inlined_method;
}
return method;
}
实际上,HLoadClass 指令最后会编译并调用 art_quick_initialize_type 方法,这个方法就是 artInitializeTypeFromCode(...) 方法,artInitializeTypeFromCode(...) 方法则会触发类加载及初始化逻辑。
同时,我们也看到由于内联之后,可能调用 Caller(inlined_method) 方法的指令已经被 Caller(inlined_method) 方法中的指令所替代,因此需要调用 GetCalleeSaveMethodCallerAndOuterMethod 方法,从方法栈帧和代码信息中解析出 Caller 方法(inlined_method)的信息。
也正是在这个过程中,存在一个检查 inlined_method 和 outer_method 是否在同一个 dex 中的逻辑,如果不在同一个dex中,则输出 “ Inlined method resolution crossed dex file boundary ” 信息。