一、概述
本章分析async-profiler的cpu profiling实现,包括:
- javaagent如何挂载;
- 不同的cpu分析引擎的实现方式;
注:
- async-profiler4.0.0
- openjdk17
案例
async-profiler支持多种ProfilingMode:
- cpu:默认模式,也可以通过-e cpu指定。收集方法调用栈样本,分析消耗cpu的方法;
- lock:-e lock,分析锁竞争情况;
- alloc:-e alloc,分析堆内存分配情况;
- nativemem:-e nativemem,分析非堆内存分配情况;
- wall:-e wall,以固定时间间隔对所有线程进行均等采样(无论线程处于运行中、休眠还是阻塞状态),适用于分析应用启动时间等场景;
本章主要分析cpu模式,通过-d指定采集时间(秒),对目标jvm进程发起cpu分析。
asprof -d 30 14047(java进程id)
分析结果通过标准输出,包含两部分,按照采样数量倒排展示:调用栈详情、栈顶栈帧。
--- Execution profile ---
Total samples : 4
--- 20000000 ns (50.00%), 2 samples
[ 0] __GI___futex_abstimed_wait_cancelable64
[ 1] pthread_cond_timedwait@@GLIBC_2.17
[ 2] os::PlatformMonitor::wait
[ 3] Monitor::wait_without_safepoint_check
[ 4] WatcherThread::sleep
[ 5] WatcherThread::run
[ 6] Thread::call_run
[ 7] thread_native_entry
[ 8] start_thread
[ 9] thread_start
--- 10000000 ns (25.00%), 1 sample
[ 0] AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<548964ul, G1BarrierSet>, (AccessInternal::BarrierType)2, 548964ul>::oop_access_barrier
[ 1] JavaThread::sleep
[ 2] JVM_Sleep
[ 3] java.lang.Thread.sleep
[ 4] com.xxx.NativeMemBurnJob.normalJob
[ 5] jdk.internal.reflect.GeneratedMethodAccessor6.invoke
[ 6] jdk.internal.reflect.DelegatingMethodAccessorImpl.invoke
[ 7] java.lang.reflect.Method.invoke
[ 8] org.springframework.scheduling.support.ScheduledMethodRunnable.runInternal
[ 9] org.springframework.scheduling.support.ScheduledMethodRunnable.lambda$run$2
[10] org.springframework.scheduling.support.ScheduledMethodRunnable$$Lambda$859.0x00000002013c32c8.run
[11] io.micrometer.observation.Observation.observe
[12] org.springframework.scheduling.support.ScheduledMethodRunnable.run
[13] org.springframework.scheduling.config.Task$OutcomeTrackingRunnable.run
[14] org.springframework.scheduling.support.DelegatingErrorHandlingRunnable.run
[15] java.util.concurrent.Executors$RunnableAdapter.call
[16] java.util.concurrent.FutureTask.runAndReset
[17] java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run
[18] java.util.concurrent.ThreadPoolExecutor.runWorker
[19] java.util.concurrent.ThreadPoolExecutor$Worker.run
[20] java.lang.Thread.run
--- 10000000 ns (25.00%), 1 sample
[ 0] java.util.concurrent.ThreadPoolExecutor.getTask
[ 1] java.util.concurrent.ThreadPoolExecutor.runWorker
[ 2] java.util.concurrent.ThreadPoolExecutor$Worker.run
[ 3] java.lang.Thread.run
ns percent samples top
---------- ------- ------- ---
20000000 50.00% 2 __GI___futex_abstimed_wait_cancelable64
10000000 25.00% 1 AccessInternal::PostRuntimeDispatch<G1BarrierSet::AccessBarrier<548964ul, G1BarrierSet>, (AccessInternal::BarrierType)2, 548964ul>::oop_access_barrier
10000000 25.00% 1 java.util.concurrent.ThreadPoolExecutor.getTask
通过指定-f选项,输出火焰图:
asprof -d 30 -f flame.html 14047(java进程id)
越宽的方法,代表cpu使用率越高,可能是性能瓶颈。
Makefile
从Makefile说起,对于async-profiler构建产物主要由2个部分组成:
- asprof:命令行工具,c实现;
- libasyncProfiler.so:基于jvmti实现的javaagent,c++实现;
ASPROF=bin/asprof
LIB_PROFILER=lib/libasyncProfiler.$(SOEXT)
build/$(ASPROF): src/main/* src/jattach/* src/fdtransfer.h
$(CC) $(CPPFLAGS) $(CFLAGS) $(DEFS) -o $@ src/main/*.cpp src/jattach/*.c
$(STRIP) $@
build/$(LIB_PROFILER): $(SOURCES) $(HEADERS) $(RESOURCES) $(JAVA_HELPER_CLASSES)
for f in src/*.cpp; do echo '#include "'$$f'"'; done |\
$(CXX) $(CPPFLAGS) $(CXXFLAGS) $(DEFS) $(INCLUDES) -fPIC -g -shared -o $@ -xc++ - $(LIBS)
其中asprof分为两个部分:
- main:asprof的入口,管理整个命令的执行流程;
- jattach:挂载agent的入口,与JVM交互;
整体流程
从进程角度来说,asprof执行涉及三个进程:
- main:执行asprof命令;
- jattach:是main通过fork创建的子进程,与JVM通讯,需要挂载两次agent,开始和结束profiling;
- JVM进程:被profiling的用户进程,执行async-profiler提供的libasyncProfiler.so;
从JVM线程角度来说,涉及两类线程:
- AttachListener:jvm内部线程,用于动态加载javaagent并执行逻辑;
- 各JVM线程:async-profiler对所有JVM线程开启profiling,各JVM线程获取当前的方法栈;
二、main
src/main/main.cpp:asprof的程序入口
- 解析入参
- 执行jattach挂载agent,执行start指令,输入参数如:start,quiet,file=数据输出文件,,log=日志文件;
- 注册信号处理器(kill pid和ctrl+c),可提前退出profiling;
- 睡眠-d指定的时间;
- 再次执行jattach挂载agent,执行stop指令;
static void sigint_handler(int sig) {
end_time = 0;
}
int main(int argc, const char** argv) {
// ...入参解析
// start,quiet,file=/tmp/asprof.{asprof的pid}.{目标进程pid},,log=/tmp/asprof-log.{asprof的pid}.{目标进程pid}
run_jattach(pid, String("start,quiet,file=") << file << "," << output << format << params << ",log=" << logfile);
fprintf(stderr, "Profiling for %d seconds\n", duration);
// 计算profiling结束时间
end_time = time_micros() + duration * 1000000ULL;
// 注册信号处理器,如果用户结束asprof进程,end_time设置为0,提前退出睡眠
signal(SIGINT, sigint_handler);
signal(SIGTERM, sigint_handler);
// 睡眠等待profiling时间
while (time_micros() < end_time) {
if (kill(pid, 0) != 0) {
fprintf(stderr, "Process exited\n");
if (use_tmp_file) print_file(file, STDOUT_FILENO);
return 0;
}
sleep(1);
}
fprintf(stderr, end_time != 0 ? "Done\n" : "Interrupted\n");
signal(SIGINT, SIG_DFL);
// stop,file=/tmp/asprof.{asprof的pid}.{目标进程pid},,log=/tmp/asprof-log.{asprof的pid}.{目标进程pid}
run_jattach(pid, String("stop,file=") << file << "," << output << format << ",log=" << logfile);
}
src/main/main.cpp:run_jattach,fork子进程执行attach,父进程阻塞等待子进程结束。
传入四个参数:
- load,常量,代表加载javaagent,对应jdk里load_agent方法;
- libpath,libasyncProfiler.so的文件位置;
- true,libpath是绝对路径;
- cmd.str(),agent参数;
static void run_jattach(int pid, String& cmd) {
pid_t child = fork();
if (child == -1) {
error("fork failed", errno);
}
if (child == 0) {
// 子进程
const char* argv[] = {"load", libpath.str(), libpath.str()[0] == '/' ? "true" : "false", cmd.str()};
exit(jattach(pid, 4, argv, 0));
} else {
// 父进程
int ret = wait_for_exit(child);
if (ret != 0) {
print_file(logfile, STDERR_FILENO);
exit(WEXITSTATUS(ret));
}
print_file(logfile, STDERR_FILENO);
// 未指定-f,读取/tmp/asprof.{asprof的pid}.{目标进程pid},写入标准输出
// 所以javaagent写tmp文件,attach进程读tmp文件,从标准输出打印
if (use_tmp_file) print_file(file, STDOUT_FILENO);
}
}
三、jattach
这里是用c写的attach javaagent逻辑,本质上和使用VirtualMachine#attach(pid)的逻辑是一致的。
动态挂载agent,依赖于jvm进程主动打开一个unix domain socket,让attach进程与jvm进程通讯。
jattach/jattach_hotspot.c:attach进程,jattach_hotspot挂载javaagent(libasyncProfiler.so)。
1)check_socket:校验socket是否存在,一般是/tmp/.java_pid{pid}文件;
2)start_attach_mechanism:如果socket不存在,发起attach逻辑;
3)connect_socket:建立socket连接;
4)write_command:attach进程向jvm进程发送load指令,指定javaagent和相关参数;
5)read_response:jvm执行javaagent结束后,attach进程通过socket得到执行结果;
int jattach_hotspot(int pid, int nspid, int argc, char** argv, int print_output) {
// check_socket: 校验socket是否存在
// start_attach_mechanism: 如果socket不存在,发起attach逻辑
if (check_socket(nspid) != 0 && start_attach_mechanism(pid, nspid) != 0) {
perror("Could not start attach mechanism");
return 1;
}
// 建立socket连接
int fd = connect_socket(nspid);
if (fd == -1) {
perror("Could not connect to socket");
return 1;
}
if (print_output) {
printf("Connected to remote JVM\n");
}
// 向socket写入参数
if (write_command(fd, argc, argv) != 0) {
perror("Error writing to socket");
close(fd);
return 1;
}
// 读取socket响应
int result = read_response(fd, argc, argv, print_output);
close(fd);
return result;
}
jattach/jattach_hotspot.c:start_attach_mechanism通过attach_pid文件和SIGQUIT信号,通知JVM进程处理打开socket。
1)创建/proc/{pid}/cwd/.attach_pid{pid}文件;
2)向目标jvm进程发送SIGQUIT信号;
3)睡眠并check_socket校验socket文件是否创建;
4)删除/proc/{pid}/cwd/.attach_pid{pid}文件;
static int start_attach_mechanism(int pid, int nspid) {
char path[MAX_PATH];
snprintf(path, sizeof(path), "/proc/%d/cwd/.attach_pid%d", mnt_changed > 0 ? nspid : pid, nspid);
int fd = creat(path, 0660);
kill(pid, SIGQUIT);
struct timespec ts = {0, 20000000};
int result;
do {
nanosleep(&ts, NULL);
result = check_socket(nspid);
} while (result != 0 && (ts.tv_nsec += 20000000) < 500000000);
unlink(path);
return result;
}
src/hotspot/share/runtime/os.cpp:jvm侧的SignalHandler线程,监听SIGQUIT信号,如果attach_pid文件存在,创建AttachListener线程。
static void signal_thread_entry(JavaThread* thread, TRAPS) {
os::set_priority(thread, NearMaxPriority);
while (true) {
int sig;
{
sig = os::signal_wait();
}
if (sig == os::sigexitnum_pd()) {
return;
}
switch (sig) {
case SIGBREAK: { // SIGQUIT
if (!DisableAttachMechanism) {
// 将状态从AL_NOT_INITIALIZED->AL_INITIALIZING
AttachListenerState cur_state = AttachListener::transit_state(AL_INITIALIZING, AL_NOT_INITIALIZED);
if (cur_state == AL_INITIALIZING) {
continue;
} else if (cur_state == AL_NOT_INITIALIZED) {
// 创建AttachListener线程
if (AttachListener::is_init_trigger()) {
continue;
} else {
AttachListener::set_state(AL_NOT_INITIALIZED);
}
} else if (AttachListener::check_socket_file()) {
continue;
}
}
}
// ...
}
}
}
// src/hotspot/os/linux/attachListener_linux.cpp
bool AttachListener::is_init_trigger() {
char fn[PATH_MAX + 1];
int ret;
struct stat64 st;
sprintf(fn, ".attach_pid%d", os::current_process_id());
RESTARTABLE(::stat64(fn, &st), ret);
if (ret == 0) {
// .attach_pid存在
if (os::Posix::matches_effective_uid_or_root(st.st_uid)) {
init(); // 创建线程
return true;
}
}
return false;
}
// src/hotspot/share/services/attachListener.cpp
void AttachListener::init() {
const char thread_name[] = "Attach Listener";
// ...
JavaThread* listener_thread = new JavaThread(&attach_listener_thread_entry);
// ...
Thread::start(listener_thread);
}
src/hotspot/share/services/attachListener.cpp:attach_listener_thread_entry是AttachListener线程的执行逻辑,创建socket,读socket,执行指定方法,如load指令对应load_agent方法,最后再写socket。
static AttachOperationFunctionInfo funcs[] = {
{ "agentProperties", get_agent_properties },
{ "datadump", data_dump },
{ "dumpheap", dump_heap },
{ "load", load_agent },
{ "properties", get_system_properties },
{ "threaddump", thread_dump },
{ "inspectheap", heap_inspection },
{ "setflag", set_flag },
{ "printflag", print_flag },
{ "jcmd", jcmd },
{ NULL, NULL }
};
static void attach_listener_thread_entry(JavaThread* thread, TRAPS) {
// 创建socket --- /tmp/.java_pid{pid}
if (AttachListener::pd_init() != 0) {
AttachListener::set_state(AL_NOT_INITIALIZED);
return;
}
// AttachListener状态标记为AL_INITIALIZED
AttachListener::set_initialized();
for (;;) {
// 读取socket,封装为AttachOperation
AttachOperation* op = AttachListener::dequeue();
AttachOperationFunctionInfo* info = NULL;
for (int i=0; funcs[i].name != NULL; i++) {
const char* name = funcs[i].name;
// 根据指令匹配要执行的function
if (strcmp(op->name(), name) == 0) {
info = &(funcs[i]);
break;
}
}
// 执行目标function --- load_agent
res = (info->func)(op, &st);
// function结果写socket
op->complete(res, &st);
}
}
src/hotspot/share/services/attachListener.cpp:load_agent_library传入load指令的参数,包括agent-agent位置,absParam-是否绝对路径,options-给agent的参数。
static jint load_agent(AttachOperation* op, outputStream* out) {
const char* agent = op->arg(0);
const char* absParam = op->arg(1);
const char* options = op->arg(2);
return JvmtiExport::load_agent_library(agent, absParam, options, out);
}
src/hotspot/share/prims/jvmtiExport.cpp:load_agent_library加载agent并执行agent的Agent_OnAttach方法。
jint JvmtiExport::load_agent_library(const char *agent, const char *absParam,
const char *options, outputStream* st) {
char ebuf[1024] = {0};
void* library = NULL;
jint result = JNI_ERR;
const char *on_attach_symbols[] = AGENT_ONATTACH_SYMBOLS;
size_t num_symbol_entries = ARRAY_SIZE(on_attach_symbols);
bool is_absolute_path = (absParam != NULL) && (strcmp(absParam,"true")==0);
// 加载agent
AgentLibrary *agent_lib = new AgentLibrary(agent, options, is_absolute_path, NULL);
if (!os::find_builtin_agent(agent_lib, on_attach_symbols, num_symbol_entries)) {
if (is_absolute_path) {
library = os::dll_load(agent, ebuf, sizeof ebuf);
}
}
if (library != NULL) {
agent_lib->set_os_lib(library);
agent_lib->set_valid();
}
if (agent_lib->valid()) {
OnAttachEntry_t on_attach_entry = NULL;
on_attach_entry = CAST_TO_FN_PTR(OnAttachEntry_t, os::find_agent_function(agent_lib, false, on_attach_symbols, num_symbol_entries));
extern struct JavaVM_ main_vm;
// 执行agent的Agent_OnAttach方法
result = (*on_attach_entry)(&main_vm, (char*)options, NULL);
}
return result;
}
四、libasyncProfiler
src/vmEntry.cpp:asprof提供的javaagent实现入口,从这里开始所有逻辑都跑在目标jvm进程内。
extern "C" DLLEXPORT jint JNICALL
Agent_OnAttach(JavaVM* vm, char* options, void* reserved) {
Arguments args;
// 解析入参
Error error = args.parse(options);
// 初始化
if (!VM::init(vm, true)) {
return COMMAND_ERROR;
}
// 执行profiling
error = Profiler::instance()->run(args);
if (error) {
return COMMAND_ERROR;
}
return 0;
}
1、init
src/vmEntry.cpp:init执行初始化,做数据准备
1)多次attach只会执行一次;
2)通过JavaVM可以拿到jvmti(JVM Tool Interface)执行jvm提供的方法;
3)通过dlopen找到libjvm.so,拿到AsyncGetCallTrace方法;
4)updateSymbols初始化非内核符号;
5)使用jvmti挂载多个钩子;
6)loadAllMethodIDs触发methodId分配;
bool VM::init(JavaVM* vm, bool attach) {
// 多次attach只会执行一次
if (_jvmti != NULL) return true;
_vm = vm;
// 获取jvmti
if (_vm->GetEnv((void**)&_jvmti, JVMTI_VERSION_1_0) != 0) {
return false;
}
// 找到libjvm.so
void* libjvm = RTLD_DEFAULT;
if (OS::isLinux() && (libjvm = dlopen("libjvm.so", RTLD_LAZY)) == NULL) {
libjvm = RTLD_DEFAULT;
}
// 找到AsyncGetCallTrace方法,后面获取java的方法栈需要用到
_asyncGetCallTrace = (AsyncGetCallTrace)dlsym(libjvm, "AsyncGetCallTrace");
Profiler* profiler = Profiler::instance();
// 初始化jvm非内核符号,后面native的方法栈需要用到
if (VMStructs::libjvm() == NULL) {
profiler->updateSymbols(false);
VMStructs::init(profiler->findLibraryByAddress((const void*)_asyncGetCallTrace));
}
jvmtiCapabilities capabilities = {0};
capabilities.can_generate_all_class_hook_events = 1;
capabilities.can_retransform_classes = 1;
capabilities.can_retransform_any_class = isOpenJ9() ? 0 : 1;
capabilities.can_generate_vm_object_alloc_events = isOpenJ9() ? 1 : 0;
capabilities.can_get_bytecodes = 1;
capabilities.can_get_constant_pool = 1;
capabilities.can_get_source_file_name = 1;
capabilities.can_get_line_numbers = 1;
capabilities.can_generate_compiled_method_load_events = 1;
capabilities.can_generate_monitor_events = 1;
capabilities.can_generate_garbage_collection_events = 1;
capabilities.can_tag_objects = 1;
_jvmti->AddCapabilities(&capabilities);
jvmtiEventCallbacks callbacks = {0};
callbacks.VMInit = VMInit;
callbacks.VMDeath = VMDeath;
// 1
callbacks.ClassLoad = ClassLoad;
// 2
callbacks.ClassPrepare = ClassPrepare;
callbacks.ClassFileLoadHook = Instrument::ClassFileLoadHook;
callbacks.CompiledMethodLoad = Profiler::CompiledMethodLoad;
callbacks.DynamicCodeGenerated = Profiler::DynamicCodeGenerated;
callbacks.ThreadStart = Profiler::ThreadStart;
callbacks.ThreadEnd = Profiler::ThreadEnd;
callbacks.MonitorContendedEnter = LockTracer::MonitorContendedEnter;
callbacks.MonitorContendedEntered = LockTracer::MonitorContendedEntered;
callbacks.VMObjectAlloc = J9ObjectSampler::VMObjectAlloc;
callbacks.SampledObjectAlloc = ObjectSampler::SampledObjectAlloc;
callbacks.GarbageCollectionStart = ObjectSampler::GarbageCollectionStart;
callbacks.GarbageCollectionFinish = Profiler::GarbageCollectionFinish;
_jvmti->SetEventCallbacks(&callbacks, sizeof(callbacks));
_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_VM_DEATH, NULL);
// 1
_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_CLASS_LOAD, NULL);
// 2
_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_CLASS_PREPARE, NULL);
_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_DYNAMIC_CODE_GENERATED, NULL);
_jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_GARBAGE_COLLECTION_FINISH, NULL);
// 3
loadAllMethodIDs(jvmti(), jni());
_jvmti->GenerateEvents(JVMTI_EVENT_DYNAMIC_CODE_GENERATED);
_jvmti->GenerateEvents(JVMTI_EVENT_COMPILED_METHOD_LOAD);
return true;
}
这里有两个比较重要的钩子,实现都在vmEntry.h中。
ClassLoad钩子,是个空实现,他的作用只是后面AsyncGetCallTrace获取java方法栈的时候,会判断是否有ClassLoad钩子,如果有才能获取成功。
static void JNICALL ClassLoad(jvmtiEnv* jvmti, JNIEnv* jni, jthread thread, jclass klass) {
// Needed only for AsyncGetCallTrace support
}
见src/hotspot/share/prims/forte.cpp,只有加了ClassLoad钩子,这里should_post_class_load才能为true,AsyncGetCallTrace才能获取java方法栈。
JNIEXPORT
void AsyncGetCallTrace(ASGCT_CallTrace *trace, jint depth, void* ucontext) {
JavaThread* thread;
if (!JvmtiExport::should_post_class_load()) {
trace->num_frames = ticks_no_class_load; // -1
return;
}
// ....
}
ClassPrepare钩子,此时类的method、field都已经准备完成。
这里目的和loadAllMethodIDs是一样的,都是触发jmethodID分配。
static void JNICALL ClassPrepare(jvmtiEnv* jvmti, JNIEnv* jni, jthread thread, jclass klass) {
loadMethodIDs(jvmti, jni, klass);
}
init里loadAllMethodIDs是确保目前已加载的class,触发jmethodID分配;
ClassPrepare钩子,是确保后面加载的class,能触发jmethodID分配。
void VM::loadAllMethodIDs(jvmtiEnv* jvmti, JNIEnv* jni) {
jint class_count;
jclass* classes;
// 遍历所有已经加载的class
if (jvmti->GetLoadedClasses(&class_count, &classes) == 0) {
for (int i = 0; i < class_count; i++) {
// 处理这个class的所有method
loadMethodIDs(jvmti, jni, classes[i]);
}
// 释放classes
jvmti->Deallocate((unsigned char*)classes);
}
}
void VM::loadMethodIDs(jvmtiEnv* jvmti, JNIEnv* jni, jclass klass) {
jint method_count;
jmethodID* methods;
// 获取这个类的所有method,然后再全部释放
if (jvmti->GetClassMethods(klass, &method_count, &methods) == 0) {
jvmti->Deallocate((unsigned char*)methods);
}
}
这个jmethodID本质上是指向Method的指针,而jmethodID有缓存机制,需要通过GetClassMethods将Method指针缓存起来,AsyncGetCallTrace时才能拿到jmethodID,使用jvmti+jmethodID能拿到方法名。
生成jmethodID,jvm代码见src/hotspot/share/oops/method.cpp#make_jmethod_id。
jmethodID Method::make_jmethod_id(ClassLoaderData* loader_data, Method* m) {
ClassLoaderData* cld = loader_data;
if (!SafepointSynchronize::is_at_safepoint()) {
MutexLocker ml(JmethodIdCreation_lock, Mutex::_no_safepoint_check_flag);
if (cld->jmethod_ids() == NULL) {
cld->set_jmethod_ids(new JNIMethodBlock());
}
// jmethodID is a pointer to Method*
return (jmethodID)cld->jmethod_ids()->add_method(m);
} else {
if (cld->jmethod_ids() == NULL) {
cld->set_jmethod_ids(new JNIMethodBlock());
}
// jmethodID is a pointer to Method*
return (jmethodID)cld->jmethod_ids()->add_method(m);
}
}
2、start
profiler.cpp:Profiler是一个单例对象,在JVM进程存活期间一直会存在,不会销毁。多次attach上来都是同一个对象。
// The instance is not deleted on purpose, since profiler structures
// can be still accessed concurrently during VM termination
Profiler* const Profiler::_instance = new Profiler();
profiler.cpp:run,启动使用标准输出,停止使用传入的file输出。
Error Profiler::run(Arguments& args) {
if (!args.hasOutputFile()) {
// start 标准输出
LogWriter out;
return runInternal(args, out);
} else {
// stop 文件输出
MutexLocker ml(_state_lock);
FileWriter out(args.file());
if (!out.is_open()) {
return Error("Could not open output file");
}
return runInternal(args, out);
}
}
profiler.cpp:runInternal,根据指令执行不同方法,start执行start方法,stop执行stop和dump。
Error Profiler::runInternal(Arguments& args, Writer& out) {
switch (args._action) {
case ACTION_START:
case ACTION_RESUME: {
Error error = start(args, args._action == ACTION_START);
if (error) {
return error;
}
if (!args._quiet) {
out << "Profiling started\n";
}
break;
}
case ACTION_STOP: {
Error error = stop();
if (args._output == OUTPUT_NONE) {
if (error) {
return error;
}
if (!args._quiet) {
out << "Profiling stopped after " << uptime() << " seconds. No dump options specified\n";
}
break;
}
// Fall through
}
case ACTION_DUMP: {
Error error = dump(out, args);
if (error) {
return error;
}
break;
}
case ...
default:
break;
}
return Error::OK;
}
profiler.cpp:start
1)清理缓存,因为profiler是单例,每次start先要清理上次profiling的缓存;
2)为记录方法调用栈分配内存_calltrace_buffer,这里为了减少并发修改,所以分配了16个_calltrace_buffer,后面每个线程尽量hash到不同的_calltrace_buffer记录调用栈;
3)updateSymbols更新内核符号表,内核方法地址转方法名;
4)根据event选择cpu执行引擎;
5)启动cpu执行引擎;
6)如果开启多个event,则启动其他类型引擎,如alloc内存分配;(只有-f xxx.jfr导出jfr格式才支持多event)
Error Profiler::start(Arguments& args, bool reset) {
// 根据-e参数,设置_event_mask
_event_mask = (args._event != NULL ? EM_CPU : 0) |
(args._alloc >= 0 ? EM_ALLOC : 0) |
(args._lock >= 0 ? EM_LOCK : 0) |
(args._wall >= 0 ? EM_WALL : 0) |
(args._nativemem >= 0 ? EM_NATIVEMEM : 0);
// 清理缓存数据
if (reset || _start_time == 0) {
_total_samples = 0;
_total_stack_walk_time = 0;
memset(_failures, 0, sizeof(_failures));
lockAll();
_class_map.clear();
_thread_filter.clear();
_call_trace_storage.clear();
_add_event_frame = args._output != OUTPUT_JFR;
_add_thread_frame = args._threads && args._output != OUTPUT_JFR;
_add_sched_frame = args._sched;
unlockAll();
MutexLocker ml(_thread_names_lock);
_thread_names.clear();
_thread_ids.clear();
}
// 为记录方法调用栈 分配内存
if (_max_stack_depth != args._jstackdepth) {
_max_stack_depth = args._jstackdepth;
size_t nelem = _max_stack_depth + MAX_NATIVE_FRAMES + RESERVED_FRAMES;
for (int i = 0; i < 16; i++) {
free(_calltrace_buffer[i]);
_calltrace_buffer[i] = (CallTraceBuffer*)calloc(nelem, sizeof(CallTraceBuffer));
if (_calltrace_buffer[i] == NULL) {
_max_stack_depth = 0;
return Error("Not enough memory to allocate stack trace buffers (try smaller jstackdepth)");
}
}
}
// 更新符号表,用于将native方法地址转名称,perf_events支持内核符号,其他只需要普通符号
updateSymbols(_engine == &perf_events && !args._alluser);
// 根据event选择cpu执行引擎
_engine = selectEngine(args._event);
_cstack = args._cstack;
// 不同cpu引擎实现start方法
error = _engine->start(args);
// 可以-e指定多个event,则启动多个后台采集任务
if (_event_mask & EM_ALLOC) {
_alloc_engine = selectAllocEngine(args._alloc, args._live);
error = _alloc_engine->start(args);
}
if (_event_mask & EM_LOCK) {
error = lock_tracer.start(args);
}
if (_event_mask & EM_WALL) {
error = wall_clock.start(args);
}
if (_event_mask & EM_NATIVEMEM) {
error = malloc_tracer.start(args);
}
_state = RUNNING;
_start_time = time(NULL);
_epoch++;
return Error::OK;
}
profiler.cpp:selectEngine,根据事件名选择不同的cpu profiling执行引擎。
1)默认不指定,就是EVENT_CPU,如果系统允许则使用perf_events,降级使用ctimer或wall;
2)通过-e wall/ctimer/itimer指定其他引擎;
3)通过-e java方法指定instrument引擎;
Engine* Profiler::selectEngine(const char* event_name) {
if (event_name == NULL) {
return &noop_engine;
} else if (strcmp(event_name, EVENT_CPU) == 0) {
if (FdTransferClient::hasPeer() || PerfEvents::supported()) {
return &perf_events; // 默认
} else if (CTimer::supported()) {
return &ctimer;
} else {
return &wall_clock;
}
} else if (strcmp(event_name, EVENT_WALL) == 0) {
if (VM::isOpenJ9()) {
return &j9_wall_clock;
} else {
return &wall_clock;
}
} else if (strcmp(event_name, EVENT_CTIMER) == 0) {
return &ctimer;
} else if (strcmp(event_name, EVENT_ITIMER) == 0) {
return &itimer;
} else if (strchr(event_name, '.') != NULL && strchr(event_name, ':') == NULL) {
return &instrument;
} else {
return &perf_events;
}
}
perf_events
perfEvents_linux.cpp:perf_events仅支持linux。通过执行perf_event_open验证是否支持perf_event。perf_event_open系统调用受制于权限约束:
1)非容器环境下需要配置sysctl kernel.perf_event_paranoid=1;(默认2)
2)容器环境下需要通过seccomp或其他方式;
bool PerfEvents::supported() {
struct perf_event_attr attr = {0};
attr.size = sizeof(attr);
attr.type = PERF_TYPE_SOFTWARE;
attr.config = PERF_COUNT_SW_CPU_CLOCK;
attr.sample_period = 1000000000;
attr.sample_type = PERF_SAMPLE_CALLCHAIN;
attr.disabled = 1;
int fd = syscall(__NR_perf_event_open, &attr, 0, -1, -1, 0);
if (fd == -1) {
return false;
}
close(fd);
return true;
}
perfEvents_linux.cpp:start。
1)如果未设置kernel.kptr_restrict=0,则无法获取内核符号地址,降级只能获取用户空间方法;
2)adjustFDLimit,调整rlimit,用perf_events需要每个线程占用一个fd;
3)创建events数组,容量=当前系统最大线程id;
4)注册信号处理方法signalHandler;
5)循环处理每个线程;
Error PerfEvents::start(Arguments& args) {
_event_type = PerfEventType::forName(args._event);
if (!setupThreadHook()) {
return Error("Could not set pthread hook");
}
_target_cpu = args._target_cpu;
_interval = args._interval ? args._interval : _event_type->default_interval;
_cstack = args._cstack;
_signal = args._signal == 0 ? OS::getProfilingSignal(0) : args._signal & 0xff;
_count_overrun = false;
// 如果未设置kernel.kptr_restrict=0,只能采集用户空间的方法栈
_alluser = args._alluser;
_kernel_stack = !_alluser && _cstack != CSTACK_NO;
if (_kernel_stack && !Symbols::haveKernelSymbols()) {
Log::warn("Kernel symbols are unavailable due to restrictions. Try\n"
" sysctl kernel.perf_event_paranoid=1\n"
" sysctl kernel.kptr_restrict=0");
_kernel_stack = false;
_alluser = strcmp(args._event, EVENT_CPU) != 0 && !supported();
}
// 调整rlimit,因为每个线程要用一个fd
adjustFDLimit();
// 创建events数组,容量=当前系统最大线程id
int max_events = OS::getMaxThreadId();
if (max_events != _max_events) {
free(_events);
_events = (PerfEvent*)calloc(max_events, sizeof(PerfEvent));
_max_events = max_events;
}
// 注册信号处理函数
OS::installSignalHandler(_signal, signalHandler);
// Enable pthread hook before traversing currently running threads
enableThreadHook();
// 循环处理每个线程
int err = createForAllThreads();
return Error::OK;
}
perfEvents_linux.cpp:createForThread为每个线程开启perf
1)组装attr参数(采样模式,基于CPU时钟周期),调用perf_event_open,得到一个fd;
2)mmap为每个fd创建一个内存映射page,用于后续读perf数据;
3)fcntl设置fd触发信号,由每个线程异步处理;
4)ioctl开启perf;
int PerfEvents::createForThread(int tid) {
if (!__sync_bool_compare_and_swap(&_events[tid]._fd, 0, -1)) {
return -1;
}
PerfEventType* event_type = _event_type;
struct perf_event_attr attr = {0};
attr.size = sizeof(attr);
// 即attr.type = PERF_TYPE_SOFTWARE;
attr.type = event_type->type;
// 即attr.config = PERF_COUNT_SW_CPU_CLOCK;
attr.config = event_type->config;
attr.config1 = event_type->config1;
attr.config2 = event_type->config2;
attr.precise_ip = 2;
// 每n个cpu时钟采集一次
attr.sample_period = _interval;
// 采集调用栈
attr.sample_type = PERF_SAMPLE_CALLCHAIN;
// 暂时禁用,用ioctl开启
attr.disabled = 1;
attr.wakeup_events = 1;
// 如果只采集用户空间方法,忽略内核方法
if (_alluser) {
attr.exclude_kernel = 1;
}
if (!_kernel_stack) {
attr.exclude_callchain_kernel = 1;
}
// perf_event_open系统调用,针对目标tid线程
int fd = syscall(__NR_perf_event_open, &attr, tid, _target_cpu, -1, PERF_FLAG_FD_CLOEXEC);
// 为每个fd创建一个内存映射page
void* page = NULL;
if (_kernel_stack || _cstack == CSTACK_DEFAULT || _cstack == CSTACK_LBR) {
page = mmap(NULL, 2 * OS::page_size, PROT_READ | PROT_WRITE, MAP_SHARED, fd, 0);
if (page == MAP_FAILED) {
Log::warn("perf_event mmap failed: %s", strerror(errno));
page = NULL;
}
}
// 将线程对应的fd和page缓存到events数组
_events[tid].reset();
_events[tid]._fd = fd;
_events[tid]._page = (struct perf_event_mmap_page*)page;
// fcntl设置perf_event由每个线程自己处理
// ioctl开启perf_event
struct f_owner_ex ex;
ex.type = F_OWNER_TID;
ex.pid = tid;
int err;
if (fcntl(fd, F_SETFL, O_ASYNC) < 0 || fcntl(fd, F_SETSIG, _signal) < 0 || fcntl(fd, F_SETOWN_EX, &ex) < 0) {
err = errno;
Log::warn("perf_event fcntl failed: %s", strerror(err));
} else if (ioctl(fd, PERF_EVENT_IOC_RESET, 0) < 0 || ioctl(fd, PERF_EVENT_IOC_REFRESH, 1) < 0) {
err = errno;
Log::warn("perf_event ioctl failed: %s", strerror(err));
} else {
return 0;
}
// 失败处理...
if (page != NULL) {
munmap(page, 2 * OS::page_size);
_events[tid]._page = NULL;
}
close(fd);
_events[tid]._fd = 0;
return err;
}
ctimer
ctimer.h:ctimer只有linux支持,相较于perf_events无法采集内核方法调用栈,在perf_events不可用时会被降级采用。
#ifdef __linux__
class CTimer : public CpuEngine {
private:
static int _max_timers;
static int* _timers;
int createForThread(int tid);
void destroyForThread(int tid);
public:
const char* type() {
return "ctimer";
}
Error check(Arguments& args);
Error start(Arguments& args);
void stop();
static bool supported() {
return true;
}
};
#else
ctimer_linux.cpp:ctimer为每个线程创建一个timer(timer_create系统调用)定时发出信号,由signalHandler处理。
Error CTimer::start(Arguments& args) {
if (!setupThreadHook()) {
return Error("Could not set pthread hook");
}
_interval = args._interval ? args._interval : DEFAULT_INTERVAL;
_cstack = args._cstack;
_signal = args._signal == 0 ? OS::getProfilingSignal(0) : args._signal & 0xff;
_count_overrun = true;
int max_timers = OS::getMaxThreadId();
if (max_timers != _max_timers) {
free(_timers);
_timers = (int*)calloc(max_timers, sizeof(int));
_max_timers = max_timers;
}
// 注册信号处理方法
OS::installSignalHandler(_signal, signalHandler);
// Enable pthread hook before traversing currently running threads
enableThreadHook();
// 循环目标进程的每个线程,创建timer
int err = createForAllThreads();
return Error::OK;
}
int CTimer::createForThread(int tid) {
struct sigevent sev;
sev.sigev_value.sival_ptr = NULL;
sev.sigev_signo = _signal;
sev.sigev_notify = SIGEV_THREAD_ID;
((int*)&sev.sigev_notify)[1] = tid;
clockid_t clock = thread_cpu_clock(tid);
int timer;
if (syscall(__NR_timer_create, clock, &sev, &timer) < 0) {
return -1;
}
// Kernel timer ID may start with zero, but we use zero as an empty slot
if (!__sync_bool_compare_and_swap(&_timers[tid], 0, timer + 1)) {
// Lost race
syscall(__NR_timer_delete, timer);
return -1;
}
struct itimerspec ts;
ts.it_interval.tv_sec = (time_t)(_interval / 1000000000);
ts.it_interval.tv_nsec = _interval % 1000000000;
ts.it_value = ts.it_interval;
syscall(__NR_timer_settime, timer, 0, &ts, NULL);
return 0;
}
itimer
itimer同时支持linux和mac。
itimer.cpp:itimer也是注册signalHandler信号处理函数,通过setitimer开启一个timer定时发送信号。itimer只能向当前进程定时发送信号,无法让信号在线程之间均匀分配。
Error ITimer::start(Arguments& args) {
_interval = args._interval ? args._interval : DEFAULT_INTERVAL;
_cstack = args._cstack;
_signal = SIGPROF;
_count_overrun = false;
OS::installSignalHandler(SIGPROF, signalHandler);
time_t sec = _interval / 1000000000;
suseconds_t usec = (_interval % 1000000000) / 1000;
struct itimerval tv = {{sec, usec}, {sec, usec}};
if (setitimer(ITIMER_PROF, &tv, NULL) != 0) {
return Error("ITIMER_PROF is not supported on this system");
}
return Error::OK;
}
wall
wall常用于配合-t参数,诊断应用的启动耗时。
wall定时采样线程,无论线程处于何种状态(运行、睡眠、阻塞)。
wallClock.cpp:start,注册信号处理器函数,pthread_create创建一个线程。
Error WallClock::start(Arguments& args) {
// 默认是WALL_BATCH
if (args._wall >= 0 || strcmp(args._event, EVENT_WALL) == 0) {
_mode = args._nobatch ? WALL_LEGACY : WALL_BATCH;
} else {
_mode = CPU_ONLY;
}
_interval = args._wall >= 0 ? args._wall : args._interval;
if (_interval == 0) {
_interval = _mode == CPU_ONLY ? DEFAULT_INTERVAL : DEFAULT_INTERVAL * 5;
}
_signal = args._signal == 0 ? OS::getProfilingSignal(1)
: ((args._signal >> 8) > 0 ? args._signal >> 8 : args._signal);
// 注册信号处理方法
OS::installSignalHandler(_signal, signalHandler);
_running = true;
// 创建一个线程
if (pthread_create(&_thread, NULL, threadEntry, this) != 0) {
return Error("Unable to create timer thread");
}
return Error::OK;
}
wallClock.cpp:这个线程定时(睡眠)采集n个线程的方法栈
1)线程cpu耗时超过10000ns发送一次信号;
2)如果线程一直未消耗cpu,采集到1000次也要发送一次信号;
void WallClock::timerLoop() {
int self = OS::threadId();
ThreadFilter* thread_filter = Profiler::instance()->threadFilter();
bool thread_filter_enabled = thread_filter->enabled();
Mode mode = _mode;
ThreadSleepMap thread_sleep_state;
// 获取当前jvm进程下的所有线程
ThreadList* thread_list = OS::listThreads();
_thread_cpu_time_buf.reset();
u64 cycle_start_time = OS::nanotime();
while (_running) {
bool enabled = _enabled;
// 每次最多通知THREADS_PER_TICK=8个线程
for (int signaled_threads = 0; signaled_threads < THREADS_PER_TICK && thread_list->hasNext(); ) {
int thread_id = thread_list->next();
if (thread_id == self || thread_id <= 0) {
continue;
}
if (thread_filter_enabled && !thread_filter->accept(thread_id)) {
continue;
}
if (mode == CPU_ONLY) {
if (!enabled || OS::threadState(thread_id) == THREAD_SLEEPING) {
continue;
}
} else if (mode == WALL_BATCH) {
// 默认模式
// 如果本次cpu时间 - 上次cpu时间 <= 10000ns,持续累计到ThreadSleepState中
ThreadSleepState& tss = thread_sleep_state[thread_id];
u64 new_thread_cpu_time = enabled ? OS::threadCpuTime(thread_id) : 0;
if (new_thread_cpu_time != 0 && new_thread_cpu_time - tss.last_cpu_time <= RUNNABLE_THRESHOLD_NS) { // <= 10000ns
if (++tss.counter < MAX_IDLE_BATCH) { // 最多累积1000次
if (tss.counter == 1) tss.start_time = TSC::ticks();
continue;
}
}
// jfr相关忽略
// ...
}
// 累积cpu时间足够,发送信号给目标线程
if (enabled && OS::sendSignalToThread(thread_id, _signal)) {
signaled_threads++;
}
}
// 睡眠一段时间
u64 current_time = OS::nanotime();
if (thread_list->hasNext()) {
// 本轮还未遍历完所有线程
long long sleep_time = cycle_start_time + (u64)_interval * thread_list->index() / thread_list->count() - current_time;
OS::sleep(sleep_time < MIN_INTERVAL ? MIN_INTERVAL : sleep_time);
} else {
// 本轮遍历完所有线程
cycle_start_time += (u64)_interval;
long long sleep_time = cycle_start_time - current_time;
if (sleep_time < MIN_INTERVAL) {
cycle_start_time = current_time + MIN_INTERVAL;
sleep_time = MIN_INTERVAL;
}
OS::sleep(sleep_time);
thread_list->update();
}
_thread_cpu_time_buf.drain(thread_sleep_state);
}
// ... 资源清理
}
java方法
对于-e com.x.y.z.XXXmethod分析java方法,实际上是对原始java方法做了增强,在方法开始插入了Instrument#recordSample调用。
import one.profiler.Instrument;
private void XXXmethod() {
Instrument.recordSample();
// 原始业务逻辑
}
Instrument#recordSample是async-profiler提供的native方法。
public class Instrument {
private Instrument() {
}
public static native void recordSample();
}
在vmEntry.cpp的init阶段,就会注册ClassFileLoadHook钩子。
bool VM::init(JavaVM* vm, bool attach) {
// 多次attach只会执行一次
if (_jvmti != NULL) return true;
_vm = vm;
// 获取jvmti
if (_vm->GetEnv((void**)&_jvmti, JVMTI_VERSION_1_0) != 0) {
return false;
}
// ...
jvmtiEventCallbacks callbacks = {0};
callbacks.VMInit = VMInit;
callbacks.VMDeath = VMDeath;
callbacks.ClassLoad = ClassLoad;
callbacks.ClassPrepare = ClassPrepare;
// 类加载回调
callbacks.ClassFileLoadHook = Instrument::ClassFileLoadHook;
_jvmti->SetEventCallbacks(&callbacks, sizeof(callbacks));
// ...
return true;
}
instrument.cpp:未通过-e java方法开启profiling,这里class增强就不会生效。
void JNICALL Instrument::ClassFileLoadHook(jvmtiEnv* jvmti, JNIEnv* jni,
jclass class_being_redefined, jobject loader,
const char* name, jobject protection_domain,
jint class_data_len, const u8* class_data,
jint* new_class_data_len, u8** new_class_data) {
// 没指定-e java方法,跳过
if (!_running) return;
if (name == NULL || strcmp(name, _target_class) == 0) {
BytecodeRewriter rewriter(class_data, class_data_len, _target_class);
rewriter.rewrite(new_class_data, new_class_data_len);
}
}
instrument.cpp:和普通javaagent一样,如果目标class已经被加载,这里对目标class做retransform。此外check需要使用jni定义Instrument#recordSample native方法。
Error Instrument::start(Arguments& args) {
// 定义Instrument#recordSample native方法
Error error = check(args);
// 通过event解析目标class
setupTargetClassAndMethod(args._event);
_interval = args._interval ? args._interval : 1;
_calls = 0;
_running = true;
// 允许CLASS_FILE_LOAD_HOOK生效
jvmtiEnv* jvmti = VM::jvmti();
jvmti->SetEventNotificationMode(JVMTI_ENABLE, JVMTI_EVENT_CLASS_FILE_LOAD_HOOK, NULL);
// 如果目标class已经被加载,触发retransform,增强class
retransformMatchedClasses(jvmti);
return Error::OK;
}
Error Instrument::check(Arguments& args) {
if (!_instrument_class_loaded) {
JNIEnv* jni = VM::jni();
const JNINativeMethod native_method = {(char*)"recordSample", (char*)"()V", (void*)recordSample};
jclass cls = jni->DefineClass(INSTRUMENT_NAME, NULL, (const jbyte*)INSTRUMENT_CLASS, INCBIN_SIZEOF(INSTRUMENT_CLASS));
// 通过jni注册recordSample实现
if (cls == NULL || jni->RegisterNatives(cls, &native_method, 1) != 0) {
jni->ExceptionDescribe();
return Error("Could not load Instrument class");
}
_instrument_class_loaded = true;
}
return Error::OK;
}
instrument.cpp:native方法实现如下,和前面的cpu引擎不同,这里没有使用任何信号,直接触发recordSample,记录方法调用栈。
void JNICALL Instrument::recordSample(JNIEnv* jni, jobject unused) {
if (!_enabled) return;
if (_interval <= 1 || ((atomicInc(_calls) + 1) % _interval) == 0) {
ExecutionEvent event(TSC::ticks());
Profiler::instance()->recordSample(NULL, _interval, INSTRUMENTED_METHOD, &event);
}
}
3、信号处理
不同引擎,通过不同的方式发出信号,最终由信号处理函数采集线程方法调用栈,这里处理信号的线程就是不同的jvm线程了。
perfEvents_linux.cpp:perf_events
void PerfEvents::signalHandler(int signo, siginfo_t* siginfo, void* ucontext) {
if (siginfo->si_code <= 0) {
// Looks like an external signal; don't treat as a profiling event
return;
}
ExecutionEvent event(TSC::ticks());
u64 counter = readCounter(siginfo, ucontext);
Profiler::instance()->recordSample(ucontext, counter, PERF_SAMPLE, &event);
// 重置perf_event计数器
ioctl(siginfo->si_fd, PERF_EVENT_IOC_RESET, 0);
ioctl(siginfo->si_fd, PERF_EVENT_IOC_REFRESH, 1);
}
cpuEngine.cpp:itimer/ctimer
void CpuEngine::signalHandler(int signo, siginfo_t* siginfo, void* ucontext) {
if (!_enabled) return;
ExecutionEvent event(TSC::ticks());
// Count missed samples when estimating total CPU time
u64 total_cpu_time = _count_overrun ? u64(_interval) * (1 + OS::overrun(siginfo)) : u64(_interval);
Profiler::instance()->recordSample(ucontext, total_cpu_time, EXECUTION_SAMPLE, &event);
}
wallClock.cpp:wall
void WallClock::signalHandler(int signo, siginfo_t* siginfo, void* ucontext) {
if (_mode == WALL_BATCH) { // 默认
WallClockEvent event;
event._start_time = TSC::ticks();
event._thread_state = getThreadState(ucontext);
event._samples = 1;
u64 trace = Profiler::instance()->recordSample(ucontext, _interval, WALL_CLOCK_SAMPLE, &event);
if (event._thread_state == THREAD_SLEEPING && trace != 0) {
_thread_cpu_time_buf.add(trace);
}
} else {
ExecutionEvent event(TSC::ticks());
event._thread_state = _mode == CPU_ONLY ? THREAD_UNKNOWN : getThreadState(ucontext);
Profiler::instance()->recordSample(ucontext, _interval, EXECUTION_SAMPLE, &event);
}
}
不同cpu分析引擎最终都调用Profiler#recordSample,只是传入的类型不同:
1)perf_events:PERF_SAMPLE
2)ctimer/itimer:EXECUTION_SAMPLE
3)wall:WALL_CLOCK_SAMPLE
4)java方法:INSTRUMENTED_METHOD
profiler.cpp:recordSample
1)获取锁lock_index,找到_calltrace_buffer[lock_index]._asgct_frames数组存储调用栈;
2)getNativeTrace获取非java方法栈;
3)getJavaTraceAsync获取java方法栈;
4)释放锁;
注:这是目前默认获取方法栈的逻辑,以后会用--cstack vm代替(CSTACK_VM),见ISSUE#795。
u64 Profiler::recordSample(void* ucontext, u64 counter, EventType event_type, Event* event) {
// 获取当前线程id
int tid = fastThreadId();
// 获取锁
u32 lock_index = getLockIndex(tid);
if (!_locks[lock_index].tryLock() &&
!_locks[lock_index = (lock_index + 1) % 16].tryLock() &&
!_locks[lock_index = (lock_index + 2) % 16].tryLock())
{
return 0;
}
// _calltrace_buffer用于存储调用栈
ASGCT_CallFrame* frames = _calltrace_buffer[lock_index]->_asgct_frames;
int num_frames = 0;
StackContext java_ctx = {0};
// native栈 非java方法栈
if (hasNativeStack(event_type)) {
if (_cstack != CSTACK_NO) {
num_frames += getNativeTrace(ucontext, frames + num_frames, event_type, tid, &java_ctx);
}
}
// java方法栈
if (_cstack == CSTACK_VMX) {
num_frames += StackWalker::walkVM(ucontext, frames + num_frames, _max_stack_depth, VM_EXPERT);
} else if (event_type <= WALL_CLOCK_SAMPLE) {
if (_cstack == CSTACK_VM) {
num_frames += StackWalker::walkVM(ucontext, frames + num_frames, _max_stack_depth, VM_NORMAL);
} else {
// 使用AGCT获取java方法栈
int java_frames = getJavaTraceAsync(ucontext, frames + num_frames, _max_stack_depth, &java_ctx);
num_frames += java_frames;
}
} else if (event_type >= ALLOC_SAMPLE && event_type <= ALLOC_OUTSIDE_TLAB && _alloc_engine == &alloc_tracer) {
// ..
} else if (event_type == MALLOC_SAMPLE) {
// ..
} else {
// -e java方法/lock
int start_depth = event_type == INSTRUMENTED_METHOD ? 1 : 0;
num_frames += getJavaTraceJvmti(jvmti_frames + num_frames, frames + num_frames, start_depth, _max_stack_depth);
}
// 将调用栈frames放入缓存
u32 call_trace_id = _call_trace_storage.put(num_frames, frames, counter);
_locks[lock_index].unlock();
return (u64)tid << 32 | call_trace_id;
}
获取native栈
profiler.cpp:getNativeTrace,对于perf_events走PerfEvents#walk,对于ctimer/itimer走StackWalker#walkFP,获取native栈到callchain,convertNativeTrace将方法地址转方法名存入frames。
int Profiler::getNativeTrace(void* ucontext, ASGCT_CallFrame* frames, EventType event_type, int tid, StackContext* java_ctx) {
const void* callchain[MAX_NATIVE_FRAMES];
int native_frames;
if (event_type == PERF_SAMPLE) {
// perf_events
native_frames = PerfEvents::walk(tid, ucontext, callchain, MAX_NATIVE_FRAMES, java_ctx);
} else if (_cstack >= CSTACK_VM) {
return 0;
} else if (_cstack == CSTACK_DWARF) {
native_frames = StackWalker::walkDwarf(ucontext, callchain, MAX_NATIVE_FRAMES, java_ctx);
} else {
// ctimer/itimer
native_frames = StackWalker::walkFP(ucontext, callchain, MAX_NATIVE_FRAMES, java_ctx);
}
return convertNativeTrace(native_frames, callchain, frames, event_type);
}
// callchain转ASGCT_CallFrame,native栈的method_id是方法名,bci是常量-10
int Profiler::convertNativeTrace(int native_frames, const void** callchain, ASGCT_CallFrame* frames, EventType event_type) {
int depth = 0;
for (int i = 0; i < native_frames; i++) {
const char* current_method_name = findNativeMethod(callchain[i]);
jmethodID current_method = (jmethodID)current_method_name;
frames[depth].bci = BCI_NATIVE_FRAME; // -10
frames[depth].method_id = current_method;
depth++;
}
return depth;
}
perfEvents_linux.cpp:读取perf数据,记录ip(Instruction Pointer)到callchain数组,即为调用栈。
int PerfEvents::walk(int tid, void* ucontext, const void** callchain, int max_depth, StackContext* java_ctx) {
PerfEvent* event = &_events[tid];
int depth = 0;
// mmap第一页是元数据页
struct perf_event_mmap_page* page = event->_page;
if (page != NULL) {
// 数据范围tail-head
u64 tail = page->data_tail;
u64 head = page->data_head;
rmb();
RingBuffer ring(page);
// 遍历perf记录
while (tail < head) {
// perf记录的头 64位
struct perf_event_header* hdr = ring.seek(tail);
// 找到perf采样记录
if (hdr->type == PERF_RECORD_SAMPLE) {
// 遍历n个ip
u64 nr = ring.next();
while (nr-- > 0) {
u64 ip = ring.next();
if (ip < PERF_CONTEXT_MAX) {
const void* iptr = (const void*)ip;
// 到达java方法,退出循环
if (CodeHeap::contains(iptr) || depth >= max_depth) {
java_ctx->pc = iptr;
goto stack_complete;
}
// 记录ip到callchain
callchain[depth++] = iptr;
}
}
break;
}
tail += hdr->size;
}
stack_complete:
page->data_tail = head;
}
event->unlock();
// ...
return depth;
}
关于mmap映射的数据结构:
1)mmap映射大小为(1+2^n)*pagesize,其中n在createForThread设置为0,pagesize一般是4096;
2)mmap映射第一页是元数据页,data_tail到data_head范围内是perf数据;(实际是环形队列)
3)每条perf数据先是一个perf_event_header,type是类型,size是数据大小;在这之后根据sample_type而不同,这里是调用栈PERF_SAMPLE_CALLCHAIN,包含两个属性:nr=调用栈深度、ips[nr]=InstructionPointer数组,每个ip就是指令地址;
stackWalker.cpp:如果采用ctimer/itimer/wall,从当前上下文(ucontext)出发,使用帧指针(fp)回溯调用栈,并将每个函数调用地址(pc)存储到callchain数组中。
int StackWalker::walkFP(void* ucontext, const void** callchain, int max_depth, StackContext* java_ctx) {
StackFrame frame(ucontext);
// 程序计数器 program counter
const void* pc = (const void*)frame.pc();
// 当前帧的帧指针(Frame Pointer)
uintptr_t fp = frame.fp();
// 当前栈指针(Stack Pointer)
uintptr_t sp = frame.sp();
uintptr_t bottom = (uintptr_t)&sp + MAX_WALK_SIZE;
int depth = 0;
// Walk until the bottom of the stack or until the first Java frame
while (depth < max_depth) {
// 如果遇到Java帧,则停止
if (CodeHeap::contains(pc) && !(depth == 0 && frame.unwindAtomicStub(pc))) {
java_ctx->set(pc, sp, fp);
break;
}
// 记录指令地址
callchain[depth++] = pc;
// Check if the next frame is below on the current stack
if (fp < sp || fp >= sp + MAX_FRAME_SIZE || fp >= bottom) {
break;
}
// Frame pointer must be word aligned
if (!aligned(fp)) {
break;
}
// 通过帧指针回溯上一个指令
pc = stripPointer(SafeAccess::load((void**)fp + FRAME_PC_SLOT));
if (inDeadZone(pc)) {
break;
}
sp = fp + (FRAME_PC_SLOT + 1) * sizeof(void*);
fp = *(uintptr_t*)fp;
}
return depth;
}
profiler.cpp:findNativeMethod,对于native栈,在采集栈的同时会将方法地址转方法名。
先根据地址定位库,再在库里通过二分查找定位方法名。
const char* Profiler::findNativeMethod(const void* address) {
CodeCache* lib = findLibraryByAddress(address);
return lib == NULL ? NULL : lib->binarySearch(address);
}
在profiler的start阶段,updateSymbols将解析当前进程用到的符号,都存放在profiler实例的CodeCacheArray中。CodeCacheArray中有n个CodeCache,每个CodeCache代表一个库下的符号。
void Symbols::parseLibraries(CodeCacheArray* array, bool kernel_symbols) {
// 解析内核符号
if (kernel_symbols && !haveKernelSymbols()) {
CodeCache* cc = new CodeCache("[kernel]");
parseKernelSymbols(cc);
if (haveKernelSymbols()) {
cc->sort();
array->add(cc);
} else {
delete cc;
}
}
std::unordered_map<u64, SharedLibrary> libs;
// 获取所有库
collectSharedLibraries(libs, MAX_NATIVE_LIBS - array->count());
for (auto& it : libs) {
u64 inode = it.first;
_parsed_inodes.insert(inode);
SharedLibrary& lib = it.second;
CodeCache* cc = new CodeCache(lib.file, array->count(), false, lib.map_start, lib.map_end);
// 解析库中的方法到CodeCache ...
free(lib.file);
cc->sort();
applyPatch(cc);
array->add(cc);
}
}
内核符号通过/proc/kallsyms获取。
cat /proc/kallsyms | head
<内存地址> <符号类型> <符号名称>
ffffc398f9210000 T _stext
ffffc398f9210000 t __pi__stext
ffffc398f9210000 T __irqentry_text_start
...
其他动态库通过/proc/{pid}/maps获取库后,解析库文件得到方法符号。
cat /proc/目标进程pid/maps
ffff99090000-ffff9a342000 r-xp 00000000 00:25 355243 /usr/lib/jvm/java-17-openjdk-arm64/lib/server/libjvm.so
ffff9a430000-ffff9a461000 rw-p 01390000 00:25 355243 /usr/lib/jvm/java-17-openjdk-arm64/lib/server/libjvm.so
ffff9a748000-ffff9a74a000 r--p 0002e000 00:25 3040794 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
ffff9a74a000-ffff9a74c000 rw-p 00030000 00:25 3040794 /usr/lib/aarch64-linux-gnu/ld-linux-aarch64.so.1
...
获取java栈
-e java方法,直接使用jvmti提供的GetStackTrace获取java方法栈。
int Profiler::getJavaTraceJvmti(jvmtiFrameInfo* jvmti_frames, ASGCT_CallFrame* frames, int start_depth, int max_depth) {
int num_frames = 0;
if (VM::jvmti()->GetStackTrace(NULL, start_depth, max_depth, jvmti_frames, &num_frames) == 0 && num_frames > 0) {
for (int i = 0; i < num_frames; i++) {
jint bci = jvmti_frames[i].location;
frames[i].method_id = jvmti_frames[i].method;
frames[i].bci = bci;
LP64_ONLY(frames[i].padding = 0;)
}
}
return num_frames;
}
除-e java方法(perf_events、ctimer、itimer、wall),都会默认采用AsyncGetCallTrace方式获取java栈。
profiler.cpp:getJavaTraceAsync调用libjvm提供的AsyncGetCallTrace方法,传入ASGCT_CallTrace,AsyncGetCallTrace会将java调用栈存入ASGCT_CallFrame数组。
int Profiler::getJavaTraceAsync(void* ucontext, ASGCT_CallFrame* frames, int max_depth, StackContext* java_ctx) {
JNIEnv* jni = VM::jni();
if (jni == NULL) {
return 0;
}
JitWriteProtection jit(false);
ASGCT_CallTrace trace = {jni, 0, frames};
// 调用libjvm.so AsyncGetCallTrace
VM::_asyncGetCallTrace(&trace, max_depth, ucontext);
if (trace.num_frames > 0) {
return trace.num_frames;
}
// ... 异常处理
}
hotspot/share/prims/forte.cpp:jvm提供AsyncGetCallTrace方法入口。
AsyncGetCallTrace有众多约束,比如只能线程获取自己的调用栈、should_post_class_load要求必须通过jvmti挂载ClassLoad钩子、当前不在gc等等。
JNIEXPORT
void AsyncGetCallTrace(ASGCT_CallTrace *trace, jint depth, void* ucontext) {
JavaThread* thread;
if (trace->env_id == NULL ||
(thread = JavaThread::thread_from_jni_environment(trace->env_id)) == NULL ||
thread->is_exiting()) {
trace->num_frames = ticks_thread_exit; // -8
return;
}
if (thread->in_deopt_handler()) {
trace->num_frames = ticks_deopt; // -9
return;
}
assert(JavaThread::current() == thread,
"AsyncGetCallTrace must be called by the current interrupted thread");
if (!JvmtiExport::should_post_class_load()) {
trace->num_frames = ticks_no_class_load; // -1
return;
}
if (Universe::heap()->is_gc_active()) {
trace->num_frames = ticks_GC_active; // -2
return;
}
switch (thread->thread_state()) {
// ...
case _thread_in_Java:
case _thread_in_Java_trans:
{
frame fr;
// 先找到栈顶栈帧fr
if (!thread->pd_get_top_frame_for_signal_handler(&fr, ucontext, true)) {
trace->num_frames = ticks_unknown_Java; // -5 unknown frame
} else {
trace->num_frames = ticks_not_walkable_Java; // -6, non walkable frame by default
// 从栈顶向下遍历,填充trace
forte_fill_call_trace_given_top(thread, trace, depth, fr);
}
}
break;
default:
trace->num_frames = ticks_unknown_state; // -7
break;
}
}
hotspot/os_cpu/linux_aarch64/thread_linux_aarch64.cpp:pd_get_top_frame获取栈顶栈帧,根据cpu架构不同,从ucontext中获取pc(程序计数器)、sp(栈指针)、fp(帧指针)。
bool JavaThread::pd_get_top_frame(frame* fr_addr, void* ucontext, bool isInJava) {
// ...
ucontext_t* uc = (ucontext_t*) ucontext;
intptr_t* ret_fp;
intptr_t* ret_sp;
address addr = os::fetch_frame_from_context(uc, &ret_sp, &ret_fp);
frame ret_frame(ret_sp, ret_fp, addr);
*fr_addr = ret_frame;
return true;
//...
}
address os::fetch_frame_from_context(const void* ucVoid,
intptr_t** ret_sp, intptr_t** ret_fp) {
const ucontext_t* uc = (const ucontext_t*)ucVoid;
address epc = os::Posix::ucontext_get_pc(uc);
*ret_sp = os::Linux::ucontext_get_sp(uc);
*ret_fp = os::Linux::ucontext_get_fp(uc);
return epc;
}
typedef struct ucontext_t
{
unsigned long __ctx(uc_flags);
struct ucontext_t *uc_link;
stack_t uc_stack;
sigset_t uc_sigmask;
mcontext_t uc_mcontext;
} ucontext_t;
typedef struct
{
unsigned long long int __ctx(fault_address);
unsigned long long int __ctx(regs)[31]; // fp在29下标寄存器
unsigned long long int __ctx(sp); // sp
unsigned long long int __ctx(pc); // pc
unsigned long long int __ctx(pstate);
unsigned char __reserved[4096] __attribute__ ((__aligned__ (16)));
} mcontext_t;
hotspot/share/prims/forte.cpp:forte_fill_call_trace_given_top遍历java方法栈,将jmethodId填充到trace数组里。
static void forte_fill_call_trace_given_top(JavaThread* thd,
ASGCT_CallTrace* trace,
int depth,
frame top_frame) {
NoHandleMark nhm;
frame initial_Java_frame;
Method* method;
int bci = -1;
int count;
count = 0;
// 从第一个栈帧开始,找第一个java方法栈帧initial_Java_frame和对应方法method
find_initial_Java_frame(thd, &top_frame, &initial_Java_frame, &method, &bci);
// ...
vframeStreamForte st(thd, initial_Java_frame, false);
// 遍历java方法栈
for (; !st.at_end() && count < depth; st.forte_next(), count++) {
bci = st.bci();
method = st.method();
// ...
// 把jMethodId存入trace
trace->frames[count].method_id = method->find_jmethod_id_or_null();
// ...
}
trace->num_frames = count;
return;
}
4、stop
profiling期间将java方法jmethodId缓存到_call_trace_storage中,在stop时需要转换为方法名称,并以指定格式输出。
jattach再次挂载libasyncProfiler,这次执行指令是stop,stop执行完成后会执行dump。
Error Profiler::runInternal(Arguments& args, Writer& out) {
switch (args._action) {
case ACTION_START:
case ACTION_RESUME: {
Error error = start(args, args._action == ACTION_START);
break;
}
case ACTION_STOP: {
Error error = stop();
// Fall through
}
case ACTION_DUMP: {
Error error = dump(out, args);
break;
}
// ...
default:
break;
}
return Error::OK;
}
停止引擎
profiler.cpp:stop停止引擎
Error Profiler::stop(bool restart) {
MutexLocker ml(_state_lock);
if (_state != RUNNING) {
return Error("Profiler is not active");
}
if (_event_mask & EM_WALL) wall_clock.stop();
if (_event_mask & EM_LOCK) lock_tracer.stop();
if (_event_mask & EM_ALLOC) _alloc_engine->stop();
if (_event_mask & EM_NATIVEMEM) malloc_tracer.stop();
_engine->stop();
// ...
_state = IDLE;
return Error::OK;
}
perfEvents_linux.cpp:perf_events引擎关闭每个线程的perf,释放fd和mmap。
void PerfEvents::destroyForThread(int tid) {
if (tid >= _max_events) {
return;
}
PerfEvent* event = &_events[tid];
int fd = event->_fd;
if (fd > 0 && __sync_bool_compare_and_swap(&event->_fd, fd, 0)) {
ioctl(fd, PERF_EVENT_IOC_DISABLE, 0);
close(fd);
}
if (event->_page != NULL) {
event->lock();
munmap(event->_page, 2 * OS::page_size);
event->_page = NULL;
event->unlock();
}
}
ctimer_linux.cpp:ctimer关闭每个线程的timer。
同理itimer和wall也类似,不再赘述。
void CTimer::destroyForThread(int tid) {
int timer = _timers[tid];
if (timer != 0 && __sync_bool_compare_and_swap(&_timers[tid], timer--, 0)) {
syscall(__NR_timer_delete, timer);
}
}
instrument.cpp:-e java方法,标记_running=false后重新retransform目标class即可恢复原class定义。
void Instrument::stop() {
_running = false;
jvmtiEnv* jvmti = VM::jvmti();
retransformMatchedClasses(jvmti); // undo transformation
jvmti->SetEventNotificationMode(JVMTI_DISABLE, JVMTI_EVENT_CLASS_FILE_LOAD_HOOK, NULL);
}
dump
profiler.cpp:dump可以支持不同的格式,常用的就是火焰图flamegraph。
注:asprof支持-o设置输出格式,也支持根据-f文件名后缀推断,如-f x.html是火焰图、x.jfr是jfr格式。
Error Profiler::dump(Writer& out, Arguments& args) {
MutexLocker ml(_state_lock);
switch (args._output) {
case OUTPUT_COLLAPSED:
// -o collapsed
dumpCollapsed(out, args);
break;
case OUTPUT_FLAMEGRAPH:
// -o flamegraph
dumpFlameGraph(out, args, false);
break;
case OUTPUT_TREE:
// -o tree
dumpFlameGraph(out, args, true);
break;
case OUTPUT_TEXT:
// -o text
dumpText(out, args);
break;
case OUTPUT_JFR:
// -o jfr
if (_state == RUNNING) {
lockAll();
_jfr.flush();
unlockAll();
}
break;
default:
return Error("No output format selected");
}
return Error::OK;
}
无论哪种格式,最终都需要将方法地址转换为方法名,以默认格式text为例。
profiler.cpp:dumpText。将_call_trace_storage缓存的方法栈提取到集合中,先打印所有调用栈,再打印栈顶方法,都按照样本数量降序。
void Profiler::dumpText(Writer& out, Arguments& args) {
FrameName fn(args, args._style | STYLE_DOTTED, _epoch, _thread_names_lock, _thread_names);
char buf[1024] = {0};
// 将_call_trace_storage缓存的方法栈,提取到samples集合
std::vector<CallTraceSample> samples;
u64 total_counter = 0;
{
std::map<u64, CallTraceSample> map;
_call_trace_storage.collectSamples(map);
samples.reserve(map.size());
for (std::map<u64, CallTraceSample>::const_iterator it = map.begin(); it != map.end(); ++it) {
CallTrace* trace = it->second.trace; // 调用栈
u64 counter = it->second.counter; // 样本数量
if (trace == NULL || counter == 0) continue;
total_counter += counter;
if (trace->num_frames == 0 || excludeTrace(&fn, trace)) continue;
samples.push_back(it->second);
}
}
// Print summary
snprintf(buf, sizeof(buf) - 1,
"--- Execution profile ---\n"
"Total samples : %lld\n",
_total_samples);
out << buf;
double cpercent = 100.0 / total_counter;
const char* units_str = activeEngine()->units();
// 打印调用栈,按照样本数量降序
if (args._dump_traces > 0) {
std::sort(samples.begin(), samples.end(), [](const CallTraceSample& a, const CallTraceSample& b) {
return a.counter > b.counter;
});
int max_count = args._dump_traces;
for (std::vector<CallTraceSample>::const_iterator it = samples.begin(); it != samples.end() && --max_count >= 0; ++it) {
snprintf(buf, sizeof(buf) - 1, "--- %lld %s (%.2f%%), %lld sample%s\n",
it->counter, units_str, it->counter * cpercent,
it->samples, it->samples == 1 ? "" : "s");
out << buf;
CallTrace* trace = it->trace;
for (int j = 0; j < trace->num_frames; j++) {
const char* frame_name = fn.name(trace->frames[j]);
snprintf(buf, sizeof(buf) - 1, " [%2d] %s\n", j, frame_name);
out << buf;
}
out << "\n";
}
}
// 打印栈顶方法,按照样本数量降序
if (args._dump_flat > 0) {
std::map<std::string, MethodSample> histogram;
for (std::vector<CallTraceSample>::const_iterator it = samples.begin(); it != samples.end(); ++it) {
const char* frame_name = fn.name(it->trace->frames[0]);
histogram[frame_name].add(it->samples, it->counter);
}
std::vector<NamedMethodSample> methods(histogram.begin(), histogram.end());
std::sort(methods.begin(), methods.end(), sortByCounter);
snprintf(buf, sizeof(buf) - 1, "%12s percent samples top\n"
" ---------- ------- ------- ---\n", units_str);
out << buf;
int max_count = args._dump_flat;
for (std::vector<NamedMethodSample>::const_iterator it = methods.begin(); it != methods.end() && --max_count >= 0; ++it) {
snprintf(buf, sizeof(buf) - 1, "%12lld %6.2f%% %7lld %s\n",
it->second.counter, it->second.counter * cpercent, it->second.samples, it->first.c_str());
out << buf;
}
}
}
无论哪种格式,都需要用到FrameName来处理方法地址到方法名的转换。
frameName.cpp:
native方法,因为method_id存储的就是方法名,这里可以直接输出(decodeNativeSymbol只是能在方法名前加lib库名,忽略);
java方法,有一层缓存;
const char* FrameName::name(ASGCT_CallFrame& frame, bool for_matching) {
switch (frame.bci) {
case BCI_NATIVE_FRAME:
// -10 native方法
return decodeNativeSymbol((const char*)frame.method_id);
// ...
default: {
// java方法 type_suffix忽略
const char* type_suffix = typeSuffix(FrameType::decode(frame.bci));
// 有一层缓存
JMethodCache::iterator it = _cache.lower_bound(frame.method_id);
if (it != _cache.end() && it->first == frame.method_id) {
it->second[0] = _cache_epoch;
const char* name = it->second.c_str() + 1;
if (type_suffix != NULL) {
return _str.assign(name).append(type_suffix).c_str();
}
return name;
}
// 缓存未命中,method_id转名称
javaMethodName(frame.method_id);
_cache.insert(it, JMethodCache::value_type(frame.method_id, std::string(1, _cache_epoch) + _str));
if (type_suffix != NULL) {
_str += type_suffix;
}
return _str.c_str();
}
}
}
frameName.cpp:通过jvmti根据jmethodID获取类名和方法名。
void FrameName::javaMethodName(jmethodID method) {
jclass method_class = NULL;
char* class_name = NULL;
char* method_name = NULL;
char* method_sig = NULL;
jvmtiEnv* jvmti = VM::jvmti();
jvmtiError err;
if ((err = jvmti->GetMethodName(method, &method_name, &method_sig, NULL)) == 0 &&
(err = jvmti->GetMethodDeclaringClass(method, &method_class)) == 0 &&
(err = jvmti->GetClassSignature(method_class, &class_name, NULL)) == 0) {
// Trim 'L' and ';' off the class descriptor like 'Ljava/lang/Object;'
javaClassName(class_name + 1, strlen(class_name) - 2, _style);
_str.append(".").append(method_name);
if (_style & STYLE_SIGNATURES) {
if (_style & STYLE_NO_SEMICOLON) {
for (char* s = method_sig; *s; s++) {
if (*s == ';') *s = '|';
}
}
_str.append(method_sig);
}
} else {
char buf[32];
snprintf(buf, sizeof(buf), "[jvmtiError %d]", err);
_str.assign(buf);
}
if (method_class) {
_jni->DeleteLocalRef(method_class);
}
jvmti->Deallocate((unsigned char*)class_name);
jvmti->Deallocate((unsigned char*)method_sig);
jvmti->Deallocate((unsigned char*)method_name);
}
总结
本章简单分析了async-profiler在cpu分析方面的实现。
async-profiler涉及三个进程:
1)asprof进程:asprof命令行工具,负责解析用户指令,发起agent挂载;
2)jattach进程:asprof每次挂载agent通过fork创建jattach进程;
3)jvm进程:被分析的进程,运行agent逻辑;
jattach挂载agent的流程如下:
重点逻辑在于libasyncProfiler.so这个agent实现。
agent需要挂载两次,第一次start开始分析,第二次stop输出分析结果。
首先执行初始化,libasyncProfiler多次挂载只会初始化一次:
1)通过jvm构造传入的JavaVM拿到jvmti;
2)通过dlopen找到jvm动态库libjvm.so,拿到AsyncGetCallTrace方法,用于后面处理java栈;
3)updateSymbols初始化非内核符号,用于后面处理native栈;
4)使用jvmti挂载多个钩子:ClassLoad钩子为了能让AsyncGetCallTrace被正常调用;ClassPrepare钩子为了更新jmethodId;
5)loadAllMethodIDs触发jmethodId分配;
接下来根据用户选择的cpu引擎走不同逻辑:
1)默认情况下优先选择perf_events引擎,降级使用ctimer,兜底使用wall;
2)可以通过-e指定引擎,比如itimer、java方法级profiling;
perf_events:
1)为目标进程的每个线程开启perf_event_open发送信号;
2)受制于权限约束,非容器环境下需要配置sysctl kernel.perf_event_paranoid=1;(默认2)容器环境下需要通过seccomp或其他方式;sysctl kernel.kptr_restrict=0才能获取内核方法符号;
ctimer:
1)为每个线程创建一个timer(timer_create系统调用)定时发出信号;
2)相较于perf_events,无法采集内核方法;
itimer:
1)通过setitimer开启一个timer定时发送信号;
2)相较于ctimer,itimer只能向当前进程定时发送信号,无法让信号在线程之间均匀分配;
wall:
1)pthread_create创建一个采样线程,定时扫描n个线程并发送信号,无论线程处于何种状态(运行、睡眠、阻塞);
java方法:
1)基于字节码增强,在目标方法执行前插入one.profiler.Instrument#recordSample;
2)Instrument#recordSample不会发送信号,直接采集java栈;
上面不同引擎采用不同方式发送信号,是agent挂载的主要任务。
profiling期间持续产生信号,记录调用栈信息。
采集native栈:
1)perf_events:直接读perf数据拿到指令地址;
2)ctimer/itimer/wall:每个线程收到信号,使用帧指针(fp)回溯,获取指令地址(pc);
3)采集native指令地址后,基于初始化的符号信息,将指令地址转方法名缓存;
采集java栈:
1)java方法引擎,使用jvmti的GetStackTrace方法;
2)其他引擎,默认会采用libjvm.so提供的AsyncGetCallTrace非标方法获取;
3)java栈是一组jmethodId,jmethodId实际上是Method指针;
profiling完成再次挂载agent,执行stop指令:
1)停止不同的cpu引擎,比如perf_events关闭每个线程的perf;
2)dump根据-o或-f指定格式输出分析结果,默认格式是text,native栈在profiling期间就拿到了方法名,java栈需要用jvmti的GetMethodName等方法传入jmethodId获取方法名;