Android 平台下的 Method Trace 实现及应用主要介绍Android 虚拟机底层对Method Trac

Android 中的MethodTrace

Android系统在Java层提供了两种开发者可直接使用的 Method Trace 的API，一是 android.os.Debug类中的 startMethodTracing相关API，第二个是android.os.Trace 类中的beginSection相关AP。这两者的区别是 Debug类只能监控 Java函数调用，而Trace类底层是使用 atrace 实现，其追踪的函数会包含了应用及系统的Java 和Native函数，并且底层基于ftrace还可以追踪cpu的详细活动信息。

本文主要分析Java层的Method Trace，因此研究重点在Debug类的 startMethodTracing的底层实现。另外在拓展部分介绍的 Java堆栈采样生成火焰图的方式已开源到GitHub上供参考 BlcokCanary。

评论区更新:针对 Native层使用StackVisitor的方案也写了个Demo验证，方案主要是参考字节跳动的分享的Sliver实现， demo 见 sliver

系统 Method Trace 实现解析

直接进入主题，本小结将Method Trace 分为 启动Trace、Trace进行中、结束Trace三个阶段进行分析。

启动Trace

当Java层调用Debug.startMethodTracing 后，其最终实现会进入Native层，在native层的实际调用的API为 art/runtime/trace.cc 的 Trace::Start() 函数，该函数参数包含以下入参: 采集的数据需要写入的File指针，写入的buffer大小，flags信息、以及TraceOutputMode(表示数据写入模式)、TraceMode(trace实现方式)。

该函数核心的逻辑在于需要根据TraceMode的值采取不同的 函数调用监听的实现方案。 TraceMode的定义而分为两种，分别是 kMethodTracing 以及kSampling，其分别对应在Java层调用Debug.startMethodTracing 以及 Debug.startMethodTracingSamping 。

void Trace::Start(std::unique_ptr<File>&& trace_file_in,
                  size_t buffer_size,
                  int flags,
                  TraceOutputMode output_mode,
                  TraceMode trace_mode,
                  int interval_us)
  
  //.. 省略
  //create Trace  
  {
    // Required since EnableMethodTracing calls ConfigureStubs which visits class linker classes.
    gc::ScopedGCCriticalSection gcs(self,
                                    gc::kGcCauseInstrumentation,
                                    gc::kCollectorTypeInstrumentation);
    ScopedSuspendAll ssa(__FUNCTION__);
    MutexLock mu(self, *Locks::trace_lock_);
    if (the_trace_ != nullptr) {
      //已经存在trace 实例，忽略本次调用 
      LOG(ERROR) << "Trace already in progress, ignoring this request";
        
    } else {
      enable_stats = (flags & kTraceCountAllocs) != 0;
      the_trace_ = new Trace(trace_file.release(), buffer_size, flags, output_mode, trace_mode);
      if (trace_mode == TraceMode::kSampling) {
        CHECK_PTHREAD_CALL(pthread_create, (&sampling_pthread_, nullptr, &RunSamplingThread,
                                            reinterpret_cast<void*>(interval_us)),
                                            "Sampling profiler thread");
        the_trace_->interval_us_ = interval_us;
      } else {
        runtime->GetInstrumentation()->AddListener(
            the_trace_,
            instrumentation::Instrumentation::kMethodEntered |
                instrumentation::Instrumentation::kMethodExited |
                instrumentation::Instrumentation::kMethodUnwind);
        // TODO: In full-PIC mode, we don't need to fully deopt.
        // TODO: We can only use trampoline entrypoints if we are java-debuggable since in that case
        // we know that inlining and other problematic optimizations are disabled. We might just
        // want to use the trampolines anyway since it is faster. It makes the story with disabling
        // jit-gc more complex though.
        runtime->GetInstrumentation()->EnableMethodTracing(
            kTracerInstrumentationKey, /*needs_interpreter=*/!runtime->IsJavaDebuggable());
      }
    }
  }}

if (the_trace_ != nullptr) {
      //已经存在trace 实例，忽略本次调用 
      LOG(ERROR) << "Trace already in progress, ignoring this request";
        
    }

在实现中，可以看到,在创建Trace实例时，会判断当前是否已经存在trace实例(the_trace 变量)，如果已存在，则会忽略这次调用，如果不存在，才会调用Trace构造函数函数，创建出trace实例，因此在一次Trace流程未结束前，多次调用StartTrace是无效的，这保证了同一时间只能有一个Trace在工作。

当调用 new Trace()创建实例时，在内部根据不同的 TraceMode，选择不同函数调用监听方案。这里根据TraceMode 分为了采样类型(TraceMode::kSampling) 以及插桩类型(TraceMode::kMethodTracing)的方式。

Trace 过程

采样类型 Trace

对于采样类型的实现，首先会通过 pthread_create 创建一个采样工作线程，这个线程执行的是 Trace::RunSamplingThread(void* arg) 函数,在该函数内部会定期通过 Runtime对象的GetThreadList获取所有的线程，

之后遍历每个线程执行 GetSample函数获取每个线程当前的调用栈。

void* Trace::RunSamplingThread(void* arg) {
  Runtime* runtime = Runtime::Current();
  intptr_t interval_us = reinterpret_cast<intptr_t>(arg);
  
  while (true) {
    //..省略
    {
     //..
      runtime->GetThreadList()->ForEach(GetSample, the_trace);
    }
  }

  runtime->DetachCurrentThread();
  return nullptr;
}

继续追踪GetSample函数的具体实现, 在该函数内部会通过 StackVisitor::WalkStack 进行栈回溯，这个类是进行栈获取的核心实现，获取调用栈信息后将其保存在 stack_trace中，接着会调用 the_trace->CompareAndUpdateStackTrace(thread, stack_trace); 进行数据比较处理

static void GetSample(Thread* thread, void* arg) REQUIRES_SHARED(Locks::mutator_lock_) {
  std::vector<ArtMethod*>* const stack_trace = Trace::AllocStackTrace();
  StackVisitor::WalkStack(
      [&](const art::StackVisitor* stack_visitor) REQUIRES_SHARED(Locks::mutator_lock_) {
        ArtMethod* m = stack_visitor->GetMethod();
        // Ignore runtime frames (in particular callee save).
        if (!m->IsRuntimeMethod()) {
          stack_trace->push_back(m);
        }
        return true;
      },
      thread,
      /* context= */ nullptr,
      art::StackVisitor::StackWalkKind::kIncludeInlinedFrames);
  Trace* the_trace = reinterpret_cast<Trace*>(arg);
    
  //更新对应线程的 StackSapmle  
  the_trace->CompareAndUpdateStackTrace(thread, stack_trace);
}

分析 CompareAndUpdateStackTrace函数，在这个函数中，主要的工作是比较前后两次获取的stack_trace，来判断函数的状态变更。

void Trace::CompareAndUpdateStackTrace(Thread* thread,
                                       std::vector<ArtMethod*>* stack_trace) {
  CHECK_EQ(pthread_self(), sampling_pthread_);
  std::vector<ArtMethod*>* old_stack_trace = thread->GetStackTraceSample();
  // Update the thread's stack trace sample.
  thread->SetStackTraceSample(stack_trace);
  // Read timer clocks to use for all events in this trace.
  uint32_t thread_clock_diff = 0;
  uint32_t wall_clock_diff = 0;
  ReadClocks(thread, &thread_clock_diff, &wall_clock_diff);
  if (old_stack_trace == nullptr) {
    // If there's no previous stack trace sample for this thread, log an entry event for all
    // methods in the trace.
    for (auto rit = stack_trace->rbegin(); rit != stack_trace->rend(); ++rit) {
      LogMethodTraceEvent(thread, *rit, instrumentation::Instrumentation::kMethodEntered,
                          thread_clock_diff, wall_clock_diff);
    }
  } else {
    // If there's a previous stack trace for this thread, diff the traces and emit entry and exit
    // events accordingly.
    auto old_rit = old_stack_trace->rbegin();
    auto rit = stack_trace->rbegin();
    // Iterate bottom-up over both traces until there's a difference between them.
    while (old_rit != old_stack_trace->rend() && rit != stack_trace->rend() && *old_rit == *rit) {
      old_rit++;
      rit++;
    }
    // Iterate top-down over the old trace until the point where they differ, emitting exit events.
    for (auto old_it = old_stack_trace->begin(); old_it != old_rit.base(); ++old_it) {
      LogMethodTraceEvent(thread, *old_it, instrumentation::Instrumentation::kMethodExited,
                          thread_clock_diff, wall_clock_diff);
    }
    // Iterate bottom-up over the new trace from the point where they differ, emitting entry events.
    for (; rit != stack_trace->rend(); ++rit) {
      LogMethodTraceEvent(thread, *rit, instrumentation::Instrumentation::kMethodEntered,
                          thread_clock_diff, wall_clock_diff);
    }
    FreeStackTrace(old_stack_trace);
  }
}

其主要的比较逻辑如下

首先通过 thread->GetStackSampler获取上一次记录的 stackTraceSample, 并通过thread->SetStackTraceSample 记录该线程当前最新的stackTraceSample ；
如果thread->GetStackSampler 获取上次的stackSample 为空，则直接遍历最新的stackTrace，并调用 LogMethodTraceEvent 记录函数变化事件，记录的事件类型全部为kMethodEntered ,表示这些函数已入栈
如果thread->GetStackSampler 不为空，则通过比较最新的StackTrace 和上次的StackTrace，来记录函数的变更，在堆栈采集中，除了第一次以外，都是这种情况，其比较的主要逻辑如下

-   从栈底遍历新旧StackTrace，分别找到栈帧不一致的开始点 记为 old_rit, rit
-   针对旧的StackTrace ，从栈顶到old_rit 记录这部分函数的 EXIT 事件
-   对于新的StackTrace ，从 rit 到栈顶 记录这部分函数的 Entered事件

举个例子，比如上次的栈是 A B C D E F G (顺序为栈底到栈顶)，最新的是 A B C H B C D, 通过对这两份数据的对比，可以得出本次采样期间内 D E F G 函数被POP了，并且又 USH了 H B C D函数。在多次采样间隔中，只要某个个函数一直没有被 POP，就可以认为这些函数（A B C 函数）耗时又增加了采样间隔的ms数，举例来说假设 B C 在第5次采样才被POP，采样周期为50ms，则可以认为 B C 函数耗时在 150ms ~250ms之间，采样方式能够获取的函数耗时精度取决于采样的周期，采样周期越短越精准。

LogMethodTraceEvent 函数是执行函数变化信息的写入操作，对于每个函数变化事件其所要记录的内容其所占的字节大小是固定的，在TraceClockSource 为kDual的情况下(表示同时使用WallTime 和 cpuTime记录)，每个函数事件通常会包含以下信息：

2个字节的线程ID + 4个字节的 EncodeTraceMethodAndAction + 4个字节的 thread_cpu_diff + 4个字节的 wall_clock_diff , 也就是说每个函数事件记录占用14个字节，每个信息都是以字节小端格式写入。

void Trace::LogMethodTraceEvent(Thread* thread, ArtMethod* method,
                                instrumentation::Instrumentation::InstrumentationEvent event,
                                uint32_t thread_clock_diff, uint32_t wall_clock_diff) {
  // Ensure we always use the non-obsolete version of the method so that entry/exit events have the
  // same pointer value.
  method = method->GetNonObsoleteMethod();

  // Advance cur_offset_ atomically.
  int32_t new_offset;
  int32_t old_offset = 0;

  // In the non-streaming case, we do a busy loop here trying to get
  // an offset to write our record and advance cur_offset_ for the
  // next use.
  if (trace_output_mode_ != TraceOutputMode::kStreaming) {
    old_offset = cur_offset_.load(std::memory_order_relaxed);  // Speculative read
    do {
      new_offset = old_offset + GetRecordSize(clock_source_);
      if (static_cast<size_t>(new_offset) > buffer_size_) {
        overflow_ = true;
        return;
      }
    } while (!cur_offset_.compare_exchange_weak(old_offset, new_offset, std::memory_order_relaxed));
  }

  TraceAction action = kTraceMethodEnter;
  switch (event) {
    case instrumentation::Instrumentation::kMethodEntered:
      action = kTraceMethodEnter;
      break;
    case instrumentation::Instrumentation::kMethodExited:
      action = kTraceMethodExit;
      break;
    case instrumentation::Instrumentation::kMethodUnwind:
      action = kTraceUnroll;
      break;
    default:
      UNIMPLEMENTED(FATAL) << "Unexpected event: " << event;
  }

  uint32_t method_value = EncodeTraceMethodAndAction(method, action);

  // Write data into the tracing buffer (if not streaming) or into a
  // small buffer on the stack (if streaming) which we'll put into the
  // tracing buffer below.
  //
  // These writes to the tracing buffer are synchronised with the
  // future reads that (only) occur under FinishTracing(). The callers
  // of FinishTracing() acquire locks and (implicitly) synchronise
  // the buffer memory.
  uint8_t* ptr;
  static constexpr size_t kPacketSize = 14U;  // The maximum size of data in a packet.
  uint8_t stack_buf[kPacketSize];             // Space to store a packet when in streaming mode.
  if (trace_output_mode_ == TraceOutputMode::kStreaming) {
    ptr = stack_buf;
  } else {
    ptr = buf_.get() + old_offset;
  }

  Append2LE(ptr, thread->GetTid());
  Append4LE(ptr + 2, method_value);
  ptr += 6;

  if (UseThreadCpuClock()) {
    Append4LE(ptr, thread_clock_diff);
    ptr += 4;
  }
  if (UseWallClock()) {
    Append4LE(ptr, wall_clock_diff);
  }
  static_assert(kPacketSize == 2 + 4 + 4 + 4, "Packet size incorrect.");

   WriteToBuf(stack_buf, sizeof(stack_buf));
  }
}

插桩类型 Trace

对于TraceMode::kMethodTracing 模式，它使用了系统 art/runtime/instrumentation 的提供的 MethodTracing 能力，当调用instrumentation->EnableMethodTracing函数时 ，其内部会通过调用 Runtime->GetClassLinker->VisitClasses(&visitor) 遍历所有的Class 的函数，并为每个函数安装 Stub ，来监听函数的进入和退出。

void Instrumentation::UpdateStubs() {
   //...
  UpdateInstrumentationLevel(requested_level);
  //遍历所有Class，visitor实现为 InstallStaubsClassVisitor 从而为所有类执行安装stub操作
  if (requested_level > InstrumentationLevel::kInstrumentNothing) {
    InstallStubsClassVisitor visitor(this);
    runtime->GetClassLinker()->VisitClasses(&visitor);
    //...
  } else {
    InstallStubsClassVisitor visitor(this);
    runtime->GetClassLinker()->VisitClasses(&visitor);
    MaybeRestoreInstrumentationStack();
  }
}

InstallStubsClassVisitor 最终会调用 InstallSubtsForMethod 对每个函数实现函数进入、退出的Hook

void Instrumentation::InstallStubsForMethod(ArtMethod* method) {
  if (!method->IsInvokable() || method->IsProxyMethod()) {
    return;
  }

  if (IsProxyInit(method)) {
    return;
  }

  if (InterpretOnly(method)) {
    UpdateEntryPoints(method, GetQuickToInterpreterBridge());
    return;
  }

  if (EntryExitStubsInstalled()) {
    // Install the instrumentation entry point if needed.
    if (CodeNeedsEntryExitStub(method->GetEntryPointFromQuickCompiledCode(), method)) {
      UpdateEntryPoints(method, GetQuickInstrumentationEntryPoint());
    }
    return;
  }

  // We're being asked to restore the entrypoints after instrumentation.
  CHECK_EQ(instrumentation_level_, InstrumentationLevel::kInstrumentNothing);
  // We need to have the resolution stub still if the class is not initialized.
  if (NeedsClinitCheckBeforeCall(method) && !method->GetDeclaringClass()->IsVisiblyInitialized()) {
    UpdateEntryPoints(method, GetQuickResolutionStub());
    return;
  }
  UpdateEntryPoints(method, GetOptimizedCodeFor(method));
}

InstallStubs的具体实现稍微有些复杂，其具体的实现在本文不做详细分析。这里举个例子，对于Quick编译的代码，其预先通过 setMethodEntryHook及 setMethodExitHook 已经预留了钩子。

通过InstallSubtsForMethod 会为每个函数安装好对应的进入函数 及退出函数 的钩子，在钩子的实现中，会分别调用 instrumenttation的 MethodEnterEvent 及 MethodExitEvent , 最终会遍历 instrumentation注册的所有监听者，通过事件的方式告知函数进入和退出的发生。

void Instrumentation::MethodEnterEventImpl(Thread* thread, ArtMethod* method) const {
  if (HasMethodEntryListeners()) {
    for (InstrumentationListener* listener : method_entry_listeners_) {
      if (listener != nullptr) {
        listener->MethodEntered(thread, method);
      }
    }
  }
}

因此, 在Trace中，通过 Instrumentation 提供的 AddListener函数注册Listener , 就可以实现函数进入和退出的监控

{   //当不是采样类型追踪时，执行的逻辑
    
        runtime->GetInstrumentation()->AddListener(
            the_trace_,
            instrumentation::Instrumentation::kMethodEntered |
                instrumentation::Instrumentation::kMethodExited |
                instrumentation::Instrumentation::kMethodUnwind);
        // TODO: In full-PIC mode, we don't need to fully deopt.
        // TODO: We can only use trampoline entrypoints if we are java-debuggable since in that case
        // we know that inlining and other problematic optimizations are disabled. We might just
        // want to use the trampolines anyway since it is faster. It makes the story with disabling
        // jit-gc more complex though.
        runtime->GetInstrumentation()->EnableMethodTracing(
            kTracerInstrumentationKey, /*needs_interpreter=*/!runtime->IsJavaDebuggable());
 }

在监听到函数的Enter和Exit之后执行的逻辑就和采样类型一样，都是调用LogMethodTraceEvent来记录信息

void Trace::MethodEntered(Thread* thread, ArtMethod* method) {
  uint32_t thread_clock_diff = 0;
  uint32_t wall_clock_diff = 0;
  ReadClocks(thread, &thread_clock_diff, &wall_clock_diff);
  LogMethodTraceEvent(thread, method, instrumentation::Instrumentation::kMethodEntered,
                      thread_clock_diff, wall_clock_diff);
}

void Trace::MethodExited(Thread* thread,
                         ArtMethod* method,
                         instrumentation::OptionalFrame frame ATTRIBUTE_UNUSED,
                         JValue& return_value ATTRIBUTE_UNUSED) {
  uint32_t thread_clock_diff = 0;
  uint32_t wall_clock_diff = 0;
  ReadClocks(thread, &thread_clock_diff, &wall_clock_diff);
  LogMethodTraceEvent(thread,
                      method,
                      instrumentation::Instrumentation::kMethodExited,
                      thread_clock_diff,
                      wall_clock_diff);
}

Finish Trace

当Java层调用 Debug.stopMethodTracing() 时，会调用到 nativce层 trace.cc 的 FiniishingTracing函数，在该函数内部，首先会进行 trace文件 Header部分的信息组装。这部分会记录 trace文件的版本号、trace追踪的时间、时间度量的类型(wallTime 还是 cpuTime)，函数调用的次数、虚拟机类型、进程号等基础信息。由于在记录函数创建事件时，为了优化，并不是直接记录每个函数的名称，而是记录一个能与之映射的内部生成的ID，因此需要将这部分映射关系记录在Header中，具体的实现是通过调用DumpMethodList 来记录的

void Trace::FinishTracing() {
  size_t final_offset = 0;
  std::set<ArtMethod*> visited_methods;
  if (trace_output_mode_ == TraceOutputMode::kStreaming) {
    // Clean up.
    MutexLock mu(Thread::Current(), *streaming_lock_);
    STLDeleteValues(&seen_methods_);
  } else {
    final_offset = cur_offset_.load(std::memory_order_relaxed);
    GetVisitedMethods(final_offset, &visited_methods);
  }

  // Compute elapsed time.
  uint64_t elapsed = MicroTime() - start_time_;

  std::ostringstream os;
  //记录Trace版本
  os << StringPrintf("%cversion\n", kTraceTokenChar);
  os << StringPrintf("%d\n", GetTraceVersion(clock_source_));
    
  os << StringPrintf("data-file-overflow=%s\n", overflow_ ? "true" : "false");
  //记录时钟类型  时及race间
  if (UseThreadCpuClock()) {
    if (UseWallClock()) {
      os << StringPrintf("clock=dual\n");
    } else {
      os << StringPrintf("clock=thread-cpu\n");
    }
  } else {
    os << StringPrintf("clock=wall\n");
  }
  os << StringPrintf("elapsed-time-usec=%" PRIu64 "\n", elapsed);  
  if (trace_output_mode_ != TraceOutputMode::kStreaming) {
    size_t num_records = (final_offset - kTraceHeaderLength) / GetRecordSize(clock_source_);
    os << StringPrintf("num-method-calls=%zd\n", num_records);
  }
  os << StringPrintf("clock-call-overhead-nsec=%d\n", clock_overhead_ns_);
  os << StringPrintf("vm=art\n");
  os << StringPrintf("pid=%d\n", getpid());
  if ((flags_ & kTraceCountAllocs) != 0) {
    os << "alloc-count=" << Runtime::Current()->GetStat(KIND_ALLOCATED_OBJECTS) << "\n";
    os << "alloc-size=" << Runtime::Current()->GetStat(KIND_ALLOCATED_BYTES) << "\n";
    os << "gc-count=" <<  Runtime::Current()->GetStat(KIND_GC_INVOCATIONS) << "\n";
  }
  // 记录线程信息  
  os << StringPrintf("%cthreads\n", kTraceTokenChar);
  DumpThreadList(os);
  //记录函数信息  
  os << StringPrintf("%cmethods\n", kTraceTokenChar);
  DumpMethodList(os, visited_methods);
  os << StringPrintf("%cend\n", kTraceTokenChar);
  std::string header(os.str());
  //.....  
}

MethodList记录的每个函数信息结构为 methodId prettyMethodDescritpor methodName methodSignature methodDeclaringClassSourceFile,这里的methodId 是 Trace追踪过程中生成的，为每个记录的函数分配的唯一ID。

同样的，由于在 method trace过程中记录的是线程ID，而不是线程名，因此需要通过 DumpThreadList记录线程id和线程名的映射关系。由于头文件信息是在Trace结束时才开始记录的，对于线程来说，可能存在 Trace过程中有线程退出的情况，因此为了保证退出的线程也能被记录，在 art/runtime/thread_list.cc 中，当线程退出时,在 ThreadList::Unregister(Thread* self)内部，会专门通过调用Trace::StoreExistingThreadInfo 提前记录下来。

最终生成的 Trace Header 为类似的文本信息

Trace 方式对比

以上我们分析了ART虚拟机 methodTrace 在采样和插桩模式下的实现方式，通过实现可以粗略对比出两种方式的优缺点。对于插桩方式, 由于是精准跟踪每个函数的Enter Exit, 因此对于函数耗时的判断是最准确的，并且不会遗漏函数，但是自身对性能的影响也更大; 而采样的方式对于函数的Enter Exit判断是基于两次样本的差异判断，对于一些耗时较低的函数，可能不会被记录在内，并且在计算函数耗时时，精度也很采样间隔有关，不过其对性能的影响较低。

method trace方式	性能影响	精确度
插桩	较高	准确
采样	低	一般

拓展

Android Studio对 Trace文件的处理

通过 art/runtime/trace.cc 生成的Trace文件，在拖入AS后，可以直接以Top Down 或者火焰图的形式展示, 因此研究了下这部分的代码。其中，对于Trace文件的解析源码位于 perflib项目中的 VmTraceParse类。在解析Trace文件后，每个函数信息会被转化为 JavaMethodModel对象。 JavaMethodModel为CaputureNode 的子类， TopDownNode类合并相同CaptureNode 对象，最终可以生成一个树结构，该结构即可以展示出 TopDown的效果.

AS中 FlameChart的数据结构也是由TopDownNode 转换而来，因为本质上没什么区别，都是一个树形结构，

只不过，对于FlameChart展示方式来说，将更函数的耗时以更宽的长度展示出来

Method Trace的应用

从表现形式上来说，火焰图的展示方式通过为不同耗时函数展示不同的宽度，因此可以更快速的定位到耗时函数。我们通过上述的源码分析已经通过指导，通过堆栈采样的方式可以实现耗时函数的监控。本身在Java层，通过Thread.getStackTrace也可以实现堆栈采集的能力，那么不通过Native的能力，我们其实也可以实现一个简单的Method Trace方案。方案具体的实现很简单，开启后台线程，定时采集主线程的栈信息，并默认保存最近 n秒的堆栈信息记录即可。

关于具体的应用场景，举几个目前已实践的案例，在APM系统中当我们监测到 APP慢启动、慢消息处理、页面慢启动、ANR 时，很多情况下只能拿到最后时刻的一些信息，或者是一些简单的流程耗时，没法感知具体主线程在这阶段函数执行的情况，通过补齐methodTrace 信息，将上述性能问题发生时间段的堆栈样本一起上报，之后，在APM平台上，分析具体问题时，我们提供了展示性能问题发生时间段的火焰图。这里以应用启动监控功能为例，对于慢启动的日志样本，可以展示出对于的火焰图信息

通过上面火焰图样例可以快速定位出，本次慢启动是 Webview.init 导致的。

卡顿监控案例分享

这里分享一个具体的代码实现案例，在线下场景中，以卡顿为例，我们可能更希望发生卡顿之后，能够立即受到通知，点击通知时，能够在手机上以火焰图的形式直接展示卡顿时间段的堆栈。以下是最终实现的效果，代码已开源在 github 上 github.com/Knight-ZXW/… 。

在Demo中目前只采集了wallTime，没有记录Thread CpuTime，后面有时间的话会完善一下

StackVisitor实现MethodTrace

基于Thread.getStackTrace 的实现方案性能影响较大，字节跳动分享了一个通过调用StackVisitor来实现函数采样的方案，我也写了一个Demo来验证这个思路，其核心代码我放在 github上： github.com/Knight-ZXW/… ，以供参考。

总结

本文首先通过源码分析粗略了解了 Android 系统在Native层实现 Method Trace的方式，对于插桩方式，其最终是通过 Instrementation AddListener 监听函数的进入、退出事件实现的，而采样的方式是通过开启线程定时执行，通过StackVisitor 获取所有线程函数栈的方式。对于这两种方案，在线上模式我们可以参考系统采样的方式，通过 Hook 并调用 StackVisitor相关API 实现线程堆栈的高性能采集方式。

在拓展部分，简单分享了 AndroidStudio 对Method Trace文件的处理，不管是插桩的方式还是采样的方式，最终在表现形式上，我们都可以通过火焰图的方式快速定位阻塞函数，对于在不太了解Native层的情况下，我们也可以直接使用Java层 Thread类的 getStackTrace方式，采集线程当前的栈信息。在Method Trace 的应用方面，本文演示基于堆栈采样在 Android 卡顿监控 (Looper Message 角度)的一个样例，最终可以在设备上直接以火焰图的形式展示函数调用情况。

最后对APM方向感兴趣的同学可以关注下我的性能监控专栏，会持续分享更多性能监控、优化相关的博文: juejin.cn/column/7107… ，专栏历史文章：

文章	地址
扒一扒抖音是如何做线程优化的	juejin.cn/post/721244…
监控Android Looper Message调度的另一种姿势	juejin.cn/post/713974…
Android 高版本采集系统CPU使用率的方式	juejin.cn/post/713503…
Android 平台下的 Method Trace 实现及应用	juejin.cn/post/710713…
Android 如何解决使用SharedPreferences 造成的卡顿、ANR问题	juejin.cn/post/705476…
基于JVMTI 实现性能监控	juejin.cn/post/694278…