Google Breakpad 源码解析(二)

1,452 阅读8分钟

系列文章

前言

在上一节中,我们知道 Breakpad 主要由三个部分组成:

  • Client,当端上发生崩溃时,会默认生成 minidump 文件。
  • Symbol Dumper,这个工具用于生成 Breakpad 专属的符号表,要作用在带有调试信息原始库才行。
  • Processor,这个工具通过读取 Client 生成的 minidump 文件,再去匹配 Symbol Dumper 生成的对应符号表,最后生成人类可读的 C/C++ 堆栈跟踪。

那么,这一节,我们先从端上使用的 Client 部分入手,Client 部分主要是学习 Breakpad 是如何监听崩溃发生和生成 minidump 文件,因为 Client 是区分不同系统的,所以我们这里选择用 Android 平台进行分析,Android 是基于 Linux 系统的。

Breakpad Client 用法

Android 要集成 Breakpad Client,可以按照这个文档,这个比较简单。

集成后,我们可以用以下代码,获取崩溃后生成 minidump 文件:

#include "client/linux/handler/exception_handler.h"

static bool dumpCallback(const google_breakpad::MinidumpDescriptor& descriptor,
void* context, bool succeeded) {
  printf("Dump path: %s\n", descriptor.path());
  return succeeded;
}

void crash() { volatile int* a = (int*)(NULL); *a = 1; }

int main(int argc, char* argv[]) {
  google_breakpad::MinidumpDescriptor descriptor("/tmp");
  google_breakpad::ExceptionHandler eh(descriptor, NULL, dumpCallback, NULL, true, -1);
  crash();
  return 0;
}

源码解析

Breakpad Client 整个处理包括两个部分:信号注册和 minidump 生成。

信号注册

ExceptionHandler 是整个处理流程的入口文件,所以我们从这个文件开始分析。

// Runs before crashing: normal context.
ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
                                   FilterCallback filter,
                                   MinidumpCallback callback,
                                   void* callback_context,
                                   bool install_handler,
                                   const int server_fd)
    : filter_(filter),
      callback_(callback),
      callback_context_(callback_context),
      minidump_descriptor_(descriptor),
      crash_handler_(NULL) {
  // ...
  if (install_handler) {
    // ...
    // 注册信号处理
    InstallHandlersLocked();
  }
  // ...
}

InstallHandlersLocked 方法会先保存旧的信号处理回调,再重新注册新的信号处理回调:

// Runs before crashing: normal context.
// static
bool ExceptionHandler::InstallHandlersLocked() {
  if (handlers_installed)
    return false;

  // ...
  
  struct sigaction sa;
  memset(&sa, 0, sizeof(sa));
  sigemptyset(&sa.sa_mask);

  // ...

  sa.sa_sigaction = SignalHandler;
  sa.sa_flags = SA_ONSTACK | SA_SIGINFO;

  // 注册需要监听的信号
  for (int i = 0; i < kNumHandledSignals; ++i) {
    if (sigaction(kExceptionSignals[i], &sa, NULL) == -1) {
      // At this point it is impractical to back out changes, and so failure to
      // install a signal is intentionally ignored.
    }
  }
  handlers_installed = true;
  return true;
}

这是需要注册的信号列表:

// The list of signals which we consider to be crashes. The default action for
// all these signals must be Core (see man 7 signal) because we rethrow the
// signal after handling it and expect that it'll be fatal.
const int kExceptionSignals[] = {
  SIGSEGV, SIGABRT, SIGFPE, SIGILL, SIGBUS, SIGTRAP
};

信号处理

当特定的信号发生时,SignalHandler 方法就会被调用,SignalHandler 是注册信号时,传入的回调方法。

// This function runs in a compromised context: see the top of the file.
// Runs on the crashing thread.
// static
void ExceptionHandler::SignalHandler(int sig, siginfo_t* info, void* uc) {

  // ...

  bool handled = false;
  for (int i = g_handler_stack_->size() - 1; !handled && i >= 0; --i) {
    handled = (*g_handler_stack_)[i]->HandleSignal(sig, info, uc);
  }

  // Upon returning from this signal handler, sig will become unmasked and then
  // it will be retriggered. If one of the ExceptionHandlers handled it
  // successfully, restore the default handler. Otherwise, restore the
  // previously installed handler. Then, when the signal is retriggered, it will
  // be delivered to the appropriate handler.
  if (handled) {
    InstallDefaultHandler(sig);
  } else {
    RestoreHandlersLocked();
  }

 	// ...
}

g_handler_stack_ 中保存的是,我们在注册信号时,传入的 ExceptionHandler 实例,同个进程可以存在多个 ExceptionHandler 实例,所以这里需要遍历回调 HandleSignal 方法。

HandleSignal 中会调用 GenerateDump 方法:

// This function runs in a compromised context: see the top of the file.
// Runs on the crashing thread.
bool ExceptionHandler::HandleSignal(int /*sig*/, siginfo_t* info, void* uc) {
  // ...
  return GenerateDump(&g_crash_context_);
}

// This function may run in a compromised context: see the top of the file.
bool ExceptionHandler::GenerateDump(CrashContext* context) {
  // ...

  // sys_clone 创建子进程
  const pid_t child = sys_clone(
      ThreadEntry, stack, CLONE_FS | CLONE_UNTRACED, &thread_arg, NULL, NULL,
      NULL);
  if (child == -1) {
    sys_close(fdes[0]);
    sys_close(fdes[1]);
    return false;
  }

  // Close the read end of the pipe.
  sys_close(fdes[0]);
  // Allow the child to ptrace us
  sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
  SendContinueSignalToChild();
  int status = 0;
  // 等待子进程执行完毕
  const int r = HANDLE_EINTR(sys_waitpid(child, &status, __WALL));

  sys_close(fdes[1]);

  // ...

  bool success = r != -1 && WIFEXITED(status) && WEXITSTATUS(status) == 0;
  if (callback_)
    success = callback_(minidump_descriptor_, callback_context_, success);
  return success;
}

GenerateDump 方法中,首先会调用 sys_clone 去创建一个子进程去生成 minidump 文件,之所以要创建一个子进程去处理,是因为在发生异常的进程上会有较多的限制,例如,可能不能申请足够的内存,因为异常可能是因为内存不足导致的

子进程的入口方法是 ThreadEntry

// This is the entry function for the cloned process. We are in a compromised
// context here: see the top of the file.
// static
int ExceptionHandler::ThreadEntry(void* arg) {
  const ThreadArgument* thread_arg = reinterpret_cast<ThreadArgument*>(arg);

  // Close the write end of the pipe. This allows us to fail if the parent dies
  // while waiting for the continue signal.
  sys_close(thread_arg->handler->fdes[1]);

  // Block here until the crashing process unblocks us when
  // we're allowed to use ptrace
  thread_arg->handler->WaitForContinueSignal();
  sys_close(thread_arg->handler->fdes[0]);

  return thread_arg->handler->DoDump(thread_arg->pid, thread_arg->context,
                                     thread_arg->context_size) == false;
}


// This function runs in a compromised context: see the top of the file.
// Runs on the cloned process.
void ExceptionHandler::WaitForContinueSignal() {
  int r;
  char receivedMessage;
  r = HANDLE_EINTR(sys_read(fdes[0], &receivedMessage, sizeof(char)));
  // ...
}

ThreadEntry 方法中,并没有直接去调用 DoDump 方法去生成 minidump 文件,而是调用 WaitForContinueSignal 方法阻塞等待写入。

而在主进程中,在 sys_clone 调用成功后,会设置当前进程允许被 ptrace

// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
SendContinueSignalToChild();

// This function runs in a compromised context: see the top of the file.
void ExceptionHandler::SendContinueSignalToChild() {
  static const char okToContinueMessage = 'a';
  int r;
  r = HANDLE_EINTR(sys_write(fdes[1], &okToContinueMessage, sizeof(char)));
  // ...
}

接着,调用 SendContinueSignalToChild 通知子进程继续执行。

这里和子进程之间的通信,就是通过 fdes 数组保存文件描述符实现的,其中 fdes[0] 用于读取,fdes[1] 用于写入,之所以需要设置 ptrace 是因为在生成 minidump 文件时,需要暂停进程中所有的线程,这个我们后面会讲到。

生成 minidump 文件

DoDump 方法会调用 MinidumpWriter::WriteMinidumpImpl 方法开始生成 minidump 文件。

bool WriteMinidumpImpl(const char* minidump_path,
                       int minidump_fd,
                       off_t minidump_size_limit,
                       pid_t crashing_process,
                       const void* blob, size_t blob_size,
                       const MappingList& mappings,
                       const AppMemoryList& appmem,
                       bool skip_stacks_if_mapping_unreferenced,
                       uintptr_t principal_mapping_address,
                       bool sanitize_stacks) {
  LinuxPtraceDumper dumper(crashing_process);
  // ...
  MinidumpWriter writer(minidump_path, minidump_fd, context, mappings,
                        appmem, skip_stacks_if_mapping_unreferenced,
                        principal_mapping_address, sanitize_stacks, &dumper);
  // Set desired limit for file size of minidump (-1 means no limit).
  writer.set_minidump_size_limit(minidump_size_limit);
  if (!writer.Init())
    return false;
  return writer.Dump();
}

LinuxDumper 用于获取当前的系统信息,例如线程信息、加载的库信息和内存信息等等,LinuxPtraceDumperLinuxDumper 的实现类,表示基于 ptrace 实现,主要是以下这两个方法的实现:

// Suspend/resume all threads in the given process.
// 暂停所有线程
virtual bool ThreadsSuspend() = 0;
// 恢复所有线程
virtual bool ThreadsResume() = 0;

MinidumpWriter 用于实现 minidump 文件生成,首先需要调用 Init 方法进行初始化:

bool Init() {
 		// 首先调用 LinuxDumper 的初始化
    if (!dumper_->Init())
      return false;

  	// 接着调用 ThreadsSuspend 暂停所有线程
    if (!dumper_->ThreadsSuspend() || !dumper_->LateInit())
      return false;

    // ...

    return true;
}

bool LinuxDumper::Init() {
  return ReadAuxv() && EnumerateThreads() && EnumerateMappings();
}

LinuxDumper.Init 方法中,调用三个方法:

  • ReadAuxv

    读取 /proc/{pid}/auxv 文件,获取一些辅助信息。

  • EnumerateThreads

    读取 /proc/{pid}/task 文件,获取当前进程中的线程信息。

  • EnumerateMappings

    读取 /proc/{pid}/maps 文件,获取当前进程中加载的内存映射文件信息。

在初始化成功后,调用 Dump 方法可以生成 minidump 文件:

bool Dump() {
    // A minidump file contains a number of tagged streams. This is the number
    // of stream which we write.
    unsigned kNumWriters = 13;

    TypedMDRVA<MDRawDirectory> dir(&minidump_writer_);
    {
      // Ensure the header gets flushed, as that happens in the destructor.
      // If a crash occurs somewhere below, at least the header will be
      // intact.
      TypedMDRVA<MDRawHeader> header(&minidump_writer_);
      
      // ...
      
      header.get()->signature = MD_HEADER_SIGNATURE;
      header.get()->version = MD_HEADER_VERSION;
      header.get()->time_date_stamp = time(NULL);
      header.get()->stream_count = kNumWriters;
      header.get()->stream_directory_rva = dir.position();
    }

    unsigned dir_index = 0;
    MDRawDirectory dirent;

    if (!WriteThreadListStream(&dirent))
      return false;
    dir.CopyIndex(dir_index++, &dirent);

    if (!WriteMappings(&dirent))
      return false;
    dir.CopyIndex(dir_index++, &dirent);

    if (!WriteAppMemory())
      return false;

    if (!WriteMemoryListStream(&dirent))
      return false;
    dir.CopyIndex(dir_index++, &dirent);

    if (!WriteExceptionStream(&dirent))
      return false;
    dir.CopyIndex(dir_index++, &dirent);

    if (!WriteSystemInfoStream(&dirent))
      return false;
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_CPU_INFO;
    if (!WriteFile(&dirent.location, "/proc/cpuinfo"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_PROC_STATUS;
    if (!WriteProcFile(&dirent.location, GetCrashThread(), "status"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_LSB_RELEASE;
    if (!WriteFile(&dirent.location, "/etc/lsb-release"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_CMD_LINE;
    if (!WriteProcFile(&dirent.location, GetCrashThread(), "cmdline"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_ENVIRON;
    if (!WriteProcFile(&dirent.location, GetCrashThread(), "environ"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_AUXV;
    if (!WriteProcFile(&dirent.location, GetCrashThread(), "auxv"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_MAPS;
    if (!WriteProcFile(&dirent.location, GetCrashThread(), "maps"))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    dirent.stream_type = MD_LINUX_DSO_DEBUG;
    if (!WriteDSODebugStream(&dirent))
      NullifyDirectoryEntry(&dirent);
    dir.CopyIndex(dir_index++, &dirent);

    // If you add more directory entries, don't forget to update kNumWriters,
    // above.

    dumper_->ThreadsResume();
    return true;
  }

这个方法看起来很长,实际上没那么复杂,首先是生成 minidump 文件的头部信息,例如签名、版本、时间戳和 Stream 数量等,用 MDRawHeader 表示。

接下来内容部分是由 13 个流(Stream)组成,例如 WriteThreadListStream 用于写入所有的线程信息,WriteMappings 用于写入所有的内存映射,以此类推,后面还有内存信息,系统信息等等。

每个 Stream 的类型是通过 stream_type 来区分的,下面是目前已定义的 stream_type 枚举:

/* For (MDRawDirectory).stream_type */
typedef enum {
  MD_UNUSED_STREAM               =  0,
  MD_RESERVED_STREAM_0           =  1,
  MD_RESERVED_STREAM_1           =  2,
  MD_THREAD_LIST_STREAM          =  3,  /* MDRawThreadList */
  MD_MODULE_LIST_STREAM          =  4,  /* MDRawModuleList */
  MD_MEMORY_LIST_STREAM          =  5,  /* MDRawMemoryList */
  MD_EXCEPTION_STREAM            =  6,  /* MDRawExceptionStream */
  MD_SYSTEM_INFO_STREAM          =  7,  /* MDRawSystemInfo */
  MD_THREAD_EX_LIST_STREAM       =  8,
  MD_MEMORY_64_LIST_STREAM       =  9,
  MD_COMMENT_STREAM_A            = 10,
  MD_COMMENT_STREAM_W            = 11,
  MD_HANDLE_DATA_STREAM          = 12,
  MD_FUNCTION_TABLE_STREAM       = 13,
  MD_UNLOADED_MODULE_LIST_STREAM = 14,
  MD_MISC_INFO_STREAM            = 15,  /* MDRawMiscInfo */
  MD_MEMORY_INFO_LIST_STREAM     = 16,  /* MDRawMemoryInfoList */
  MD_THREAD_INFO_LIST_STREAM     = 17,
  MD_HANDLE_OPERATION_LIST_STREAM = 18,
  MD_TOKEN_STREAM                = 19,
  MD_JAVASCRIPT_DATA_STREAM      = 20,
  MD_SYSTEM_MEMORY_INFO_STREAM   = 21,
  MD_PROCESS_VM_COUNTERS_STREAM  = 22,
  MD_LAST_RESERVED_STREAM        = 0x0000ffff,

  /* Breakpad extension types.  0x4767 = "Gg" */
  MD_BREAKPAD_INFO_STREAM        = 0x47670001,  /* MDRawBreakpadInfo  */
  MD_ASSERTION_INFO_STREAM       = 0x47670002,  /* MDRawAssertionInfo */
  /* These are additional minidump stream values which are specific to
   * the linux breakpad implementation. */
  MD_LINUX_CPU_INFO              = 0x47670003,  /* /proc/cpuinfo      */
  MD_LINUX_PROC_STATUS           = 0x47670004,  /* /proc/$x/status    */
  MD_LINUX_LSB_RELEASE           = 0x47670005,  /* /etc/lsb-release   */
  MD_LINUX_CMD_LINE              = 0x47670006,  /* /proc/$x/cmdline   */
  MD_LINUX_ENVIRON               = 0x47670007,  /* /proc/$x/environ   */
  MD_LINUX_AUXV                  = 0x47670008,  /* /proc/$x/auxv      */
  MD_LINUX_MAPS                  = 0x47670009,  /* /proc/$x/maps      */
  MD_LINUX_DSO_DEBUG             = 0x4767000A,  /* MDRawDebug{32,64}  */

  /* Crashpad extension types. 0x4350 = "CP"
   * See Crashpad's minidump/minidump_extensions.h. */
  MD_CRASHPAD_INFO_STREAM        = 0x43500001,  /* MDRawCrashpadInfo  */
} MDStreamType;  /* MINIDUMP_STREAM_TYPE */

每个 Stream 的写入都分成两个部分,首先是 Stream 的信息:

dirent->stream_type = MD_THREAD_LIST_STREAM;
dirent->location = list.location();

stream_type 我们已经说过了,location 则是用来标记 Stream 内容的位置,因为 Stream 的信息和内容是分开写入的,会先写入所有的 Stream 的信息,再依次写入 Stream 的内容

typedef struct {
  uint32_t  data_size;
  MDRVA     rva;
} MDLocationDescriptor;  /* MINIDUMP_LOCATION_DESCRIPTOR */

location 中的 data_size 是表示内容的长度,rva 表示在 minidump 文件中的位置。

我们可以用 Breakpad 的 minidump_dump 工具将原始的 minidump 文件转成人类可读的格式。

MDRawHeader
  signature            = 0x504d444d
  version              = 0xa793
  stream_count         = 13
  stream_directory_rva = 0x20
  checksum             = 0x0
  time_date_stamp      = 0x61011550 2021-07-28 08:29:04
  flags                = 0x0

mDirectory[0]
MDRawDirectory
  stream_type        = 0x3 (MD_THREAD_LIST_STREAM)
  location.data_size = 11668
  location.rva       = 0xc0

mDirectory[1]
MDRawDirectory
  stream_type        = 0x4 (MD_MODULE_LIST_STREAM)
  location.data_size = 69880
  location.rva       = 0x10dca8

后面的内容类似

小结

我们来总结下 Breakpad Client 的工作流程,首先,注册对应信号的回调,当崩溃发生时,接收到对应信号的回调,然后 copy 一个子进程,接着设置当前主进程为允许 ptrace,最后在子进程暂停所有线程,中完成 minidump 文件的生成

我们用 Breakpad 代码中的注释总结整个过程:

// The signal flow looks like this:
//
//   SignalHandler (uses a global stack of ExceptionHandler objects to find
//        |         one to handle the signal. If the first rejects it, try
//        |         the second etc...)
//        V
//   HandleSignal ----------------------------| (clones a new process which
//        |                                   |  shares an address space with
//   (wait for cloned                         |  the crashed process. This
//     process)                               |  allows us to ptrace the crashed
//        |                                   |  process)
//        V                                   V
//   (set signal handler to             ThreadEntry (static function to bounce
//    SIG_DFL and rethrow,                    |      back into the object)
//    killing the crashed                     |
//    process)                                V
//                                          DoDump  (writes minidump)
//                                            |
//                                            V
//                                         sys_exit
//