系列文章
前言
在上一节中,我们知道 Breakpad 主要由三个部分组成:
- Client,当端上发生崩溃时,会默认生成 minidump 文件。
- Symbol Dumper,这个工具用于生成 Breakpad 专属的符号表,要作用在带有调试信息原始库才行。
- Processor,这个工具通过读取 Client 生成的 minidump 文件,再去匹配 Symbol Dumper 生成的对应符号表,最后生成人类可读的 C/C++ 堆栈跟踪。
那么,这一节,我们先从端上使用的 Client 部分入手,Client 部分主要是学习 Breakpad 是如何监听崩溃发生和生成 minidump 文件,因为 Client 是区分不同系统的,所以我们这里选择用 Android 平台进行分析,Android 是基于 Linux 系统的。
Breakpad Client 用法
Android 要集成 Breakpad Client,可以按照这个文档,这个比较简单。
集成后,我们可以用以下代码,获取崩溃后生成 minidump 文件:
#include "client/linux/handler/exception_handler.h"
static bool dumpCallback(const google_breakpad::MinidumpDescriptor& descriptor,
void* context, bool succeeded) {
printf("Dump path: %s\n", descriptor.path());
return succeeded;
}
void crash() { volatile int* a = (int*)(NULL); *a = 1; }
int main(int argc, char* argv[]) {
google_breakpad::MinidumpDescriptor descriptor("/tmp");
google_breakpad::ExceptionHandler eh(descriptor, NULL, dumpCallback, NULL, true, -1);
crash();
return 0;
}
源码解析
Breakpad Client 整个处理包括两个部分:信号注册和 minidump 生成。
信号注册
ExceptionHandler
是整个处理流程的入口文件,所以我们从这个文件开始分析。
// Runs before crashing: normal context.
ExceptionHandler::ExceptionHandler(const MinidumpDescriptor& descriptor,
FilterCallback filter,
MinidumpCallback callback,
void* callback_context,
bool install_handler,
const int server_fd)
: filter_(filter),
callback_(callback),
callback_context_(callback_context),
minidump_descriptor_(descriptor),
crash_handler_(NULL) {
// ...
if (install_handler) {
// ...
// 注册信号处理
InstallHandlersLocked();
}
// ...
}
InstallHandlersLocked
方法会先保存旧的信号处理回调,再重新注册新的信号处理回调:
// Runs before crashing: normal context.
// static
bool ExceptionHandler::InstallHandlersLocked() {
if (handlers_installed)
return false;
// ...
struct sigaction sa;
memset(&sa, 0, sizeof(sa));
sigemptyset(&sa.sa_mask);
// ...
sa.sa_sigaction = SignalHandler;
sa.sa_flags = SA_ONSTACK | SA_SIGINFO;
// 注册需要监听的信号
for (int i = 0; i < kNumHandledSignals; ++i) {
if (sigaction(kExceptionSignals[i], &sa, NULL) == -1) {
// At this point it is impractical to back out changes, and so failure to
// install a signal is intentionally ignored.
}
}
handlers_installed = true;
return true;
}
这是需要注册的信号列表:
// The list of signals which we consider to be crashes. The default action for
// all these signals must be Core (see man 7 signal) because we rethrow the
// signal after handling it and expect that it'll be fatal.
const int kExceptionSignals[] = {
SIGSEGV, SIGABRT, SIGFPE, SIGILL, SIGBUS, SIGTRAP
};
信号处理
当特定的信号发生时,SignalHandler
方法就会被调用,SignalHandler
是注册信号时,传入的回调方法。
// This function runs in a compromised context: see the top of the file.
// Runs on the crashing thread.
// static
void ExceptionHandler::SignalHandler(int sig, siginfo_t* info, void* uc) {
// ...
bool handled = false;
for (int i = g_handler_stack_->size() - 1; !handled && i >= 0; --i) {
handled = (*g_handler_stack_)[i]->HandleSignal(sig, info, uc);
}
// Upon returning from this signal handler, sig will become unmasked and then
// it will be retriggered. If one of the ExceptionHandlers handled it
// successfully, restore the default handler. Otherwise, restore the
// previously installed handler. Then, when the signal is retriggered, it will
// be delivered to the appropriate handler.
if (handled) {
InstallDefaultHandler(sig);
} else {
RestoreHandlersLocked();
}
// ...
}
g_handler_stack_
中保存的是,我们在注册信号时,传入的 ExceptionHandler
实例,同个进程可以存在多个 ExceptionHandler
实例,所以这里需要遍历回调 HandleSignal
方法。
HandleSignal
中会调用 GenerateDump
方法:
// This function runs in a compromised context: see the top of the file.
// Runs on the crashing thread.
bool ExceptionHandler::HandleSignal(int /*sig*/, siginfo_t* info, void* uc) {
// ...
return GenerateDump(&g_crash_context_);
}
// This function may run in a compromised context: see the top of the file.
bool ExceptionHandler::GenerateDump(CrashContext* context) {
// ...
// sys_clone 创建子进程
const pid_t child = sys_clone(
ThreadEntry, stack, CLONE_FS | CLONE_UNTRACED, &thread_arg, NULL, NULL,
NULL);
if (child == -1) {
sys_close(fdes[0]);
sys_close(fdes[1]);
return false;
}
// Close the read end of the pipe.
sys_close(fdes[0]);
// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
SendContinueSignalToChild();
int status = 0;
// 等待子进程执行完毕
const int r = HANDLE_EINTR(sys_waitpid(child, &status, __WALL));
sys_close(fdes[1]);
// ...
bool success = r != -1 && WIFEXITED(status) && WEXITSTATUS(status) == 0;
if (callback_)
success = callback_(minidump_descriptor_, callback_context_, success);
return success;
}
在 GenerateDump
方法中,首先会调用 sys_clone
去创建一个子进程去生成 minidump 文件,之所以要创建一个子进程去处理,是因为在发生异常的进程上会有较多的限制,例如,可能不能申请足够的内存,因为异常可能是因为内存不足导致的。
子进程的入口方法是 ThreadEntry
:
// This is the entry function for the cloned process. We are in a compromised
// context here: see the top of the file.
// static
int ExceptionHandler::ThreadEntry(void* arg) {
const ThreadArgument* thread_arg = reinterpret_cast<ThreadArgument*>(arg);
// Close the write end of the pipe. This allows us to fail if the parent dies
// while waiting for the continue signal.
sys_close(thread_arg->handler->fdes[1]);
// Block here until the crashing process unblocks us when
// we're allowed to use ptrace
thread_arg->handler->WaitForContinueSignal();
sys_close(thread_arg->handler->fdes[0]);
return thread_arg->handler->DoDump(thread_arg->pid, thread_arg->context,
thread_arg->context_size) == false;
}
// This function runs in a compromised context: see the top of the file.
// Runs on the cloned process.
void ExceptionHandler::WaitForContinueSignal() {
int r;
char receivedMessage;
r = HANDLE_EINTR(sys_read(fdes[0], &receivedMessage, sizeof(char)));
// ...
}
ThreadEntry
方法中,并没有直接去调用 DoDump
方法去生成 minidump 文件,而是调用 WaitForContinueSignal
方法阻塞等待写入。
而在主进程中,在 sys_clone
调用成功后,会设置当前进程允许被 ptrace
:
// Allow the child to ptrace us
sys_prctl(PR_SET_PTRACER, child, 0, 0, 0);
SendContinueSignalToChild();
// This function runs in a compromised context: see the top of the file.
void ExceptionHandler::SendContinueSignalToChild() {
static const char okToContinueMessage = 'a';
int r;
r = HANDLE_EINTR(sys_write(fdes[1], &okToContinueMessage, sizeof(char)));
// ...
}
接着,调用 SendContinueSignalToChild
通知子进程继续执行。
这里和子进程之间的通信,就是通过 fdes
数组保存文件描述符实现的,其中 fdes[0]
用于读取,fdes[1]
用于写入,之所以需要设置 ptrace 是因为在生成 minidump 文件时,需要暂停进程中所有的线程,这个我们后面会讲到。
生成 minidump 文件
DoDump
方法会调用 MinidumpWriter::WriteMinidumpImpl
方法开始生成 minidump 文件。
bool WriteMinidumpImpl(const char* minidump_path,
int minidump_fd,
off_t minidump_size_limit,
pid_t crashing_process,
const void* blob, size_t blob_size,
const MappingList& mappings,
const AppMemoryList& appmem,
bool skip_stacks_if_mapping_unreferenced,
uintptr_t principal_mapping_address,
bool sanitize_stacks) {
LinuxPtraceDumper dumper(crashing_process);
// ...
MinidumpWriter writer(minidump_path, minidump_fd, context, mappings,
appmem, skip_stacks_if_mapping_unreferenced,
principal_mapping_address, sanitize_stacks, &dumper);
// Set desired limit for file size of minidump (-1 means no limit).
writer.set_minidump_size_limit(minidump_size_limit);
if (!writer.Init())
return false;
return writer.Dump();
}
LinuxDumper
用于获取当前的系统信息,例如线程信息、加载的库信息和内存信息等等,LinuxPtraceDumper
是 LinuxDumper
的实现类,表示基于 ptrace 实现,主要是以下这两个方法的实现:
// Suspend/resume all threads in the given process.
// 暂停所有线程
virtual bool ThreadsSuspend() = 0;
// 恢复所有线程
virtual bool ThreadsResume() = 0;
MinidumpWriter
用于实现 minidump 文件生成,首先需要调用 Init
方法进行初始化:
bool Init() {
// 首先调用 LinuxDumper 的初始化
if (!dumper_->Init())
return false;
// 接着调用 ThreadsSuspend 暂停所有线程
if (!dumper_->ThreadsSuspend() || !dumper_->LateInit())
return false;
// ...
return true;
}
bool LinuxDumper::Init() {
return ReadAuxv() && EnumerateThreads() && EnumerateMappings();
}
LinuxDumper.Init
方法中,调用三个方法:
-
ReadAuxv
读取 /proc/{pid}/auxv 文件,获取一些辅助信息。
-
EnumerateThreads
读取 /proc/{pid}/task 文件,获取当前进程中的线程信息。
-
EnumerateMappings
读取 /proc/{pid}/maps 文件,获取当前进程中加载的内存映射文件信息。
在初始化成功后,调用 Dump
方法可以生成 minidump 文件:
bool Dump() {
// A minidump file contains a number of tagged streams. This is the number
// of stream which we write.
unsigned kNumWriters = 13;
TypedMDRVA<MDRawDirectory> dir(&minidump_writer_);
{
// Ensure the header gets flushed, as that happens in the destructor.
// If a crash occurs somewhere below, at least the header will be
// intact.
TypedMDRVA<MDRawHeader> header(&minidump_writer_);
// ...
header.get()->signature = MD_HEADER_SIGNATURE;
header.get()->version = MD_HEADER_VERSION;
header.get()->time_date_stamp = time(NULL);
header.get()->stream_count = kNumWriters;
header.get()->stream_directory_rva = dir.position();
}
unsigned dir_index = 0;
MDRawDirectory dirent;
if (!WriteThreadListStream(&dirent))
return false;
dir.CopyIndex(dir_index++, &dirent);
if (!WriteMappings(&dirent))
return false;
dir.CopyIndex(dir_index++, &dirent);
if (!WriteAppMemory())
return false;
if (!WriteMemoryListStream(&dirent))
return false;
dir.CopyIndex(dir_index++, &dirent);
if (!WriteExceptionStream(&dirent))
return false;
dir.CopyIndex(dir_index++, &dirent);
if (!WriteSystemInfoStream(&dirent))
return false;
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_CPU_INFO;
if (!WriteFile(&dirent.location, "/proc/cpuinfo"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_PROC_STATUS;
if (!WriteProcFile(&dirent.location, GetCrashThread(), "status"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_LSB_RELEASE;
if (!WriteFile(&dirent.location, "/etc/lsb-release"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_CMD_LINE;
if (!WriteProcFile(&dirent.location, GetCrashThread(), "cmdline"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_ENVIRON;
if (!WriteProcFile(&dirent.location, GetCrashThread(), "environ"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_AUXV;
if (!WriteProcFile(&dirent.location, GetCrashThread(), "auxv"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_MAPS;
if (!WriteProcFile(&dirent.location, GetCrashThread(), "maps"))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
dirent.stream_type = MD_LINUX_DSO_DEBUG;
if (!WriteDSODebugStream(&dirent))
NullifyDirectoryEntry(&dirent);
dir.CopyIndex(dir_index++, &dirent);
// If you add more directory entries, don't forget to update kNumWriters,
// above.
dumper_->ThreadsResume();
return true;
}
这个方法看起来很长,实际上没那么复杂,首先是生成 minidump 文件的头部信息,例如签名、版本、时间戳和 Stream 数量等,用 MDRawHeader
表示。
接下来内容部分是由 13 个流(Stream)组成,例如 WriteThreadListStream
用于写入所有的线程信息,WriteMappings
用于写入所有的内存映射,以此类推,后面还有内存信息,系统信息等等。
每个 Stream 的类型是通过 stream_type
来区分的,下面是目前已定义的 stream_type
枚举:
/* For (MDRawDirectory).stream_type */
typedef enum {
MD_UNUSED_STREAM = 0,
MD_RESERVED_STREAM_0 = 1,
MD_RESERVED_STREAM_1 = 2,
MD_THREAD_LIST_STREAM = 3, /* MDRawThreadList */
MD_MODULE_LIST_STREAM = 4, /* MDRawModuleList */
MD_MEMORY_LIST_STREAM = 5, /* MDRawMemoryList */
MD_EXCEPTION_STREAM = 6, /* MDRawExceptionStream */
MD_SYSTEM_INFO_STREAM = 7, /* MDRawSystemInfo */
MD_THREAD_EX_LIST_STREAM = 8,
MD_MEMORY_64_LIST_STREAM = 9,
MD_COMMENT_STREAM_A = 10,
MD_COMMENT_STREAM_W = 11,
MD_HANDLE_DATA_STREAM = 12,
MD_FUNCTION_TABLE_STREAM = 13,
MD_UNLOADED_MODULE_LIST_STREAM = 14,
MD_MISC_INFO_STREAM = 15, /* MDRawMiscInfo */
MD_MEMORY_INFO_LIST_STREAM = 16, /* MDRawMemoryInfoList */
MD_THREAD_INFO_LIST_STREAM = 17,
MD_HANDLE_OPERATION_LIST_STREAM = 18,
MD_TOKEN_STREAM = 19,
MD_JAVASCRIPT_DATA_STREAM = 20,
MD_SYSTEM_MEMORY_INFO_STREAM = 21,
MD_PROCESS_VM_COUNTERS_STREAM = 22,
MD_LAST_RESERVED_STREAM = 0x0000ffff,
/* Breakpad extension types. 0x4767 = "Gg" */
MD_BREAKPAD_INFO_STREAM = 0x47670001, /* MDRawBreakpadInfo */
MD_ASSERTION_INFO_STREAM = 0x47670002, /* MDRawAssertionInfo */
/* These are additional minidump stream values which are specific to
* the linux breakpad implementation. */
MD_LINUX_CPU_INFO = 0x47670003, /* /proc/cpuinfo */
MD_LINUX_PROC_STATUS = 0x47670004, /* /proc/$x/status */
MD_LINUX_LSB_RELEASE = 0x47670005, /* /etc/lsb-release */
MD_LINUX_CMD_LINE = 0x47670006, /* /proc/$x/cmdline */
MD_LINUX_ENVIRON = 0x47670007, /* /proc/$x/environ */
MD_LINUX_AUXV = 0x47670008, /* /proc/$x/auxv */
MD_LINUX_MAPS = 0x47670009, /* /proc/$x/maps */
MD_LINUX_DSO_DEBUG = 0x4767000A, /* MDRawDebug{32,64} */
/* Crashpad extension types. 0x4350 = "CP"
* See Crashpad's minidump/minidump_extensions.h. */
MD_CRASHPAD_INFO_STREAM = 0x43500001, /* MDRawCrashpadInfo */
} MDStreamType; /* MINIDUMP_STREAM_TYPE */
每个 Stream 的写入都分成两个部分,首先是 Stream 的信息:
dirent->stream_type = MD_THREAD_LIST_STREAM;
dirent->location = list.location();
stream_type
我们已经说过了,location
则是用来标记 Stream 内容的位置,因为 Stream 的信息和内容是分开写入的,会先写入所有的 Stream 的信息,再依次写入 Stream 的内容。
typedef struct {
uint32_t data_size;
MDRVA rva;
} MDLocationDescriptor; /* MINIDUMP_LOCATION_DESCRIPTOR */
location
中的 data_size
是表示内容的长度,rva
表示在 minidump 文件中的位置。
我们可以用 Breakpad 的 minidump_dump 工具将原始的 minidump 文件转成人类可读的格式。
MDRawHeader
signature = 0x504d444d
version = 0xa793
stream_count = 13
stream_directory_rva = 0x20
checksum = 0x0
time_date_stamp = 0x61011550 2021-07-28 08:29:04
flags = 0x0
mDirectory[0]
MDRawDirectory
stream_type = 0x3 (MD_THREAD_LIST_STREAM)
location.data_size = 11668
location.rva = 0xc0
mDirectory[1]
MDRawDirectory
stream_type = 0x4 (MD_MODULE_LIST_STREAM)
location.data_size = 69880
location.rva = 0x10dca8
后面的内容类似
小结
我们来总结下 Breakpad Client 的工作流程,首先,注册对应信号的回调,当崩溃发生时,接收到对应信号的回调,然后 copy 一个子进程,接着设置当前主进程为允许 ptrace,最后在子进程暂停所有线程,中完成 minidump 文件的生成。
我们用 Breakpad 代码中的注释总结整个过程:
// The signal flow looks like this:
//
// SignalHandler (uses a global stack of ExceptionHandler objects to find
// | one to handle the signal. If the first rejects it, try
// | the second etc...)
// V
// HandleSignal ----------------------------| (clones a new process which
// | | shares an address space with
// (wait for cloned | the crashed process. This
// process) | allows us to ptrace the crashed
// | | process)
// V V
// (set signal handler to ThreadEntry (static function to bounce
// SIG_DFL and rethrow, | back into the object)
// killing the crashed |
// process) V
// DoDump (writes minidump)
// |
// V
// sys_exit
//