理解 Android Native Crash 处理流程

2,749 阅读19分钟

一、背景

1.1 debuggerd守护进程

在Android系统中有监控应用异常退出的机制,当发生native崩溃或者主动调用debuggerd命令 时候,都会通知debuggerd守护进程,让其抓取进程奔溃现场的相关信息,生成tombstones文件。文件个数限制为 10个。当超过限制的时候,则覆盖最老的文件。 生成文件路径位于 /data/tombstones/tombstone_xx

针对进程出现的不同状态,linux kernel内核会发送相应signal给异常进程。捕获signal然后对其做对应的处理(一般是退出异常进程)。 Android系统则是在这种机制下,通过拦截捕获signal,然后dump出进程相关信息,输出到tombstones文件。

debuggerd进程会打开socket服务端。当需要调用debuggerd服务时,通过socket客户端与debuggerd服务建立socket连接。 然后发送不同请求给debuggerd服务端。当服务端收到不同的请求后,则执行相应的dump操作。

1.2 tombstoned 守护进程

Android 8.0 之前的版本中,崩溃由 debuggerddebuggerd64 守护程序处理。Android 8.0 及更高版本会根据需要拉起 crash_dump32crash_dump64 进程。本质流程上没有太大区别。

本文分析最新的 tombstoned守护进程。

二、tombstoned守护进程的启动

在系统init进程启动后,会解析xxx.rc文件。此时,会启动tombstoned守护进程。

system/core/debuggerd/tombstoned/tombstoned.rc

service tombstoned /system/bin/tombstoned
    user tombstoned
    group system // 属于系统组

    # Don't start tombstoned until after the real /data is mounted.
    class late_start
    // 启动三个socket服务 
    socket tombstoned_crash seqpacket 0666 system system
    socket tombstoned_intercept seqpacket 0666 system system
    socket tombstoned_java_trace seqpacket 0666 system system
    writepid /dev/cpuset/system-background/tasks

三个服务的名字如下:

system/core/debuggerd/protocol.h

constexpr char kTombstonedCrashSocketName[] = "tombstoned_crash";
constexpr char kTombstonedJavaTraceSocketName[] = "tombstoned_java_trace";
constexpr char kTombstonedInterceptSocketName[] = "tombstoned_intercept";

解析rc文件后,会拉起进程执行 tombstone.cppmain()方法:

system/core/debuggerd/libdebuggerd/tombstone.cpp

int main(int, char* []) {
  umask(0137);

  // Don't try to connect to ourselves if we crash.
  // 就算tombstoned进程崩溃,也不要连接我们自己的socket服务。
  struct sigaction action = {};
  action.sa_handler = [](int signal) {
    LOG(ERROR) << "received fatal signal " << signal;
    // 收到内核发送的崩溃信号,不作处理,直接退出。
    _exit(1);
  };
  // 注册信号
  debuggerd_register_handlers(&action);

  // 获取 tombstoned_intercept socket服务端
  int intercept_socket = android_get_control_socket(kTombstonedInterceptSocketName);
  // 获取 tombstoned_crash socket服务端
  int crash_socket = android_get_control_socket(kTombstonedCrashSocketName);

  if (intercept_socket == -1 || crash_socket == -1) {
    PLOG(FATAL) << "failed to get socket from init";
  }
  // 传入fd,让socket对应的服务,变成非阻塞
  evutil_make_socket_nonblocking(intercept_socket);
  evutil_make_socket_nonblocking(crash_socket);

  event_base* base = event_base_new();
  if (!base) {
    LOG(FATAL) << "failed to create event_base";
  }

  intercept_manager = new InterceptManager(base, intercept_socket);
    // 监听 native crash
  evconnlistener* tombstone_listener =
      evconnlistener_new(base, crash_accept_cb, CrashQueue::for_tombstones(), LEV_OPT_CLOSE_ON_FREE,
                         -1 /* backlog */, crash_socket);
  if (!tombstone_listener) {
    LOG(FATAL) << "failed to create evconnlistener for tombstones.";
  }

  if (kJavaTraceDumpsEnabled) {
   // 获取 tombstoned_java_trace socket服务端
    const int java_trace_socket = android_get_control_socket(kTombstonedJavaTraceSocketName);
    if (java_trace_socket == -1) {
      PLOG(FATAL) << "failed to get socket from init";
    }
    // 非阻塞
    evutil_make_socket_nonblocking(java_trace_socket);
    //监听java crash, crash_accept_cb callback。
    evconnlistener* java_trace_listener =
        evconnlistener_new(base, crash_accept_cb, CrashQueue::for_anrs(), LEV_OPT_CLOSE_ON_FREE,
                           -1 /* backlog */, java_trace_socket);
    if (!java_trace_listener) {
      LOG(FATAL) << "failed to create evconnlistener for java traces.";
    }
  }

  LOG(INFO) << "tombstoned successfully initialized";
 
  event_base_dispatch(base);
}

至此,tombstoned守护进程则会一直等待其他进程的连接。

三、App应用进程

3.1 程序启动背景

Android程序除了少数程序(如:init进程)是静态链接外,基本上都是动态链接库。而想要加载运行一个动态链接库,就需要动态链接器来加载动态链接库, 然后才能执行Android应用程序。

Android的动态链接器就是 linker。因此,从动态链接的启动出发。首先会进入动态链接器的自举bootstrap过程,当自举(自己给自己重定位)完成后,才会加载所需的共享动态库,最后进入到程序代码的入口。

从动态链接器的入口开始:

3.2 begin.S

bionic/linker/arch/arm64/begin.S

ENTRY(_start)
  // Force unwinds to end in this function.
  .cfi_undefined x30

  mov x0, sp
  bl __linker_init // 调用init方法 

  /* linker init returns the _entry address in the main image */
  br x0
END(_start)

linker_init() 会返回主程序的入口地址。 那么它内部是怎么实现的呢?

3.3 linker_init()

bionic/linker/linker_main.cpp

/*
 * This is the entry point for the linker, called from begin.S. This
 * method is responsible for fixing the linker's own relocations, and
 * then calling __linker_init_post_relocation().
 *
 * Because this method is called before the linker has fixed it's own
 * relocations, any attempt to reference an extern variable, extern
 * function, or other GOT reference will generate a segfault.
 */
extern "C" ElfW(Addr) __linker_init(void* raw_args) {
  // Initialize TLS early so system calls and errno work.
  KernelArgumentBlock args(raw_args);
  bionic_tcb temp_tcb = {};
  __libc_init_main_thread_early(args, &temp_tcb);

  // When the linker is run by itself (rather than as an interpreter for
  // another program), AT_BASE is 0.
  ElfW(Addr) linker_addr = getauxval(AT_BASE);
  if (linker_addr == 0) {
    // The AT_PHDR and AT_PHNUM aux values describe this linker instance, so use
    // the phdr to find the linker's base address.
    ElfW(Addr) load_bias;
    get_elf_base_from_phdr(
      reinterpret_cast<ElfW(Phdr)*>(getauxval(AT_PHDR)), getauxval(AT_PHNUM),
      &linker_addr, &load_bias);
  }

  ElfW(Ehdr)* elf_hdr = reinterpret_cast<ElfW(Ehdr)*>(linker_addr);
  ElfW(Phdr)* phdr = reinterpret_cast<ElfW(Phdr)*>(linker_addr + elf_hdr->e_phoff);

  soinfo tmp_linker_so(nullptr, nullptr, nullptr, 0, 0);

  tmp_linker_so.base = linker_addr;
  tmp_linker_so.size = phdr_table_get_load_size(phdr, elf_hdr->e_phnum);
  tmp_linker_so.load_bias = get_elf_exec_load_bias(elf_hdr);
  tmp_linker_so.dynamic = nullptr;
  tmp_linker_so.phdr = phdr;
  tmp_linker_so.phnum = elf_hdr->e_phnum;
  tmp_linker_so.set_linker_flag();

  // Prelink the linker so we can access linker globals.
  if (!tmp_linker_so.prelink_image()) __linker_cannot_link(args.argv[0]);

  // This might not be obvious... The reasons why we pass g_empty_list
  // in place of local_group here are (1) we do not really need it, because
  // linker is built with DT_SYMBOLIC and therefore relocates its symbols against
  // itself without having to look into local_group and (2) allocators
  // are not yet initialized, and therefore we cannot use linked_list.push_*
  // functions at this point.
  if (!tmp_linker_so.link_image(g_empty_list, g_empty_list, nullptr, nullptr)) __linker_cannot_link(args.argv[0]);
   
   // 继续调用  __linker_init_post_relocation 方法
  return __linker_init_post_relocation(args, tmp_linker_so);
}

3.3.1 __linker_init_post_relocation()

/*
 * This code is called after the linker has linked itself and fixed its own
 * GOT. It is safe to make references to externs and other non-local data at
 * this point. The compiler sometimes moves GOT references earlier in a
 * function, so avoid inlining this function (http://b/80503879).
 */
static ElfW(Addr) __attribute__((noinline))
__linker_init_post_relocation(KernelArgumentBlock& args, soinfo& tmp_linker_so) {
  //... 省略
  //  继续调用linker_main() 
  ElfW(Addr) start_address = linker_main(args, exe_to_load);

  if (g_is_ldd) _exit(EXIT_SUCCESS);

  INFO("[ Jumping to _start (%p)... ]", reinterpret_cast<void*>(start_address));

  // Return the address that the calling assembly stub should jump to.
  return start_address;
}

3.3.2 linker_main()

bionic/linker/linker_main.cpp

static ElfW(Addr) linker_main(KernelArgumentBlock& args, const char* exe_to_load) {
  ProtectedDataGuard guard;

#if TIMING
  struct timeval t0, t1;
  gettimeofday(&t0, 0);
#endif
   // ...

  // Register the debuggerd signal handler.
  // 注册信号处理器 
  linker_debuggerd_init();

  g_linker_logger.ResetState();

  // Get a few environment variables.
  const char* LD_DEBUG = getenv("LD_DEBUG");
  if (LD_DEBUG != nullptr) {
    g_ld_debug_verbosity = atoi(LD_DEBUG);
  }

// ...

}

3.4 linker_debuggerd_init()

bionic/linker/linker_debuggerd_android.cpp

void linker_debuggerd_init() {
  // There may be a version mismatch between the bootstrap linker and the crash_dump in the APEX,
  // so don't pass in any process info from the bootstrap linker.
  debuggerd_callbacks_t callbacks = {
#if defined(__ANDROID_APEX__)
      .get_process_info = get_process_info,
#endif
      .post_dump = notify_gdb_of_libraries,
  };
  
  debuggerd_init(&callbacks);
}

3.5 debuggerd_init()

system/core/debuggerd/handler/debuggerd_handler.cpp

void debuggerd_init(debuggerd_callbacks_t* callbacks) {
  if (callbacks) {
    g_callbacks = *callbacks;
  }

  size_t thread_stack_pages = 8;
  void* thread_stack_allocation = mmap(nullptr, PAGE_SIZE * (thread_stack_pages + 2), PROT_NONE,
                                       MAP_ANONYMOUS | MAP_PRIVATE, -1, 0);
  if (thread_stack_allocation == MAP_FAILED) {
    fatal_errno("failed to allocate debuggerd thread stack");
  }

  char* stack = static_cast<char*>(thread_stack_allocation) + PAGE_SIZE;
  if (mprotect(stack, PAGE_SIZE * thread_stack_pages, PROT_READ | PROT_WRITE) != 0) {
    fatal_errno("failed to mprotect debuggerd thread stack");
  }

  // Stack grows negatively, set it to the last byte in the page...
  stack = (stack + thread_stack_pages * PAGE_SIZE - 1);
  // and align it.
  stack -= 15;
  pseudothread_stack = stack;

  struct sigaction action;
  memset(&action, 0, sizeof(action));
  sigfillset(&action.sa_mask);
  // 注册信号处理回调函数  
  action.sa_sigaction = debuggerd_signal_handler;
  action.sa_flags = SA_RESTART | SA_SIGINFO;

  // Use the alternate signal stack if available so we can catch stack overflows.
  action.sa_flags |= SA_ONSTACK;

#define SA_EXPOSE_TAGBITS 0x00000800
  // Request that the kernel set tag bits in the fault address. This is necessary for diagnosing MTE
  // faults.
  action.sa_flags |= SA_EXPOSE_TAGBITS;

  debuggerd_register_handlers(&action);
}

3.5.1 debuggerd_register_handlers

system/core/debuggerd/include/debuggerd/handler.h

static void __attribute__((__unused__)) debuggerd_register_handlers(struct sigaction* action) {
  bool enabled = true;
#if ANDROID_DEBUGGABLE
  char value[PROP_VALUE_MAX] = "";
  enabled = !(__system_property_get("debug.debuggerd.disable", value) > 0 && !strcmp(value, "1"));
#endif
    // 注册信号 
  if (enabled) {
    sigaction(SIGABRT, action, nullptr);
    sigaction(SIGBUS, action, nullptr);
    sigaction(SIGFPE, action, nullptr);
    sigaction(SIGILL, action, nullptr);
    sigaction(SIGSEGV, action, nullptr);
    sigaction(SIGSTKFLT, action, nullptr);
    sigaction(SIGSYS, action, nullptr);
    sigaction(SIGTRAP, action, nullptr);
  }

  sigaction(BIONIC_SIGNAL_DEBUGGER, action, nullptr);
}

至此,应用完成了信号注册,当进程发生崩溃时候,内核会发送信号给信号注册器。从而回调debuggerd_signal_handler()

3.5.2 debuggerd_signal_handler()

// Handler that does crash dumping by forking and doing the processing in the child.
// Do this by ptracing the relevant thread, and then execing debuggerd to do the actual dump.
// 当崩溃发生时,通过fork子进程来dump各个线程的状态,如寄存器、pc值
static void debuggerd_signal_handler(int signal_number, siginfo_t* info, void* context) {
   // ...
   
   // Only allow one thread to handle a signal at a time.
   // 上锁 一次值允许一个线程进入
  int ret = pthread_mutex_lock(&crash_mutex);
  if (ret != 0) {
    async_safe_format_log(ANDROID_LOG_INFO, "libc", "pthread_mutex_lock failed: %s", strerror(ret));
    return;
  } 
    
   // ...
   
   // Set PR_SET_DUMPABLE to 1, so that crash_dump can ptrace us.
  int orig_dumpable = prctl(PR_GET_DUMPABLE);
  // 首先让父进程 dumpable,这样子进程才可以ptrace父进程
  if (prctl(PR_SET_DUMPABLE, 1) != 0) {
    fatal_errno("failed to set dumpable");
  }

  // ...

  // Essentially pthread_create without CLONE_FILES, so we still work during file descriptor
  // exhaustion.
  // clone()方法,本质上是通过开线程的方式来创建子进程(没有clone文件),因此,即使父进程的fd被耗尽,子进程也能正常工作
  pid_t child_pid =
    clone(debuggerd_dispatch_pseudothread, pseudothread_stack,
          CLONE_THREAD | CLONE_SIGHAND | CLONE_VM | CLONE_CHILD_SETTID | CLONE_CHILD_CLEARTID,
          &thread_info, nullptr, nullptr, &thread_info.pseudothread_tid);
          
  if (child_pid == -1) {
    fatal_errno("failed to spawn debuggerd dispatch thread");
  }

  // Wait for the child to start... 等待子进程结束
  futex_wait(&thread_info.pseudothread_tid, -1);

  // and then wait for it to terminate.
  futex_wait(&thread_info.pseudothread_tid, child_pid);
    
  // 恢复dumpable前的状态 
  // Restore PR_SET_DUMPABLE to its original value.
  if (prctl(PR_SET_DUMPABLE, orig_dumpable) != 0) {
    fatal_errno("failed to restore dumpable");
  }

  // Restore PR_SET_PTRACER to its original value.
  if (restore_orig_ptracer && prctl(PR_SET_PTRACER, 0) != 0) {
    fatal_errno("failed to restore traceable");
  }

  //... 
  
}

clone()出子进程,然后执行debuggerd_dispatch_pseudothread()操作。继续看看。

3.5.3 debuggerd_dispatch_pseudothread

system/core/debuggerd/handler/debuggerd_handler.cpp


#if defined(__LP64__)
#define CRASH_DUMP_NAME "crash_dump64"
#else
#define CRASH_DUMP_NAME "crash_dump32"
#endif

#define CRASH_DUMP_PATH "/apex/com.android.runtime/bin/" CRASH_DUMP_NAME

static int debuggerd_dispatch_pseudothread(void* arg) {
  debugger_thread_info* thread_info = static_cast<debugger_thread_info*>(arg);

    // ...
 
  // Don't use fork(2) to avoid calling pthread_atfork handlers.
  // 通过clone出子进程,执行crash_dump64 程序
  pid_t crash_dump_pid = __fork();
  if (crash_dump_pid == -1) {
    async_safe_format_log(ANDROID_LOG_FATAL, "libc",
                          "failed to fork in debuggerd signal handler: %s", strerror(errno));
  } else if (crash_dump_pid == 0) {
    // 子进程环境 
   //...

    execle(CRASH_DUMP_PATH, CRASH_DUMP_NAME, main_tid, pseudothread_tid, debuggerd_dump_type,
           nullptr, nullptr);
    async_safe_format_log(ANDROID_LOG_FATAL, "libc", "failed to exec crash_dump helper: %s",
                          strerror(errno));
    return 1;
  }

   // ...

}

这里,我们考虑64位程序。 CRASH_DUMP_NAME 就是 crash_dump64。会进入到crash_dump.cppmain() 方法中。

3.6 crash_dump64

system/core/debuggerd/crash_dump.cpp

int main(int argc, char** argv) {
  //...
  
     pid_t forkpid = fork();
  if (forkpid == -1) {
    PLOG(FATAL) << "fork failed";
  } else if (forkpid == 0) {
    fork_exit_read.reset();
  } else {
    // We need the pseudothread to live until we get around to verifying the vm pid against it.
    // The last thing it does is block on a waitpid on us, so wait until our child tells us to die.
    fork_exit_write.reset();
    char buf;
    TEMP_FAILURE_RETRY(read(fork_exit_read.get(), &buf, sizeof(buf)));
    _exit(0);
    
  }
  
  Initialize(argv);
  ParseArgs(argc, argv, &pseudothread_tid, &dump_type);

  // Die if we take too long.
  //
  // Note: processes with many threads and minidebug-info can take a bit to
  //       unwind, do not make this too small. b/62828735
  alarm(30);

  // Get the process name (aka cmdline).
  std::string process_name = get_process_name(g_target_thread);
  
  // 开始遍历所有线程,dump出寄存器、pc值等
  // In order to reduce the duration that we pause the process for, we ptrace
  // the threads, fetch their registers and associated information, and then
  // fork a separate process as a snapshot of the process's address space.
  std::set<pid_t> threads;
  if (!android::procinfo::GetProcessTids(g_target_thread, &threads)) {
    PLOG(FATAL) << "failed to get process threads";
  }

  std::map<pid_t, ThreadInfo> thread_info;
  siginfo_t siginfo;
  std::string error;

  {
    ATRACE_NAME("ptrace");
    for (pid_t thread : threads) {
      // Trace the pseudothread separately, so we can use different options.
      // 如果crash_dump本身的主线程,先跳过
      if (thread == pseudothread_tid) {
        continue;
      }

      if (!ptrace_seize_thread(target_proc_fd, thread, &error)) {
        bool fatal = thread == g_target_thread;
        LOG(fatal ? FATAL : WARNING) << error;
      }

      ThreadInfo info;
      info.pid = target_process;
      info.tid = thread;
      info.uid = getuid();
      info.process_name = process_name;
      info.thread_name = get_thread_name(thread);

      if (!ptrace_interrupt(thread, &info.signo)) {
        PLOG(WARNING) << "failed to ptrace interrupt thread " << thread;
        ptrace(PTRACE_DETACH, thread, 0, 0);
        continue;
      }

      if (thread == g_target_thread) { // 如果是崩溃线程,根据奔溃信息获取寄存器等信息
        // Read the thread's registers along with the rest of the crash info out of the pipe.
        ReadCrashInfo(input_pipe, &siginfo, &info.registers, &abort_msg_address,
                      &fdsan_table_address);
        info.siginfo = &siginfo;
        info.signo = info.siginfo->si_signo;
      } else { // 不是崩溃线程,则通过RemoteGet来获取
        info.registers.reset(unwindstack::Regs::RemoteGet(thread));
        if (!info.registers) {
          PLOG(WARNING) << "failed to fetch registers for thread " << thread;
          ptrace(PTRACE_DETACH, thread, 0, 0);
          continue;
        }
      }

      thread_info[thread] = std::move(info);
    }
  }

  // 连接tombstoned守护进程 
  {
    ATRACE_NAME("tombstoned_connect");
    LOG(INFO) << "obtaining output fd from tombstoned, type: " << dump_type;
    g_tombstoned_connected =
        tombstoned_connect(g_target_thread, &g_tombstoned_socket, &g_output_fd, dump_type);
  }

  if (g_tombstoned_connected) {
    if (TEMP_FAILURE_RETRY(dup2(g_output_fd.get(), STDOUT_FILENO)) == -1) {
      PLOG(ERROR) << "failed to dup2 output fd (" << g_output_fd.get() << ") to STDOUT_FILENO";
    }
  } else {
    unique_fd devnull(TEMP_FAILURE_RETRY(open("/dev/null", O_RDWR)));
    TEMP_FAILURE_RETRY(dup2(devnull.get(), STDOUT_FILENO));
    g_output_fd = std::move(devnull);
  }
  
  
  
  int signo = siginfo.si_signo;
  bool fatal_signal = signo != DEBUGGER_SIGNAL;
  bool backtrace = false;

  // si_value is special when used with DEBUGGER_SIGNAL.
  //   0: dump tombstone
  //   1: dump backtrace
  if (!fatal_signal) {
    int si_val = siginfo.si_value.sival_int;
    if (si_val == 0) {
      backtrace = false;
    } else if (si_val == 1) {
      backtrace = true;
    } else {
      LOG(WARNING) << "unknown si_value value " << si_val;
    }
  }

  // TODO: Use seccomp to lock ourselves down.
  unwindstack::UnwinderFromPid unwinder(256, vm_pid);
  if (!unwinder.Init(unwindstack::Regs::CurrentArch())) {
    LOG(FATAL) << "Failed to init unwinder object.";
  }
    // 根据信号来决定如何dump
   if (backtrace) { 
    ATRACE_NAME("dump_backtrace");
    // 是否需要dump 堆栈
    dump_backtrace(std::move(g_output_fd), &unwinder, thread_info, g_target_thread);
  } else {
    {
      ATRACE_NAME("fdsan table dump");
      populate_fdsan_table(&open_files, unwinder.GetProcessMemory(), fdsan_table_address);
    }

    {
      ATRACE_NAME("engrave_tombstone");
      // 生成 tombstone 文件。
      engrave_tombstone(std::move(g_output_fd), &unwinder, thread_info, g_target_thread,
                        abort_msg_address, &open_files, &amfd_data);
    }
  }
  // 至此,tombstone文件生成完毕,
   // 如果是致命的信号,则通知AMS,进入framework层崩溃流程
  if (fatal_signal) {
    // Don't try to notify ActivityManager if it just crashed, or we might hang until timeout.
    if (thread_info[target_process].thread_name != "system_server") {
      activity_manager_notify(target_process, signo, amfd_data);
    }
  }

     // Close stdout before we notify tombstoned of completion.
     //通知tombstoned进程,我们dump完成
    close(STDOUT_FILENO);
    if (g_tombstoned_connected && !tombstoned_notify_completion(g_tombstoned_socket.get())) {
      LOG(ERROR) << "failed to notify tombstoned of completion";
    }

    return 0;

}

crash_dump进程的主要职责:

3.6.1 engrave_tombstone()

system/core/debuggerd/libdebuggerd/tombstone.cpp

void engrave_tombstone(unique_fd output_fd, unwindstack::Unwinder* unwinder,
                       const std::map<pid_t, ThreadInfo>& threads, pid_t target_thread,
                       uint64_t abort_msg_address, OpenFilesList* open_files,
                       std::string* amfd_data) {
  // don't copy log messages to tombstone unless this is a dev device
  bool want_logs = android::base::GetBoolProperty("ro.debuggable", false);

  log_t log;
  log.current_tid = target_thread;
  log.crashed_tid = target_thread;
  log.tfd = output_fd.get();
  log.amfd_data = amfd_data;

  _LOG(&log, logtype::HEADER, "*** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***\n");
  // dump 信息头
  dump_header_info(&log);
  // dump 时间戳
  dump_timestamp(&log, time(nullptr));

  auto it = threads.find(target_thread);
  if (it == threads.end()) {
    LOG(FATAL) << "failed to find target thread";
  }
  // dump 崩溃线程
  dump_thread(&log, unwinder, it->second, abort_msg_address, true);

  if (want_logs) {
  // dump 日志
    dump_logs(&log, it->second.pid, 50);
  }

  for (auto& [tid, thread_info] : threads) {
    if (tid == target_thread) {
      continue;
    }
    // dump 所有线程
    dump_thread(&log, unwinder, thread_info, 0, false);
  }

  if (open_files) {
    _LOG(&log, logtype::OPEN_FILES, "\nopen files:\n");
    // dump fd
    dump_open_files_list(&log, *open_files, "    ");
  }

  if (want_logs) {
  // 抓取log
    dump_logs(&log, it->second.pid, 0);
  }
}

经过对比生成的tombstone文件,确实是按照这个流程生成的tombstones文件。

3.6.2 dump_header_info()

static void dump_header_info(log_t* log) {
  auto fingerprint = GetProperty("ro.build.fingerprint", "unknown");
  auto revision = GetProperty("ro.revision", "unknown");

  _LOG(log, logtype::HEADER, "Build fingerprint: '%s'\n", fingerprint.c_str());
  _LOG(log, logtype::HEADER, "Revision: '%s'\n", revision.c_str());
  _LOG(log, logtype::HEADER, "ABI: '%s'\n", ABI_STRING);
}

写入指纹、版本、ABI架构等信息。

3.6.3 dump_timestamp()

static void dump_timestamp(log_t* log, time_t time) {
  struct tm tm;
  localtime_r(&time, &tm);

  char buf[strlen("1970-01-01 00:00:00+0830") + 1];
  strftime(buf, sizeof(buf), "%F %T%z", &tm);
  _LOG(log, logtype::HEADER, "Timestamp: %s\n", buf);
}

写入时间戳。

3.6.4 dump_thread() 崩溃线程

static bool dump_thread(log_t* log, unwindstack::Unwinder* unwinder, const ThreadInfo& thread_info,
                        uint64_t abort_msg_address, bool primary_thread) {
  log->current_tid = thread_info.tid;
  if (!primary_thread) {
    _LOG(log, logtype::THREAD, "--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---\n");
  }
  // 打印崩溃线程的信息 
  dump_thread_info(log, thread_info);

// dump 崩溃信号信息
  if (thread_info.siginfo) {
    // dump 信号值
    dump_signal_info(log, thread_info, unwinder->GetProcessMemory().get());
    //dump 引起崩溃的原因
    dump_probable_cause(log, thread_info.siginfo, unwinder->GetMaps());
  }
    // 如果是崩溃线程,则dump abort信息
  if (primary_thread) {
    dump_abort_message(log, unwinder->GetProcessMemory().get(), abort_msg_address);
  }
   // dump寄存器地址
  dump_registers(log, thread_info.registers.get());
  // Unwind will mutate the registers, so make a copy first.
  // unwind会改变寄存器的值,先保存
  std::unique_ptr<unwindstack::Regs> regs_copy(thread_info.registers->Clone());
  unwinder->SetRegs(regs_copy.get());
  // 开始unwind 调用堆栈
  unwinder->Unwind();
  if (unwinder->NumFrames() == 0) {
    _LOG(log, logtype::THREAD, "Failed to unwind");
  } else {
  // 打印堆栈
    _LOG(log, logtype::BACKTRACE, "\nbacktrace:\n");
    log_backtrace(log, unwinder, "    ");

    _LOG(log, logtype::STACK, "\nstack:\n");
    // dump栈段内存 
    dump_stack(log, unwinder->frames(), unwinder->GetMaps(), unwinder->GetProcessMemory().get());
  }

  if (primary_thread) {
    unwindstack::Maps* maps = unwinder->GetMaps();
    // dump 寄存器的地址附近的stack 
    dump_memory_and_code(log, maps, unwinder->GetProcessMemory().get(),
                         thread_info.registers.get());
    if (maps != nullptr) {
      uint64_t addr = 0;
      siginfo_t* si = thread_info.siginfo;
      if (signal_has_si_addr(si)) {
        addr = reinterpret_cast<uint64_t>(si->si_addr);
      }
      // dump 所有的map内存快照
      dump_all_maps(log, unwinder, addr);
    }
  }

  log->current_tid = log->crashed_tid;
  return true;
}

// 具体dump方法:
写入崩溃进程的 进程id、线程id、线程名字,包名。
写入崩溃线程所属的用户id。
static void dump_thread_info(log_t* log, const ThreadInfo& thread_info) {
  // Blacklist logd, logd.reader, logd.writer, logd.auditd, logd.control ...
  // TODO: Why is this controlled by thread name?
  if (thread_info.thread_name == "logd" ||
      android::base::StartsWith(thread_info.thread_name, "logd.")) {
    log->should_retrieve_logcat = false;
  }

  _LOG(log, logtype::HEADER, "pid: %d, tid: %d, name: %s  >>> %s <<<\n", thread_info.pid,
       thread_info.tid, thread_info.thread_name.c_str(), thread_info.process_name.c_str());
  _LOG(log, logtype::HEADER, "uid: %d\n", thread_info.uid);
}

// 打印
static void dump_signal_info(log_t* log, const ThreadInfo& thread_info,
                             unwindstack::Memory* process_memory) {
  char addr_desc[64];  // ", fault addr 0x1234"
  // 崩溃的内存地址
  if (signal_has_si_addr(thread_info.siginfo)) {
    void* addr = thread_info.siginfo->si_addr;
    if (thread_info.siginfo->si_signo == SIGILL) {
      uint32_t instruction = {};
      process_memory->Read(reinterpret_cast<uint64_t>(addr), &instruction, sizeof(instruction));
      snprintf(addr_desc, sizeof(addr_desc), "%p (*pc=%#08x)", addr, instruction);
    } else {
      snprintf(addr_desc, sizeof(addr_desc), "%p", addr);
    }
  } else {
    snprintf(addr_desc, sizeof(addr_desc), "--------");
  }
    
  char sender_desc[32] = {};  // " from pid 1234, uid 666"
  if (signal_has_sender(thread_info.siginfo, thread_info.pid)) {
    get_signal_sender(sender_desc, sizeof(sender_desc), thread_info.siginfo);
  }
    // 打印信号值、信号内存地址 
  _LOG(log, logtype::HEADER, "signal %d (%s), code %d (%s%s), fault addr %s\n",
       thread_info.siginfo->si_signo, get_signame(thread_info.siginfo),
       thread_info.siginfo->si_code, get_sigcode(thread_info.siginfo), sender_desc, addr_desc);
}

// dump abort信息 
static void dump_abort_message(log_t* log, unwindstack::Memory* process_memory, uint64_t address) {
  if (address == 0) {
    return;
  }

  size_t length;
  if (!process_memory->ReadFully(address, &length, sizeof(length))) {
    _LOG(log, logtype::HEADER, "Failed to read abort message header: %s\n", strerror(errno));
    return;
  }

  // The length field includes the length of the length field itself.
  if (length < sizeof(size_t)) {
    _LOG(log, logtype::HEADER, "Abort message header malformed: claimed length = %zd\n", length);
    return;
  }

  length -= sizeof(size_t);

  // The abort message should be null terminated already, but reserve a spot for NUL just in case.
  std::vector<char> msg(length + 1);
  if (!process_memory->ReadFully(address + sizeof(length), &msg[0], length)) {
    _LOG(log, logtype::HEADER, "Failed to read abort message: %s\n", strerror(errno));
    return;
  }

  _LOG(log, logtype::HEADER, "Abort message: '%s'\n", &msg[0]);
}

// dump 引起崩溃的原因
static void dump_probable_cause(log_t* log, const siginfo_t* si, unwindstack::Maps* maps) {
  std::string cause;
  if (si->si_signo == SIGSEGV && si->si_code == SEGV_MAPERR) {
    if (si->si_addr < reinterpret_cast<void*>(4096)) {
      cause = StringPrintf("null pointer dereference");
    } else if (si->si_addr == reinterpret_cast<void*>(0xffff0ffc)) {
      cause = "call to kuser_helper_version";
    } else if (si->si_addr == reinterpret_cast<void*>(0xffff0fe0)) {
      cause = "call to kuser_get_tls";
    } else if (si->si_addr == reinterpret_cast<void*>(0xffff0fc0)) {
      cause = "call to kuser_cmpxchg";
    } else if (si->si_addr == reinterpret_cast<void*>(0xffff0fa0)) {
      cause = "call to kuser_memory_barrier";
    } else if (si->si_addr == reinterpret_cast<void*>(0xffff0f60)) {
      cause = "call to kuser_cmpxchg64";
    }
  } else if (si->si_signo == SIGSEGV && si->si_code == SEGV_ACCERR) {
    unwindstack::MapInfo* map_info = maps->Find(reinterpret_cast<uint64_t>(si->si_addr));
    if (map_info != nullptr && map_info->flags == PROT_EXEC) {
      cause = "execute-only (no-read) memory access error; likely due to data in .text.";
    }
  } else if (si->si_signo == SIGSYS && si->si_code == SYS_SECCOMP) {
    cause = StringPrintf("seccomp prevented call to disallowed %s system call %d", ABI_STRING,
                         si->si_syscall);
  }

  if (!cause.empty()) _LOG(log, logtype::HEADER, "Cause: %s\n", cause.c_str());
}


// dmp=寄存器信息地址
void dump_registers(log_t* log, unwindstack::Regs* regs) {
  // Split lr/sp/pc into their own special row.
  static constexpr size_t column_count = 4;
  std::vector<std::pair<std::string, uint64_t>> current_row;
  std::vector<std::pair<std::string, uint64_t>> special_row;

#if defined(__arm__) || defined(__aarch64__)
  static constexpr const char* special_registers[] = {"ip", "lr", "sp", "pc"};
#elif defined(__i386__)
  static constexpr const char* special_registers[] = {"ebp", "esp", "eip"};
#elif defined(__x86_64__)
  static constexpr const char* special_registers[] = {"rbp", "rsp", "rip"};
#else
  static constexpr const char* special_registers[] = {};
#endif

  regs->IterateRegisters([log, &current_row, &special_row](const char* name, uint64_t value) {
    auto row = &current_row;
    for (const char* special_name : special_registers) {
      if (strcmp(special_name, name) == 0) {
        row = &special_row;
        break;
      }
    }

    row->emplace_back(name, value);
    if (current_row.size() == column_count) {
      print_register_row(log, current_row);
      current_row.clear();
    }
  });

  if (!current_row.empty()) {
    print_register_row(log, current_row);
  }

  print_register_row(log, special_row);
}

3.6.5 dump_thread() 非崩溃线程

同上。

3.6.6 dump_logs()

// Dumps the logs generated by the specified pid to the tombstone, from both
// "system" and "main" log devices.  Ideally we'd interleave the output.
static void dump_logs(log_t* log, pid_t pid, unsigned int tail) {
  if (pid == getpid()) {
    // Cowardly refuse to dump logs while we're running in-process.
    return;
  }
    // dump system、main的日志
  dump_log_file(log, pid, "system", tail);
  dump_log_file(log, pid, "main", tail);
}


static void dump_log_file(log_t* log, pid_t pid, const char* filename, unsigned int tail) {
  bool first = true;
  logger_list* logger_list;

  if (!log->should_retrieve_logcat) {
    return;
  }

  logger_list = android_logger_list_open(
      android_name_to_log_id(filename), ANDROID_LOG_RDONLY | ANDROID_LOG_NONBLOCK, tail, pid);

  if (!logger_list) {
    ALOGE("Unable to open %s: %s\n", filename, strerror(errno));
    return;
  }

  while (true) {
    log_msg log_entry;
    ssize_t actual = android_logger_list_read(logger_list, &log_entry);

    if (actual < 0) {
      if (actual == -EINTR) {
        // interrupted by signal, retry
        continue;
      } else if (actual == -EAGAIN) {
        // non-blocking EOF; we're done
        break;
      } else {
        ALOGE("Error while reading log: %s\n", strerror(-actual));
        break;
      }
    } else if (actual == 0) {
      ALOGE("Got zero bytes while reading log: %s\n", strerror(errno));
      break;
    }

    // NOTE: if you ALOGV something here, this will spin forever,
    // because you will be writing as fast as you're reading.  Any
    // high-frequency debug diagnostics should just be written to
    // the tombstone file.

    if (first) {
      _LOG(log, logtype::LOGS, "--------- %slog %s\n",
        tail ? "tail end of " : "", filename);
      first = false;
    }

    // Msg format is: <priority:1><tag:N>\0<message:N>\0
    //
    // We want to display it in the same format as "logcat -v threadtime"
    // (although in this case the pid is redundant).
    char timeBuf[32];
    time_t sec = static_cast<time_t>(log_entry.entry.sec);
    struct tm tmBuf;
    struct tm* ptm;
    ptm = localtime_r(&sec, &tmBuf);
    strftime(timeBuf, sizeof(timeBuf), "%m-%d %H:%M:%S", ptm);

    if (log_entry.id() == LOG_ID_EVENTS) {
      if (!g_eventTagMap) {
        g_eventTagMap = android_openEventTagMap(nullptr);
      }
      AndroidLogEntry e;
      char buf[512];
      if (android_log_processBinaryLogBuffer(&log_entry.entry_v1, &e, g_eventTagMap, buf,
                                             sizeof(buf)) == 0) {
        _LOG(log, logtype::LOGS, "%s.%03d %5d %5d %c %-8.*s: %s\n", timeBuf,
             log_entry.entry.nsec / 1000000, log_entry.entry.pid, log_entry.entry.tid, 'I',
             (int)e.tagLen, e.tag, e.message);
      }
      continue;
    }

    char* msg = log_entry.msg();
    if (msg == nullptr) {
      continue;
    }
    unsigned char prio = msg[0];
    char* tag = msg + 1;
    msg = tag + strlen(tag) + 1;

    // consume any trailing newlines
    char* nl = msg + strlen(msg) - 1;
    while (nl >= msg && *nl == '\n') {
      *nl-- = '\0';
    }

    static const char* kPrioChars = "!.VDIWEFS";
    char prioChar = (prio < strlen(kPrioChars) ? kPrioChars[prio] : '?');

    // Look for line breaks ('\n') and display each text line
    // on a separate line, prefixed with the header, like logcat does.
    do {
      nl = strchr(msg, '\n');
      if (nl != nullptr) {
        *nl = '\0';
        ++nl;
      }

      _LOG(log, logtype::LOGS, "%s.%03d %5d %5d %c %-8s: %s\n", timeBuf,
           log_entry.entry.nsec / 1000000, log_entry.entry.pid, log_entry.entry.tid, prioChar, tag,
           msg);
    } while ((msg = nl));
  }

  android_logger_list_free(logger_list);
}

3.7 activity_manager_notify()

system/core/debuggerd/crash_dump.cpp

static bool activity_manager_notify(pid_t pid, int signal, const std::string& amfd_data) {
  ATRACE_CALL();
  // AMS 服务端socket路径 /data/system/ndebugsocket
  android::base::unique_fd amfd(socket_local_client(
      "/data/system/ndebugsocket", ANDROID_SOCKET_NAMESPACE_FILESYSTEM, SOCK_STREAM));
  if (amfd.get() == -1) {
    PLOG(ERROR) << "unable to connect to activity manager";
    return false;
  }

  struct timeval tv = {
    .tv_sec = 1,
    .tv_usec = 0,
  };
  if (setsockopt(amfd.get(), SOL_SOCKET, SO_SNDTIMEO, &tv, sizeof(tv)) == -1) {
    PLOG(ERROR) << "failed to set send timeout on activity manager socket";
    return false;
  }
  tv.tv_sec = 3;  // 3 seconds on handshake read
  if (setsockopt(amfd.get(), SOL_SOCKET, SO_RCVTIMEO, &tv, sizeof(tv)) == -1) {
    PLOG(ERROR) << "failed to set receive timeout on activity manager socket";
    return false;
  }

  // Activity Manager protocol: binary 32-bit network-byte-order ints for the
  // pid and signal number, followed by the raw text of the dump, culminating
  // in a zero byte that marks end-of-data.
  // 写入pid
  uint32_t datum = htonl(pid);
  if (!android::base::WriteFully(amfd, &datum, 4)) {
    PLOG(ERROR) << "AM pid write failed";
    return false;
  }
  // 写入信号
  datum = htonl(signal);
  if (!android::base::WriteFully(amfd, &datum, 4)) {
    PLOG(ERROR) << "AM signal write failed";
    return false;
  }
  // 写入dump数据
  if (!android::base::WriteFully(amfd, amfd_data.c_str(), amfd_data.size() + 1)) {
    PLOG(ERROR) << "AM data write failed";
    return false;
  }

  // 3 sec timeout reading the ack; we're fine if the read fails.
  char ack;
  android::base::ReadFully(amfd, &ack, 1);
  return true;
}

通过socket 来通知AMS,执行崩溃流程。

3.8 tombstoned_notify_completion()

通知tombstoned守护进程。

system/core/debuggerd/tombstoned/tombstoned_client.cpp

bool tombstoned_notify_completion(int tombstoned_socket) {
  TombstonedCrashPacket packet = {};
  // 通知 kCompletedDump 类型事件
  packet.packet_type = CrashPacketType::kCompletedDump;
  if (TEMP_FAILURE_RETRY(write(tombstoned_socket, &packet, sizeof(packet))) != sizeof(packet)) {
    return false;
  }
  return true;
}

具体的逻辑是通知 tombstoned进程,crash_dump进程已经完成了 tombstone文件的生成。

3.9 小结

Android8.0前,tombstone文件是由debuggerd进程来生成。 Android8.0后,则是有crash_dump进程来完成,同时通知AMS的socket服务端。最终,通知tombstoned进程。

四、AMS端接收崩溃通知

当tombstone文件生成后,如果是崩溃信号,则会通过socket通知AMS。那么AMS是什么注册这个监听的呢?

4.1 startOtherServices()

SystemServer 进程 startOtherServices()方法中:

private void startOtherServices() {
    //...

    mActivityManagerService.systemReady(() -> {
                Slog.i(TAG, "Making services ready");
                traceBeginAndSlog("StartActivityManagerReadyPhase");
                mSystemServiceManager.startBootPhase(
                        SystemService.PHASE_ACTIVITY_MANAGER_READY);
                traceEnd();
                traceBeginAndSlog("StartObservingNativeCrashes");
                try {
                    // 注册nativeCrash监听
                    mActivityManagerService.startObservingNativeCrashes();
                } catch (Throwable e) {
                    reportWtf("observing native crashes", e);
                }
                //...
              }
              
              //...
}

4.2 startObservingNativeCrashes()

public void startObservingNativeCrashes() {
    final NativeCrashListener ncl = new NativeCrashListener(this);
    ncl.start();
}

NativeCrashListener 继承thread 是一个线程类。看看run()方法:


// Must match the path defined in debuggerd.c.
static final String DEBUGGERD_SOCKET_PATH = "/data/system/ndebugsocket";
    
@Override
public void run() {
    final byte[] ackSignal = new byte[1];

    if (DEBUG) Slog.i(TAG, "Starting up");

    // The file system entity for this socket is created with 0777 perms, owned
    // by system:system. selinux restricts things so that only crash_dump can
    // access it.
    {
        File socketFile = new File(DEBUGGERD_SOCKET_PATH);
        if (socketFile.exists()) {
            socketFile.delete();
        }
    }

    try {
        // 创建服务端 socket 
        // DEBUGGERD_SOCKET_PATH = "/data/system/ndebugsocket";
        FileDescriptor serverFd = Os.socket(AF_UNIX, SOCK_STREAM, 0);
        final UnixSocketAddress sockAddr = UnixSocketAddress.createFileSystem(
                DEBUGGERD_SOCKET_PATH);
        Os.bind(serverFd, sockAddr);
        // 监听 serverFd
        Os.listen(serverFd, 1);
        Os.chmod(DEBUGGERD_SOCKET_PATH, 0777);

        while (true) {
            FileDescriptor peerFd = null;
            try {
                if (MORE_DEBUG) Slog.v(TAG, "Waiting for debuggerd connection");
                // 接收socket 客户端信息
                peerFd = Os.accept(serverFd, null /* peerAddress */);
                if (MORE_DEBUG) Slog.v(TAG, "Got debuggerd socket " + peerFd);
                if (peerFd != null) {
                // system 权限才能执行这里
                    // the reporting thread may take responsibility for
                    // acking the debugger; make sure we play along.
                    // 处理native crash 崩溃
                    consumeNativeCrashData(peerFd);
                }
            } catch (Exception e) {
                Slog.w(TAG, "Error handling connection", e);
            } finally {
               //...
            }
        }
    } catch (Exception e) {
        Slog.e(TAG, "Unable to init native debug socket!", e);
    }
}


职责:

  1. 创建了 "/data/system/ndebugsocket" socket服务端,一直监听
  2. 如果有客户端发送数据,则进行消费

4.2.1 consumeNativeCrashData()

// Read a crash report from the connection
void consumeNativeCrashData(FileDescriptor fd) {
    if (MORE_DEBUG) Slog.i(TAG, "debuggerd connected");
    // 已经和debuggerd进程连接上了
    final byte[] buf = new byte[4096];
    final ByteArrayOutputStream os = new ByteArrayOutputStream(4096);

    try {
        StructTimeval timeout = StructTimeval.fromMillis(SOCKET_TIMEOUT_MILLIS);
        Os.setsockoptTimeval(fd, SOL_SOCKET, SO_RCVTIMEO, timeout);
        Os.setsockoptTimeval(fd, SOL_SOCKET, SO_SNDTIMEO, timeout);
        // 这个AMS的socket只允许 crash_dump 进程来连接。
        // The socket is guarded by an selinux neverallow rule that only
        // permits crash_dump to connect to it. This allows us to trust the
        // received values.
        
        // first, the pid and signal number
        // 首先获取 pid和信号量
        int headerBytes = readExactly(fd, buf, 0, 8);
        if (headerBytes != 8) {
            // protocol failure; give up
            Slog.e(TAG, "Unable to read from debuggerd");
            return;
        }

        int pid = unpackInt(buf, 0);
        int signal = unpackInt(buf, 4);
        if (DEBUG) {
            Slog.v(TAG, "Read pid=" + pid + " signal=" + signal);
        }

        // now the text of the dump
        // dump 内容数据
        if (pid > 0) {
            final ProcessRecord pr;
            synchronized (mAm.mPidsSelfLocked) {
                pr = mAm.mPidsSelfLocked.get(pid);
            }
            if (pr != null) {
                // Don't attempt crash reporting for persistent apps
                if (pr.isPersistent()) {
                    if (DEBUG) {
                        Slog.v(TAG, "Skipping report for persistent app " + pr);
                    }
                    return;
                }

                int bytes;
                do {
                    // get some data
                    bytes = Os.read(fd, buf, 0, buf.length);
                    if (bytes > 0) {
                        if (MORE_DEBUG) {
                            String s = new String(buf, 0, bytes, "UTF-8");
                            Slog.v(TAG, "READ=" + bytes + "> " + s);
                        }
                        // did we just get the EOD null byte?
                        if (buf[bytes-1] == 0) {
                            os.write(buf, 0, bytes-1);  // exclude the EOD token
                            break;
                        }
                        // no EOD, so collect it and read more
                        os.write(buf, 0, bytes);
                    }
                } while (bytes > 0);

                // Okay, we've got the report.
                if (DEBUG) Slog.v(TAG, "processing");

                // Mark the process record as being a native crash so that the
                // cleanup mechanism knows we're still submitting the report
                // even though the process will vanish as soon as we let
                // debuggerd proceed.
                synchronized (mAm) {
                    pr.setCrashing(true);
                    pr.forceCrashReport = true;
                }

                // Crash reporting is synchronous but we want to let debuggerd
                // go about it business right away, so we spin off the actual
                // reporting logic on a thread and let it take it's time.
                final String reportString = new String(os.toByteArray(), "UTF-8");
                // 开启线程上报
                (new NativeCrashReporter(pr, signal, reportString)).start();
            } else {
                Slog.w(TAG, "Couldn't find ProcessRecord for pid " + pid);
            }
        } else {
            Slog.e(TAG, "Bogus pid!");
        }
    } catch (Exception e) {
        Slog.e(TAG, "Exception dealing with report", e);
        // ugh, fail.
    }
}

debuggerd/tombstoned进程读取各种数据,然后在通过 NativeCrashReporter 上报到AMS。

4.2.2 NativeCrashReporter

NativeCrashReporter 也是一个线程。看看 run()方法:

@Override
    public void run() {
        try {
            CrashInfo ci = new CrashInfo();
            ci.exceptionClassName = "Native crash";
            ci.exceptionMessage = Os.strsignal(mSignal);
            ci.throwFileName = "unknown";
            ci.throwClassName = "unknown";
            ci.throwMethodName = "unknown";
            ci.stackTrace = mCrashReport;

            if (DEBUG) Slog.v(TAG, "Calling handleApplicationCrash()");
            // 回到了java崩溃的处理流程, crashType="native_crash"
            mAm.handleApplicationCrashInner("native_crash", mApp, mApp.processName, ci);
            if (DEBUG) Slog.v(TAG, "<-- handleApplicationCrash() returned");
        } catch (Exception e) {
            Slog.e(TAG, "Unable to report native crash", e);
        }
    }

处理逻辑就回调到了java崩溃的流程。 具体可以参考java crash处理流程。

五、tombstone文件解析

** *** *** *** *** *** *** *** *** *** *** *** *** *** *** ***
// dump_header_info 信息
dump_open_files_list
Build fingerprint: 'Android/sdk_phone_x86/generic_x86:11/RSR1.210210.001.A1/7193139:userdebug/dev-keys'
Revision: '0'
ABI: 'x86'
// dump_timestamp() 信息
Timestamp: 2022-08-21 10:22:58+0800

// dump_thread_info 信息
pid: 6623, tid: 6623, name: .xxx.demo  >>> com.xxx.demo <<<
uid: 10122

//dump_signal_info 信息 
signal 11 (SIGSEGV), code 1 (SEGV_MAPERR), fault addr 0x0

//dump_probable_cause 信息
Cause: null pointer dereference

  // dump_registers(log, thread_info.registers.get());
    eax 00000000  ebx bd5b4e88  ecx e3216818  edx 00000001
    edi bd5a966e  esi ffedff70
    ebp ffedffe8  esp ffedff70  eip bd589ec1
    
// dupm 奔溃堆栈,log_backtrace(log, unwinder, "    ");
backtrace:
      #00 pc 00008ec1  /data/app/~~K6Bv_4byGWgutG6D4pB9_w==/com.xxx.
      demo-6IjoxXgs45RHBjfym04DyA==/lib/x86/libinsight.so 
      (Java_com_zygote_insight_SecondActivity_tellJNI+257) (BuildId: 5def31cedbf8b046384064f99845240cf16b4a31)
      #01 pc 001422b2  /apex/com.android.art/lib/libart.so (art_quick_generic_jni_trampoline+82) (BuildId: bf39832c4acabbc939d5c516b6f1d211)
      #02 pc 0013baa2  /apex/com.android.art/lib/libart.so (art_quick_invoke_stub+338) (BuildId: bf39832c4acabbc939d5c516b6f1d211)
      #03 pc 001d0501  /apex/com.android.art/lib/libart.so (art::ArtMethod::Invoke(art::Thread*, unsigned int*, unsigned int, art::JValue*, char const*)+241) (BuildId: bf39832c4acabbc939d5c516b6f1d211)
      #04 pc 00386881  /apex/com.android.art/lib/libart.so (art::interpreter::ArtInterpreterToCompiledCodeBridge(art::Thread*, art::ArtMethod*, art::ShadowFrame*, unsigned short, art::JValue*)+385) (BuildId: bf39832c4acabbc939d5c516b6f1d211)
      ...
   
     // dump_memory_and_code(log, maps, unwinder->GetProcessMemory().get(),
   memory near ebp ([stack]):
    ffedffc8 00ee0048 c3b1266d ffee01d0 0000000c  H...m&..........
    ffedffd8 ffee0010 c3b1266d ffee01d0 0000000c  ....m&..........
    ffedffe8 ffee0010 ec94c2b3 f2781b90 ffee0004  ..........x.....
    ffedfff8 ffee0008 ffee0844 00000002 13495010  ....D........PI.
    ffee0008 13561a10 00000006 d3ed0c74 00000000  ..V.....t.......
    ffee0018 ffee0528 00000908 00000000 00000000  (...............
    ffee0028 00000000 00000000 00000000 13495010  .............PI.
    ffee0038 13561a10 0000000c ffee005c c3b1266d  ..V........m&..
    ffee0048 ffee01d0 ec945aa3 00000000 13495010  .....Z.......PI.
    ffee0058 13561a10 c3b1266a e4b02a10 ecfd21dc  ..V.j&...*...!..
    ffee0068 ffee00f8 ec9da502 d3ed0c74 ffee01c8  ........t.......
    ffee0078 00000008 e4b02a10 ffee0418 c3b1266a  .....*......j&..
    ffee0088 00000067 ec9da42a 00000000 f2740c10  g...*.........t.
    ffee0098 00000001 e4b02abc 000006f0 ecfd21dc  .....*.......!..
    ffee00a8 ffee00e8 ecb76a5b ec8acddc ec8a1f4f  ....[j......O...
    ffee00b8 13561a10 f2802c98 13561a10 00000000  ..V..,....V.....   
    
    ...
    
    // dump_all_maps() 
    memory map (2131 entries):
    12c00000-2abfffff rw-         0  18000000  [anon:dalvik-main space (region space)]
    6383c000-6383efff r--         0      3000  /system/bin/app_process32 (BuildId: c5eedbfb6130af84c3db8e121fb1202e) (load bias 0x1000)
    6383f000-63842fff r-x      2000      4000  /system/bin/app_process32 (BuildId: c5eedbfb6130af84c3db8e121fb1202e) (load bias 0x1000)
    63843000-63843fff r--      5000      1000  /system/bin/app_process32 (BuildId: c5eedbfb6130af84c3db8e121fb120) (load bias 0x1000)
    63844000-63844fff rw-      5000      1000  /system/bin/app_process32 (BuildId: c5eedbfb6130af84c3db8e121fb120) (load bias 0x1000)
    63845000-63845fff rw-         0      1000  [anon:.bss]
    70ed1000-710adfff rw-         0    1dd000  [anon:dalvik-/apex/com.android.art/javalib/boot.art]
    710ae000-710f0fff rw-         0     43000  [anon:dalvik-/apex/com.android.art/javalib/boot-core-libart.art]
    710f1000-7118efff rw-         0     9e000  [anon:dalvik-/apex/com.android.art/javalib/boot-core-icu4j.art]
    7118f000-711b7fff rw-         0     29000  [anon:dalvik-/apex/com.android.art/javalib/boot-okhttp.art]
    711b8000-711edfff rw-         0     36000  [anon:dalvik-/apex/com.android.art/javalib/boot-bouncycastle.art]
    711ee000-711f8fff rw-         0      b000  [anon:dalvik-/apex/com.android.art/javalib/boot-apache-xml.art]
    711f9000-7127afff r--         0     82000  /apex/com.android.art/javalib/x86/boot.oat (BuildId: 
    ...
    
    // dump_logs() 
    --------- tail end of log main
08-21 10:22:50.483  6623  6660 I insight : onAvailable(P:6623)(T:457)(C:NetworkModule)at (NetworkModule.java:68)
08-21 10:22:50.485  6623  6660 I insight : sFirstCheck: false, networkAvailable: true(P:6623)(T:457)(C:NetworkModule)at (NetworkModule.java:42)

08-21 10:22:50.486  6623  6660 D insight : No subscribers registered for event class com.tcloud.core.module.NetworkModule$OnNetworkChange(P:6623)(T:457)(C:CoreEventBus)at (CoreUtils.java:145)
08-21 10:22:50.487  6623  6660 D insight : No subscribers registered for event class org.greenrobot.eventbus.NoSubscriberEvent(P:6623)(T:457)(C:CoreEventBus)at (CoreUtils.java:145)

08-21 10:22:50.522  6623  6660 I insight : LeakCanary.install (P:6623)(T:2)(C:LeakHelper)at (LeakHelper.java:20)
08-21 10:22:50.600  6623  6705 D libEGL  : loaded /vendor/lib/egl/libEGL_emulation.so
08-21 10:22:50.607  6623  6705 D libEGL  : loaded /vendor/lib/egl/libGLESv1_CM_emulation.so
08-21 10:22:50.615  6623  6705 D libEGL  : loaded /vendor/lib/egl/libGLESv2_emulation.so
08-21 10:22:50.693  6623  6660 I insight : created: MainActivity(P:6623)(T:2)(C:AppLifeCycleHelper)at (AppLifeCycleHelper.java:37)
    ...
    
    // dump 所有线程
    --- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
pid: 6623, tid: 6644, name: Signal Catcher  >>> com.demo.xxx <<<
uid: 10122
    eax fffffffc  ebx d9e5f070  ecx 00000000  edx 00000000
    edi 80000204  esi 00000008
    ebp d9e5f098  esp d9e5f038  eip f2ee0b99

backtrace:
      #00 pc 00000b99  [vdso] (__kernel_vsyscall+9)
      #01 pc 000ce821  /apex/com.android.runtime/lib/bionic/libc.so (__rt_sigtimedwait+33) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)
      #02 pc 00086c55  /apex/com.android.runtime/lib/bionic/libc.so (sigwait+69) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)
      #03 pc 00667d7d  /apex/com.android.art/lib/libart.so (art::SignalCatcher::WaitForSignal(art::Thread*, art::SignalSet&)+461) (BuildId: bf39832c4acabbc939d5c516b6f1d211)
      #04 pc 0066675f  /apex/com.android.art/lib/libart.so (art::SignalCatcher::Run(void*)+479) (BuildId: bf39832c4acabbc939d5c516b6f1d211)
      #05 pc 000e6974  /apex/com.android.runtime/lib/bionic/libc.so (__pthread_start(void*)+100) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)
      #06 pc 00078567  /apex/com.android.runtime/lib/bionic/libc.so (__start_thread+71) (BuildId: 6e3a0180fa6637b68c0d181c343e6806)
--- --- --- --- --- --- --- --- --- --- --- --- --- --- --- ---
...
    
      // dump fds
      open files:
    fd 0: /dev/null (unowned)
    fd 1: /dev/null (unowned)
    fd 2: /dev/null (unowned)
    fd 3: socket:[154324] (unowned)
    fd 4: /sys/kernel/tracing/trace_marker (unowned)
    fd 5: /dev/binderfs/binder (unowned)
    fd 6: /apex/com.android.art/javalib/core-oj.jar (unowned)
    fd 7: /apex/com.android.art/javalib/core-libart.jar (unowned)
    fd 8: /apex/com.android.art/javalib/core-icu4j.jar (unowned)
    fd 9: /apex/com.android.art/javalib/okhttp.jar (unowned)
    fd 10: /apex/com.android.art/javalib/bouncycastle.jar (unowned)
    fd 11: /apex/com.android.art/javalib/apache-xml.jar (unowned)
    fd 12: /system/framework/framework.jar (unowned)
    fd 13: /system/framework/ext.jar (unowned)
    fd 14: /system/framework/telephony-common.jar (unowned)
    fd 15: /system/framework/voip-common.jar (unowned)
    fd 16: /system/framework/ims-common.jar (unowned)
    fd 17: /system/framework/framework-atb-backward-compatibility.jar (unowned)
    fd 18: /apex/com.android.conscrypt/javalib/conscrypt.jar (unowned)
    fd 19: /apex/com.android.media/javalib/updatable-media.jar (unowned)
    fd 20: /apex/com.android.mediaprovider/javalib/framework-mediaprovider.jar (unowned)
    fd 21: /apex/com.android.os.statsd/javalib/framework-statsd.jar (unowned)
    fd 22: /dev/null (unowned)
    fd 23: /dev/null (unowned)
    fd 24: /apex/com.android.permission/javalib/framework-permission.jar (unowned)
    fd 25: /apex/com.android.sdkext/javalib/framework-sdkextensions.jar (unowned)
    fd 26: /apex/com.android.wifi/javalib/framework-wifi.jar (unowned)
    fd 27: /apex/com.android.tethering/javalib/framework-tethering.jar (unowned)
    fd 28: /dev/null (unowned)
    fd 29: /system/framework/framework-res.apk (owned by ZipArchive 0xf2adc080)
    fd 30: /product/overlay/GoogleWebViewOverlay.apk (owned by ZipArchive 0xf2addb10)
    fd 31: /product/overlay/framework-res__auto_generated_rro_product.apk (owned by ZipArchive 0xf2ade3d0)
    fd 32: /product/overlay/EmulatorGmsConfigOverlay.apk (owned by ZipArchive 0xf2ade880)
    fd 33: /dev/null (unowned)
    fd 34: /dev/null (unowned)
    fd 35: /dev/null (unowned)
    fd 36: /dev/null (unowned)
    fd 37: pipe:[154328] (unowned)
    fd 38: pipe:[154328] (unowned)
    fd 39: anon_inode:[eventfd] (owned by unique_fd 0xe46f92dc)
    fd 40: anon_inode:[eventfd] (owned by unique_fd 0xe46f92f4)
    fd 41: socket:[154329] (owned by unique_fd 0xe46f92e8)
    fd 42: socket:[154330] (owned by unique_fd 0xe46f92ec)
    fd 43: socket:[156147] (owned by unique_fd 0xf2b70510)
    fd 44: socket:[155356] (owned by unique_fd 0xe46f92f0)
    fd 45: anon_inode:[eventfd] (owned by unique_fd 0xf24bc09c)
    fd 46: anon_inode:[eventpoll] (owned by unique_fd 0xf24bc0bc)
    fd 47: /data/app/~~K6Bv_4byGWgutG6D4pB9_w==/com.zygote.insight-6IjoxXgs45RHBjfym04DyA==/base.apk (owned by ZipArchive 0xf2acba00)
    ... 
    
    // log main 日志
    -------- log main
08-21 10:22:47.452  6623  6623 I .zygote.insigh: Late-enabling -Xcheck:jni
08-21 10:22:47.646  6623  6623 I .zygote.insigh: Unquickening 12 vdex files!
08-21 10:22:47.659  6623  6623 W .zygote.insigh: Unexpected CPU variant for X86 using defaults: x86
08-21 10:22:48.007  6623  6623 D ApplicationLoaders: Returning zygote-cached class loader: /system/framework/android.test.base.jar
08-21 10:22:48.013  6623  6623 I .zygote.insigh: The ClassLoaderContext is a special shared library.
08-21 10:22:48.526  6623  6623 D NetworkSecurityConfig: No Network Security Config specified, using platform default
08-21 10:22:48.529  6

...

六、参考

gityuan.com/2016/06/25/…

gaozhipeng.me/posts/stabi…