Core 文件损坏还能补救吗?有的兄弟,有的。

572 阅读26分钟

前言

前段时间被问的最多是 core 在 gdb、core-parser 上无法使用,gdb info shared 与 core-parser map 均无法输出其依赖的动态库,导致无法导入符号表文件。或者是抓取 core 文件超时发生了截断,亦或者 adb pull 意外断开导致文件截断等诸多情况。正好我这里有几个现有的例子,集中回答下问题。

拯救被截断的 Core 文件

core 文件是否被截断了,可以在 core-parser 上使用指令 env core --load 查看所有的 load 段是否含有被截断的状态。

core-parser> env core --load | grep TRUNCATE
  629   [6ef5074000, 6ef659a000)  r--  0001526000  /data/app/~~7wq_KV-ZlFrJh2-Fc9c-QQ==/com.xxxx.android.xxxx-EeMQb9vwYZ9yi7tgADMFHw==/base.apk [EMPTY](TRUNCATE)
  630   [6ef659a000, 6ef6ec4000)  ---  0000000000  [] [EMPTY](TRUNCATE)
  631   [6ef6ec4000, 6ef6ec8000)  rw-  0000000000  [] [EMPTY](TRUNCATE)
  632   [6ef6ec8000, 6ef759a000)  ---  0000000000  [] [EMPTY](TRUNCATE)
  633   [6ef7642000, 6ef7a00000)  r--  00003be000  /data/app/~~G_8FGB4GB3s0i46o2c29dA==/com.xx.android.xxxx-FH2XnSmYVeSMlhUU5bxZug==/base.apk [EMPTY](TRUNCATE)
  634   [6ef7a00000, 6ef7c7e000)  rw-  0000000000  /dmabuf:screenshot [EMPTY](TRUNCATE)
...

例如这个 core 文件,总共有 9101 个 load 段,却在 629 处发生了截断,意味着有 8000+ 个 load 段是无法使用的,那这个 core 文件当然无法直接在 gdb、core-parser 上解析处堆栈。

core-parser> env core
  * r_debug: 0x7ba1dcbbf8
  * arm mode: thumb
  * mNote: 1
  * mLoad: 9101
  * mQuickLoad: 5947
  * mLinkMap: 0
core-parser> 

而这个 core 文件在 gdb 上使用的情况会这样。

Program terminated with signal SIGABRT, Aborted.
#0  0x0000007b76f2ebe0 in ?? ()
[Current thread is 1 (LWP 2682)]

(gdb) info sharedlibrary 
No shared libraries loaded at this time.

(gdb) info auxv 
33   AT_SYSINFO_EHDR      System-supplied DSO's ELF header 0x7ba1dbe000
51   AT_MINSIGSTKSZ       Minimum stack size for signal delivery 0x1270
16   AT_HWCAP             Machine-dependent CPU capability hints 0xffffffff
6    AT_PAGESZ            System page size               4096
17   AT_CLKTCK            Frequency of times()           100
3    AT_PHDR              Program headers for program    0x5b8b3f0040
4    AT_PHENT             Size of program header entry   56
5    AT_PHNUM             Number of program headers      12
7    AT_BASE              Base address of interpreter    0x7ba1c5b000
8    AT_FLAGS             Flags                          0x0
9    AT_ENTRY             Entry point of program         0x5b8b3f4000
11   AT_UID               Real user ID                   0
12   AT_EUID              Effective user ID              0
13   AT_GID               Real group ID                  0
14   AT_EGID              Effective group ID             0
23   AT_SECURE            Boolean, was exec setuid-like? 1
25   AT_RANDOM            Address of 16 random bytes     0x7fe2b52ac8
26   AT_HWCAP2            Extension of AT_HWCAP          0x801af3ff
31   AT_EXECFN            File name of executable        0x7fe2b54fde <error: Cannot access memory at address 0x7fe2b54fde>
15   AT_PLATFORM          String identifying platform    0x7fe2b52ad8 <error: Cannot access memory at address 0x7fe2b52ad8>
0    AT_NULL              End of vector                  0x0
(gdb)

通用方法

几个月前在 core-parser 填了 Fakecore 的坑,可能大家都以为这个功能只是用在 tombstone 文件转换。其实不是的,Fakecore 的设计都是为了做 core 文件修复。

重建 link_map

我们可以通过 fake 系列的指令简单的重构一份 map 表,例如指令 fake map --auto

core-parser> fake map --auto
New overlay [100000, 110000)
Create FAKE PHDR
New note overlay [7c750, 1a7338)
Create FAKE DYNAMIC
Create FAKE LINK64 MAP
0x7129489000 /apex/com.android.adbd/lib64/libadb_pairing_auth.so
0x712c400000 /apex/com.android.adbd/lib64/libadb_pairing_connection.so
0x712c551000 /apex/com.android.adbd/lib64/libadb_pairing_server.so
0x7b50649000 /apex/com.android.adbd/lib64/libadbconnection_client.so
0x7b06667000 /apex/com.android.adbd/lib64/libbase.so
0x713067b000 /apex/com.android.adbd/lib64/libc++.so
0x7132e60000 /apex/com.android.adbd/lib64/libcrypto.so
0x7b1f1ec000 /apex/com.android.adbd/lib64/libcrypto_utils.so
0x7b55f57000 /apex/com.android.adbd/lib64/libcutils.so
0x70e1084000 /apex/com.android.appsearch/lib64/libc++.so
0x6f9a653000 /apex/com.android.appsearch/lib64/libicing.so
0x70e1200000 /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so
0x7b506d0000 /apex/com.android.art/lib64/libadbconnection.so
0x7b45e1a000 /apex/com.android.art/lib64/libandroidio.so
0x7b5ac00000 /apex/com.android.art/lib64/libart.so
...

于是我们可以再次 map 查看,但此时的动态库依赖表是没有进行校准的。重新保存的 Fakecore 是无法直接在 gdb 上使用的。

core-parser> map
NUM LINKMAP       REGION                   FLAGS  L_ADDR         NAME
  1 0x102000  [100000, 110000)  rw-   100000  [FAKECORE] [*](OVERLAY)(FAKE)
  2 0x102030  [    ???   ,    ???    )  ---   7129489000  /apex/com.android.adbd/lib64/libadb_pairing_auth.so
  3 0x102060  [    ???   ,    ???    )  ---   712c400000  /apex/com.android.adbd/lib64/libadb_pairing_connection.so
  4 0x102090  [    ???   ,    ???    )  ---   712c551000  /apex/com.android.adbd/lib64/libadb_pairing_server.so
  5 0x1020c0  [    ???   ,    ???    )  ---   7b50649000  /apex/com.android.adbd/lib64/libadbconnection_client.so
  6 0x1020f0  [    ???   ,    ???    )  ---   7b06667000  /apex/com.android.adbd/lib64/libbase.so
  7 0x102120  [    ???   ,    ???    )  ---   713067b000  /apex/com.android.adbd/lib64/libc++.so
  8 0x102150  [    ???   ,    ???    )  ---   7132e60000  /apex/com.android.adbd/lib64/libcrypto.so
  9 0x102180  [    ???   ,    ???    )  ---   7b1f1ec000  /apex/com.android.adbd/lib64/libcrypto_utils.so
 10 0x1021b0  [    ???   ,    ???    )  ---   7b55f57000  /apex/com.android.adbd/lib64/libcutils.so
 11 0x1021e0  [    ???   ,    ???    )  ---   70e1084000  /apex/com.android.appsearch/lib64/libc++.so
 12 0x102210  [    ???   ,    ???    )  ---   6f9a653000  /apex/com.android.appsearch/lib64/libicing.so
 13 0x102240  [    ???   ,    ???    )  ---   70e1200000  /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so
 14 0x102270  [    ???   ,    ???    )  ---   7b506d0000  /apex/com.android.art/lib64/libadbconnection.so
 15 0x1022a0  [    ???   ,    ???    )  ---   7b45e1a000  /apex/com.android.art/lib64/libandroidio.so
 16 0x1022d0  [    ???   ,    ???    )  ---   7b5ac00000  /apex/com.android.art/lib64/libart.so
 ...

校准

由于 l_ld 的值无法构建出来,以及 l_addr 可能与真实的 load 地址不相等的情况,因此需要原动态库文件进行校准。这里可以使用 fake map --sysroot symbols/。被校准的库会输出 calibrate 字样。

core-parser> fake map --sysroot ./root:./apex
Mmap segment [7129489000, 712949d000) ./apex/com.android.adbd/lib64/libadb_pairing_auth.so [0]
calibrate /apex/com.android.adbd/lib64/libadb_pairing_auth.so l_ld(71294e6dc8)
Mmap segment [712c400000, 712c424000) ./apex/com.android.adbd/lib64/libadb_pairing_connection.so [0]
calibrate /apex/com.android.adbd/lib64/libadb_pairing_connection.so l_ld(712c4b6f88)
Mmap segment [712c551000, 712c569000) ./apex/com.android.adbd/lib64/libadb_pairing_server.so [0]
calibrate /apex/com.android.adbd/lib64/libadb_pairing_server.so l_ld(712c5bb388)
Mmap segment [7b50649000, 7b50661000) ./apex/com.android.adbd/lib64/libadbconnection_client.so [0]
calibrate /apex/com.android.adbd/lib64/libadbconnection_client.so l_ld(7b506b6fe0)
Mmap segment [7b06667000, 7b0667b000) ./apex/com.android.adbd/lib64/libbase.so [0]
calibrate /apex/com.android.adbd/lib64/libbase.so l_ld(7b066b3258)
Mmap segment [713067b000, 713070b000) ./apex/com.android.adbd/lib64/libc++.so [0]
calibrate /apex/com.android.adbd/lib64/libc++.so l_ld(71307d8670)
Mmap segment [7132e60000, 7132ed0000) ./apex/com.android.adbd/lib64/libcrypto.so [0]
calibrate /apex/com.android.adbd/lib64/libcrypto.so l_ld(7132fdb0e0)
Mmap segment [7b1f1ec000, 7b1f1f0000) ./apex/com.android.adbd/lib64/libcrypto_utils.so [0]
calibrate /apex/com.android.adbd/lib64/libcrypto_utils.so l_ld(7b1f1f4020)
Mmap segment [7b55f57000, 7b55f63000) ./apex/com.android.adbd/lib64/libcutils.so [0]
calibrate /apex/com.android.adbd/lib64/libcutils.so l_ld(7b55f74208)
Mmap segment [70e1084000, 70e1114000) ./apex/com.android.appsearch/lib64/libc++.so [0]
calibrate /apex/com.android.appsearch/lib64/libc++.so l_ld(70e11e1670)
Mmap segment [6f9a653000, 6f9a6ab000) ./apex/com.android.appsearch/lib64/libicing.so [0]
calibrate /apex/com.android.appsearch/lib64/libicing.so l_ld(6f9a8d0ae0)
Mmap segment [70e1200000, 70e1248000) ./apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so [0]
calibrate /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so l_ld(70e12b8988)
Mmap segment [7b506d0000, 7b506dc000) ./apex/com.android.art/lib64/libadbconnection.so [0]
calibrate /apex/com.android.art/lib64/libadbconnection.so l_ld(7b506f4108)
Mmap segment [7b45e1a000, 7b45e1e000) ./apex/com.android.art/lib64/libandroidio.so [0]
calibrate /apex/com.android.art/lib64/libandroidio.so l_ld(7b45e22028)
Mmap segment [7b5ac00000, 7b5ae09000) ./apex/com.android.art/lib64/libart.so [0]
calibrate /apex/com.android.art/lib64/libart.so l_ld(7b5be27398)

重建 Fakecore

一般情况下,做完前两步大多数都可以把当前环境保存成新的 core 文件。使用 fake core -r 指令来完成。

core-parser> fake core -r
FakeCore: saved [coredump/core-HeapTaskDaemon-2653.fakecore]
(gdb) info sharedlibrary 
From                To                  Syms Read   Shared Object Library
                                        No          /apex/com.android.adbd/lib64/libadb_pairing_auth.so
                                        No          /apex/com.android.adbd/lib64/libadb_pairing_connection.so
                                        No          /apex/com.android.adbd/lib64/libadb_pairing_server.so
                                        No          /apex/com.android.adbd/lib64/libadbconnection_client.so
                                        No          /apex/com.android.adbd/lib64/libbase.so
                                        No          /apex/com.android.adbd/lib64/libc++.so
                                        No          /apex/com.android.adbd/lib64/libcrypto.so
                                        No          /apex/com.android.adbd/lib64/libcrypto_utils.so
                                        No          /apex/com.android.adbd/lib64/libcutils.so
                                        No          /apex/com.android.appsearch/lib64/libc++.so
                                        No          /apex/com.android.appsearch/lib64/libicing.so
                                        No          /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so
                                        No          /apex/com.android.art/lib64/libadbconnection.so
                                        No          /apex/com.android.art/lib64/libandroidio.so
                                        No          /apex/com.android.art/lib64/libart.so

(gdb) set sysroot symbols/
Reading symbols from symbols/apex/com.android.adbd/lib64/libadb_pairing_auth.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libadb_pairing_connection.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libadb_pairing_server.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libadbconnection_client.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libbase.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libc++.so...
...

(gdb) bt
#0  abort () at bionic/libc/bionic/abort.cpp:49
#1  0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
core-parser> sysroot symbols
Mmap segment [7129489000, 712949d000) symbols/apex/com.android.adbd/lib64/libadb_pairing_auth.so [0]
Mmap segment [712949d000, 71294e5000) symbols/apex/com.android.adbd/lib64/libadb_pairing_auth.so [14000]
Read symbols[977] (/apex/com.android.adbd/lib64/libadb_pairing_auth.so)
Mmap segment [712c400000, 712c424000) symbols/apex/com.android.adbd/lib64/libadb_pairing_connection.so [0]
Mmap segment [712c424000, 712c4b4000) symbols/apex/com.android.adbd/lib64/libadb_pairing_connection.so [24000]
Read symbols[1645] (/apex/com.android.adbd/lib64/libadb_pairing_connection.so)
Mmap segment [712c551000, 712c569000) symbols/apex/com.android.adbd/lib64/libadb_pairing_server.so [0]
Mmap segment [712c569000, 712c5b9000) symbols/apex/com.android.adbd/lib64/libadb_pairing_server.so [18000]
Read symbols[1143] (/apex/com.android.adbd/lib64/libadb_pairing_server.so)
...

core-parser> bt
ERROR: Please command "env config --sdk <SDK>!!"
Thread("2682") 
  x0  0x0000000000000000  x1  0x0000000000000a7a  x2  0x0000000000000006  x3  0x6f00007b0aa44670  
  x4  0x0000007b0a939000  x5  0x0000007b0a939000  x6  0x0000007b0a939000  x7  0x0000000000000001  
  x8  0x00000000000000f0  x9  0x0000000000000000  x10 0xffffff80fffffbdf  x11 0x0000007b76f4dfc4  
  x12 0x0200007200008390  x13 0x0000007b0aa44670  x14 0x0000000000000002  x15 0x02000072ffffffff  
  x16 0x0000007b76fd6240  x17 0x0000007b76fbadd0  x18 0x0000007b094f0000  x19 0xef00007b0aa44670  
  x20 0x2f00007b0aa44660  x21 0xaf00007b0aa44640  x22 0x6f00007b0aa44670  x23 0x0000000000000000  
  x24 0x0000000000000a7a  x25 0x0200007300000000  x26 0x0040000e400010ef  x27 0x0101010101010101  
  x28 0x00000007b0aa4467  fp  0x0000007b0aa44700  
  lr  0x0000007b76f2ebb4  sp  0x0000007b0aa44630  pc  0x0000007b76f2ebe0  pst 0x0000000000001000  
  Native: #0  0000007b76f2ebe0  abort+0x148

其它扩展用法

可以看到这个 core 修复后,仍然无法将堆栈输出。其实是这个线程栈的段也是被截断的,因此无法进行栈回溯。

core-parser> rd 0x0000007b0aa44630
ERROR: Invalid address 0x7b0aa44630
core-parser> env core --load | grep 7b0aa4
  4888  [7b0a93e000, 7b0aa49000)  rw-  000010b000  [] [EMPTY](TRUNCATE)
  4889  [7b0aa49000, 7b0aa4d000)  ---  0000000000  [] [EMPTY](TRUNCATE)
  4890  [7b0aa4d000, 7b0ad60000)  ---  0000000000  [] [EMPTY](TRUNCATE)

这份 core 文件来自 android 程序发生 native crash,那么当前线程栈内存有一部分可以从 tombstone 文件中获得。于是我们可以将 tombstone 转成 fakecore,在把这个 fakecore 的 7b0a93e000 段保存下来,回写到原来的 core 文件即可。

制作 Fakecore

Android 墓碑文件转 FakeCore 开源拉!

Tid: 2682
tagged_addr_ctrl 1
pac_enabled_keys f
x0  0x0000000000000000  x1  0x0000000000000a7a  x2  0x0000000000000006  x3  0x6f00007b0aa44670  
x4  0x0000007b0a939000  x5  0x0000007b0a939000  x6  0x0000007b0a939000  x7  0x0000000000000001  
x8  0x00000000000000f0  x9  0x0000000000000000  x10 0xffffff80fffffbdf  x11 0x0000007b76f4dfc4  
x12 0x0200007200008390  x13 0x0000007b0aa44670  x14 0x0000000000000002  x15 0x02000072ffffffff  
x16 0x0000007b76fd6240  x17 0x0000007b76fbadd0  x18 0x0000007b094f0000  x19 0xef00007b0aa44670  
x20 0x2f00007b0aa44660  x21 0xaf00007b0aa44640  x22 0x6f00007b0aa44670  x23 0x0000000000000000  
x24 0x0000000000000a7a  x25 0x0200007300000000  x26 0x0040000e400010ef  x27 0x0101010101010101  
x28 0x00000007b0aa4467  fp  0x0000007b0aa44700  
lr  0x0000007b76f2ebb4  sp  0x0000007b0aa44630  pc  0x0000007b76f2ebe0  pst 0x0000000000001000  
/apex/com.android.art/lib64/libart.so:4b8d363e411911cec9861f3127c06d28
/apex/com.android.art/lib64/libbase.so:99e384a650f746069995ae2825eb2eb9
/apex/com.android.runtime/lib64/bionic/hwasan/libc.so:d4a3d36d0a7d2f3a94f8a0160deec5b2
/system/framework/arm64/boot-core-libart.oat:a587436c5addf6ef9cee37ed3e3cdce4
/system/framework/arm64/boot.oat:0924577b1271219e6b02dfeec82c7dc2
Create Fakecore tombstones/tombstone_27.fakecore ...
Core load (0x7fca58002940) 
Core env:
  * Path: 
  * Machine: arm64
  * Bits: 64
  * PointSize: 8
  * PointMask: 0xffffffffffffffff
  * VabitsMask: 0x7fffffffff
  * PageSize: 0x1000
  * Remote: false
  * Thread: 2682
Switch android(0) env.
New overlay [100000, 104000)
Create FAKE PHDR
New note overlay [844a8, f5ae8)
Create FAKE DYNAMIC
Create FAKE LINK MAP
0x7b5ac00000 /apex/com.android.art/lib64/libart.so:4b8d363e411911cec9861f3127c06d28
0x7b5f4d4000 /apex/com.android.art/lib64/libbase.so:99e384a650f746069995ae2825eb2eb9
0x7b76ec7000 /apex/com.android.runtime/lib64/bionic/hwasan/libc.so:d4a3d36d0a7d2f3a94f8a0160deec5b2
0x71030000 /system/framework/arm64/boot-core-libart.oat:a587436c5addf6ef9cee37ed3e3cdce4
0x70d0c000 /system/framework/arm64/boot.oat:0924577b1271219e6b02dfeec82c7dc2
Create FAKE STRTAB
New overlay [7200000000, 72fffff000)
Core load (0x396f370) tombstones/tombstone_27.fakecore
Core env:
  * Path: tombstones/tombstone_27.fakecore
  * Machine: arm64
  * Bits: 64
  * PointSize: 8
  * PointMask: 0xffffffffffffffff
  * VabitsMask: 0x7fffffffff
  * PageSize: 0x1000
  * Remote: false
  * Thread: 2682
Switch android(0) env.

core-parser> bt
ERROR: Please command "env config --sdk <SDK>!!"
Thread("2682") 
  x0  0x0000000000000000  x1  0x0000000000000a7a  x2  0x0000000000000006  x3  0x6f00007b0aa44670  
  x4  0x0000007b0a939000  x5  0x0000007b0a939000  x6  0x0000007b0a939000  x7  0x0000000000000001  
  x8  0x00000000000000f0  x9  0x0000000000000000  x10 0xffffff80fffffbdf  x11 0x0000007b76f4dfc4  
  x12 0x0200007200008390  x13 0x0000007b0aa44670  x14 0x0000000000000002  x15 0x02000072ffffffff  
  x16 0x0000007b76fd6240  x17 0x0000007b76fbadd0  x18 0x0000007b094f0000  x19 0xef00007b0aa44670  
  x20 0x2f00007b0aa44660  x21 0xaf00007b0aa44640  x22 0x6f00007b0aa44670  x23 0x0000000000000000  
  x24 0x0000000000000a7a  x25 0x0200007300000000  x26 0x0040000e400010ef  x27 0x0101010101010101  
  x28 0x00000007b0aa4467  fp  0x0000007b0aa44700  
  lr  0x0000007b76f2ebb4  sp  0x0000007b0aa44630  pc  0x0000007b76f2ebe0  pst 0x0000000000001000  
  Native: #00  0000007b76f2ebe0  
  Native: #01  0000007b76f2ebb0  
  Native: #02  0000007b5bb716d0  
  Native: #03  0000007b5f4f155c  
  Native: #04  0000007b5f4f0204  
  Native: #05  0000007b5b67c304  
  Native: #06  0000007b5b35c544  
  Native: #07  0000007b5b6503dc  
  Native: #08  0000007b5b53d154  
  Native: #09  0000007b5b4997f4  
  Native: #10  0000007b5b49918c  
  Native: #11  0000007b5b49de18  
  Native: #12  0000007b5b336330  
  Native: #13  0000000071045d00  
  Native: #14  6300007b0aa4549c  
core-parser>

回写段内存

先从 tombstone 转后的 fakecore 中读取该线程栈内存段。

core-parser> file 0x0000007b0aa44700
[7b0a93e000, 7b0aa49000)  0000000000000000  [anon:stack_and_tls:2682]
core-parser> rd 7b0a93e000 -e 7b0aa49000 -f 7b0a93e000.bin
Saved [7b0a93e000.bin].

重新映射到之前的 core-parser 环境中。

core-parser> mmap 7b0a93e000 7b0a93e000.bin
Mmap segment [7b0a93e000, 7b0aa49000) 7b0a93e000.bin [0]

core-parser> bt
ERROR: Please command "env config --sdk <SDK>!!"
Thread("2682") 
  x0  0x0000000000000000  x1  0x0000000000000a7a  x2  0x0000000000000006  x3  0x6f00007b0aa44670  
  x4  0x0000007b0a939000  x5  0x0000007b0a939000  x6  0x0000007b0a939000  x7  0x0000000000000001  
  x8  0x00000000000000f0  x9  0x0000000000000000  x10 0xffffff80fffffbdf  x11 0x0000007b76f4dfc4  
  x12 0x0200007200008390  x13 0x0000007b0aa44670  x14 0x0000000000000002  x15 0x02000072ffffffff  
  x16 0x0000007b76fd6240  x17 0x0000007b76fbadd0  x18 0x0000007b094f0000  x19 0xef00007b0aa44670  
  x20 0x2f00007b0aa44660  x21 0xaf00007b0aa44640  x22 0x6f00007b0aa44670  x23 0x0000000000000000  
  x24 0x0000000000000a7a  x25 0x0200007300000000  x26 0x0040000e400010ef  x27 0x0101010101010101  
  x28 0x00000007b0aa4467  fp  0x0000007b0aa44700  
  lr  0x0000007b76f2ebb4  sp  0x0000007b0aa44630  pc  0x0000007b76f2ebe0  pst 0x0000000000001000  
  Native: #00  0000007b76f2ebe0  abort+0x148
  Native: #01  0000007b76f2ebb0  abort+0x118
  Native: #02  0000007b5bb716d0  art::Runtime::Abort(char const*)+0x33c
  Native: #03  0000007b5f4f155c  android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+0xc4
  Native: #04  0000007b5f4f0204  android::base::LogMessage::~LogMessage()+0x378
  Native: #05  0000007b5b67c304  art::gc::collector::ConcurrentCopying::ProcessMarkStack()+0x1a8c
  Native: #06  0000007b5b35c544  art::gc::collector::ConcurrentCopying::CopyingPhase()+0x9f8
  Native: #07  0000007b5b6503dc  art::gc::collector::ConcurrentCopying::RunPhases()+0xcf4
  Native: #08  0000007b5b53d154  art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+0x224
  Native: #09  0000007b5b4997f4  art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool, unsigned int)+0x544
  Native: #10  0000007b5b49918c  art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool, unsigned int)+0xd0
  Native: #11  0000007b5b49de18  art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+0xa4
  Native: #12  0000007b5b336330  art::gc::TaskProcessor::RunAllTasks(art::Thread*)+0x408
  Native: #13  0000000071045d00  art_jni_trampoline+0x70
  Native: #14  6300007b0aa4549c

可以看到此时的纯 Native 堆栈已经可以输出了。而这个问题与虚拟机相关,有时候避免不了要解析 Java 相关内存。并且从一开始我们知道这个 core 有 600+ 个 load 段是有效的,因此涵盖了所有的 Java 堆内存,因此也可以尝试进行修复。

虚拟机相关内存段修复

在此之前,我们需要设置 SDK 环境。

core-parser> env config --sdk 35
Switch android(35) env.

用指令 dex 查看相关的 jar 包是否在原 core 文件中。可见也是截断的。

core-parser> dex
NUM DEXCACHE    REGION                   FLAGS NAME
  1 0x00000000  [7b8c81d000, 7b8c825000)  r--  /apex/com.android.rkpd/javalib/service-rkp.jar [EMPTY](TRUNCATE)
  2 0x00000000  [7b89cb7000, 7b89cbc000)  r--  /apex/com.android.virt/javalib/service-virtualization.jar [EMPTY](TRUNCATE)
  3 0x00000000  [7b8c8bb000, 7b8c8c0000)  r--  /apex/com.android.compos/javalib/service-compos.jar [EMPTY](TRUNCATE)
  4 0x00000000  [7b39cea000, 7b39de1000)  r--  /apex/com.android.uwb/javalib/service-uwb.jar [EMPTY](TRUNCATE)
  5 0x00000000  [7b65017000, 7b65080000)  r--  /apex/com.android.healthfitness/javalib/service-healthfitness.jar [EMPTY](TRUNCATE)
  6 0x00000000  [7b39fcc000, 7b3a043000)  r--  /apex/com.android.profiling/javalib/service-profiling.jar [EMPTY](TRUNCATE)
  7 0x00000000  [7b93c61000, 7b93c64000)  r--  /apex/com.android.ondevicepersonalization/javalib/service-ondevicepersonalization.jar [EMPTY](TRUNCATE)
  8 0x00000000  [7b8c929000, 7b8c93f000)  r--  /apex/com.android.adservices/javalib/service-adservices.jar [EMPTY](TRUNCATE)
  9 0x00000000  [7b89a7d000, 7b89a87000)  r--  /apex/com.android.scheduling/javalib/service-scheduling.jar [EMPTY](TRUNCATE)
 10 0x00000000  [7b89af0000, 7b89afa000)  r--  /apex/com.android.os.statsd/javalib/service-statsd.jar [EMPTY](TRUNCATE)
...

因此我们可以重新 sysroot ./apex:./root --dex。带 dex 选项意思是仅加载 Java 相关依赖。

core-parser> sysroot ./apex:./root --dex
Mmap segment [7b8c81d000, 7b8c825000) ./apex/com.android.rkpd/javalib/service-rkp.jar [0]
Mmap segment [7b89cb7000, 7b89cbc000) ./apex/com.android.virt/javalib/service-virtualization.jar [0]
Mmap segment [7b39cea000, 7b39de1000) ./apex/com.android.uwb/javalib/service-uwb.jar [0]
Mmap segment [7b65017000, 7b65080000) ./apex/com.android.healthfitness/javalib/service-healthfitness.jar [0]
Mmap segment [7b39fcc000, 7b3a043000) ./apex/com.android.profiling/javalib/service-profiling.jar [0]
Mmap segment [7b93c61000, 7b93c64000) ./apex/com.android.ondevicepersonalization/javalib/service-ondevicepersonalization.jar [0]
Mmap segment [7b8c929000, 7b8c93f000) ./apex/com.android.adservices/javalib/service-adservices.jar [0]
Mmap segment [7b89a7d000, 7b89a87000) ./apex/com.android.scheduling/javalib/service-scheduling.jar [0]
Mmap segment [7b89af0000, 7b89afa000) ./apex/com.android.os.statsd/javalib/service-statsd.jar [0]
...

再次进行 bt,可以看到 Java 相关的信息已经输出。但仍存在些问题,由于缺失的内存段实在太多了,无法完全修复。

core-parser> bt
"HeapTaskDaemon" sysTid=2682 WaitingPerformingGc
  | group="system" daemon=1 prio=5 target=0x6fab8c58 uncaught_exception=0x0
  | tid=6 sCount=0 flags=0 obj=0xa204e30 self=0x560000538b408000 env=0x8f0000428b413f40
  | stack=0x7b0a93e000-0x7b0a942000 stackSize=0x107730 handle=0x7b0aa45730
  | mutexes=0x560000538b408798 held="abort lock" 
  x0  0x0000000000000000  x1  0x0000000000000a7a  x2  0x0000000000000006  x3  0x6f00007b0aa44670  
  x4  0x0000007b0a939000  x5  0x0000007b0a939000  x6  0x0000007b0a939000  x7  0x0000000000000001  
  x8  0x00000000000000f0  x9  0x0000000000000000  x10 0xffffff80fffffbdf  x11 0x0000007b76f4dfc4  
  x12 0x0200007200008390  x13 0x0000007b0aa44670  x14 0x0000000000000002  x15 0x02000072ffffffff  
  x16 0x0000007b76fd6240  x17 0x0000007b76fbadd0  x18 0x0000007b094f0000  x19 0xef00007b0aa44670  
  x20 0x2f00007b0aa44660  x21 0xaf00007b0aa44640  x22 0x6f00007b0aa44670  x23 0x0000000000000000  
  x24 0x0000000000000a7a  x25 0x0200007300000000  x26 0x0040000e400010ef  x27 0x0101010101010101  
  x28 0x00000007b0aa4467  fp  0x0000007b0aa44700  
  lr  0x0000007b76f2ebb4  sp  0x0000007b0aa44630  pc  0x0000007b76f2ebe0  pst 0x0000000000001000  
  Native: #00  0000007b76f2ebe0  abort+0x148
  Native: #01  0000007b76f2ebb0  abort+0x118
  Native: #02  0000007b5bb716d0  art::Runtime::Abort(char const*)+0x33c
  Native: #03  0000007b5f4f155c  android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+0xc4
  Native: #04  0000007b5f4f0204  android::base::LogMessage::~LogMessage()+0x378
  Native: #05  0000007b5b67c304  art::gc::collector::ConcurrentCopying::ProcessMarkStack()+0x1a8c
  Native: #06  0000007b5b35c544  art::gc::collector::ConcurrentCopying::CopyingPhase()+0x9f8
  Native: #07  0000007b5b6503dc  art::gc::collector::ConcurrentCopying::RunPhases()+0xcf4
  Native: #08  0000007b5b53d154  art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+0x224
  Native: #09  0000007b5b4997f4  art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool, unsigned int)+0x544
  Native: #10  0000007b5b49918c  art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool, unsigned int)+0xd0
  Native: #11  0000007b5b49de18  art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+0xa4
  Native: #12  0000007b5b336330  art::gc::TaskProcessor::RunAllTasks(art::Thread*)+0x408
  Native: #13  0000000071045d00  art_jni_trampoline+0x70
  Native: #14  6300007b0aa4549c  
  JavaKt: #0  0000000000000000  dalvik.system.VMRuntime.runHeapTasks
  JavaKt: #1  0000007b5e99f13e  java.lang.Daemons$HeapTaskDaemon.runInternal
  JavaKt: #2  0000007b5e99e3de  java.lang.Daemons$Daemon.run
  JavaKt: #3  0000007b5a717b88  java.lang.Thread.run
core-parser> space
TYPE   REGION                  ADDRESS             NAME
  5  [0xa080000, 0x4a080000)  0xfc00004d8b3e1000  main space (region space)
  0  [0x6f7e4000, 0x6f923b08)  0xae0000468b3e0160  /system/framework/arm64/boot.art
  0  [0x6fab8000, 0x6fad5178)  0xc40000468b3e4a40  /system/framework/arm64/boot-core-libart.art
  0  [0x6fb00000, 0x6fb120a0)  0x430000468b3e4ba0  /system/framework/arm64/boot-okhttp.art
  0  [0x6fb2c000, 0x6fb4a260)  0xc30000468b3e02c0  /system/framework/arm64/boot-bouncycastle.art
  0  [0x6fb60000, 0x6fb60a18)  0x3a0000468b3e0420  /system/framework/arm64/boot-apache-xml.art
  0  [0x6fb64000, 0x70153f20)  0xbc0000468b3e0580  /system/framework/arm64/boot-framework.art
core-parser> p 0xa080000
Size: 0x18
Padding: 0x7
Object Name: java.lang.StringBuilder
  // extends java.lang.AbstractStringBuilder
    [0x10] byte coder = 0x0
    [0x0c] int count = 65
    [0x08] byte[] value = 0xa080078
  // extends java.lang.Object
    [0x04] private transient int shadow$_monitor_ = 0
    [0x00] private transient java.lang.Class shadow$_klass_ = 0x6f878668
core-parser>

修复效果

将前面完成了回写线程栈内存段,以及 Java 相关段的 core-parser 环境,再次 fake core -r 保存。

core-parser> fake core -r
FakeCore: saved [coredump/core-HeapTaskDaemon-2653.fakecore]
(gdb) bt
#0  abort () at bionic/libc/bionic/abort.cpp:49
#1  0x0000007b5bb716d4 in art::Runtime::Abort (msg=<optimized out>) at art/runtime/runtime.cc:766
#2  0x0000007b5f4f1560 in std::__1::__function::__value_func<void (char const*)>::operator()[abi:nn180000](char const*&&) const (this=0x7b76f4dfc4 <sigaddset64(sigset64_t*, int)>,
    __args=@0x7b0aa44840: 0x6500003d8b865800 "Check failed: region_space_->IsLargeObject(to_ref) ") at prebuilts/clang/host/linux-x86/clang-r522817/include/c++/v1/__functional/function.h:425
#3  std::__1::function<void(char const*)>::operator() (this=0x7b76f4dfc4 <sigaddset64(sigset64_t*, int)>, __arg=0x6500003d8b865800 "Check failed: region_space_->IsLargeObject(to_ref) ")
    at prebuilts/clang/host/linux-x86/clang-r522817/include/c++/v1/__functional/function.h:978
#4  android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::operator()(char const*) const (this=<optimized out>, abort_message=<optimized out>) at system/libbase/logging.cpp:425
#5  android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*) (abort_message=<optimized out>) at system/libbase/logging.cpp:425
#6  0x0000007b5f4f0208 in android::base::LogMessage::~LogMessage (this=0xb600007b0aa44a40) at system/libbase/logging.cpp:513
#7  0x0000007b5b67c308 in art::gc::collector::ConcurrentCopying::ProcessMarkStackRef (this=0xaa00004f8b3e3000, to_ref=<optimized out>) at art/runtime/gc/collector/concurrent_copying.cc:2293
#8  art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce()::$_0::operator()(art::mirror::Object*) const [clone .__uniq.219178288367021339109957061695789229157] (this=<optimized out>,
    ref=<optimized out>) at art/runtime/gc/collector/concurrent_copying.cc:2164
#9  art::gc::collector::ConcurrentCopying::ProcessThreadLocalMarkStacks<art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce()::$_0>(bool, art::Closure*, art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce()::$_0 const&) [clone .__uniq.219178288367021339109957061695789229157] (this=0xaa00004f8b3e3000, disable_weak_ref_access=false, checkpoint_callback=0x0, processor=...)
    at art/runtime/gc/collector/concurrent_copying.cc:2244
#10 art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:2160
#11 art::gc::collector::ConcurrentCopying::ProcessMarkStack (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:2142
#12 0x0000007b5b35c548 in art::gc::collector::ConcurrentCopying::CopyingPhase (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:1640
#13 0x0000007b5b6503e0 in art::gc::collector::ConcurrentCopying::RunPhases (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:257
#14 0x0000007b5b53d158 in art::gc::collector::GarbageCollector::Run (this=0xaa00004f8b3e3000, gc_cause=<optimized out>, clear_soft_references=<optimized out>)
    at art/runtime/gc/collector/garbage_collector.cc:220
#15 0x0000007b5b4997f8 in art::gc::Heap::CollectGarbageInternal (this=<optimized out>, gc_type=<optimized out>, gc_cause=<optimized out>, clear_soft_references=false, requested_gc_num=<optimized out>)
    at art/runtime/gc/heap.cc:3024
#16 0x0000007b5b499190 in art::gc::Heap::ConcurrentGC (this=0x5000004f8b3e0800, self=<optimized out>, cause=art::gc::kGcCauseBackground, force_full=false, requested_gc_num=4314)
    at art/runtime/gc/heap.cc:4341
#17 0x0000007b5b49de1c in art::gc::Heap::ConcurrentGCTask::Run (this=0x8400003c8b72e120, self=0x560000538b408000) at art/runtime/gc/heap.cc:4181
#18 0x0000007b5b336334 in art::gc::TaskProcessor::RunAllTasks (this=0x8e00003f8b3e9500, self=0x560000538b408000) at art/runtime/gc/task_processor.cc:158
#19 0x0000000071045d04 in dalvik::system::VMRuntime::nativeSetTargetHeapUtilization (this=...)
   from symbols/system/framework/arm64/boot-core-libart.oat
#20 0x000000007107ac0c in java::lang::Daemons$HeapTaskDaemon::runInternal (this=...) at java/lang/Daemons.java:734
#21 0x00000000710534d0 in java::lang::Daemons$Daemon::run (this=...) at java/lang/Daemons.java:122
#22 0x0000000070e6ad8c in java::lang::Thread::run (this=...)
   from symbols/system/framework/arm64/boot.oat
#23 0x0000007b5b2c3378 in art_quick_invoke_stub () at art/runtime/arch/arm64/quick_entrypoints_arm64.S:672
#24 0x0000007b5b26e24c in art::ArtMethod::Invoke (this=0x6fa16b60, self=0x560000538b408000, args=0xa300007b0aa45490, args_size=4, result=0xe300007b0aa454b0, shorty=0x6300007b0aa454a0 "V")
    at art/runtime/art_method.cc:422
#25 0x0000007b5bb92408 in InvokeInstance (this=<optimized out>, self=<optimized out>, receiver=...) at art/runtime/art_method-inl.h:202
#26 0x0000007b5b3e0140 in art::Thread::CreateCallback (arg=0x560000538b408000) at art/runtime/thread.cc:681
#27 0x0000000000000000 in ?? ()

最后

办法总比困难多。