前言
前段时间被问的最多是 core 在 gdb、core-parser 上无法使用,gdb info shared 与 core-parser map 均无法输出其依赖的动态库,导致无法导入符号表文件。或者是抓取 core 文件超时发生了截断,亦或者 adb pull 意外断开导致文件截断等诸多情况。正好我这里有几个现有的例子,集中回答下问题。
拯救被截断的 Core 文件
core 文件是否被截断了,可以在 core-parser 上使用指令 env core --load 查看所有的 load 段是否含有被截断的状态。
core-parser> env core --load | grep TRUNCATE
629 [6ef5074000, 6ef659a000) r-- 0001526000 /data/app/~~7wq_KV-ZlFrJh2-Fc9c-QQ==/com.xxxx.android.xxxx-EeMQb9vwYZ9yi7tgADMFHw==/base.apk [EMPTY](TRUNCATE)
630 [6ef659a000, 6ef6ec4000) --- 0000000000 [] [EMPTY](TRUNCATE)
631 [6ef6ec4000, 6ef6ec8000) rw- 0000000000 [] [EMPTY](TRUNCATE)
632 [6ef6ec8000, 6ef759a000) --- 0000000000 [] [EMPTY](TRUNCATE)
633 [6ef7642000, 6ef7a00000) r-- 00003be000 /data/app/~~G_8FGB4GB3s0i46o2c29dA==/com.xx.android.xxxx-FH2XnSmYVeSMlhUU5bxZug==/base.apk [EMPTY](TRUNCATE)
634 [6ef7a00000, 6ef7c7e000) rw- 0000000000 /dmabuf:screenshot [EMPTY](TRUNCATE)
...
例如这个 core 文件,总共有 9101 个 load 段,却在 629 处发生了截断,意味着有 8000+ 个 load 段是无法使用的,那这个 core 文件当然无法直接在 gdb、core-parser 上解析处堆栈。
core-parser> env core
* r_debug: 0x7ba1dcbbf8
* arm mode: thumb
* mNote: 1
* mLoad: 9101
* mQuickLoad: 5947
* mLinkMap: 0
core-parser>
而这个 core 文件在 gdb 上使用的情况会这样。
Program terminated with signal SIGABRT, Aborted.
#0 0x0000007b76f2ebe0 in ?? ()
[Current thread is 1 (LWP 2682)]
(gdb) info sharedlibrary
No shared libraries loaded at this time.
(gdb) info auxv
33 AT_SYSINFO_EHDR System-supplied DSO's ELF header 0x7ba1dbe000
51 AT_MINSIGSTKSZ Minimum stack size for signal delivery 0x1270
16 AT_HWCAP Machine-dependent CPU capability hints 0xffffffff
6 AT_PAGESZ System page size 4096
17 AT_CLKTCK Frequency of times() 100
3 AT_PHDR Program headers for program 0x5b8b3f0040
4 AT_PHENT Size of program header entry 56
5 AT_PHNUM Number of program headers 12
7 AT_BASE Base address of interpreter 0x7ba1c5b000
8 AT_FLAGS Flags 0x0
9 AT_ENTRY Entry point of program 0x5b8b3f4000
11 AT_UID Real user ID 0
12 AT_EUID Effective user ID 0
13 AT_GID Real group ID 0
14 AT_EGID Effective group ID 0
23 AT_SECURE Boolean, was exec setuid-like? 1
25 AT_RANDOM Address of 16 random bytes 0x7fe2b52ac8
26 AT_HWCAP2 Extension of AT_HWCAP 0x801af3ff
31 AT_EXECFN File name of executable 0x7fe2b54fde <error: Cannot access memory at address 0x7fe2b54fde>
15 AT_PLATFORM String identifying platform 0x7fe2b52ad8 <error: Cannot access memory at address 0x7fe2b52ad8>
0 AT_NULL End of vector 0x0
(gdb)
通用方法
几个月前在 core-parser 填了 Fakecore 的坑,可能大家都以为这个功能只是用在 tombstone 文件转换。其实不是的,Fakecore 的设计都是为了做 core 文件修复。
重建 link_map
我们可以通过 fake 系列的指令简单的重构一份 map 表,例如指令 fake map --auto
。
core-parser> fake map --auto
New overlay [100000, 110000)
Create FAKE PHDR
New note overlay [7c750, 1a7338)
Create FAKE DYNAMIC
Create FAKE LINK64 MAP
0x7129489000 /apex/com.android.adbd/lib64/libadb_pairing_auth.so
0x712c400000 /apex/com.android.adbd/lib64/libadb_pairing_connection.so
0x712c551000 /apex/com.android.adbd/lib64/libadb_pairing_server.so
0x7b50649000 /apex/com.android.adbd/lib64/libadbconnection_client.so
0x7b06667000 /apex/com.android.adbd/lib64/libbase.so
0x713067b000 /apex/com.android.adbd/lib64/libc++.so
0x7132e60000 /apex/com.android.adbd/lib64/libcrypto.so
0x7b1f1ec000 /apex/com.android.adbd/lib64/libcrypto_utils.so
0x7b55f57000 /apex/com.android.adbd/lib64/libcutils.so
0x70e1084000 /apex/com.android.appsearch/lib64/libc++.so
0x6f9a653000 /apex/com.android.appsearch/lib64/libicing.so
0x70e1200000 /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so
0x7b506d0000 /apex/com.android.art/lib64/libadbconnection.so
0x7b45e1a000 /apex/com.android.art/lib64/libandroidio.so
0x7b5ac00000 /apex/com.android.art/lib64/libart.so
...
于是我们可以再次 map 查看,但此时的动态库依赖表是没有进行校准的。重新保存的 Fakecore 是无法直接在 gdb 上使用的。
core-parser> map
NUM LINKMAP REGION FLAGS L_ADDR NAME
1 0x102000 [100000, 110000) rw- 100000 [FAKECORE] [*](OVERLAY)(FAKE)
2 0x102030 [ ??? , ??? ) --- 7129489000 /apex/com.android.adbd/lib64/libadb_pairing_auth.so
3 0x102060 [ ??? , ??? ) --- 712c400000 /apex/com.android.adbd/lib64/libadb_pairing_connection.so
4 0x102090 [ ??? , ??? ) --- 712c551000 /apex/com.android.adbd/lib64/libadb_pairing_server.so
5 0x1020c0 [ ??? , ??? ) --- 7b50649000 /apex/com.android.adbd/lib64/libadbconnection_client.so
6 0x1020f0 [ ??? , ??? ) --- 7b06667000 /apex/com.android.adbd/lib64/libbase.so
7 0x102120 [ ??? , ??? ) --- 713067b000 /apex/com.android.adbd/lib64/libc++.so
8 0x102150 [ ??? , ??? ) --- 7132e60000 /apex/com.android.adbd/lib64/libcrypto.so
9 0x102180 [ ??? , ??? ) --- 7b1f1ec000 /apex/com.android.adbd/lib64/libcrypto_utils.so
10 0x1021b0 [ ??? , ??? ) --- 7b55f57000 /apex/com.android.adbd/lib64/libcutils.so
11 0x1021e0 [ ??? , ??? ) --- 70e1084000 /apex/com.android.appsearch/lib64/libc++.so
12 0x102210 [ ??? , ??? ) --- 6f9a653000 /apex/com.android.appsearch/lib64/libicing.so
13 0x102240 [ ??? , ??? ) --- 70e1200000 /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so
14 0x102270 [ ??? , ??? ) --- 7b506d0000 /apex/com.android.art/lib64/libadbconnection.so
15 0x1022a0 [ ??? , ??? ) --- 7b45e1a000 /apex/com.android.art/lib64/libandroidio.so
16 0x1022d0 [ ??? , ??? ) --- 7b5ac00000 /apex/com.android.art/lib64/libart.so
...
校准
由于 l_ld 的值无法构建出来,以及 l_addr 可能与真实的 load 地址不相等的情况,因此需要原动态库文件进行校准。这里可以使用 fake map --sysroot symbols/
。被校准的库会输出 calibrate
字样。
core-parser> fake map --sysroot ./root:./apex
Mmap segment [7129489000, 712949d000) ./apex/com.android.adbd/lib64/libadb_pairing_auth.so [0]
calibrate /apex/com.android.adbd/lib64/libadb_pairing_auth.so l_ld(71294e6dc8)
Mmap segment [712c400000, 712c424000) ./apex/com.android.adbd/lib64/libadb_pairing_connection.so [0]
calibrate /apex/com.android.adbd/lib64/libadb_pairing_connection.so l_ld(712c4b6f88)
Mmap segment [712c551000, 712c569000) ./apex/com.android.adbd/lib64/libadb_pairing_server.so [0]
calibrate /apex/com.android.adbd/lib64/libadb_pairing_server.so l_ld(712c5bb388)
Mmap segment [7b50649000, 7b50661000) ./apex/com.android.adbd/lib64/libadbconnection_client.so [0]
calibrate /apex/com.android.adbd/lib64/libadbconnection_client.so l_ld(7b506b6fe0)
Mmap segment [7b06667000, 7b0667b000) ./apex/com.android.adbd/lib64/libbase.so [0]
calibrate /apex/com.android.adbd/lib64/libbase.so l_ld(7b066b3258)
Mmap segment [713067b000, 713070b000) ./apex/com.android.adbd/lib64/libc++.so [0]
calibrate /apex/com.android.adbd/lib64/libc++.so l_ld(71307d8670)
Mmap segment [7132e60000, 7132ed0000) ./apex/com.android.adbd/lib64/libcrypto.so [0]
calibrate /apex/com.android.adbd/lib64/libcrypto.so l_ld(7132fdb0e0)
Mmap segment [7b1f1ec000, 7b1f1f0000) ./apex/com.android.adbd/lib64/libcrypto_utils.so [0]
calibrate /apex/com.android.adbd/lib64/libcrypto_utils.so l_ld(7b1f1f4020)
Mmap segment [7b55f57000, 7b55f63000) ./apex/com.android.adbd/lib64/libcutils.so [0]
calibrate /apex/com.android.adbd/lib64/libcutils.so l_ld(7b55f74208)
Mmap segment [70e1084000, 70e1114000) ./apex/com.android.appsearch/lib64/libc++.so [0]
calibrate /apex/com.android.appsearch/lib64/libc++.so l_ld(70e11e1670)
Mmap segment [6f9a653000, 6f9a6ab000) ./apex/com.android.appsearch/lib64/libicing.so [0]
calibrate /apex/com.android.appsearch/lib64/libicing.so l_ld(6f9a8d0ae0)
Mmap segment [70e1200000, 70e1248000) ./apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so [0]
calibrate /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so l_ld(70e12b8988)
Mmap segment [7b506d0000, 7b506dc000) ./apex/com.android.art/lib64/libadbconnection.so [0]
calibrate /apex/com.android.art/lib64/libadbconnection.so l_ld(7b506f4108)
Mmap segment [7b45e1a000, 7b45e1e000) ./apex/com.android.art/lib64/libandroidio.so [0]
calibrate /apex/com.android.art/lib64/libandroidio.so l_ld(7b45e22028)
Mmap segment [7b5ac00000, 7b5ae09000) ./apex/com.android.art/lib64/libart.so [0]
calibrate /apex/com.android.art/lib64/libart.so l_ld(7b5be27398)
重建 Fakecore
一般情况下,做完前两步大多数都可以把当前环境保存成新的 core 文件。使用 fake core -r
指令来完成。
core-parser> fake core -r
FakeCore: saved [coredump/core-HeapTaskDaemon-2653.fakecore]
(gdb) info sharedlibrary
From To Syms Read Shared Object Library
No /apex/com.android.adbd/lib64/libadb_pairing_auth.so
No /apex/com.android.adbd/lib64/libadb_pairing_connection.so
No /apex/com.android.adbd/lib64/libadb_pairing_server.so
No /apex/com.android.adbd/lib64/libadbconnection_client.so
No /apex/com.android.adbd/lib64/libbase.so
No /apex/com.android.adbd/lib64/libc++.so
No /apex/com.android.adbd/lib64/libcrypto.so
No /apex/com.android.adbd/lib64/libcrypto_utils.so
No /apex/com.android.adbd/lib64/libcutils.so
No /apex/com.android.appsearch/lib64/libc++.so
No /apex/com.android.appsearch/lib64/libicing.so
No /apex/com.android.appsearch/lib64/libprotobuf-cpp-lite.so
No /apex/com.android.art/lib64/libadbconnection.so
No /apex/com.android.art/lib64/libandroidio.so
No /apex/com.android.art/lib64/libart.so
(gdb) set sysroot symbols/
Reading symbols from symbols/apex/com.android.adbd/lib64/libadb_pairing_auth.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libadb_pairing_connection.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libadb_pairing_server.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libadbconnection_client.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libbase.so...
Reading symbols from symbols/apex/com.android.adbd/lib64/libc++.so...
...
(gdb) bt
#0 abort () at bionic/libc/bionic/abort.cpp:49
#1 0x0000000000000000 in ?? ()
Backtrace stopped: previous frame identical to this frame (corrupt stack?)
core-parser> sysroot symbols
Mmap segment [7129489000, 712949d000) symbols/apex/com.android.adbd/lib64/libadb_pairing_auth.so [0]
Mmap segment [712949d000, 71294e5000) symbols/apex/com.android.adbd/lib64/libadb_pairing_auth.so [14000]
Read symbols[977] (/apex/com.android.adbd/lib64/libadb_pairing_auth.so)
Mmap segment [712c400000, 712c424000) symbols/apex/com.android.adbd/lib64/libadb_pairing_connection.so [0]
Mmap segment [712c424000, 712c4b4000) symbols/apex/com.android.adbd/lib64/libadb_pairing_connection.so [24000]
Read symbols[1645] (/apex/com.android.adbd/lib64/libadb_pairing_connection.so)
Mmap segment [712c551000, 712c569000) symbols/apex/com.android.adbd/lib64/libadb_pairing_server.so [0]
Mmap segment [712c569000, 712c5b9000) symbols/apex/com.android.adbd/lib64/libadb_pairing_server.so [18000]
Read symbols[1143] (/apex/com.android.adbd/lib64/libadb_pairing_server.so)
...
core-parser> bt
ERROR: Please command "env config --sdk <SDK>!!"
Thread("2682")
x0 0x0000000000000000 x1 0x0000000000000a7a x2 0x0000000000000006 x3 0x6f00007b0aa44670
x4 0x0000007b0a939000 x5 0x0000007b0a939000 x6 0x0000007b0a939000 x7 0x0000000000000001
x8 0x00000000000000f0 x9 0x0000000000000000 x10 0xffffff80fffffbdf x11 0x0000007b76f4dfc4
x12 0x0200007200008390 x13 0x0000007b0aa44670 x14 0x0000000000000002 x15 0x02000072ffffffff
x16 0x0000007b76fd6240 x17 0x0000007b76fbadd0 x18 0x0000007b094f0000 x19 0xef00007b0aa44670
x20 0x2f00007b0aa44660 x21 0xaf00007b0aa44640 x22 0x6f00007b0aa44670 x23 0x0000000000000000
x24 0x0000000000000a7a x25 0x0200007300000000 x26 0x0040000e400010ef x27 0x0101010101010101
x28 0x00000007b0aa4467 fp 0x0000007b0aa44700
lr 0x0000007b76f2ebb4 sp 0x0000007b0aa44630 pc 0x0000007b76f2ebe0 pst 0x0000000000001000
Native: #0 0000007b76f2ebe0 abort+0x148
其它扩展用法
可以看到这个 core 修复后,仍然无法将堆栈输出。其实是这个线程栈的段也是被截断的,因此无法进行栈回溯。
core-parser> rd 0x0000007b0aa44630
ERROR: Invalid address 0x7b0aa44630
core-parser> env core --load | grep 7b0aa4
4888 [7b0a93e000, 7b0aa49000) rw- 000010b000 [] [EMPTY](TRUNCATE)
4889 [7b0aa49000, 7b0aa4d000) --- 0000000000 [] [EMPTY](TRUNCATE)
4890 [7b0aa4d000, 7b0ad60000) --- 0000000000 [] [EMPTY](TRUNCATE)
这份 core 文件来自 android 程序发生 native crash,那么当前线程栈内存有一部分可以从 tombstone 文件中获得。于是我们可以将 tombstone 转成 fakecore,在把这个 fakecore 的 7b0a93e000 段保存下来,回写到原来的 core 文件即可。
制作 Fakecore
Tid: 2682
tagged_addr_ctrl 1
pac_enabled_keys f
x0 0x0000000000000000 x1 0x0000000000000a7a x2 0x0000000000000006 x3 0x6f00007b0aa44670
x4 0x0000007b0a939000 x5 0x0000007b0a939000 x6 0x0000007b0a939000 x7 0x0000000000000001
x8 0x00000000000000f0 x9 0x0000000000000000 x10 0xffffff80fffffbdf x11 0x0000007b76f4dfc4
x12 0x0200007200008390 x13 0x0000007b0aa44670 x14 0x0000000000000002 x15 0x02000072ffffffff
x16 0x0000007b76fd6240 x17 0x0000007b76fbadd0 x18 0x0000007b094f0000 x19 0xef00007b0aa44670
x20 0x2f00007b0aa44660 x21 0xaf00007b0aa44640 x22 0x6f00007b0aa44670 x23 0x0000000000000000
x24 0x0000000000000a7a x25 0x0200007300000000 x26 0x0040000e400010ef x27 0x0101010101010101
x28 0x00000007b0aa4467 fp 0x0000007b0aa44700
lr 0x0000007b76f2ebb4 sp 0x0000007b0aa44630 pc 0x0000007b76f2ebe0 pst 0x0000000000001000
/apex/com.android.art/lib64/libart.so:4b8d363e411911cec9861f3127c06d28
/apex/com.android.art/lib64/libbase.so:99e384a650f746069995ae2825eb2eb9
/apex/com.android.runtime/lib64/bionic/hwasan/libc.so:d4a3d36d0a7d2f3a94f8a0160deec5b2
/system/framework/arm64/boot-core-libart.oat:a587436c5addf6ef9cee37ed3e3cdce4
/system/framework/arm64/boot.oat:0924577b1271219e6b02dfeec82c7dc2
Create Fakecore tombstones/tombstone_27.fakecore ...
Core load (0x7fca58002940)
Core env:
* Path:
* Machine: arm64
* Bits: 64
* PointSize: 8
* PointMask: 0xffffffffffffffff
* VabitsMask: 0x7fffffffff
* PageSize: 0x1000
* Remote: false
* Thread: 2682
Switch android(0) env.
New overlay [100000, 104000)
Create FAKE PHDR
New note overlay [844a8, f5ae8)
Create FAKE DYNAMIC
Create FAKE LINK MAP
0x7b5ac00000 /apex/com.android.art/lib64/libart.so:4b8d363e411911cec9861f3127c06d28
0x7b5f4d4000 /apex/com.android.art/lib64/libbase.so:99e384a650f746069995ae2825eb2eb9
0x7b76ec7000 /apex/com.android.runtime/lib64/bionic/hwasan/libc.so:d4a3d36d0a7d2f3a94f8a0160deec5b2
0x71030000 /system/framework/arm64/boot-core-libart.oat:a587436c5addf6ef9cee37ed3e3cdce4
0x70d0c000 /system/framework/arm64/boot.oat:0924577b1271219e6b02dfeec82c7dc2
Create FAKE STRTAB
New overlay [7200000000, 72fffff000)
Core load (0x396f370) tombstones/tombstone_27.fakecore
Core env:
* Path: tombstones/tombstone_27.fakecore
* Machine: arm64
* Bits: 64
* PointSize: 8
* PointMask: 0xffffffffffffffff
* VabitsMask: 0x7fffffffff
* PageSize: 0x1000
* Remote: false
* Thread: 2682
Switch android(0) env.
core-parser> bt
ERROR: Please command "env config --sdk <SDK>!!"
Thread("2682")
x0 0x0000000000000000 x1 0x0000000000000a7a x2 0x0000000000000006 x3 0x6f00007b0aa44670
x4 0x0000007b0a939000 x5 0x0000007b0a939000 x6 0x0000007b0a939000 x7 0x0000000000000001
x8 0x00000000000000f0 x9 0x0000000000000000 x10 0xffffff80fffffbdf x11 0x0000007b76f4dfc4
x12 0x0200007200008390 x13 0x0000007b0aa44670 x14 0x0000000000000002 x15 0x02000072ffffffff
x16 0x0000007b76fd6240 x17 0x0000007b76fbadd0 x18 0x0000007b094f0000 x19 0xef00007b0aa44670
x20 0x2f00007b0aa44660 x21 0xaf00007b0aa44640 x22 0x6f00007b0aa44670 x23 0x0000000000000000
x24 0x0000000000000a7a x25 0x0200007300000000 x26 0x0040000e400010ef x27 0x0101010101010101
x28 0x00000007b0aa4467 fp 0x0000007b0aa44700
lr 0x0000007b76f2ebb4 sp 0x0000007b0aa44630 pc 0x0000007b76f2ebe0 pst 0x0000000000001000
Native: #00 0000007b76f2ebe0
Native: #01 0000007b76f2ebb0
Native: #02 0000007b5bb716d0
Native: #03 0000007b5f4f155c
Native: #04 0000007b5f4f0204
Native: #05 0000007b5b67c304
Native: #06 0000007b5b35c544
Native: #07 0000007b5b6503dc
Native: #08 0000007b5b53d154
Native: #09 0000007b5b4997f4
Native: #10 0000007b5b49918c
Native: #11 0000007b5b49de18
Native: #12 0000007b5b336330
Native: #13 0000000071045d00
Native: #14 6300007b0aa4549c
core-parser>
回写段内存
先从 tombstone 转后的 fakecore 中读取该线程栈内存段。
core-parser> file 0x0000007b0aa44700
[7b0a93e000, 7b0aa49000) 0000000000000000 [anon:stack_and_tls:2682]
core-parser> rd 7b0a93e000 -e 7b0aa49000 -f 7b0a93e000.bin
Saved [7b0a93e000.bin].
重新映射到之前的 core-parser 环境中。
core-parser> mmap 7b0a93e000 7b0a93e000.bin
Mmap segment [7b0a93e000, 7b0aa49000) 7b0a93e000.bin [0]
core-parser> bt
ERROR: Please command "env config --sdk <SDK>!!"
Thread("2682")
x0 0x0000000000000000 x1 0x0000000000000a7a x2 0x0000000000000006 x3 0x6f00007b0aa44670
x4 0x0000007b0a939000 x5 0x0000007b0a939000 x6 0x0000007b0a939000 x7 0x0000000000000001
x8 0x00000000000000f0 x9 0x0000000000000000 x10 0xffffff80fffffbdf x11 0x0000007b76f4dfc4
x12 0x0200007200008390 x13 0x0000007b0aa44670 x14 0x0000000000000002 x15 0x02000072ffffffff
x16 0x0000007b76fd6240 x17 0x0000007b76fbadd0 x18 0x0000007b094f0000 x19 0xef00007b0aa44670
x20 0x2f00007b0aa44660 x21 0xaf00007b0aa44640 x22 0x6f00007b0aa44670 x23 0x0000000000000000
x24 0x0000000000000a7a x25 0x0200007300000000 x26 0x0040000e400010ef x27 0x0101010101010101
x28 0x00000007b0aa4467 fp 0x0000007b0aa44700
lr 0x0000007b76f2ebb4 sp 0x0000007b0aa44630 pc 0x0000007b76f2ebe0 pst 0x0000000000001000
Native: #00 0000007b76f2ebe0 abort+0x148
Native: #01 0000007b76f2ebb0 abort+0x118
Native: #02 0000007b5bb716d0 art::Runtime::Abort(char const*)+0x33c
Native: #03 0000007b5f4f155c android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+0xc4
Native: #04 0000007b5f4f0204 android::base::LogMessage::~LogMessage()+0x378
Native: #05 0000007b5b67c304 art::gc::collector::ConcurrentCopying::ProcessMarkStack()+0x1a8c
Native: #06 0000007b5b35c544 art::gc::collector::ConcurrentCopying::CopyingPhase()+0x9f8
Native: #07 0000007b5b6503dc art::gc::collector::ConcurrentCopying::RunPhases()+0xcf4
Native: #08 0000007b5b53d154 art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+0x224
Native: #09 0000007b5b4997f4 art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool, unsigned int)+0x544
Native: #10 0000007b5b49918c art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool, unsigned int)+0xd0
Native: #11 0000007b5b49de18 art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+0xa4
Native: #12 0000007b5b336330 art::gc::TaskProcessor::RunAllTasks(art::Thread*)+0x408
Native: #13 0000000071045d00 art_jni_trampoline+0x70
Native: #14 6300007b0aa4549c
可以看到此时的纯 Native 堆栈已经可以输出了。而这个问题与虚拟机相关,有时候避免不了要解析 Java 相关内存。并且从一开始我们知道这个 core 有 600+ 个 load 段是有效的,因此涵盖了所有的 Java 堆内存,因此也可以尝试进行修复。
虚拟机相关内存段修复
在此之前,我们需要设置 SDK 环境。
core-parser> env config --sdk 35
Switch android(35) env.
用指令 dex
查看相关的 jar 包是否在原 core 文件中。可见也是截断的。
core-parser> dex
NUM DEXCACHE REGION FLAGS NAME
1 0x00000000 [7b8c81d000, 7b8c825000) r-- /apex/com.android.rkpd/javalib/service-rkp.jar [EMPTY](TRUNCATE)
2 0x00000000 [7b89cb7000, 7b89cbc000) r-- /apex/com.android.virt/javalib/service-virtualization.jar [EMPTY](TRUNCATE)
3 0x00000000 [7b8c8bb000, 7b8c8c0000) r-- /apex/com.android.compos/javalib/service-compos.jar [EMPTY](TRUNCATE)
4 0x00000000 [7b39cea000, 7b39de1000) r-- /apex/com.android.uwb/javalib/service-uwb.jar [EMPTY](TRUNCATE)
5 0x00000000 [7b65017000, 7b65080000) r-- /apex/com.android.healthfitness/javalib/service-healthfitness.jar [EMPTY](TRUNCATE)
6 0x00000000 [7b39fcc000, 7b3a043000) r-- /apex/com.android.profiling/javalib/service-profiling.jar [EMPTY](TRUNCATE)
7 0x00000000 [7b93c61000, 7b93c64000) r-- /apex/com.android.ondevicepersonalization/javalib/service-ondevicepersonalization.jar [EMPTY](TRUNCATE)
8 0x00000000 [7b8c929000, 7b8c93f000) r-- /apex/com.android.adservices/javalib/service-adservices.jar [EMPTY](TRUNCATE)
9 0x00000000 [7b89a7d000, 7b89a87000) r-- /apex/com.android.scheduling/javalib/service-scheduling.jar [EMPTY](TRUNCATE)
10 0x00000000 [7b89af0000, 7b89afa000) r-- /apex/com.android.os.statsd/javalib/service-statsd.jar [EMPTY](TRUNCATE)
...
因此我们可以重新 sysroot ./apex:./root --dex
。带 dex 选项意思是仅加载 Java 相关依赖。
core-parser> sysroot ./apex:./root --dex
Mmap segment [7b8c81d000, 7b8c825000) ./apex/com.android.rkpd/javalib/service-rkp.jar [0]
Mmap segment [7b89cb7000, 7b89cbc000) ./apex/com.android.virt/javalib/service-virtualization.jar [0]
Mmap segment [7b39cea000, 7b39de1000) ./apex/com.android.uwb/javalib/service-uwb.jar [0]
Mmap segment [7b65017000, 7b65080000) ./apex/com.android.healthfitness/javalib/service-healthfitness.jar [0]
Mmap segment [7b39fcc000, 7b3a043000) ./apex/com.android.profiling/javalib/service-profiling.jar [0]
Mmap segment [7b93c61000, 7b93c64000) ./apex/com.android.ondevicepersonalization/javalib/service-ondevicepersonalization.jar [0]
Mmap segment [7b8c929000, 7b8c93f000) ./apex/com.android.adservices/javalib/service-adservices.jar [0]
Mmap segment [7b89a7d000, 7b89a87000) ./apex/com.android.scheduling/javalib/service-scheduling.jar [0]
Mmap segment [7b89af0000, 7b89afa000) ./apex/com.android.os.statsd/javalib/service-statsd.jar [0]
...
再次进行 bt,可以看到 Java 相关的信息已经输出。但仍存在些问题,由于缺失的内存段实在太多了,无法完全修复。
core-parser> bt
"HeapTaskDaemon" sysTid=2682 WaitingPerformingGc
| group="system" daemon=1 prio=5 target=0x6fab8c58 uncaught_exception=0x0
| tid=6 sCount=0 flags=0 obj=0xa204e30 self=0x560000538b408000 env=0x8f0000428b413f40
| stack=0x7b0a93e000-0x7b0a942000 stackSize=0x107730 handle=0x7b0aa45730
| mutexes=0x560000538b408798 held="abort lock"
x0 0x0000000000000000 x1 0x0000000000000a7a x2 0x0000000000000006 x3 0x6f00007b0aa44670
x4 0x0000007b0a939000 x5 0x0000007b0a939000 x6 0x0000007b0a939000 x7 0x0000000000000001
x8 0x00000000000000f0 x9 0x0000000000000000 x10 0xffffff80fffffbdf x11 0x0000007b76f4dfc4
x12 0x0200007200008390 x13 0x0000007b0aa44670 x14 0x0000000000000002 x15 0x02000072ffffffff
x16 0x0000007b76fd6240 x17 0x0000007b76fbadd0 x18 0x0000007b094f0000 x19 0xef00007b0aa44670
x20 0x2f00007b0aa44660 x21 0xaf00007b0aa44640 x22 0x6f00007b0aa44670 x23 0x0000000000000000
x24 0x0000000000000a7a x25 0x0200007300000000 x26 0x0040000e400010ef x27 0x0101010101010101
x28 0x00000007b0aa4467 fp 0x0000007b0aa44700
lr 0x0000007b76f2ebb4 sp 0x0000007b0aa44630 pc 0x0000007b76f2ebe0 pst 0x0000000000001000
Native: #00 0000007b76f2ebe0 abort+0x148
Native: #01 0000007b76f2ebb0 abort+0x118
Native: #02 0000007b5bb716d0 art::Runtime::Abort(char const*)+0x33c
Native: #03 0000007b5f4f155c android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*)+0xc4
Native: #04 0000007b5f4f0204 android::base::LogMessage::~LogMessage()+0x378
Native: #05 0000007b5b67c304 art::gc::collector::ConcurrentCopying::ProcessMarkStack()+0x1a8c
Native: #06 0000007b5b35c544 art::gc::collector::ConcurrentCopying::CopyingPhase()+0x9f8
Native: #07 0000007b5b6503dc art::gc::collector::ConcurrentCopying::RunPhases()+0xcf4
Native: #08 0000007b5b53d154 art::gc::collector::GarbageCollector::Run(art::gc::GcCause, bool)+0x224
Native: #09 0000007b5b4997f4 art::gc::Heap::CollectGarbageInternal(art::gc::collector::GcType, art::gc::GcCause, bool, unsigned int)+0x544
Native: #10 0000007b5b49918c art::gc::Heap::ConcurrentGC(art::Thread*, art::gc::GcCause, bool, unsigned int)+0xd0
Native: #11 0000007b5b49de18 art::gc::Heap::ConcurrentGCTask::Run(art::Thread*)+0xa4
Native: #12 0000007b5b336330 art::gc::TaskProcessor::RunAllTasks(art::Thread*)+0x408
Native: #13 0000000071045d00 art_jni_trampoline+0x70
Native: #14 6300007b0aa4549c
JavaKt: #0 0000000000000000 dalvik.system.VMRuntime.runHeapTasks
JavaKt: #1 0000007b5e99f13e java.lang.Daemons$HeapTaskDaemon.runInternal
JavaKt: #2 0000007b5e99e3de java.lang.Daemons$Daemon.run
JavaKt: #3 0000007b5a717b88 java.lang.Thread.run
core-parser> space
TYPE REGION ADDRESS NAME
5 [0xa080000, 0x4a080000) 0xfc00004d8b3e1000 main space (region space)
0 [0x6f7e4000, 0x6f923b08) 0xae0000468b3e0160 /system/framework/arm64/boot.art
0 [0x6fab8000, 0x6fad5178) 0xc40000468b3e4a40 /system/framework/arm64/boot-core-libart.art
0 [0x6fb00000, 0x6fb120a0) 0x430000468b3e4ba0 /system/framework/arm64/boot-okhttp.art
0 [0x6fb2c000, 0x6fb4a260) 0xc30000468b3e02c0 /system/framework/arm64/boot-bouncycastle.art
0 [0x6fb60000, 0x6fb60a18) 0x3a0000468b3e0420 /system/framework/arm64/boot-apache-xml.art
0 [0x6fb64000, 0x70153f20) 0xbc0000468b3e0580 /system/framework/arm64/boot-framework.art
core-parser> p 0xa080000
Size: 0x18
Padding: 0x7
Object Name: java.lang.StringBuilder
// extends java.lang.AbstractStringBuilder
[0x10] byte coder = 0x0
[0x0c] int count = 65
[0x08] byte[] value = 0xa080078
// extends java.lang.Object
[0x04] private transient int shadow$_monitor_ = 0
[0x00] private transient java.lang.Class shadow$_klass_ = 0x6f878668
core-parser>
修复效果
将前面完成了回写线程栈内存段,以及 Java 相关段的 core-parser 环境,再次 fake core -r
保存。
core-parser> fake core -r
FakeCore: saved [coredump/core-HeapTaskDaemon-2653.fakecore]
(gdb) bt
#0 abort () at bionic/libc/bionic/abort.cpp:49
#1 0x0000007b5bb716d4 in art::Runtime::Abort (msg=<optimized out>) at art/runtime/runtime.cc:766
#2 0x0000007b5f4f1560 in std::__1::__function::__value_func<void (char const*)>::operator()[abi:nn180000](char const*&&) const (this=0x7b76f4dfc4 <sigaddset64(sigset64_t*, int)>,
__args=@0x7b0aa44840: 0x6500003d8b865800 "Check failed: region_space_->IsLargeObject(to_ref) ") at prebuilts/clang/host/linux-x86/clang-r522817/include/c++/v1/__functional/function.h:425
#3 std::__1::function<void(char const*)>::operator() (this=0x7b76f4dfc4 <sigaddset64(sigset64_t*, int)>, __arg=0x6500003d8b865800 "Check failed: region_space_->IsLargeObject(to_ref) ")
at prebuilts/clang/host/linux-x86/clang-r522817/include/c++/v1/__functional/function.h:978
#4 android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::operator()(char const*) const (this=<optimized out>, abort_message=<optimized out>) at system/libbase/logging.cpp:425
#5 android::base::SetAborter(std::__1::function<void (char const*)>&&)::$_0::__invoke(char const*) (abort_message=<optimized out>) at system/libbase/logging.cpp:425
#6 0x0000007b5f4f0208 in android::base::LogMessage::~LogMessage (this=0xb600007b0aa44a40) at system/libbase/logging.cpp:513
#7 0x0000007b5b67c308 in art::gc::collector::ConcurrentCopying::ProcessMarkStackRef (this=0xaa00004f8b3e3000, to_ref=<optimized out>) at art/runtime/gc/collector/concurrent_copying.cc:2293
#8 art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce()::$_0::operator()(art::mirror::Object*) const [clone .__uniq.219178288367021339109957061695789229157] (this=<optimized out>,
ref=<optimized out>) at art/runtime/gc/collector/concurrent_copying.cc:2164
#9 art::gc::collector::ConcurrentCopying::ProcessThreadLocalMarkStacks<art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce()::$_0>(bool, art::Closure*, art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce()::$_0 const&) [clone .__uniq.219178288367021339109957061695789229157] (this=0xaa00004f8b3e3000, disable_weak_ref_access=false, checkpoint_callback=0x0, processor=...)
at art/runtime/gc/collector/concurrent_copying.cc:2244
#10 art::gc::collector::ConcurrentCopying::ProcessMarkStackOnce (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:2160
#11 art::gc::collector::ConcurrentCopying::ProcessMarkStack (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:2142
#12 0x0000007b5b35c548 in art::gc::collector::ConcurrentCopying::CopyingPhase (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:1640
#13 0x0000007b5b6503e0 in art::gc::collector::ConcurrentCopying::RunPhases (this=0xaa00004f8b3e3000) at art/runtime/gc/collector/concurrent_copying.cc:257
#14 0x0000007b5b53d158 in art::gc::collector::GarbageCollector::Run (this=0xaa00004f8b3e3000, gc_cause=<optimized out>, clear_soft_references=<optimized out>)
at art/runtime/gc/collector/garbage_collector.cc:220
#15 0x0000007b5b4997f8 in art::gc::Heap::CollectGarbageInternal (this=<optimized out>, gc_type=<optimized out>, gc_cause=<optimized out>, clear_soft_references=false, requested_gc_num=<optimized out>)
at art/runtime/gc/heap.cc:3024
#16 0x0000007b5b499190 in art::gc::Heap::ConcurrentGC (this=0x5000004f8b3e0800, self=<optimized out>, cause=art::gc::kGcCauseBackground, force_full=false, requested_gc_num=4314)
at art/runtime/gc/heap.cc:4341
#17 0x0000007b5b49de1c in art::gc::Heap::ConcurrentGCTask::Run (this=0x8400003c8b72e120, self=0x560000538b408000) at art/runtime/gc/heap.cc:4181
#18 0x0000007b5b336334 in art::gc::TaskProcessor::RunAllTasks (this=0x8e00003f8b3e9500, self=0x560000538b408000) at art/runtime/gc/task_processor.cc:158
#19 0x0000000071045d04 in dalvik::system::VMRuntime::nativeSetTargetHeapUtilization (this=...)
from symbols/system/framework/arm64/boot-core-libart.oat
#20 0x000000007107ac0c in java::lang::Daemons$HeapTaskDaemon::runInternal (this=...) at java/lang/Daemons.java:734
#21 0x00000000710534d0 in java::lang::Daemons$Daemon::run (this=...) at java/lang/Daemons.java:122
#22 0x0000000070e6ad8c in java::lang::Thread::run (this=...)
from symbols/system/framework/arm64/boot.oat
#23 0x0000007b5b2c3378 in art_quick_invoke_stub () at art/runtime/arch/arm64/quick_entrypoints_arm64.S:672
#24 0x0000007b5b26e24c in art::ArtMethod::Invoke (this=0x6fa16b60, self=0x560000538b408000, args=0xa300007b0aa45490, args_size=4, result=0xe300007b0aa454b0, shorty=0x6300007b0aa454a0 "V")
at art/runtime/art_method.cc:422
#25 0x0000007b5bb92408 in InvokeInstance (this=<optimized out>, self=<optimized out>, receiver=...) at art/runtime/art_method-inl.h:202
#26 0x0000007b5b3e0140 in art::Thread::CreateCallback (arg=0x560000538b408000) at art/runtime/thread.cc:681
#27 0x0000000000000000 in ?? ()
最后
办法总比困难多。