本文分析基于Android S(12)
前言
关于Native内存的调试工具,其实我已经介绍了好几种,有ASan、HWASan和MTE。本来我是不打算再写内存调试工具的文章的,但是几天前"字节跳动终端技术"发布了一篇文章《字节Android Native Crash治理之Memory Corruption工具原理与实践》,描述了该工具在APP侧所带来的价值,确实令人眼前一亮。不过我发现它内部的核心原理借鉴了GWP-ASan。本着其他APP厂家工程师可以参考的目的,我想这篇文章可能也有点价值。毕竟,来源于开源社区的好东西应该让大家一起使用。
1. 简介
在没有MTE之前,所有的内存调试工具都有个通病:性能和内存的开销使得它们很难用于线上场景。因此有人就想到采样的方式,比如1000次分配我们只关注其中的1次。当线上机器足够多时,即便1/1000的采样率也能够检测出大多数问题。这样的想法挺好,不过内存非法访问该如何检测呢?ASan和HWASan中采用的是在ldr/str指令前插桩的方式,不过这种方式需要重新编译,因此不适合线上场景。另一种方式就是通过控制页的读写权限来检测,譬如我们将某一页的权限改为不可读写,那么随后所有的读写访问都将产生SIGSEGV的错误。后一种方式虽然可以检测,但也有自身的弊病:该方式最小的内存分配单元是一页,而malloc申请的空间通常远小于一页,因此会浪费很多内存。不过好在采样场景下的采样内存占比较小,即便它们最小都是一页,也不会增加多少开销。
基于这样的想法,Google开发了GWP-ASan工具。它的名称是个递归缩略词:GWP-ASan Will Provide Allocation SANity”。那为什么是G开头呢?因为是Google开发的嘛。这个工具在Android R(11)上被正式引入,在此之前,它已经在Chromium项目中被充分验证。
GWP-Asan也借鉴了其他工具的思想:GWP-ASan is based on the classic Electric Fence Malloc Debugger, with a key adaptation.
2. 基本原理
在Android中,当我们调用malloc时会做一层分发(dispatch,发生在bionic库中),譬如分配器的选择(选择jemalloc还是scudo)就是在这里进行的。因此我们可以在分发时进行采样:当采样命中时,走GWP-ASan的分配策略;当采样未命中时,走默认分配器的分配策略。
GWP-ASan的核心分配逻辑就是从Guarded Pool Memory中找出可用空间。
Guarded Pool Memory中的内存以页为单位连续排列,其中Guard Page(不可读写的页,用于检测溢出型错误)和Allocable Page(可用于分配的页,在GWP-ASan的实现中又称为slot,插槽的意思)间隔分布,且最前面和最后面的页都是Guard Page。这样一来,每个Allocable Page的前后两页都是Guard Page。
Malloc实际分配的空间一般都小于一页,因此这就涉及到分配出的内存在Allocable Page中是左对齐还是右对齐的问题。左对齐的话可以检测出Underflow,因为下溢出会访问到左边的Guard Page;与之相反,右对齐的话就可以检测出Overflow。GWP-ASan采用随机对齐的方式,这样只要测试的机器足够多,同一个对象的Underflow和Overflow问题都可以被检测到。
接着是Use-After-Free(UAF)问题的检测。
当对一块GWP-ASan分配的内存进行释放时,系统会将它所在的页标记为不可读写的状态(通过mprotect系统调用),这样后续对该内存的访问可以被立即检测出来,并判定为UAF的问题。
不过从解决问题的角度来看,光检测是不够的,我们还需要当初这块内存的分配调用栈(以及释放调用栈,如果有的话)。只有配合这些信息,我们才能够明白溢出来自何处,而释放后的使用又源自哪里。因此我们还需要一块内存,用来维护每个Allocable Page(slot)分配和释放时的调用栈信息,GWP-ASan中称其为Metadata。
此外,哪个Allocable Page正在被使用,哪个处于空闲态,也需要通过一个FreeSlots数组来记录。由于Guarded Pool Memory内部的结构是固定的,因此我们可以为每块Allocable Page打上序号。这样数组中只要记录所有空闲Allocable Page的序号就能解决问题:后续新的分配到来,从该数组中为其寻找安身之所的序号。这样一来,GWP-ASan共需要三块区域来支持内存分配,如下所示。
信息收集后还需考虑一个问题,即问题发生时相关信息的输出。由于问题发生时会发送SIGSEGV信号,因此首先想到的是去拦截信号处理函数。Android中SIGSEGV默认的处理办法是生成tombstone文件(Android 11开始APP已经有接口可以获取到本进程生成的tombstone文件了),因此GWP-ASan在生成tombstone的路径上增加了一些逻辑:当fault address落在Guarded Pool Memory中,会将相应的Metadata输出到tombstone文件。
3. 具体实现
介绍完基本原理后,我们还需要从源码角度来看看它的具体实现。因为只有了解完细节,才能称得上理解。
3.1 malloc的分发过程
当我们调用bionic库中的malloc函数时,实际上调用的是下面的代码。
extern "C" void* malloc(size_t bytes) {
auto dispatch_table = GetDispatchTable();
void *result;
if (__predict_false(dispatch_table != nullptr)) {
result = dispatch_table->malloc(bytes); //采用分发表来调用具体实现
} else {
result = Malloc(malloc)(bytes);
}
if (__predict_false(result == nullptr)) {
warning_log("malloc(%zu) failed: returning null pointer", bytes);
return nullptr;
}
return MaybeTagPointer(result); //给地址高位打上0xB4的tag
}
其中有两个关键的点值得关注:
- 如果dispatch table(分发表)存在,则通过它调用具体的实现;否则走默认的流程。
- 所有Native堆上分配出的内存都会在地址高位打上一个tag,Android 11和12上默认打上的都是0xB4的固定tag。主要目的是给MTE的推广做马前卒,提前找出高位增加tag可能引发的兼容性问题。
那么这时我们就要思考:分发表的初始化是在何时发生的呢?
当一个库被动态加载时,库中所有标记为"constructor"的函数将会依次触发,譬如下面的函数。
__attribute__((constructor(1))) static void __libc_preinit() {
// The linker has initialized its copy of the global stack_chk_guard, and filled in the main
// thread's TLS slot with that value. Initialize the local global stack guard with its value.
__stack_chk_guard = reinterpret_cast<uintptr_t>(__get_tls()[TLS_SLOT_STACK_GUARD]);
__libc_preinit_impl();
}
该函数最终会调用MaybeInitGwpAsan
。
bool MaybeInitGwpAsan(libc_globals* globals, bool force_init) {
...
if (!force_init && !ShouldGwpAsanSampleProcess()) { //是否需要采用GWP-ASan的分发模式
return false;
}
...
if (GetDispatchTable() == nullptr) {
atomic_store(&globals->current_dispatch_table, &gwp_asan_dispatch); //修改分发表
}
...
gwp_asan_initialize(NativeAllocatorDispatch(), nullptr, nullptr); //初始化GWP-ASan
return true;
}
其中也可以分为3个步骤。
-
如果该进程没有指定强制开启GWP-ASan,那么是否开启由
ShouldGwpAsanSampleProcess
决定。其内部的逻辑是128个启动进程中只有1个开启GWP-ASan。static constexpr uint8_t kProcessSampleRate = 128; //128个进程中只有1个会开启GWP-ASan bool ShouldGwpAsanSampleProcess() { uint8_t random_number; __libc_safe_arc4random_buf(&random_number, sizeof(random_number)); return random_number % kProcessSampleRate == 0; }
-
当第一步决定开启GWP-ASan后,这里将dispatch table修改为gwp_asan_dispatch。因此当我们调用malloc时,实际上调用的是gwp_asan_malloc。
static const MallocDispatch gwp_asan_dispatch __attribute__((unused)) = { gwp_asan_calloc, gwp_asan_free, Malloc(mallinfo), gwp_asan_malloc, gwp_asan_malloc_usable_size, Malloc(memalign), Malloc(posix_memalign), #if defined(HAVE_DEPRECATED_MALLOC_FUNCS) Malloc(pvalloc), #endif gwp_asan_realloc, #if defined(HAVE_DEPRECATED_MALLOC_FUNCS) Malloc(valloc), #endif gwp_asan_malloc_iterate, gwp_asan_malloc_disable, gwp_asan_malloc_enable, Malloc(mallopt), Malloc(aligned_alloc), Malloc(malloc_info), };
-
调用
gwp_asan_initialize
对GWP-ASan进行初始化。传入的NativeAllocatorDispatch()
是平台默认的malloc实现,譬如Android 11上默认的malloc实现为scudo_malloc。
设定完分发表后,之后我们再调用malloc就会进入到gwp_asan_malloc
函数。
3.2 初始化
bool gwp_asan_initialize(const MallocDispatch* dispatch, bool*, const char*) {
prev_dispatch = dispatch;
Options Opts;
Opts.Enabled = true;
Opts.MaxSimultaneousAllocations = 32;
Opts.SampleRate = 2500;
Opts.InstallSignalHandlers = false;
Opts.InstallForkHandlers = true;
Opts.Backtrace = android_unsafe_frame_pointer_chase;
GuardedAlloc.init(Opts);
// TODO(b/149790891): The log line below causes ART tests to fail as they're
// not expecting any output. Disable the output for now.
// info_log("GWP-ASan has been enabled.");
__libc_shared_globals()->gwp_asan_state = GuardedAlloc.getAllocatorState();
__libc_shared_globals()->gwp_asan_metadata = GuardedAlloc.getMetadataRegion();
return true;
}
gwp_asan_initialize
中主要做了两件事情:
- 设定配置参数,并调用GuardedAlloc.init对GWP-ASan的分配器进行初始化。
- 将分配器中两个指针赋值给全局变量,这样可以在生成tombstone时方便地获取它们,通过它们便可以得知重要的调试信息。
配置参数中关键的是以下三个:
- MaxSimultaneousAllocations:Guarded Pool Memory中可用于分配的slots(pages)数量。
- SampleRate:采样率,2500次分配会触发1次采样,进而从Guarded Pool Memory中分配。
- Backtrace:获取调用栈时采用的函数,这里采用的是基于FP的栈回溯方案。通常而言,它只适用于64位场景。因为只有在64位场景下,FP才会默认存入栈中。
void GuardedPoolAllocator::init(const options::Options &Opts) {
...
size_t PoolBytesRequired =
PageSize * (1 + State.MaxSimultaneousAllocations) +
State.MaxSimultaneousAllocations * State.maximumAllocationSize();
void *GuardedPoolMemory = reserveGuardedPool(PoolBytesRequired);
size_t BytesRequired =
roundUpTo(State.MaxSimultaneousAllocations * sizeof(*Metadata), PageSize);
Metadata = reinterpret_cast<AllocationMetadata *>(
map(BytesRequired, kGwpAsanMetadataName));
// Allocate memory and set up the free pages queue.
BytesRequired = roundUpTo(
State.MaxSimultaneousAllocations * sizeof(*FreeSlots), PageSize);
FreeSlots =
reinterpret_cast<size_t *>(map(BytesRequired, kGwpAsanFreeSlotsName));
// Multiply the sample rate by 2 to give a good, fast approximation for (1 /
// SampleRate) chance of sampling.
if (Opts.SampleRate != 1)
AdjustedSampleRatePlusOne = static_cast<uint32_t>(Opts.SampleRate) * 2 + 1;
else
AdjustedSampleRatePlusOne = 2;
initPRNG();
getThreadLocals()->NextSampleCounter =
((getRandomUnsigned32() % (AdjustedSampleRatePlusOne - 1)) + 1) &
ThreadLocalPackedVariables::NextSampleCounterMask;
State.GuardedPagePool = reinterpret_cast<uintptr_t>(GuardedPoolMemory);
State.GuardedPagePoolEnd =
reinterpret_cast<uintptr_t>(GuardedPoolMemory) + PoolBytesRequired;
...
}
GuardedPoolAllocator::init
中最重要的就是对三块区域的初始化过程。当MaxSimultaneousAllocations为32时,Guarded Pool Memory需要分配65页,Metadata需要分配5页,FreeSlots需要分配1页,共耗费284KiB(1Kib=1024bytes)。所有的分配都通过mmap进行,区别在于Guarded Pool Memory初始化后所有的页都不可读写,实际分配时再改变页的属性,而Metadata和FreeSlots在初始化后便可以读写了。
3.3 采样过程
采样过程发生在gwp_asan_malloc
函数中。
void* gwp_asan_malloc(size_t bytes) {
if (__predict_false(GuardedAlloc.shouldSample())) {
if (void* result = GuardedAlloc.allocate(bytes)) {
return result;
}
}
return prev_dispatch->malloc(bytes);
}
当GuardedAlloc.shouldSample()返回true时,最终的内存申请走GuardedAlloc.allocate
函数,否则走prev_dispatch->malloc
函数,也即默认的malloc实现。
// Return whether the allocation should be randomly chosen for sampling.
GWP_ASAN_ALWAYS_INLINE bool shouldSample() {
// NextSampleCounter == 0 means we "should regenerate the counter".
// == 1 means we "should sample this allocation".
// AdjustedSampleRatePlusOne is designed to intentionally underflow. This
// class must be valid when zero-initialised, and we wish to sample as
// infrequently as possible when this is the case, hence we underflow to
// UINT32_MAX.
if (GWP_ASAN_UNLIKELY(getThreadLocals()->NextSampleCounter == 0))
getThreadLocals()->NextSampleCounter =
((getRandomUnsigned32() % (AdjustedSampleRatePlusOne - 1)) + 1) &
ThreadLocalPackedVariables::NextSampleCounterMask;
return GWP_ASAN_UNLIKELY(--getThreadLocals()->NextSampleCounter == 0);
}
shouldSample
中为每个线程随机生成一个NextSampleCounter,每调用一次malloc则减一。当NextSampleCounter为1时(--后为0),则shouldSample
返回true,分配从Guarded Pool Memory中取,除此之外都返回false。当NextSampleCounter为0时,生成一个新的随机值。这种处理方式有两个好处:
- 不用每次分配都生成随机数,提升效率。
- 为每个线程单独生成一个随机数,避免竞争。
Android中默认的采样率为1/2500,因此AdjustedSampleRatePlusOne的值为5001。至于为什么要乘2,是因为此种采样策略的概率应该取所有随机数的均值。
AdjustedSampleRatePlusOne = static_cast<uint32_t>(Opts.SampleRate) * 2 + 1;
3.4 分配过程
void *GuardedPoolAllocator::allocate(size_t Size, size_t Alignment) {
...
size_t Index;
{
ScopedLock L(PoolMutex);
Index = reserveSlot();
}
if (Index == kInvalidSlotID)
return nullptr;
uintptr_t SlotStart = State.slotToAddr(Index);
AllocationMetadata *Meta = addrToMetadata(SlotStart);
uintptr_t SlotEnd = State.slotToAddr(Index) + State.maximumAllocationSize();
uintptr_t UserPtr;
// Randomly choose whether to left-align or right-align the allocation, and
// then apply the necessary adjustments to get an aligned pointer.
if (getRandomUnsigned32() % 2 == 0)
UserPtr = alignUp(SlotStart, Alignment);
else
UserPtr = alignDown(SlotEnd - Size, Alignment);
assert(UserPtr >= SlotStart);
assert(UserPtr + Size <= SlotEnd);
...
allocateInGuardedPool(
reinterpret_cast<void *>(getPageAddr(UserPtr, PageSize)),
roundUpTo(Size, PageSize));
Meta->RecordAllocation(UserPtr, Size);
{
ScopedLock UL(BacktraceMutex);
Meta->AllocationTrace.RecordBacktrace(Backtrace);
}
return reinterpret_cast<void *>(UserPtr);
}
分配过程可以分为以下几个步骤:
- 寻找空闲slot,并选择对齐方式。
- 通过mprotect改变该slot的访问属性。
- 将此次分配的调用栈记录到对应的Metadata结构体中。
size_t GuardedPoolAllocator::reserveSlot() {
// Avoid potential reuse of a slot before we have made at least a single
// allocation in each slot. Helps with our use-after-free detection.
if (NumSampledAllocations < State.MaxSimultaneousAllocations)
return NumSampledAllocations++;
if (FreeSlotsLength == 0)
return kInvalidSlotID;
size_t ReservedIndex = getRandomUnsigned32() % FreeSlotsLength;
size_t SlotIndex = FreeSlots[ReservedIndex];
FreeSlots[ReservedIndex] = FreeSlots[--FreeSlotsLength];
return SlotIndex;
}
寻找空闲slot的核心在于找到它的序号,如下所示。这里需要借助FreeSlots:随机从FreeSlots中找到一个空闲的序号,并将FreeSlotsLength减一。不过最初的32次分配无需寻找空闲块,而是顺次返回相应slot的序号。这么做的目的是为了尽量延长空闲slot存在的时间(如果有尚未使用的slot,就不要去用之前释放的),以增大UAF问题的检测几率。
void GuardedPoolAllocator::allocateInGuardedPool(void *Ptr, size_t Size) const {
assert((reinterpret_cast<uintptr_t>(Ptr) % State.PageSize) == 0);
assert((Size % State.PageSize) == 0);
Check(mprotect(Ptr, Size, PROT_READ | PROT_WRITE) == 0,
"Failed to allocate in guarded pool allocator memory");
MaybeSetMappingName(Ptr, Size, kGwpAsanAliveSlotName);
}
allocateInGuardedPool
会为分配出来的slot改变访问权限,这里使用的是mprotect系统调用。改变完权限后,相应vma的名称也发生了改变,变成了"GWP-ASan Alive Slot"。
// Name of actively-occupied slot mappings.
static constexpr const char *kGwpAsanAliveSlotName = "GWP-ASan Alive Slot";
获取的调用栈信息本质上是由PC值构成的数组,而AllocationTrace.RecordBacktrace
里对该数组又做了一些压缩,提升内存使用效率。
这样一来,我们的内存分配了,该记录的调试信息也都记录到位了。
3.5 释放过程
void gwp_asan_free(void* mem) {
if (__predict_false(GuardedAlloc.pointerIsMine(mem))) {
GuardedAlloc.deallocate(mem);
return;
}
prev_dispatch->free(mem);
}
所有Native内存的释放都会走gwp_asan_free
函数,如果该地址落在Guarded Pool Memory范围内,则表明这块内存是由GWP-ASan负责分配出来的,否则它就是由系统默认分配器负责的。对于GWP-ASan分配出的内存,释放流程需要调用GuardedAlloc.deallocate
。
void GuardedPoolAllocator::deallocate(void *Ptr) {
uintptr_t UPtr = reinterpret_cast<uintptr_t>(Ptr);
size_t Slot = State.getNearestSlot(UPtr);
uintptr_t SlotStart = State.slotToAddr(Slot);
AllocationMetadata *Meta = addrToMetadata(UPtr);
if (Meta->Addr != UPtr) {
// If multiple errors occur at the same time, use the first one.
ScopedLock L(PoolMutex);
trapOnAddress(UPtr, Error::INVALID_FREE);
}
// Intentionally scope the mutex here, so that other threads can access the
// pool during the expensive markInaccessible() call.
{
ScopedLock L(PoolMutex);
if (Meta->IsDeallocated) {
trapOnAddress(UPtr, Error::DOUBLE_FREE);
}
// Ensure that the deallocation is recorded before marking the page as
// inaccessible. Otherwise, a racy use-after-free will have inconsistent
// metadata.
Meta->RecordDeallocation();
...
}
deallocateInGuardedPool(reinterpret_cast<void *>(SlotStart),
State.maximumAllocationSize());
// And finally, lock again to release the slot back into the pool.
ScopedLock L(PoolMutex);
freeSlot(Slot);
}
GuardedAlloc.deallocate
除了释放内存的操作外,还对两种错误进行了检测。一种是"invalid free",表明分配的地址和释放的地址并不吻合。另一种是"double free",表明这块内存被释放了两遍。检测出问题后,GuardedAlloc.deallocate
会调用trapOnAddress
来触发SIGSEGV信号。其内部实现也比较简单,即访问Guarded Pool Memory中的首地址,由于它位于Guard Page内,因此必然会发生错误。
void GuardedPoolAllocator::trapOnAddress(uintptr_t Address, Error E) {
State.FailureType = E;
State.FailureAddress = Address;
// Raise a SEGV by touching first guard page.
volatile char *p = reinterpret_cast<char *>(State.GuardedPagePool);
*p = 0;
__builtin_unreachable();
}
除此之外,RecordDeallocation
会记录释放时的调用栈信息,deallocateInGuardedPool
会将这一页的访问权限重新修改为不可读写,而freeSlot
会将该slot的序号写入FreeSlots中,并将FreeSlotsLength加一。
3.6 Log收集过程
当进程接收到SIGSEGV信号后,会调用debuggerd_signal_handler
来进一步处理。其中生成tombstone时需要调用dump_thread
函数来收集线程信息。
static bool dump_thread(log_t* log, unwindstack::Unwinder* unwinder, const ThreadInfo& thread_info,
const ProcessInfo& process_info, bool primary_thread) {
...
std::unique_ptr<GwpAsanCrashData> gwp_asan_crash_data;
std::unique_ptr<ScudoCrashData> scudo_crash_data;
if (primary_thread) {
gwp_asan_crash_data = std::make_unique<GwpAsanCrashData>(unwinder->GetProcessMemory().get(),
process_info, thread_info);
scudo_crash_data =
std::make_unique<ScudoCrashData>(unwinder->GetProcessMemory().get(), process_info);
}
if (primary_thread && gwp_asan_crash_data->CrashIsMine()) {
gwp_asan_crash_data->DumpCause(log);
}
...
if (primary_thread) {
if (gwp_asan_crash_data->HasDeallocationTrace()) {
gwp_asan_crash_data->DumpDeallocationTrace(log, unwinder);
}
if (gwp_asan_crash_data->HasAllocationTrace()) {
gwp_asan_crash_data->DumpAllocationTrace(log, unwinder);
}
...
}
...
return true;
}
如果该线程恰好是崩溃的线程,那么需要进一步收集GWP-ASan相关的调试信息。在gwp_asan_crash_data构造的过程中,会调用__gwp_asan_diagnose_error
给问题定性。
gwp_asan::Error
__gwp_asan_diagnose_error(const gwp_asan::AllocatorState *State,
const gwp_asan::AllocationMetadata *Metadata,
uintptr_t ErrorPtr) {
if (!__gwp_asan_error_is_mine(State, ErrorPtr))
return Error::UNKNOWN;
if (State->FailureType != Error::UNKNOWN)
return State->FailureType;
// Let's try and figure out what the source of this error is.
if (State->isGuardPage(ErrorPtr)) {
size_t Slot = State->getNearestSlot(ErrorPtr);
const AllocationMetadata *SlotMeta =
addrToMetadata(State, Metadata, State->slotToAddr(Slot));
// Ensure that this slot was allocated once upon a time.
if (!SlotMeta->Addr)
return Error::UNKNOWN;
if (SlotMeta->Addr < ErrorPtr)
return Error::BUFFER_OVERFLOW;
return Error::BUFFER_UNDERFLOW;
}
// Access wasn't a guard page, check for use-after-free.
const AllocationMetadata *SlotMeta =
addrToMetadata(State, Metadata, ErrorPtr);
if (SlotMeta->IsDeallocated) {
return Error::USE_AFTER_FREE;
}
// If we have reached here, the error is still unknown.
return Error::UNKNOWN;
}
其中"invalid free"和"double free"的问题在释放过程中已经被定过性了,这里主要针对的是Underflow,Overflow和Use-After-Free的定性。当访问地址位于Guard Page,则该问题属于溢出问题;当访问地址位于Slot区域且该Slot已经释放,则该问题属于UAF问题。
最终分别调用GwpAsanCrashData::DumpCause
、GwpAsanCrashData::DumpDeallocationTrace
和GwpAsanCrashData::DumpAllocationTrace
来输出问题描述、释放时的调用栈和申请时的调用栈。
4. 使用方法
GWP-ASan在不同进程中具有不同的默认状态,如下所示。
- 所有Native进程都有1/128的概率来开启GWP-ASan检测。
- Zygote进程不开启GWP-ASan检测。
- 所有系统APP进程(由zygote孵化)都有1/128的概率来开启GWP-ASan检测。
- 所有普通APP进程(由zygote孵化)默认不开启GWP-ASan检测。
不过虽然普通APP默认不开启GWP-ASan检测,我们也可以在Manifest中为它打开。该选项既可以对APP中所有进程打开,也可以针对单个进程打开,具体可参考官方文档。
<application android:gwpAsanMode="always">
...
</application>
此外,在开发者模式下,我们也可以通过如下选项为某个APP打开GWP-ASan检测。
System > Developer options > App Compatibility Changes > [Selected App] > GWP_ASAN (default disabled,可以选择开启)
以下是开启GWP-ASan后检测到问题时的tombstone,供大家参考。
pid: 1087, tid: 2343, name: xxxxxx >>> system_server <<<
uid: 1000
signal 11 (SIGSEGV), code 2 (SEGV_ACCERR), fault addr 0x74118bdfb0
Cause: [GWP-ASan]: Use After Free, 0 bytes into a 66-byte allocation at 0x74118bdfb0
x0 b4000074118bdfb0 x1 b4000074118bdfb0 x2 00000074378ba310 x3 000000730976ca88
x4 0000000000000fb0 x5 b4000074378c6a98 x6 000000000000043f x7 000000000000043f
x8 000000775330cc0b x9 5f2a0abf8db399f1 x10 0000000000000001 x11 0000000000000000
x12 0000000000000010 x13 00000000bba7c281 x14 0000000070742a85 x15 0000000000000000
x16 0000007753391a30 x17 000000774eecfb40 x18 00000072ff3d0000 x19 b4000074118bdfb0
x20 000000730976cb98 x21 b4000074118bdfb0 x22 b4000074d7886530 x23 000000730976d000
x24 000000730976ccb0 x25 000000730976ccb0 x26 000000730976cff8 x27 00000000000fc000
x28 0000007309674000 x29 000000730976cb10
lr 000000775332bac0 sp 000000730976cb10 pc 000000774eecfb50 pst 0000000080001000
backtrace:
#00 pc 0000000000049b50 /apex/com.android.runtime/lib64/bionic/libc.so (__strlen_aarch64+16) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#01 pc 0000000000052abc /system/lib64/libhidlbase.so (android::hardware::hidl_string::hidl_string(char const*)+56) (BuildId: dc455f7517968fbed17e7fbb3f377e0a)
...
deallocated by thread 2088:
#00 pc 0000000000047e2c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+80) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#01 pc 0000000000048550 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::deallocate(void*)+196) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#02 pc 00000000001917d0 /system/framework/arm64/boot-framework.oat (art_jni_trampoline+112) (BuildId: e1b3d2fbfd00203ea9de09dabda6081908273030)
#03 pc 002c402015c782c8 <unknown>
allocated by thread 2088:
#00 pc 0000000000047e2c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::AllocationMetadata::CallSiteInfo::RecordBacktrace(unsigned long (*)(unsigned long*, unsigned long))+80) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#01 pc 000000000004845c /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan::GuardedPoolAllocator::allocate(unsigned long, unsigned long)+624) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#02 pc 000000000003c298 /apex/com.android.runtime/lib64/bionic/libc.so (gwp_asan_malloc(unsigned long)+164) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#03 pc 000000000003ccc4 /apex/com.android.runtime/lib64/bionic/libc.so (malloc+76) (BuildId: 5a713cb8951c61263d81334f9bb67f02)
#04 pc 000000000004cb9c /system/lib64/libc++.so (operator new(unsigned long)+24) (BuildId: cadbf270a31efc098116752e111460e9)
#05 pc 00000000004a43e8 /apex/com.android.art/lib64/libart.so (art::JNI<false>::GetStringUTFChars(_JNIEnv*, _jstring*, unsigned char*)+656) (BuildId: 36a1504474f96e5f18cff2d0e43e0971)
#06 pc 00000000000b664c /system/lib64/libandroid_runtime.so (android::com_android_internal_app_ActivityTrigger_native_at_pauseActivity(_JNIEnv*, _jobject*, _jstring*)+56) (BuildId: 33d935916e24e56732544224c3cd4203)
#07 pc 00000000001917d0 /system/framework/arm64/boot-framework.oat (art_jni_trampoline+112) (BuildId: e1b3d2fbfd00203ea9de09dabda6081908273030)
#08 pc 002c402015c782c8 <unknown>
5. 结语
总的来说,GWP-ASan的原理并不复杂。而字节的MemCorruption工具相比于原始的GWP-ASan工具增加了如下功能:
- 在ARM32设备上使用优化的libunwind_stack库,支持32位设备上调用栈的记录。
- 重新注册SIGSEGV的处理函数,可以在检测到问题时只记录而不崩溃。
- 增加了很多线上部署、动态配置的工作。