iOS异常日志采集

2,649 阅读13分钟

1、作为一名应用开发者,你是否有如下经历:

  • 为了确保应用正确无误的运行,在提交AppStore或企业发版之前,你和测试会进行大量的测试工作,经过一段时间的测试修复过程,APP在测试机上运行稳定,大家都很有信心的发布了版本,你也要开始进行下一版功能的迭代。但是当发布出去有一定的安装量后,就有用户反馈有闪退发生!

  • 这时获取线上应用的闪退信息对闪退异常的分析和处理就尤其重要了。如果你的应用对数据不是高度保密的话,你可以接入Bugly听云友盟等常见第三方SDK。如果你司对数据隐私有较高的要求,这时就需要自己开发一套异常日志采集系统了。

2、闪退原因分析

尽管开发过程中经过大量的测试,但面对不同的设备不同系统版本也很难避免偶发的出现如下问题:

  • 数组越界、插入空值
  • unrecognized selector
  • NSString crash
  • NSNotification crash
  • KVO crash
  • 野指针
  • 线程问题
  • EXC_BAD_ACCESS
  • ...... iOS应用闪退一般是由Mach异常ObjC异常引起的,闪退捕捉流程大致如下图:

image.png

3、NSException异常

常见的NSException异常有:

  • unrecognized selector crash
  • KVO crash
  • NSNotification crash (僵尸对象)
  • NSTimer crash
  • Container crash(数组越界,插nil等)
  • NSString crash (字符串操作的crash)
  • Bad Access crash (野指针)
  • UI not on Main Thread Crash (非主线程刷UI)
  • ......
3.1、捕获异常

针对应用级异常,通过系统提供的NSGetUncaughtExceptionHandler备份异常处理方法,通过NSSetUncaughtExceptionHandler方法捕获异常自定义处理:

//Backing up original exception handler
g_previousUncaughtExceptionHandler = NSGetUncaughtExceptionHandler();

//Setting new handler
NSSetUncaughtExceptionHandler(&handleUncaughtException);
3.2、处理异常
static void handleUncaughtException(NSException* exception) {
    handleException(exception, false);
}
3.3、兼容处理

通常情况下,建议在同一个App里避免通过NSSetUncaughtExceptionHandler注册多次异常处理。
但常用第三方框架Bugly听云友盟等都集成了自己的Crash处理,对NSSetUncaughtExceptionHandler无法避免的会进行多次异常处理的注册,为了做兼容处理,建议:在通过NSSetUncaughtExceptionHandler设置自己的handler前,先通过NSGetUncaughtExceptionHandler获取以前的handler备份,当自己处理完异常后一定要执行备份的handler,这样传递链才不对断开。

// 执行完自己的异常处理后,一定要执行备份的handler
if (g_previousUncaughtExceptionHandler != NULL)
{
    //Calling original exception handler
    g_previousUncaughtExceptionHandler(exception);
}

4、Mach异常

4.1、Mach是什么?

下图中Mach在系统中处于最接近底层的模块,是XNU的微内核核心,它提供操作系统一些最关键的功能。Mach管理诸如CPU使用率和内存之类的处理器资源,处理调度,实施内存保护,并为本地和远程无类型的进程间通信实现以消息为中心的基础结构。

image.png

Mach异常是指最底层的内核级异常,允许在进程里或跨进程处理异常。异常信息通过Mach IPC(Inter-Process Communication)端口在进程间进行Mach消息传递,对目标进程的mach端口具有权限的任何进程、特定线程、特定任务、甚至整个host都可以进行异常处理注册,内核将按顺序搜索这些异常处理。每个threadtaskhost都有一个异常端口数组用来接收异常,Mach的部分API暴露给了用户态,用户态的开发者可以直接通过Mach API设置thread/task/host的异常端口,来捕获Mach异常。相关API如下:

  • task_get_exception_ports : 获取该task的上一次的异常端口
  • mach_port_allocate : 为该task创建新的异常端口
  • mach_port_insert_right :  为该task申请task_set_exception_ports权限
  • task_set_exception_ports : 设置该task的新的异常端口
  • mach_msg(): 接收异常消息
4.2、Mach异常捕获

新建异常处理线程监听Mach异常并处理异常信息:

image.png

4.2.1 端口结构
static struct
{
    exception_mask_t masks[EXC_TYPES_COUNT];
    exception_handler_t ports[EXC_TYPES_COUNT];
    exception_behavior_t behaviors[EXC_TYPES_COUNT];
    thread_state_flavor_t flavors[EXC_TYPES_COUNT];
    mach_msg_type_number_t count;
} g_previousExceptionPorts;
4.2.2 备份当前异常端口
//Backing up original exception ports
kr = task_get_exception_ports(thisTask,
                              mask,
                              g_previousExceptionPorts.masks,
                              &g_previousExceptionPorts.count,
                              g_previousExceptionPorts.ports,
                              g_previousExceptionPorts.behaviors,
                              g_previousExceptionPorts.flavors);
4.2.3 创建新的异常端口,并设置task作为新的接受异常端口
//Allocating new port with receive rights
kr = mach_port_allocate(thisTask,
                        MACH_PORT_RIGHT_RECEIVE,
                        &g_exceptionPort);
                        
......

//Adding send rights to port
kr = mach_port_insert_right(thisTask,
                            g_exceptionPort,
                            g_exceptionPort,
                            MACH_MSG_TYPE_MAKE_SEND);
                            
......

//Installing port as exception handler
kr = task_set_exception_ports(thisTask,
                              mask,
                              g_exceptionPort,
                              (int)(EXCEPTION_DEFAULT | MACH_EXCEPTION_CODES),
                              THREAD_STATE_NONE);
4.2.4 创建监听线程,开始监听
//Creating primary exception thread
error = pthread_create(&g_primaryPThread,
                       &attr,
                       &handleExceptions,
                       kThreadPrimary);

......

pthread_attr_destroy(&attr);
g_primaryMachThread = pthread_mach_thread_np(g_primaryPThread);
ksmc_addReservedThread(g_primaryMachThread);
4.2.5 接受异常消息并处理异常
/** Our exception handler thread routine.
* Wait for an exception message, uninstall our exception port, record the
* exception information, and write a report.
*/
static void* handleExceptions(void* const userData)
{
    ......
}
4.2.6 恢复以前的端口
// Reinstall old exception ports.
for(mach_msg_type_number_t i = 0; i < g_previousExceptionPorts.count; i++)
{
    //Restoring port index i
    kr = task_set_exception_ports(thisTask,
                                  g_previousExceptionPorts.masks[i],
                                  g_previousExceptionPorts.ports[i],
                                  g_previousExceptionPorts.behaviors[i],
                                  g_previousExceptionPorts.flavors[i]);
}

5、Unix信号异常

信号是软件层对中断的一种模拟,它是一种异步通信的处理机制,事实上,进程并不知道信号何时到来。 BSDMach异常机制之上构建的UNIX信号处理机制,Mach异常在host层被ux_exception转换为相应的Unix,并通过threadsignal将信号传递到出错的线程。 面对Mach异常和Unix异常,我们应优先捕获Mach异常,因为Mach异常处理会优先于Unix信号处理,如果Mach异常的处理让程序退出,则Unix信号就没有机会到达该线程。

5.1、常见信号类型
SIGABRT, //NSExceptionMachCabort()
SIGFPE, //NaN
SIGILL, //
SIGPIPE, //
SIGBUS, //
SIGSEGV, //MachEXC_BAD_ACCESS
SIGSYS, //
SIGTRAP, //trap
5.2、信号捕获
5.2.1 sigaction函数
struct sigaction{
  void (*sa_handler)(int);
  sigset_t sa_mask;
  int sa_flag;
  void (*sa_sigaction)(int,siginfo_t *,void *);
};
int sigaction(int sig, const struct sigaction *act, struct sigaction *oact)

参数解释:

  • sig:要操作的信号
  • act:要设置的对信号的新处理方式
  • oact:原来对信号的处理方式
  • 返回值0 表示成功,-1 表示有错误发生
5.2.2 为信号设置新的处理函数(备份原先的处理函数)
action.sa_sigaction = &handleSignal;
for(int i = 0; i < fatalSignalsCount; i++)
{
    //Assigning handler for signal %d", fatalSignals[i]
    if(sigaction(fatalSignals[i], &action, &g_previousSignalHandlers[i]) != 0)
    {
        char sigNameBuff[30];
        const char* sigName = kssignal_signalName(fatalSignals[i]);
        if(sigName == NULL)
        {
            snprintf(sigNameBuff, sizeof(sigNameBuff), "%d", fatalSignals[i]);
            sigName = sigNameBuff;
        }

        // Try to reverse the damage
        for(i--;i >= 0; i--)
        {
            sigaction(fatalSignals[i], &g_previousSignalHandlers[i], NULL);
        }
        goto failed;
    }
}
5.2.3 处理异常信号
static void handleSignal(int sigNum, siginfo_t* signalInfo, void* userContext)
{
    //Trapped signal %d", sigNum
    if(g_isEnabled)
    {
        thread_act_array_t threads = NULL;
        mach_msg_type_number_t numThreads = 0;
        ksmc_suspendEnvironment(&threads, &numThreads);
        ......
        kscm_handleException(crashContext);
        ksmc_resumeEnvironment(threads, numThreads);
    }

    //Re-raising signal for regular handlers to catch
    raise(sigNum);
}

Q: 为什么Mach阶段不捕获EXC_Crash异常,而放在Unix信号里捕获与之对应的SIGABRT信号?
A: 开源框架PLCrashReporter给出了解释:

/* We still need to use signal handlers to catch SIGABRT in-process. The kernel sends an EXC_CRASH mach exception
* to denote SIGABRT termination. In that case, catching the Mach exception in-process leads to process deadlock
* in an uninterruptable wait. Thus, we fall back on BSD signal handlers for SIGABRT, and do not register for
* EXC_CRASH. */

6、符号表解析

通过上面的步骤,我们已经可以拿到应用崩溃时的crash日志信息了。如果是生产环境下打包的应用,堆栈信息是函数地址,这时我们需要进行符号表解析,还原堆栈里的符号含义。

6.1 符号表
6.1.1 什么是符号表?

符号表是内存地址与函数名、文件名、行号的映射表。符号表元素如下所示:
<起始地址> <结束地址> <函数> [<文件名:行号>]

6.1.2 为什么要配置符号表?

为了能快速并准确地定位用户APP发生Crash的代码位置,Bugly使用符号表对APP发生Crash的程序堆栈进行解析还原

举一个例子: Alt text

6.1.3 应用符号表文件的获取
  • 默认Xcode工程会有如下设置,生产环境编译打包才生成符号表文件:

image.png

  • 打开Xcode → Window → Organizer → 选择发布的包 → 右击Show in Finder → 找到*.xcarchive文件 → 右击显示包内容:

image.png

image.png

  • dSYMs → *.app.dSYM → 右击显示包内容 → Contents → Resources → DWARF → 符号表文件

image.png

image.png

  • 获取符号表文件的UUID标识:dwarfdump --uuid 符号表文件

image.png

  • 查看符号表内容:dwarfdump --arch arm64 --debug-pubnames Demo.app.dSYM

image.png

内容截取:

Demo.app.dSYM/Contents/Resources/DWARF/Demo(arm64): file format Mach-O arm64

.debug_pubnames contents:
length = 0x00000b57 version = 0x0002 unit_offset = 0x000a29a6 unit_size = 0x00001cfc
Offset Name
0x00000275 "-[EMGroupInfoViewController initWithGroupId:]"
0x000002c0 "-[EMGroupInfoViewController viewDidLoad]"
0x000002f7 "-[EMGroupInfoViewController reloadInfo]"
0x0000032e "-[EMGroupInfoViewController dealloc]"
0x00000365 "-[EMGroupInfoViewController _setupSubviews]"
0x0000039c "-[EMGroupInfoViewController numberOfSectionsInTableView:]"
0x000003e3 "-[EMGroupInfoViewController tableView:numberOfRowsInSection:]"
0x0000044e "-[EMGroupInfoViewController tableView:cellForRowAtIndexPath:]"
0x000004d9 "-[EMGroupInfoViewController groupOwnerDidUpdate:newOwner:oldOwner:]"
0x00000540 "-[EMGroupInfoViewController tableView:heightForHeaderInSection:]"
0x00000593 "-[EMGroupInfoViewController tableView:heightForFooterInSection:]"
0x000005e6 "-[EMGroupInfoViewController tableView:didSelectRowAtIndexPath:]"
0x00000717 "-[EMGroupInfoViewController multiDevicesGroupEventDidReceive:groupId:ext:]"
0x00000782 "-[EMGroupInfoViewController _resetGroup:]"
0x0000080b "-[EMGroupInfoViewController _fetchGroupWithId:isShowHUD:]"
0x00000876 "__57-[EMGroupInfoViewController _fetchGroupWithId:isShowHUD:]_block_invoke"
0x000008d6 "__copy_helper_block_e8_32w"
0x000008ff "__destroy_helper_block_e8_32w"
0x0000091e "-[EMGroupInfoViewController tableViewDidTriggerHeaderRefresh]"
0x00000956 "-[EMGroupInfoViewController handleGroupInfoUpdated:]"
0x000009b0 "-[EMGroupInfoViewController groupAnnouncementAction]"
0x000009f9 "__52-[EMGroupInfoViewController groupAnnouncementAction]_block_invoke"
0x00000a9b "__52-[EMGroupInfoViewController groupAnnouncementAction]_block_invoke_2"
0x00000aff "__52-[EMGroupInfoViewController groupAnnouncementAction]_block_invoke_3"
6.2 atos符号表解析

atos是用来解析程序运行时的内存地址对应的文件名,行号及所在函数名。

6.2.1 常见crash日志
Incident Identifier:   B7DEEC95-FCCA-42A6-ABA3-77AD8B743F6C
CrashReporter Key:     66c23a1c1d6377ff1cc81121d3ac82623457b377
Hardware Model:        iPhone10,2
Process:               Demo [35768]
Path:                  /private/var/containers/Bundle/Application/140CCFD5-86C9-4CE8-A6EC-08232859CF79/Demo.app/Demo
Identifier:            com.test.demo
Version:               2 (1.4.0)
Code Type:             ARM-64 (Native)
Role:                  Foreground
Parent Process:        launchd [1]

Date/Time:             2020-11-23 16:23:23.4944 +0800
Launch Time:           2020-11-23 16:22:23.4110 +0800
OS Version:            iPhone OS 14.2 (18B92)
Release Type:          User
Baseband Version:      6.02.01
Report Version:        104

Exception Type:        EXC_CRASH (SIGABRT)
Exception Codes:       0x0000000000000000, 0x0000000000000000
Exception Note:        EXC_CORPSE_NOTIFY
Triggered by Thread:   0

Thread 0 name:         Dispatch queue: com.apple.main-thread
Thread 0 Crashed:
0 libsystem_kernel.dylib 0x00000001c386d84c 0x1c3846000 + 161868
1 libsystem_pthread.dylib 0x00000001dedc99e8 0x1dedbf000 + 43496
2 libsystem_c.dylib 0x00000001a13d98f4 0x1a1366000 + 473332
3 libc++abi.dylib 0x00000001ac1dacc8 0x1ac1c8000 + 77000
4 libc++abi.dylib 0x00000001ac1ccca0 0x1ac1c8000 + 19616
5 libobjc.A.dylib 0x00000001ac0dee04 0x1ac0d8000 + 28164
6 Demo 0x0000000100c8fd0c 0x1006b8000 + 6126860
7 Demo 0x00000001014bbb44 0x1006b8000 + 14695236
8 libc++abi.dylib 0x00000001ac1da154 0x1ac1c8000 + 74068
9 libc++abi.dylib 0x00000001ac1dce68 0x1ac1c8000 + 85608
10 libobjc.A.dylib 0x00000001ac0ded04 0x1ac0d8000 + 27908
11 CoreFoundation 0x00000001986a6c7c 0x198613000 + 605308
12 GraphicsServices 0x00000001ae9c9598 0x1ae9c6000 + 13720
13 UIKitCore 0x000000019af90638 0x19a464000 + 11716152
14 UIKitCore 0x000000019af95bb8 0x19a464000 + 11738040
15 Demo 0x0000000100704aac 0x1006b8000 + 314028
16 libdyld.dylib 0x0000000198385588 0x198384000 + 5512

关键堆栈信息是加密的:

6 Demo 0x0000000100c8fd0c 0x1006b8000 + 6126860
7 Demo 0x00000001014bbb44 0x1006b8000 + 14695236

自定义日志格式流转流程:

image.png

6.2.2 解析Crash日志

日志信息(截取):

image.png

日志的UUID标识:(只有与符号表文件的UUID标识匹配时,才能正确解析出内容)

image.png

闪退Crash堆栈(截取):

image.png

解析符号:
十进制转换为二进制:
"object_addr": 4331880448 → 0x102334000
"instruction_addr": 4331909492 → 0x10233B174

执行atos命令行解析:atos -arch arm64 -o MobApmDemo -l 0x102334000 0x10233B174

image.png

6.2.3 atos局限性

atosXcode自带的命令行工具,只支持Mac平台,无法在Linux平台和Windows平台使用。atos解析符号只能逐行解析,效率较低。

atosl工具atosLinux 实现, 由 facebook 开发,但在实际开发中atosl不支持arm64架构的符号表解析。

6.3 symbolicatecrash符号解析

symbolicatecrashXcode自带的分析工具,可以将堆栈信息批量符号化,是对atos的封装,可以一次解析一个crash文件,解析内容也包含系统符号,使用方便。

6.3.1 获取symbolicatecrash工具

通过find命令找到symbolicatecrash工具的路径:

find /Applications/Xcode.app -name symbolicatecrash -type f
// 输出:
/Applications/Xcode.app/Contents/SharedFrameworks/DVTFoundation.framework/Versions/A/Resources/symbolicatecrash
6.3.2 解析.crash文件

执行如下命令:

export DEVELOPER_DIR="/Applications/XCode.app/Contents/Developer"
./symbolicatecrash xxx.crash Demo.app.dSYM/Contents/Resources/DWARF/Demo -o out.crash

生成的out.crash文件,会解析所有堆栈符号:

Incident Identifier:       2B157C99-2399-405A-8BB0-A0FD6253FAB6
CrashReporter Key:         b3e9ef79355f1c42b61d81edfd1f97c43cab232c
Hardware Model:            iPhone10,2
Process:                   Demo [41201]
Path:                      /private/var/containers/Bundle/Application/EC97E67A-F9C4-49EC-A3A6-F083FF742048/Demo.app/Demo
Version:                   126 (1.1.3)
Code Type:                 ARM-64
Parent Process:            ? [1]

Date/Time:                 2020-11-05 11:56:34.687 +0800
OS Version:                iOS 14.1 (18A8395)
Report Version:            104

Exception Type:            EXC_CRASH (SIGABRT)
Exception Codes:           0x00000000 at 0x0000000000000000
Crashed Thread:            0

Application Specific Information: 
*** Terminating app due to uncaught exception 'NSUnknownKeyException', reason: '[<NSObject 0x2836b8600> setValue:forUndefinedKey:]: this class is not key value coding-compliant for the key key.'

Thread 0 Crashed:
0 CoreFoundation 0x000000019eca9114 __exceptionPreprocess + 216
1 libobjc.A.dylib 0x00000001b2545cb4 objc_exception_throw + 56
2 CoreFoundation 0x000000019ebba4e0 -[NSObject+ 177376 (NSKindOfAdditions) isNSSet__] + 0
3 Foundation 0x000000019fe54728 -[NSObject+ 182056 (NSKeyValueCoding) setValue:forKey:] + 312
4 Demo 0x000000010475623c -[ViewController crashTest:] + 25148 (ViewController.m:104)
5 UIKitCore 0x00000001a14f291c -[UIApplication sendAction:to:from:forEvent:] + 96
6 Demo 0x0000000104908afc __nbsEventHookSendAction_block_invoke + 1232
7 UIKitCore 0x00000001a0e8a5bc -[UIControl sendAction:to:forEvent:] + 240
8 UIKitCore 0x00000001a0e8a900 -[UIControl _sendActionsForEvents:withEvent:] + 352
9 UIKitCore 0x00000001a0e89238 -[UIControl touchesEnded:withEvent:] + 532
10 UIKitCore 0x00000001a152d7f0 -[UIWindow _sendTouchesForEvent:] + 1244
11 UIKitCore 0x00000001a152f118 -[UIWindow sendEvent:] + 3824
12 UIKitCore 0x00000001a150a4fc -[UIApplication sendEvent:] + 744
13 UIKitCore 0x00000001a158c76c __dispatchPreprocessedEventFromEventQueue + 1032
14 UIKitCore 0x00000001a1590f0c __processEventQueue + 6440
15 UIKitCore 0x00000001a15881cc __eventFetcherSourceCallback + 156
16 CoreFoundation 0x000000019ec29240 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 24
17 CoreFoundation 0x000000019ec29140 __CFRunLoopDoSource0 + 204
18 CoreFoundation 0x000000019ec28488 __CFRunLoopDoSources0 + 256
19 CoreFoundation 0x000000019ec22a40 __CFRunLoopRun + 776
20 CoreFoundation 0x000000019ec22200 CFRunLoopRunSpecific + 572
21 GraphicsServices 0x00000001b4d9f598 GSEventRunModal + 160
22 UIKitCore 0x00000001a14ebbcc -[UIApplication _run] + 1052
23 UIKitCore 0x00000001a14f11a0 UIApplicationMain + 164
24 Demo 0x000000010475756c main + 30060 (main.m:18)
25 libdyld.dylib 0x000000019e901588 start + 4