线程 Call Stack 的捕获和解析如果要获取当前线程的调用栈，可以直接使用现有 API:[NSThread cal

注：文中的 jdy_、BS_ 均是原文作者的资源前缀。

一、获取任意一个线程的 Call Stack

如果要获取当前线程的调用栈，可以直接使用现有 API:[NSThread callStackSymbols]。

但是并没有相关 API 支持获取任意线程的调用栈，所以只能自己编码实现。

1、调用栈

一个线程的调用栈是什么样的呢？

"我"的理解是应该包含当前线程的执行地址，并且从这个地址可以一级一级回溯到线程的入口地址，这样就反向构成了一条链：线程入口执行某个方法，然后逐级嵌套调用到当前现场。

如图所示，每一级的方法调用，都对应了一张活动记录，也称为活动帧。也就是说，调用栈是由一张张帧结构组成的，可以称之为栈帧。每个栈帧对应一个函数调用：蓝色的部分是 DrawSquare 函数的栈帧，它在执行的过程中调用了绿色部分的 DrawLine 函数。

可以看到栈帧由三部分组成：函数参数、返回地址、帧内的变量。首先把函数的参数入栈；随后将返回地址入栈，这表示当前活动记录执行结束后要返回的地址；最后是在函数内部定义的变量。

Stack Pointer(栈指针)表示当前栈的顶部，由于大部分操作系统的栈向下生长，它其实是栈地址的最小值。根据之前的解释，Frame Pointer 指向的地址中，存储了上一次 Stack Pointer 的值，也就是返回地址。

在大多数操作系统中，每个栈帧还保存了上一个栈帧的 Frame Pointer，因此只要知道当前栈帧的 Stack Pointer 和 Frame Pointer，就能知道上一个栈帧的 Stack Pointer 和 Frame Pointer，从而递归的获取栈底的帧。形成了一条链。

显然当一个函数调用结束时，它的栈帧就不存在了。

因此，调用栈其实是栈的一种抽象概念，它表示了方法之间的调用关系，一般来说从栈中可以解析出调用栈。

那么，在我们获取到栈帧后，就可以通过返回地址来进行回溯了。

2、指令指针和基址指针

我们明确了两个目标：①、当前执行的指令

②、当前栈帧结构。

以 x86 为例，寄存器用途如下：

SP/ESP/RSP: Stack pointer for top address of the stack.
BP/EBP/RBP: Stack base pointer for holding the address of the current stack frame.
IP/EIP/RIP: Instruction pointer. Holds the program counter, the current instruction address.

可以看到，我们可以通过指令指针来获取当前指令地址，以及通过栈基址指针获取当前栈帧地址。

那么问题来了，我们怎么获取到相关寄存器呢？

3、线程执行状态

考虑到一个线程被挂起时，后续继续执行需要恢复现场，所以在挂起时相关现场需要被保存起来，比如当前执行到哪条指令了。

那么就要有相关的结构体来为线程保存运行时的状态，经过一番查阅，得到如下信息：

The function thread_get_state returns the execution state (e.g. the machine registers) of target_thread as specified by flavor.

Function - Return the execution statefora thread.

SYNOPSIS

kern_return_t thread_get_state
(
     thread_act_t             target_act,
     thread_state_flavor_t        flavor,
     thread_state_t            old_state,
     mach_msg_type_number_t *old_stateCnt
);

/*
 * THREAD_STATE_FLAVOR_LIST 0
 *  these are the supported flavors. 这些枚举值没有找到，可能为作者自定义。
 */
#define x86_THREAD_STATE32      1
#define x86_FLOAT_STATE32       2
#define x86_EXCEPTION_STATE32   3
#define x86_THREAD_STATE64      4
#define x86_FLOAT_STATE64       5
#define x86_EXCEPTION_STATE64   6
#define x86_THREAD_STATE        7
#define x86_FLOAT_STATE         8
#define x86_EXCEPTION_STATE     9
#define x86_DEBUG_STATE32       10
#define x86_DEBUG_STATE64       11
#define x86_DEBUG_STATE         12
#define THREAD_STATE_NONE       13
/* 14 and 15 are used for the internal x86_SAVED_STATE flavours */
#define x86_AVX_STATE32         16
#define x86_AVX_STATE64         17
#define x86_AVX_STATE           18

所以我们可以通过这个 API 搭配相关参数来获得想要的寄存器信息：

bool jdy_fillThreadStateIntoMachineContext(thread_t thread, _STRUCT_MCONTEXT * machineContext) {
    mach_msg_type_number_t state_count = x86_THREAD_STATE64_COUNT;
    kern_return_t kr = thread_get_state(thread, x86_THREAD_STATE64, (thread_state_t)&machineContext->__ss, &state_count);
    return (kr == KERN_SUCCESS);
}

不同的架构对应的 state_count 不同，所以这里用了宏 x86_THREAD_STATE64_COUNT 来做处理。这里引入了一个结构体叫 _STRUCT_MCONTEXT。

4、不同平台的寄存器

_STRUCT_MCONTEXT 在不同平台上的结构不同：

x86_64，如 iPhone6 模拟器：

_STRUCT_MCONTEXT64
{
    _STRUCT_X86_EXCEPTION_STATE64   __es;
    _STRUCT_X86_THREAD_STATE64  __ss;
    _STRUCT_X86_FLOAT_STATE64   __fs;
};

_STRUCT_X86_THREAD_STATE64
{
    __uint64_t  __rax;
    __uint64_t  __rbx;
    __uint64_t  __rcx;
    __uint64_t  __rdx;
    __uint64_t  __rdi;
    __uint64_t  __rsi;
    __uint64_t  __rbp;
    __uint64_t  __rsp;
    __uint64_t  __r8;
    __uint64_t  __r9;
    __uint64_t  __r10;
    __uint64_t  __r11;
    __uint64_t  __r12;
    __uint64_t  __r13;
    __uint64_t  __r14;
    __uint64_t  __r15;
    __uint64_t  __rip;
    __uint64_t  __rflags;
    __uint64_t  __cs;
    __uint64_t  __fs;
    __uint64_t  __gs;
};

x86_32，如 iPhone4s 模拟器：

_STRUCT_MCONTEXT32
{
    _STRUCT_X86_EXCEPTION_STATE32   __es;
    _STRUCT_X86_THREAD_STATE32  __ss;
    _STRUCT_X86_FLOAT_STATE32   __fs;
};

_STRUCT_X86_THREAD_STATE32
{
   unsignedint    __eax;
   unsignedint    __ebx;
   unsignedint    __ecx;
   unsignedint    __edx;
   unsignedint    __edi;
   unsignedint    __esi;
   unsignedint    __ebp;
   unsignedint    __esp;
   unsignedint    __ss;
   unsignedint    __eflags;
   unsignedint    __eip;
   unsignedint    __cs;
   unsignedint    __ds;
   unsignedint    __es;
   unsignedint    __fs;
   unsignedint    __gs;
};

ARM64，如 iPhone5s：

_STRUCT_MCONTEXT64
{
	_STRUCT_ARM_EXCEPTION_STATE64	__es;
	_STRUCT_ARM_THREAD_STATE64	__ss;
	_STRUCT_ARM_NEON_STATE64	__ns;
};

_STRUCT_ARM_THREAD_STATE64
{
	__uint64_t    __x[29];	/* General purpose registers x0-x28 */
	void*         __opaque_fp;	/* Frame pointer x29 */
	void*         __opaque_lr;	/* Link register x30 */
	void*         __opaque_sp;	/* Stack pointer x31 */
	void*         __opaque_pc;	/* Program counter */
	__uint32_t    __cpsr;	/* Current program status register */
	__uint32_t    __opaque_flags;	/* Flags describing structure format */
};

ARMv7/v6，如 iPhone4s：

_STRUCT_MCONTEXT32
{
	_STRUCT_ARM_EXCEPTION_STATE	__es;
	_STRUCT_ARM_THREAD_STATE	__ss;
	_STRUCT_ARM_VFP_STATE		__fs;
};

_STRUCT_ARM_THREAD_STATE
{
	__uint32_t	__r[13];	/* General purpose register r0-r12 */
	__uint32_t	__sp;		/* Stack pointer r13 */
	__uint32_t	__lr;		/* Link register r14 */
	__uint32_t	__pc;		/* Program counter r15 */
	__uint32_t	__cpsr;		/* Current program status register */
};

可以对照《iOS ABI Function Call Guide》，其中在 ARM64 相关章节中描述到：

The frame pointer register (x29) must always address a valid frame record, although some functions–such as leaf functions or tail calls–may elect not to create an entry in this list. As a result, stack traces will always be meaningful, even without debug information

而在 ARMv7/v6 上描述到：

The function calling conventions used in the ARMv6 environment are the same as those used in the Procedure Call Standard for the ARM Architecture (release 1.07), with the following exceptions:
*The stack is 4-byte aligned at the point of function calls. Large data types (larger than 4 bytes) are 4-byte aligned. Register R7 is used as a frame pointer Register R9 has special usage.*

所以，通过了解以上不同平台的寄存器结构，我们可以编写出比较通用的回溯功能。

5、算法实现

/**
 * 关于栈帧的布局可以参考：
 * https://en.wikipedia.org/wiki/Call_stack
 * http://www.cs.cornell.edu/courses/cs412/2008sp/lectures/lec20.pdf
 * http://eli.thegreenplace.net/2011/09/06/stack-frame-layout-on-x86-64/
 */
typedef struct JDYStackFrame {
    const struct JDYStackFrame* const previous;  // 上一个栈帧
    const uintptr_t returnAddress;  // 返回地址：上一个栈帧的地址
} JDYStackFrame;


/// 回溯
int jdy_backtraceThread(thread_t thread, uintptr_t *backtraceBuffer, int limit) {
    if (limit <= 0) return 0;

    _STRUCT_MCONTEXT mcontext;
    // 获取上下文
    if (!jdy_fillThreadStateIntoMachineContext(thread, &mcontext)) {
        return 0;
    }

    int i = 0;
    uintptr_t pc = jdy_programCounterOfMachineContext(&mcontext);
    backtraceBuffer[i++] = pc;
    if (i == limit) return i;

    uintptr_t lr = jdy_linkRegisterOfMachineContext(&mcontext);
    if (lr != 0) {
        /* 由于lr保存的也是返回地址，所以在lr有效时，应该会产生重复的地址项 */
        backtraceBuffer[i++] = lr;
        if (i == limit) return i;
    }

    JDYStackFrame frame = {0};
    uintptr_t fp = jdy_framePointerOfMachineContext(&mcontext);
    if (fp == 0 || jdy_copyMemory((void *)fp, &frame, sizeof(frame)) != KERN_SUCCESS) {
        return i;
    }

    while (i < limit) {
        backtraceBuffer[i++] = frame.returnAddress;
        if (frame.returnAddress == 0
            || frame.previous == NULL
            || jdy_copyMemory((void *)frame.previous, &frame, sizeof(frame)) != KERN_SUCCESS) {
            break;
        }
    }

    return i;
}

二、失败的传统方法

利用 dispatch_async 或 performSelectorOnMainThread 等方法，结合 callstackSymbols 方法，回到主线程并获取调用栈。这是否可行？

线程，首先要运行起来，然后(如果有必要)启动 runloop 进行保活。我们知道 runloop 的本质就是一个死循环，在循环中调用多个函数，分别判断 source0、source1、timer、dispatch_queue 等事件源有没有要处理的内容。

和 UI 相关的事件都是 source0，因此会执行 __CFRunLoopDoSources0，当事件处理完后 runloop 进入休眠状态。

假设我们使用 dispatch_async，它会唤醒 runloop 并处理事件，但此时 __CFRunLoopDoSources0 已经执行完毕，不可能获取到 viewDidLoad 的调用栈。

performSelector 系列方法的底层也依赖于 runloop，因此它只是向当前的 runloop 提交了一个任务，但是依然要等待现有任务完成以后才能执行，所以拿不到实时的调用栈。

总而言之，一切涉及到 runloop，或者需要等待 viewDidLoad 执行完的方案都不可能成功。

三、Mach_thread

回忆之前对栈的介绍，只要知道 StackPointer 和 FramePointer 就可以完全确定一个栈的信息，那有没有办法拿到所有线程的 StackPointer 和 FramePointer 呢？

答案是肯定的，首先系统提供了 task_threads 方法，可以获取到所有的线程，注意这里的线程是最底层的 mach 线程，它和 NSThread 的关系稍后会详细阐述。

对于每一个线程，可以用 thread_get_state 方法获取它的所有信息，信息填充在 _STRUCT_MCONTEXT 类型的参数中。这个方法中有两个参数随着 CPU 架构的不同而改变，因此定义了 BS_THREAD_STATE_COUNT 和 BS_THREAD_STATE 这两个宏用于屏蔽不同 CPU 之间的区别。

在 _STRUCT_MCONTEXT 类型的结构体中，存储了当前线程的 Stack Pointer 和最顶部栈帧的 Frame Pointer，从而获取到了整个线程的调用栈。

在项目中，调用栈存储在 backtraceBuffer 数组中，其中每一个指针对应了一个栈帧，每个栈帧又对应一个函数调用，并且每个函数都有自己的符号名。

接下来的任务就是根据栈帧的 Frame Pointer 获取到这个函数调用的符号名。

四、相关 API 和数据结构

由于在上面回溯线程调用栈拿到的是一组地址，所以这里进行符号化的输入输出应该分别是地址和符号。在实际操作中，我们需要依赖于 dyld 相关方法和数据结构：

/*
 * Structure filled in by dladdr().
 */
typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;


extern int dladdr(const void *, Dl_info *);

DESCRIPTION
     These routines provide additional introspection of dyld beyond that provided by dlopen() and dladdr()

     _dyld_image_count() returns the current number of images mapped in by dyld. Note that using this count
     to iterate all images is not thread safe, because another thread may be adding or removing images dur-ing during
     ing the iteration.

     _dyld_get_image_header() returns a pointer to the mach header of the image indexed by image_index.  If
     image_index is out of range, NULL is returned.

     _dyld_get_image_vmaddr_slide() returns the virtural memory address slide amount of the image indexed by
     image_index. If image_index is out of range zero is returned.

     _dyld_get_image_name() returns the name of the image indexed by image_index. The C-string continues to
     be owned by dyld and should not deleted.  If image_index is out of range NULL is returned.

又为了要判断此次解析是否成功，所以接口设计为：

bool jdy_symbolicateAddress(const uintptr_t addr, Dl_info *info)

Dl_info 用来填充解析的结果。

五、算法思路

对一个地址进行符号化解析说起来也是比较直接的，就是找到地址所属的内存镜像，然后定位该镜像中的符号表，最后从符号表中匹配目标地址的符号。

以下思路是描述一个大致的方向，并没有涵盖具体的细节，比如基于 ASLR 的偏移量：

// 基于 ASLR 的偏移量 https://en.wikipedia.org/wiki/Address_space_layout_randomization

/**
 * When the dynamic linker loads an image,
 * the image must be mapped into the virtual address space of the process at an unoccupied address.
 * The dynamic linker accomplishes this by adding a value "the virtual memory slide amount" to the base address of the image.
 */

①、寻找包含地址的目标镜像

起初看到一个 API 还有点小惊喜，可惜 iPhone 上用不了：

extern bool _dyld_image_containing_address(const void * address) __OSX_AVAILABLE_BUT_DEPRECATED(__MAC_10_3,__MAC_10_5,__IPHONE_NA,__IPHONE_NA);

所以得自己来判断。

A segment defines a range of bytes in a Mach-O file and the addresses and memory protection attributes at which those bytes are mapped into virtual memory when the dynamic linker loads the application. As such, segments are always virtual memory page aligned. A segment contains zero or more sections.

通过遍历每个段，判断目标地址是否落在该段包含的范围内：

/*
* The segment load command indicates that a part of this file is to be
* mapped into the task's address space.  The size of this segment in memory,
* vmsize, maybe equal to or larger than the amount to map from this file,
* filesize.  The file is mapped starting at fileoff to the beginning of
* the segment in memory, vmaddr.  The rest of the memory of the segment,
* if any, is allocated zero fill on demand.  The segment's maximum virtual
* memory protection and initial virtual memory protection are specified
* by the maxprot and initprot fields.  If the segment has sections then the
* section structures directly follow the segment command and their size is
* reflected in cmdsize.
*/
struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};


/**
 * @brief 判断某个segment_command是否包含addr这个地址，基于segment的虚拟地址和段大小来判断
 */
bool jdy_segmentContainsAddress(const struct load_command *cmdPtr, const uintptr_t addr) {
    if (cmdPtr->cmd == LC_SEGMENT) {
        struct segment_command *segPtr = (struct segment_command *)cmdPtr;
        if (addr >= segPtr->vmaddr && addr < (segPtr->vmaddr + segPtr->vmsize)) {
            return true;
        }
    }
}

这样一来，我们就可以找到包含目标地址的镜像文件了。

②、定位目标镜像的符号表

由于符号的收集和符号表的创建贯穿着编译和链接阶段，这里就不展开了，而是只要确定除了代码段 _TEXT 和数据段 DATA 外，还有个 _LINKEDIT 段包含符号表：

The __LINKEDIT segment contains raw data used by the dynamic linker, such as symbol, string, and relocation table entries.

所以现在我们需要先定位到 __LINKEDIT 段，同样摘自苹果官方文档：

Segments and sections are normally accessed by name. Segments, by convention, are named using all uppercase letters preceded by two underscores (for example, _TEXT); sections should be named using all lowercase letters preceded by two underscores (for example, _text). This naming convention is standard, although not required for the tools to operate correctly.

我们通过遍历每个段，比较段名称是否和 __LINKEDIT 相同：

usr/include/mach-o/loader.h

#define SEG_LINKEDIT   "__LINKEDIT"

接着来找符号表：

摘自《The Mac Hacker's Handbook》： The LC_SYMTAB load command describes where to find the string and symbol tables within the __LINKEDIT segment. The offsets given are file offsets, so you subtract the file offset of the __LINKEDIT segment to obtain the virtual memory offset of the string and symbol tables. Adding the virtual memory offset to the virtual-memory address where the __LINKEDIT segment is loaded will give you the in-memory location of the string and sym- bol tables.

也就是说，我们需要结合 __LINKEDIT segment_command(见上面结构描述)和 LC_SYMTAB load_command(见下面结构描述)来定位符号表：

/*
 * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
 * "stab" style symbol table information as described in the header files
 * <nlist.h> and <stab.h>.
 */
struct symtab_command {
	uint32_t	cmd;		/* LC_SYMTAB */
	uint32_t	cmdsize;	/* sizeof(struct symtab_command) */
	uint32_t	symoff;		/* symbol table offset */
	uint32_t	nsyms;		/* number of symbol table entries */
	uint32_t	stroff;		/* string table offset */
	uint32_t	strsize;	/* string table size in bytes */
};

如上述引用描述，LC_SYMTAB 和 _LINKEDIT 中的偏移量都是文件偏移量，所以要获得内存中符号表和字符串表的地址，我们先将 LC_SYMTAB 的 symoff 和 stroff 分别减去 LINKEDIT 的 fileoff 得到虚拟地址偏移量，然后再加上 _LINKEDIT 的vmoffset 得到虚拟地址。当然，要得到最终的实际内存地址，还需要加上基于 ASLR 的偏移量。

③、在符号表中寻找和目标地址最匹配的符号

终于找到符号表了，代码：

/**
 * @brief 在指定的符号表中为地址匹配最合适的符号，这里的地址需要减去vmaddr_slide
 */
const JDY_SymbolTableEntry *jdy_findBestMatchSymbolForAddress(uintptr_t addr,
                                                              JDY_SymbolTableEntry *symbolTable,
                                                              uint32_t nsyms) {

    // 1. addr >= symbol.value; 因为addr是某个函数中的一条指令地址，它应该大于等于这个函数的入口地址，也就是对应符号的值；
    // 2. symbol.value is nearest to addr; 离指令地址addr更近的函数入口地址，才是更准确的匹配项；

    const JDY_SymbolTableEntry *nearestSymbol = NULL;
    uintptr_t currentDistance = UINT32_MAX;

    for (uint32_t symIndex = 0; symIndex < nsyms; symIndex++) {
        uintptr_t symbolValue = symbolTable[symIndex].n_value;
        if (symbolValue > 0) {
            uintptr_t symbolDistance = addr - symbolValue;
            if (symbolValue <= addr && symbolDistance <= currentDistance) {
                currentDistance = symbolDistance;
                nearestSymbol = symbolTable + symIndex;
            }
        }
    }

    return nearestSymbol;
}


/*
 * This is the symbol table entry structure for 64-bit architectures.
 */
struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};

找到匹配的 nlist 结构后，我们可以通过 .n_un.n_strx 来定位字符串表中相应的符号名。

六、揭秘 NSThread

我们可以获取到所有线程以及它们的调用堆栈，但如果想单独获取某个线程的堆栈呢？如何建立 NSThread 线程和内核线程之间的联系？

GNUStep-base 的源码中包含了 Foundation 库的源码，不能确保 NSThread 是否采用它的实现，但至少可以从 NSThread.m 类中挖掘出很多有用信息。

很多文章都提到了 NSThread 是 pthread 的封装，这就涉及两个问题：

pthread 是什么？
NSThread 如何封装 pthread？

pthread 中的字母 p 是 POSIX 的简写，POSIX 表示"可移植操作系统接口(Portable Operating System Interface)"。

每个操作系统都有自己的线程模型，它们提供的操作线程的 API 也不一样，这就给跨平台的线程管理带来了问题，而 POSIX 的目的就是提供抽象的 pthread 以及相关 API，这些 API 在不同操作系统中有不同的实现，但是完成的功能一致。

Unix 系统提供的 thread_get_state 和 task_threads 等方法，操作的都是内核线程，每个内核线程由 thread_t 类型的 id 来唯一标识，pthread 的唯一标识是 pthread_t 类型。

内核线程和 pthread 的转换(也即是 thread_t 和 pthread_t 互转)很容易，因为 pthread 诞生的目的就是为了抽象内核线程。

说 NSThread 封装了 pthread 并不是很准确，NSThread 内部只有很少的地方用到了 pthread。NSThread 的 start 方法简化版实现如下:

- (void) start
{
     pthread_attr_t attr;
     pthread_t thr;
     errno = 0;
     pthread_attr_init(&attr);
     if (pthread_create(&thr, &attr, nsthreadLauncher, self)) {
         // Error Handling
     }
}

甚至于 NSThread 都没有存储新建 pthread 的 pthread_t 标识。

另一处用到 pthread 的地方就是 NSThread 在退出时，调用了 pthread_exit()。除此以外就很少感受到 pthread 的存在感了。

实际上所有的 performSelector 系列最终都会走到下面这个全能函数:

- (void)performSelector:(SEL)aSelector 
               onThread:(NSThread *)thr 
             withObject:(nullable id)arg 
          waitUntilDone:(BOOL)wait 
                  modes:(nullable NSArray<NSString *> *)array API_AVAILABLE(macos(10.5), ios(2.0), watchos(2.0), tvos(9.0));

而它仅仅是一个封装，根据线程获取到 runloop，真正调用的还是 NSRunloop 的方法:

- (void) performSelector:(SEL)aSelector
                  target:(id)target
                argument:(id)argument
                   order:(NSUInteger)order
                   modes:(NSArray*)modes;

这些信息将组成一个 Performer 对象放进 runloop 等待执行。

七、NSThread 转内核 thread

由于系统没有提供相应的转换方法，而且 NSThread 没有保留线程的 pthread_t，所以常规手段无法满足需求。

一种思路是利用 performSelector 方法在指定线程执行代码并记录 thread_t，执行代码的时机不能太晚，如果在打印调用栈时才执行就会破坏调用栈。最好的方法是在线程创建时执行，上文提到了利用 pthread_create 方法创建线程，它的回调函数 nsthreadLauncher 实现如下:

static void *nsthreadLauncher(void* thread)
{
    NSThread *t = (NSThread*)thread;
    [nc postNotificationName: NSThreadDidStartNotification object:t userInfo: nil];
    [t _setName: [t name]];
    [t main];
    [NSThread exit];
    return NULL;
}

很神奇的发现系统居然会发送一个通知，通知名不对外提供，但是可以通过监听所有通知名的方法得知它的名字： @"_NSThreadDidStartNotification"，于是我们可以监听这个通知并调用 performSelector 方法。

一般 NSThread 使用 initWithTarget:Selector:object 方法创建。在 main 方法中 selector 会被执行，main 方法执行结束后线程就会退出。如果想做线程保活，需要在传入的 selector 中开启 runloop，详见我的这篇文章：深入研究 Runloop 与线程保活。

可见，这种方案并不现实，因为之前已经解释过，performSelector 依赖于 runloop 开启，而 runloop 直到 main 方法才有可能开启。

回顾问题发现，我们需要的是一个联系 NSThread 对象和内核 thread 的纽带，也就是说要找到 NSThread 对象的某个唯一值，而且内核 thread 也具有这个唯一值。

观察一下 NSThread，它的唯一值只有对象地址，对象序列号(Sequence Number) 和线程名称:

<NSThread: 0x144d095e0>{number = 1, name = main}

地址分配在堆上，没有使用意义，序列号的计算没有看懂，因此只剩下 name。幸运的是 pthread 也提供了一个方法 pthread_getname_np 来获取线程的名字，两者是一致的，感兴趣的读者可以自行阅读 setName 方法的实现，它调用的就是 pthread 提供的接口。

这里的 np 表示 not POSIX，也就是说它并不能跨平台使用。

于是解决方案就很简单了，对于 NSThread 参数，把它的名字改为某个随机数(我选择了时间戳)，然后遍历 pthread 并检查有没有匹配的名字。查找完成后把参数的名字恢复即可。

八、主线程转内核 thread

本来以为问题已经圆满解决，不料还有一个坑，主线程设置 name 后无法用 pthread_getname_np 读取到。

好在我们还可以迂回解决问题：事先获得主线程的 thread_t，然后进行比对。

上述方案要求我们在主线程中执行代码从而获得 thread_t，显然最好的方案是在 load 方法里：

static mach_port_t main_thread_id;
+ (void)load {
    main_thread_id = mach_thread_self();
}

九、学习文章

iOS中线程Call Stack的捕获和解析（一） iOS中线程Call Stack的捕获和解析（二）获取任意线程调用栈的那些事 BSBacktraceLogger