点赞再看,微信搜索 【iOS 成长指北】 对全书进行指点。欢迎分享,有任何问题都可以提出
参考文章:今日头条iOS客户端启动速度优化,优化 App 的启动时间,深入理解iOS App的启动过程
基本知识
mach-O
-
Executable 可执行文件
-
Dylib 动态库
-
Bundle 无法被链接的动态库,只能通过
dlopen()
加载 -
Image 指的是 Executable,Dylib 和 Bundle的一种,文中会多次使用 Image 这个名词。
-
Framework 动态库和对应的头文件、资源逻辑的集合。
mach-O 组成结构
源码地址 mach-O loader.h
-
Mach-O头部(mach header): 描述了Mach-O的CPU架构,文件类型,以及加载命令等信息。
-
加载命令(load command): 描述了文件中数据的具体组织结构,不同的数据类型使用不同的加载命令表示。
-
Data: Data中每个段(segment)的数据都保存在这里,每个段都一个或多个section,他们存放了具体的数据与代码。
mach_header
保存了一些基本信息,包括了该文件运行的平台、文件类型、LoadCommands的个数等等。
struct mach_header {
uint32_t magic;
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t filetype;
uint32_t ncmds;
uint32_t sizeofcmds;
uint32_t flags;
}
struct mach_header_64 {
uint32_t magic;
cpu_type_t cputype;
cpu_subtype_t cpusubtype;
uint32_t filetype;
uint32_t ncmds;
uint32_t sizeofcmds;
uint32_t flags;
uint32_t reserved;
};
32位和64位 mach_header
magic:
魔数,用于快速确认该文件用于64位还是32位
cputype:
CPU 类型,比如 arm
cpusubtype:
对应的具体类型,比如 arm64、armv7
filetype:
文件类型,比如可执行文件、库文件、Dsym 文件
/* Constants for the filetype field of the mach_header */
#define MH_OBJECT 0x1 /* relocatable object file Target 文件:编译器对源码编译后得到的中间结果 */
#define MH_EXECUTE 0x2 /* demand paged executable file 可执行二进制文件 */
#define MH_FVMLIB 0x3 /* fixed VM shared library file VM 共享库文件 */
#define MH_CORE 0x4 /* core file Core 文件,一般在 App Crash 产生 */
#define MH_PRELOAD 0x5 /* preloaded executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library 动态库*/
#define MH_DYLINKER 0x7 /* dynamic link editor 动态连接器 /usr/lib/dyld */
#define MH_BUNDLE 0x8 /* dynamically bound bundle file 非独立的二进制文件,往往通过 gcc-bundle 生成 */
#define MH_DYLIB_STUB 0x9 /* shared library stub for static 静态链接文件 */
#define MH_DSYM 0xa /* companion file with only debug 符号文件以及调试信息,在解析堆栈符号中常用 */
#define MH_KEXT_BUNDLE 0xb /* x86_64 kexts x86_64 内核扩展*/
ncmds:
ncmds 指明了 Mach-O 文件中加载命令的数量
sizeofcmds:
sizeofcmds 字段指明了 Mach-O 文件加载命令所占的总字节大小
flags:
flags 代表文件标志,它是一个含有一组位标志的整数,指明了 Mach-O 文件的一些标志信息
#define MH_NOUNDEFS 0x1 /* 目前没有未定义的符号,不存在链接依赖 Target 文件中没有带未定义的符号,常为静态二进制文件 */
#define MH_DYLDLINK 0x4 /* 该文件是dyld的输入文件,无法被再次静态链接 */
#define MH_SPLIT_SEGS 0x20 /* Target 文件中的只读 Segment 和可读写 Segment 分开 */
#define MH_TWOLEVEL 0x80 /*两级名称空间 该 Image 使用二级命名空间(two name space binding)绑定方案 */
#define MH_FORCE_FLAT 0x100 /* 使用扁平命名空间(flat name space binding)绑定(与 MH_TWOLEVEL 互斥) */
#define MH_WEAK_DEFINES 0x8000 /* 二进制文件使用了弱符号 */
#define MH_BINDS_TO_WEAK 0x10000 /* 二进制文件链接了弱符号 */
#define MH_ALLOW_STACK_EXECUTION 0x20000 /* 允许 Stack 可执行 */
#define MH_PIE 0x200000 /* 对可执行的文件类型启用地址空间 layout 随机化 加载程序在随机的地址空间,只在 MH_EXECUTE中使用 */
#define MH_NO_HEAP_EXECUTION 0x1000000 /* 将 Heap 标记为不可执行,可防止 heap spray 攻击 */
dyld
动态链接器,当内核执行 LC_DYLINK
时,连接器会启动,查找进程所依赖的动态库,并加载到内存中。
随机地址空间
进程每一次启动,地址空间都会简单地随机化。
对于大多数应用程序来说,地址空间随机化是一个和他们完全不相关的实现细节,但是对于黑客来说,它具有重大的意义。
如果采用传统的方式,程序的每一次启动的虚拟内存镜像都是一致的,黑客很容易采取重写内存的方式来破解程序。采用 ASLR 可以有效的避免黑客攻击。 ASLR(Address Space Layout Randomization)技术就是通过加载程序的时候不再使用固定的基址加载,从而干扰 shellcode 定位的一种保护机制。
二级名称空间
这是 dyld 的一个独有特性,说是符号空间中还包括所在库的信息,这样子就可以让两个不同的库导出相同的符号,与其对应的是平摊空间
reserved:
64位的 mach_header_64 比 32 位多了一个 reserved 字段,目前它的取值系统保留。
load_command (加载命令)
在 mach_header 之后是 Load Command 加载命令,这些加载命令在 Mach-O 文件加载解析时,被内核加载器或动态连接器调用,基本的加载命令的数据结构如下:
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};
cmd 代表当前加载命令的类型,cmdsize 字段代表加载命令的大小。
cmd
cmd 的类型不同,所代表的加载命令的类型就不同,他的结构体也会有所不同。不同类型加载命令会在load_command
结构体后面加上一个或多个字段来表示特定的结构体信息。
LC_SEGMENT
表示这是一个短加载命令,需要将它加载到对应的进程空间中。
LC_LOAD_DYLIB
表示这是一个需要动态加载的链接库。它使用 dylib_command
结构体。当 cmd 类型是 LC_ID_DYLIB,LC_LOAD_DYLIB,LC_LOAD_WEAK_DYLIB与LC_REEXPORT_DYLIB 时,统一用 dylib_command
结构体表示。
struct dylib_command {
uint32_t cmd; /* LC_ID_DYLIB, LC_LOAD_{,WEAK_}DYLIB, LC_REEXPORT_DYLIB */
uint32_t cmdsize; /* includes pathname string */
struct dylib dylib; /* the library identification */
};
使用dylib结构体来存储要加载的动态库的具体信息。
struct dylib {
union lc_str name; /* library's path name */
uint32_t timestamp; /* library's build time stamp */
uint32_t current_version; /* library's current version number */
uint32_t compatibility_version; /* library's compatibility vers number*/
};
-
name 是动态库的完整路径,动态连接器在加载动态库时,会通过此路径进行加载。
-
timestamp 字段描述了动态库建构是的时间戳。
-
current_version 与 compatibility_version 指明了当前版本与兼容的版本号。
LC_CODE_SIGNATURE
LC_CODE_SIGNATURE 是代码签名加载命令,描述了 Mach-O 的代码签名信息,如果不符合 Code(或者在 iOS 当中不存在),进程会立即被内核用 SIGKILL 命令杀死。它属于链接信息,使用 linedit_data_command 结构体表示。
struct linkedit_data_command {
uint32_t cmd; /* LC_CODE_SIGNATURE or LC_SEGMENT_SPLIT_INFO */
uint32_t cmdsize; /* sizeof(struct linkedit_data_command) */
uint32_t dataoff; /* file offset of data in __LINKEDIT segment */
uint32_t datasize; /* file size of data in __LINKEDIT segment */
};
-
dataoff字段制定了相对于 _LINKEDIT 段的文件偏移位置。
-
datasize字段指明了数据的大小。
LC_SEGMENT
段加载命令 LC_SEGMENT 描述了 Mach-O 文件的段信息,使用 segment_command 结构体来表示,它的定义如下:
/*
* The segment load command indicates that a part of this file is to be
* mapped into the task's address space. The size of this segment in memory,
* vmsize, maybe equal to or larger than the amount to map from this file,
* filesize. The file is mapped starting at fileoff to the beginning of
* the segment in memory, vmaddr. The rest of the memory of the segment,
* if any, is allocated zero fill on demand. The segment's maximum virtual
* memory protection and initial virtual memory protection are specified
* by the maxprot and initprot fields. If the segment has sections then the
* section structures directly follow the segment command and their size is
* reflected in cmdsize.
*/
struct segment_command { /* for 32-bit architectures */
uint32_t cmd; /* LC_SEGMENT */
uint32_t cmdsize; /* includes sizeof section structs */
char segname[16]; /* segment name */
uint32_t vmaddr; /* memory address of this segment */
uint32_t vmsize; /* memory size of this segment */
uint32_t fileoff; /* file offset of this segment */
uint32_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
/** The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
一个编译后可能执行的程序会分成多个段(Segment),不同类型的数据放入不同的段中。
程序的代码被称作代码段,放入一个名为 __TEXT
的段中。
__TEXT
代表的是 Segment,小写的 __text
代表 Section。
-
segname 字段是一个16字节大小的空间,用来存储段的名称
-
vmaddr 字段指明了段要加载的虚拟内存地址
-
vmsize 字段指明了段所占的虚拟内存的大小
-
fileoff 字段指明了段数据所在文件中的偏移地址
-
filesize 字段指明了段数据实际的大小
-
maxprot 字段指明了页面所需要的内存保护(可读,可写,可执行等)
代码段的maxprot字段在编译时被设置成VM_PROT_READ(可读),VM_PROT_WRITE(可写),VM_PROT_EXECUTE(可执行)。
-
initprot 字段指明了页面初始的内存保护
initprot 字段被设置成 VM_PROT_READ 与 VM_PROT_EXECUTE,这样是合理的,一个普通的应用程序,代码段部分通常是不可写的。
-
nsects 字段指明了段所包含的节区
nsects
字段指定了段加载命令包含几个节区,一个段可以包含 0 个或多个节区。如 __PAGEZERO 段就不包含任何节区,该段被称为空指针陷阱段,映射到虚拟内存空间的第一页,用于捕捉 NULL 指针的引用。当一个段包含多个节区时,节区信息会以数组的形式存储在段加载命令后面。节区使用结构体 section 表示(64位用 section_64 表示)。
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};
- sectname 字段表示节区的名称,
- segname 字段表示节区所在的段名,
- offset 指明了节区所在的文件偏移,
- align 表示节区的内存对齐边界,
- reloff 指明了重定位信息的文件偏移,
- nreloc 表示重定位条目的数目,
- flags则是节区的一些标志属性。
-
flags 字段指明了段的标志信息
flags存储了段的一些标志属性,取值如下:
#define SG_HIGHVM 0x1 #define SG_FVMLIB 0x2 #define SG_NORELOC 0x4 #define SG_PROTECTED_VERSION_1 0x8
值得关注的是 SG_PROTECTED_VERSTION_1,当段被设置了改标志位时,表示段是经过加密的。在 macOS 版本 10.6 以前,使用
AES
算法加密解密,10.6 以后版本使用了Blowfish
加密算法。著名的iOS逆向工具 class-dump 提供了一个静态数据段解密工具deprotect
。 -
__TEXT
代码段,只读,包含函数,和只读的字符串,上图中类似__TEXT
,__text
的都是代码段 -
__Data
数据段,读写,包括可读写的全局变量等,__DATA
,__data
都是数据段 -
__LINKEDIT
包含了方法和变量的元数据(位置,偏移量),以及代码签名等信息。
更多细节查看 Mac OS X ABI Mach-O File Format Reference
启动过程
从 exec() 函数开始
main()
函数是整个程序的入口,在程序启动之前,系统会调用exec()
函数。系统内核把应用映射到新的地址空间,且每次起始位置都是随机的(因为使用 ASLR)。并将起始位置到 0x000000
这段范围的进程权限都标记为不可读写不可执行。如果是 32 位进程,这个范围 至少 是 4KB;对于 64 位进程则 至少 是 4GB。NULL 指针引用和指针截断误差都是会被它捕获。
dyld
简介
dyld(the dynamic link editor)是苹果的动态链接器,是苹果操作系统一个重要组成部分,在系统内核做好程序准备工作之后,交由dyld负责余下的工作。点击下载。
共享缓存机制
在 iOS 系统中,每个程序依赖的动态库都需要通过 dyld(位于/usr/lib/dyld)一个一个加载到内存,然而,很多系统库几乎是每个程序都会用到的,如果在每个程序运行的时候都重复的去加载一次,势必造成运行缓慢,为了优化启动速度和提高程序性能,共享缓存机制就应运而生。所有默认的动态链接库被合并成一个大的缓存文件,放到 /System/Library/Caches/com.apple.dyld/
目录下,按不同的架构保存分别保存着。
dyld 加载过程
__dyld_start
在 dyld
源代码 dyldStartup.s
中的 __dyld_start
,源码中可以看到一条 call
命令,根据注释可以知道是跳转到 dyldbootstrap::start()
函数:
# call dyldbootstrap::start(app_mh, argc, argv, slide, dyld_mh, &startGlue)
movl %edx,(%esp) # param1 = app_mh
movl 4(%ebp),%eax
movl %eax,4(%esp) # param2 = argc
lea 8(%ebp),%eax
movl %eax,8(%esp) # param3 = argv
movl %ebx,12(%esp) # param4 = slide
movl %ecx,16(%esp) # param5 = actual load address
lea 28(%esp),%eax
movl %eax,20(%esp) # param6 = &startGlue
call __ZN13dyldbootstrap5startEPK12macho_headeriPPKclS2_Pm
dyldbootstrap::start()
函数中做了很多 dyld 初始化相关的工作,包括
-
rebaseDyld()
dyld重定位 -
mach_init()
mach消息初始化 -
__guard_setup()
栈溢出保护
初始化工作完成后,此函数调用到了 dyld::_main()
,再将返回值传递给 __dyld_start
去调用真正的 main()
函数。在 dyldInitialization.cpp
文件中可以找到 dyldbootstrap::start()
函数的实现如下:
//
// This is code to bootstrap dyld. This work in normally done for a program by dyld and crt.
// In dyld we have to do this manually.
//
uintptr_t start(const struct macho_header* appsMachHeader, int argc, const char* argv[],
intptr_t slide, const struct macho_header* dyldsMachHeader,
uintptr_t* startGlue)
{
// if kernel had to slide dyld, we need to fix up load sensitive locations
// we have to do this before using any global variables
slide = slideOfMainExecutable(dyldsMachHeader);
bool shouldRebase = slide != 0;
#if __has_feature(ptrauth_calls)
shouldRebase = true;
#endif
if ( shouldRebase ) {
rebaseDyld(dyldsMachHeader, slide);
}
// allow dyld to use mach messaging
mach_init();
// kernel sets up env pointer to be just past end of agv array
const char** envp = &argv[argc+1];
// kernel sets up apple pointer to be just past end of envp array
const char** apple = envp;
while(*apple != NULL) { ++apple; }
++apple;
// set up random value for stack canary
__guard_setup(apple);
#if DYLD_INITIALIZER_SUPPORT
// run all C++ initializers inside dyld
runDyldInitializers(dyldsMachHeader, slide, argc, argv, envp, apple);
#endif
// now that we are done bootstrapping dyld, call dyld's main
uintptr_t appsSlide = slideOfMainExecutable(appsMachHeader);
return dyld::_main(appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
dyld::_main()
参考资料:dyld详解,dylib动态库加载过程分析,XNU、dyld源码分析Mach-O和动态库的加载过程(上),XNU、dyld 源码分析,Mach-O 和动态库的加载过程 (下)
具体代码如下:
//
// Entry point for dyld. The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide,
int argc, const char* argv[], const char* envp[], const char* apple[],
uintptr_t* startGlue) {
if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
launchTraceID = dyld3::kdebug_trace_dyld_duration_start(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, (uint64_t)mainExecutableMH, 0, 0);
}
//-------------第一步,设置运行环境-------------
// Grab the cdHash of the main executable from the environment
uint8_t mainExecutableCDHashBuffer[20];
const uint8_t* mainExecutableCDHash = nullptr;
if ( hexToBytes(_simple_getenv(apple, "executable_cdhash"), 40, mainExecutableCDHashBuffer) ) {
//获取主程序的hash
mainExecutableCDHash = mainExecutableCDHashBuffer;
}
// Trace dyld's load
notifyKernelAboutImage((macho_header*)&__dso_handle, _simple_getenv(apple, "dyld_file"));
#if !TARGET_IPHONE_SIMULATOR
// Trace the main executable's load
notifyKernelAboutImage(mainExecutableMH, _simple_getenv(apple, "executable_file"));
#endif
uintptr_t result = 0;
//获取主程序的macho_header结构
sMainExecutableMachHeader = mainExecutableMH;
//获取主程序slide值
sMainExecutableSlide = mainExecutableSlide;
#if __MAC_OS_X_VERSION_MIN_REQUIRED
// if this is host dyld, check to see if iOS simulator is being run
const char* rootPath = _simple_getenv(envp, "DYLD_ROOT_PATH");
if ( (rootPath != NULL) ) {
// look to see if simulator has its own dyld
char simDyldPath[PATH_MAX];
strlcpy(simDyldPath, rootPath, PATH_MAX);
strlcat(simDyldPath, "/usr/lib/dyld_sim", PATH_MAX);
int fd = my_open(simDyldPath, O_RDONLY, 0);
if ( fd != -1 ) {
const char* errMessage = useSimulatorDyld(fd, mainExecutableMH, simDyldPath, argc, argv, envp, apple, startGlue, &result);
if ( errMessage != NULL )
halt(errMessage);
return result;
}
}
#endif
CRSetCrashLogMessage("dyld: launch started");
//设置当前上下文信息
setContext(mainExecutableMH, argc, argv, envp, apple);
// Pickup the pointer to the exec path.
//获取主程序路径
sExecPath = _simple_getenv(apple, "executable_path");
// <rdar://problem/13868260> Remove interim apple[0] transition code from dyld
if (!sExecPath) sExecPath = apple[0];
if ( sExecPath[0] != '/' ) {
// have relative path, use cwd to make absolute
char cwdbuff[MAXPATHLEN];
if ( getcwd(cwdbuff, MAXPATHLEN) != NULL ) {
// maybe use static buffer to avoid calling malloc so early...
char* s = new char[strlen(cwdbuff) + strlen(sExecPath) + 2];
strcpy(s, cwdbuff);
strcat(s, "/");
strcat(s, sExecPath);
sExecPath = s;
}
}
// Remember short name of process for later logging
// 获取进程名称
sExecShortName = ::strrchr(sExecPath, '/');
if ( sExecShortName != NULL )
++sExecShortName;
else
sExecShortName = sExecPath;
//配置进程受限模式
configureProcessRestrictions(mainExecutableMH);
#if __MAC_OS_X_VERSION_MIN_REQUIRED
if ( !gLinkContext.allowEnvVarsPrint && !gLinkContext.allowEnvVarsPath && !gLinkContext.allowEnvVarsSharedCache ) {
pruneEnvironmentVariables(envp, &apple);
// set again because envp and apple may have changed or moved
setContext(mainExecutableMH, argc, argv, envp, apple);
}
else
#endif
{
//检测环境变量
checkEnvironmentVariables(envp);
defaultUninitializedFallbackPaths(envp);
}
#if __MAC_OS_X_VERSION_MIN_REQUIRED
if ( ((dyld3::MachOFile*)mainExecutableMH)->supportsPlatform(dyld3::Platform::iOSMac)
&& !((dyld3::MachOFile*)mainExecutableMH)->supportsPlatform(dyld3::Platform::macOS)) {
gLinkContext.rootPaths = parseColonList("/System/iOSSupport", NULL);
gLinkContext.marzipan = true;
if ( sEnv.DYLD_FALLBACK_LIBRARY_PATH == sLibraryFallbackPaths )
sEnv.DYLD_FALLBACK_LIBRARY_PATH = sRestrictedLibraryFallbackPaths;
if ( sEnv.DYLD_FALLBACK_FRAMEWORK_PATH == sFrameworkFallbackPaths )
sEnv.DYLD_FALLBACK_FRAMEWORK_PATH = sRestrictedFrameworkFallbackPaths;
}
#endif
//如果设置了DYLD_PRINT_OPTS则调用printOptions()打印参数
if ( sEnv.DYLD_PRINT_OPTS )
printOptions(argv);
// 如果设置了DYLD_PRINT_ENV则调用printEnvironmentVariables()打印环境变量
if ( sEnv.DYLD_PRINT_ENV )
printEnvironmentVariables(envp);
// 获取当前程序架构
getHostInfo(mainExecutableMH, mainExecutableSlide);
//-------------第二步 加载共享缓存-------------
// load shared cache
// 检查共享缓存是否开启,iOS必须开启
checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide);
#if TARGET_IPHONE_SIMULATOR
// <HACK> until <rdar://30773711> is fixed
gLinkContext.sharedRegionMode = ImageLoader::kUsePrivateSharedRegion;
// </HACK>
#endif
if ( gLinkContext.sharedRegionMode != ImageLoader::kDontUseSharedRegion ) {
mapSharedCache();
}
bool cacheCompatible = (sSharedCacheLoadInfo.loadAddress == nullptr) || (sSharedCacheLoadInfo.loadAddress->header.formatVersion == dyld3::closure::kFormatVersion);
if ( cacheCompatible && (sEnableClosures || inWhiteList(sExecPath)) ) {
const dyld3::closure::LaunchClosure* mainClosure = nullptr;
dyld3::closure::LoadedFileInfo mainFileInfo;
mainFileInfo.fileContent = mainExecutableMH;
mainFileInfo.path = sExecPath;
// FIXME: If we are saving this closure, this slice offset/length is probably wrong in the case of FAT files.
mainFileInfo.sliceOffset = 0;
mainFileInfo.sliceLen = std::numeric_limits<__typeof(mainFileInfo.sliceLen)>::max();
struct stat mainExeStatBuf;
if ( ::stat(sExecPath, &mainExeStatBuf) == 0 ) {
mainFileInfo.inode = mainExeStatBuf.st_ino;
mainFileInfo.mtime = mainExeStatBuf.st_mtime;
}
// 首先检查缓存中是否关闭
// check for closure in cache first
if ( sSharedCacheLoadInfo.loadAddress != nullptr ) {
mainClosure = sSharedCacheLoadInfo.loadAddress->findClosure(sExecPath);
if ( gLinkContext.verboseWarnings && (mainClosure != nullptr) )
dyld::log("dyld: found closure %p (size=%lu) in dyld shared cache\n", mainClosure, mainClosure->size());
}
#if !TARGET_IPHONE_SIMULATOR
if ( (mainClosure == nullptr) || !closureValid(mainClosure, mainFileInfo, mainExecutableCDHash, true, envp) ) {
mainClosure = nullptr;
if ( sEnableClosures || isStagedApp((dyld3::MachOFile*)mainExecutableMH, sExecPath) ) {
// 如果强制闭包,缓存中没有闭包,或者它无效,请检查缓存闭包
// if forcing closures, and no closure in cache, or it is invalid, check for cached closure
mainClosure = findCachedLaunchClosure(mainExecutableCDHash, mainFileInfo, envp);
if ( mainClosure == nullptr ) {
// if no cached closure found, build new one
mainClosure = buildLaunchClosure(mainExecutableCDHash, mainFileInfo, envp);
}
}
}
#endif
// try using launch closure
if ( mainClosure != nullptr ) {
CRSetCrashLogMessage("dyld3: launch started");
bool launched = launchWithClosure(mainClosure, sSharedCacheLoadInfo.loadAddress, (dyld3::MachOLoaded*)mainExecutableMH,
mainExecutableSlide, argc, argv, envp, apple, &result, startGlue);
#if !TARGET_IPHONE_SIMULATOR
if ( !launched ) {
// closure is out of date, build new one
mainClosure = buildLaunchClosure(mainExecutableCDHash, mainFileInfo, envp);
if ( mainClosure != nullptr ) {
launched = launchWithClosure(mainClosure, sSharedCacheLoadInfo.loadAddress, (dyld3::MachOLoaded*)mainExecutableMH,
mainExecutableSlide, argc, argv, envp, apple, &result, startGlue);
}
}
#endif
if ( launched ) {
#if __has_feature(ptrauth_calls)
// start() calls the result pointer as a function pointer so we need to sign it.
result = (uintptr_t)__builtin_ptrauth_sign_unauthenticated((void*)result, 0, 0);
#endif
if (sSkipMain)
result = (uintptr_t)&fake_main;
return result;
}
else {
if ( gLinkContext.verboseWarnings )
dyld::log("dyld: unable to use closure %p\n", mainClosure);
}
}
}
else {
if ( gLinkContext.verboseWarnings )
dyld::log("dyld: not using closure because shared cache format version does not match dyld's\n");
}
// could not use closure info, launch old way
// install gdb notifier
stateToHandlers(dyld_image_state_dependents_mapped, sBatchHandlers)->push_back(notifyGDB);
stateToHandlers(dyld_image_state_mapped, sSingleHandlers)->push_back(updateAllImages);
// make initial allocations large enough that it is unlikely to need to be re-alloced
sImageRoots.reserve(16);
sAddImageCallbacks.reserve(4);
sRemoveImageCallbacks.reserve(4);
sAddLoadImageCallbacks.reserve(4);
sImageFilesNeedingTermination.reserve(16);
sImageFilesNeedingDOFUnregistration.reserve(8);
#if !TARGET_IPHONE_SIMULATOR
#ifdef WAIT_FOR_SYSTEM_ORDER_HANDSHAKE
// <rdar://problem/6849505> Add gating mechanism to dyld support system order file generation process
WAIT_FOR_SYSTEM_ORDER_HANDSHAKE(dyld::gProcessInfo->systemOrderFlag);
#endif
#endif
try {
// add dyld itself to UUID list
addDyldImageToUUIDList();
#if SUPPORT_ACCELERATE_TABLES
#if __arm64e__
// Disable accelerator tables when we have threaded rebase/bind, which is arm64e executables only for now.
if (sMainExecutableMachHeader->cpusubtype == CPU_SUBTYPE_ARM64_E)
sDisableAcceleratorTables = true;
#endif
bool mainExcutableAlreadyRebased = false;
if ( (sSharedCacheLoadInfo.loadAddress != nullptr) && !dylibsCanOverrideCache() && !sDisableAcceleratorTables && (sSharedCacheLoadInfo.loadAddress->header.accelerateInfoAddr != 0) ) {
struct stat statBuf;
if ( ::stat(IPHONE_DYLD_SHARED_CACHE_DIR "no-dyld2-accelerator-tables", &statBuf) != 0 )
sAllCacheImagesProxy = ImageLoaderMegaDylib::makeImageLoaderMegaDylib(&sSharedCacheLoadInfo.loadAddress->header, sSharedCacheLoadInfo.slide, mainExecutableMH, gLinkContext);
}
reloadAllImages:
#endif
CRSetCrashLogMessage(sLoadingCrashMessage);
//-------------第三步 实例化主程序-------------
// instantiate ImageLoader for main executable
sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath);
gLinkContext.mainExecutable = sMainExecutable;
gLinkContext.mainExecutableCodeSigned = hasCodeSignatureLoadCommand(mainExecutableMH);
\#if TARGET_IPHONE_SIMULATOR
#if TARGET_IPHONE_SIMULATOR
// check main executable is not too new for this OS
{
if ( ! isSimulatorBinary((uint8_t*)mainExecutableMH, sExecPath) ) {
throwf("program was built for a platform that is not supported by this runtime");
}
uint32_t mainMinOS = sMainExecutable->minOSVersion();
// dyld is always built for the current OS, so we can get the current OS version
// from the load command in dyld itself.
uint32_t dyldMinOS = ImageLoaderMachO::minOSVersion((const mach_header*)&__dso_handle);
if ( mainMinOS > dyldMinOS ) {
#if TARGET_OS_WATCH
throwf("app was built for watchOS %d.%d which is newer than this simulator %d.%d",
mainMinOS >> 16, ((mainMinOS >> 8) & 0xFF),
dyldMinOS >> 16, ((dyldMinOS >> 8) & 0xFF));
#elif TARGET_OS_TV
throwf("app was built for tvOS %d.%d which is newer than this simulator %d.%d",
mainMinOS >> 16, ((mainMinOS >> 8) & 0xFF),
dyldMinOS >> 16, ((dyldMinOS >> 8) & 0xFF));
#else
throwf("app was built for iOS %d.%d which is newer than this simulator %d.%d",
mainMinOS >> 16, ((mainMinOS >> 8) & 0xFF),
dyldMinOS >> 16, ((dyldMinOS >> 8) & 0xFF));
#endif
}
}
#endif
#if __MAC_OS_X_VERSION_MIN_REQUIRED
// <rdar://problem/22805519> be less strict about old mach-o binaries
uint32_t mainSDK = sMainExecutable->sdkVersion();
gLinkContext.strictMachORequired = (mainSDK >= DYLD_MACOSX_VERSION_10_12) || gLinkContext.allowInsertFailures;
#else
// simulators, iOS, tvOS, and watchOS are always strict
gLinkContext.strictMachORequired = true;
#endif
#if SUPPORT_ACCELERATE_TABLES
sAllImages.reserve((sAllCacheImagesProxy != NULL) ? 16 : INITIAL_IMAGE_COUNT);
#else
sAllImages.reserve(INITIAL_IMAGE_COUNT);
#endif
// Now that shared cache is loaded, setup an versioned dylib overrides
#if SUPPORT_VERSIONED_PATHS
checkVersionedPaths();
#endif
// dyld_all_image_infos image list does not contain dyld
// add it as dyldPath field in dyld_all_image_infos
// for simulator, dyld_sim is in image list, need host dyld added
#if TARGET_IPHONE_SIMULATOR
// get path of host dyld from table of syscall vectors in host dyld
void* addressInDyld = gSyscallHelpers;
#else
// get path of dyld itself
void* addressInDyld = (void*)&__dso_handle;
#endif
char dyldPathBuffer[MAXPATHLEN+1];
int len = proc_regionfilename(getpid(), (uint64_t)(long)addressInDyld, dyldPathBuffer, MAXPATHLEN);
if ( len > 0 ) {
dyldPathBuffer[len] = '\0'; // proc_regionfilename() does not zero terminate returned string
if ( strcmp(dyldPathBuffer, gProcessInfo->dyldPath) != 0 )
gProcessInfo->dyldPath = strdup(dyldPathBuffer);
}
//-------------第四步 加载插入的动态库-------------
// load any inserted libraries
if ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
loadInsertedDylib(*lib);
}
// record count of inserted libraries so that a flat search will look at
// inserted libraries, then main, then others.
// 记录插入的动态库数量
sInsertedDylibCount = sAllImages.size()-1;
//-------------第五步 链接主程序-------------
// link main executable
gLinkContext.linkingMainExecutable = true;
#if SUPPORT_ACCELERATE_TABLES
if ( mainExcutableAlreadyRebased ) {
// previous link() on main executable has already adjusted its internal pointers for ASLR
// work around that by rebasing by inverse amount
sMainExecutable->rebase(gLinkContext, -mainExecutableSlide);
}
#endif
link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
sMainExecutable->setNeverUnloadRecursive();
if ( sMainExecutable->forceFlat() ) {
gLinkContext.bindFlat = true;
gLinkContext.prebindUsage = ImageLoader::kUseNoPrebinding;
}
//-------------第六步 链接插入的动态库-------------
// link any inserted libraries
// do this after linking main executable so that any dylibs pulled in by inserted
// dylibs (e.g. libSystem) will not be in front of dylibs the program uses
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
image->setNeverUnloadRecursive();
}
// only INSERTED libraries can interpose
// register interposing info after all inserted libraries are bound so chaining works
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->registerInterposing(gLinkContext);
}
}
// <rdar://problem/19315404> dyld should support interposition even without DYLD_INSERT_LIBRARIES
for (long i=sInsertedDylibCount+1; i < sAllImages.size(); ++i) {
ImageLoader* image = sAllImages[i];
if ( image->inSharedCache() )
continue;
image->registerInterposing(gLinkContext);
}
#if SUPPORT_ACCELERATE_TABLES
if ( (sAllCacheImagesProxy != NULL) && ImageLoader::haveInterposingTuples() ) {
// Accelerator tables cannot be used with implicit interposing, so relaunch with accelerator tables disabled
ImageLoader::clearInterposingTuples();
// unmap all loaded dylibs (but not main executable)
for (long i=1; i < sAllImages.size(); ++i) {
ImageLoader* image = sAllImages[i];
if ( image == sMainExecutable )
continue;
if ( image == sAllCacheImagesProxy )
continue;
image->setCanUnload();
ImageLoader::deleteImage(image);
}
// note: we don't need to worry about inserted images because if DYLD_INSERT_LIBRARIES was set we would not be using the accelerator table
sAllImages.clear();
sImageRoots.clear();
sImageFilesNeedingTermination.clear();
sImageFilesNeedingDOFUnregistration.clear();
sAddImageCallbacks.clear();
sRemoveImageCallbacks.clear();
sAddLoadImageCallbacks.clear();
sDisableAcceleratorTables = true;
sAllCacheImagesProxy = NULL;
sMappedRangesStart = NULL;
mainExcutableAlreadyRebased = true;
gLinkContext.linkingMainExecutable = false;
resetAllImages();
goto reloadAllImages;
}
#endif
// apply interposing to initial set of images
for(int i=0; i < sImageRoots.size(); ++i) {
sImageRoots[i]->applyInterposing(gLinkContext);
}
ImageLoader::applyInterposingToDyldCache(gLinkContext);
gLinkContext.linkingMainExecutable = false;
// Bind and notify for the main executable now that interposing has been registered
uint64_t bindMainExecutableStartTime = mach_absolute_time();
sMainExecutable->recursiveBindWithAccounting(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
uint64_t bindMainExecutableEndTime = mach_absolute_time();
ImageLoaderMachO::fgTotalBindTime += bindMainExecutableEndTime - bindMainExecutableStartTime;
gLinkContext.notifyBatch(dyld_image_state_bound, false);
// Bind and notify for the inserted images now interposing has been registered
if ( sInsertedDylibCount > 0 ) {
for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
ImageLoader* image = sAllImages[i+1];
image->recursiveBind(gLinkContext, sEnv.DYLD_BIND_AT_LAUNCH, true);
}
}
//-------------第七步 执行弱符号绑定-------------
// <rdar://problem/12186933> do weak binding only after all inserted images linked
sMainExecutable->weakBind(gLinkContext);
// If cache has branch island dylibs, tell debugger about them
if ( (sSharedCacheLoadInfo.loadAddress != NULL) && (sSharedCacheLoadInfo.loadAddress->header.mappingOffset >= 0x78) && (sSharedCacheLoadInfo.loadAddress->header.branchPoolsOffset != 0) ) {
uint32_t count = sSharedCacheLoadInfo.loadAddress->header.branchPoolsCount;
dyld_image_info info[count];
const uint64_t* poolAddress = (uint64_t*)((char*)sSharedCacheLoadInfo.loadAddress + sSharedCacheLoadInfo.loadAddress->header.branchPoolsOffset);
// <rdar://problem/20799203> empty branch pools can be in development cache
if ( ((mach_header*)poolAddress)->magic == sMainExecutableMachHeader->magic ) {
for (int poolIndex=0; poolIndex < count; ++poolIndex) {
uint64_t poolAddr = poolAddress[poolIndex] + sSharedCacheLoadInfo.slide;
info[poolIndex].imageLoadAddress = (mach_header*)(long)poolAddr;
info[poolIndex].imageFilePath = "dyld_shared_cache_branch_islands";
info[poolIndex].imageFileModDate = 0;
}
// add to all_images list
addImagesToAllImages(count, info);
// tell gdb about new branch island images
gProcessInfo->notification(dyld_image_adding, count, info);
}
}
CRSetCrashLogMessage("dyld: launch, running initializers");
#if SUPPORT_OLD_CRT_INITIALIZATION
// Old way is to run initializers via a callback from crt1.o
if ( ! gRunInitializersOldWay )
initializeMainExecutable();
#else
// run all initializers
//-------------第八步 执行初始化方法-------------
initializeMainExecutable();
#endif
// notify any montoring proccesses that this process is about to enter main()
if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
}
notifyMonitoringDyldMain();
//-------------第九步 查找入口点并返回-------------
// find entry point for main executable
result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
if ( result != 0 ) {
// main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
else
halt("libdyld.dylib support not present for LC_MAIN");
}
else {
// main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
*startGlue = 0;
}
#if __has_feature(ptrauth_calls)
// start() calls the result pointer as a function pointer so we need to sign it.
result = (uintptr_t)__builtin_ptrauth_sign_unauthenticated((void*)result, 0, 0);
#endif
}
catch(const char* message) {
syncAllImages();
halt(message);
}
catch(...) {
dyld::log("dyld: launch failed\n");
}
CRSetCrashLogMessage("dyld2 mode");
if (sSkipMain) {
if (dyld3::kdebug_trace_dyld_enabled(DBG_DYLD_TIMING_LAUNCH_EXECUTABLE)) {
dyld3::kdebug_trace_dyld_duration_end(launchTraceID, DBG_DYLD_TIMING_LAUNCH_EXECUTABLE, 0, 0, 2);
}
result = (uintptr_t)&fake_main;
*startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
}
return result;
}
设置运行环境
这一步主要是设置运行参数、环境变量等。代码在开始的时候,将入参 mainExecutableMH
赋值给了 sMainExecutableMachHeader
,这是一个 macho_header
结构体,表示的是当前主程序的 Mach-O
头部信息,加载器依据 Mach-O
头部信息就可以解析整个 Mach-O
文件信息。接着调用 setContext()
设置上下文信息,包括一些回调函数、参数、标志信息等。设置的回调函数都是 dyld
模块自身实现的,如 loadLibrary()
函数实际调用的是 libraryLocator()
,负责加载动态库。
static void setContext(const macho_header* mainExecutableMH, int argc, const char* argv[], const char* envp[], const char* apple[]) {
gLinkContext.loadLibrary = &libraryLocator;
gLinkContext.terminationRecorder = &terminationRecorder;
...
}
configureProcessRestrictions()
用来配置进程是否受限。AMFI(AppleMobileFileIntegrity)。
static void configureProcessRestrictions(const macho_header* mainExecutableMH) {
uint64_t amfiInputFlags = 0;
#if TARGET_IPHONE_SIMULATOR
amfiInputFlags |= AMFI_DYLD_INPUT_PROC_IN_SIMULATOR;
#elif __MAC_OS_X_VERSION_MIN_REQUIRED
if ( hasRestrictedSegment(mainExecutableMH) )
amfiInputFlags |= AMFI_DYLD_INPUT_PROC_HAS_RESTRICT_SEG;
#elif __IPHONE_OS_VERSION_MIN_REQUIRED
if ( isFairPlayEncrypted(mainExecutableMH) )
amfiInputFlags |= AMFI_DYLD_INPUT_PROC_IS_ENCRYPTED;
#endif
uint64_t amfiOutputFlags = 0;
if ( amfi_check_dyld_policy_self(amfiInputFlags, &amfiOutputFlags) == 0 ) {
gLinkContext.allowAtPaths = (amfiOutputFlags & AMFI_DYLD_OUTPUT_ALLOW_AT_PATH);
gLinkContext.allowEnvVarsPrint = (amfiOutputFlags & AMFI_DYLD_OUTPUT_ALLOW_PRINT_VARS);
gLinkContext.allowEnvVarsPath = (amfiOutputFlags & AMFI_DYLD_OUTPUT_ALLOW_PATH_VARS);
gLinkContext.allowEnvVarsSharedCache = (amfiOutputFlags & AMFI_DYLD_OUTPUT_ALLOW_CUSTOM_SHARED_CACHE);
gLinkContext.allowClassicFallbackPaths = (amfiOutputFlags & AMFI_DYLD_OUTPUT_ALLOW_FALLBACK_PATHS);
gLinkContext.allowInsertFailures = (amfiOutputFlags & AMFI_DYLD_OUTPUT_ALLOW_FAILED_LIBRARY_INSERTION);
}
else {
#if __MAC_OS_X_VERSION_MIN_REQUIRED
// support chrooting from old kernel
bool isRestricted = false;
bool libraryValidation = false;
// any processes with setuid or setgid bit set or with __RESTRICT segment is restricted
if ( issetugid() || hasRestrictedSegment(mainExecutableMH) ) {
isRestricted = true;
}
bool usingSIP = (csr_check(CSR_ALLOW_TASK_FOR_PID) != 0);
uint32_t flags;
if ( csops(0, CS_OPS_STATUS, &flags, sizeof(flags)) != -1 ) {
// On OS X CS_RESTRICT means the program was signed with entitlements
if ( ((flags & CS_RESTRICT) == CS_RESTRICT) && usingSIP ) {
isRestricted = true;
}
// Library Validation loosens searching but requires everything to be code signed
if ( flags & CS_REQUIRE_LV ) {
isRestricted = false;
libraryValidation = true;
}
}
gLinkContext.allowAtPaths = !isRestricted;
gLinkContext.allowEnvVarsPrint = !isRestricted;
gLinkContext.allowEnvVarsPath = !isRestricted;
gLinkContext.allowEnvVarsSharedCache = !libraryValidation || !usingSIP;
gLinkContext.allowClassicFallbackPaths = !isRestricted;
gLinkContext.allowInsertFailures = false;
#else
halt("amfi_check_dyld_policy_self() failed\n");
#endif
}
}
checkEnvironmentVariables()
检测环境变量。
static void checkEnvironmentVariables(const char* envp[])
{
if ( !gLinkContext.allowEnvVarsPath && !gLinkContext.allowEnvVarsPrint )
return;
const char** p;
for(p = envp; *p != NULL; p++) {
const char* keyEqualsValue = *p;
if ( strncmp(keyEqualsValue, "DYLD_", 5) == 0 ) {
const char* equals = strchr(keyEqualsValue, '=');
if ( equals != NULL ) {
strlcat(sLoadingCrashMessage, "\n", sizeof(sLoadingCrashMessage));
strlcat(sLoadingCrashMessage, keyEqualsValue, sizeof(sLoadingCrashMessage));
const char* value = &equals[1];
const size_t keyLen = equals-keyEqualsValue;
char key[keyLen+1];
strncpy(key, keyEqualsValue, keyLen);
key[keyLen] = '\0';
if ( (strncmp(key, "DYLD_PRINT_", 11) == 0) && !gLinkContext.allowEnvVarsPrint )
continue;
processDyldEnvironmentVariable(key, value, NULL);
}
}
else if ( strncmp(keyEqualsValue, "LD_LIBRARY_PATH=", 16) == 0 ) {
const char* path = &keyEqualsValue[16];
sEnv.LD_LIBRARY_PATH = parseColonList(path, NULL);
}
}
#if SUPPORT_LC_DYLD_ENVIRONMENT
checkLoadCommandEnvironmentVariables();
#endif // SUPPORT_LC_DYLD_ENVIRONMENT
#if SUPPORT_ROOT_PATH
// <rdar://problem/11281064> DYLD_IMAGE_SUFFIX and DYLD_ROOT_PATH cannot be used together
if ( (gLinkContext.imageSuffix != NULL && *gLinkContext.imageSuffix != NULL) && (gLinkContext.rootPaths != NULL) ) {
dyld::warn("Ignoring DYLD_IMAGE_SUFFIX because DYLD_ROOT_PATH is used.\n");
gLinkContext.imageSuffix = NULL; // this leaks allocations from parseColonList
}
#endif
}
最后是调用 getHostInfo() 获取当前程序架构。
整个过程中有一些 DYLD_* 开头的环境变量
// 如果设置了DYLD_PRINT_OPTS则调用printOptions()打印参数
if ( sEnv.DYLD_PRINT_OPTS )
printOptions(argv);
// 如果设置了DYLD_PRINT_ENV则调用printEnvironmentVariables()打印环境变量
if ( sEnv.DYLD_PRINT_ENV )
printEnvironmentVariables(envp);
受限在Xcode中配置一下即可让这些环境变量生效
然后,按下图所示添加环境变量DYLD_PRINT_OPTS
并设置 Value
为1。
运行Xcode即可看到控制台打印的详细信息:
加载共享缓存
static void checkSharedRegionDisable(const dyld3::MachOLoaded* mainExecutableMH, uintptr_t mainExecutableSlide) {
...
// iOS cannot run without shared region
}
这一步先调用checkSharedRegionDisable()检查共享缓存是否禁用。该函数的iOS实现部分仅有一句注释,从注释我们可以推断iOS必须开启共享缓存才能正常工作.
接下来调用 mapSharedCache() 加载共享缓存,而 mapSharedCache() 实际上是调用了 loadDyldCache() 方法。从代码可以看出,共享缓存加载又分为以下三种情况:
-
仅加载到当前进程,调用mapCachePrivate()
-
共享缓存已加载,不做任何处理
-
当前进程首次加载共享缓存,调用mapCacheSystemWide()
loadDyldCache() 的实现如下:
bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results) {
results->loadAddress = 0;
results->slide = 0;
results->errorMessage = nullptr;
if TARGET_IPHONE_SIMULATOR
// simulator only supports mmap()ing cache privately into process
return mapCachePrivate(options, results);
#else
if ( options.forcePrivate ) {
// mmap cache into this process only
return mapCachePrivate(options, results);
}
else {
// fast path: when cache is already mapped into shared region
bool hasError = false;
if ( reuseExistingCache(options, results) ) {
hasError = (results->errorMessage != nullptr);
} else {
// slow path: this is first process to load cache
hasError = mapCacheSystemWide(options, results);
}
return hasError;
}
endif
}
实例化主程序
这一步将主程序的 Mach-O 加载进内存,并实例化一个 ImageLoader。instantiateFromLoadedImage() 首先调用 isCompatibleMachO() 检测 Mach-O 头部的 magic、cputype、cpusubtype 等相关属性,判断 Mach-O 文件的兼容性,如果兼容性满足,则调用 ImageLoaderMachO::instantiateMainExecutable() 实例化主程序的 ImageLoader,代码如下:
// The kernel maps in main executable before dyld gets control. We need to
// make an ImageLoader* for the already mapped in main executable.
static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path) {
// try mach-o loader
if ( isCompatibleMachO((const uint8_t*)mh, path) ) {
ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
addImage(image);
return (ImageLoaderMachO*)image;
}
throw "main executable not a known format";
}
ImageLoaderMachO::instantiateMainExecutable() 函数里面首先会调用 sniffLoadCommands() 函数来获取一些数据,包括:
-
compressed:若 Mach-O 存在 LC_DYLD_INFO 和 LC_DYLD_INFO_ONLY 加载命令,则说明是压缩类型的 Mach-O,代码片段如下:
switch (cmd->cmd) { case LC_DYLD_INFO: case LC_DYLD_INFO_ONLY: if ( cmd->cmdsize != sizeof(dyld_info_command) ) throw "malformed mach-o image: LC_DYLD_INFO size wrong"; dyldInfoCmd = (struct dyld_info_command*)cmd; // 存在LC_DYLD_INFO或者LC_DYLD_INFO_ONLY则表示是压缩类型的Mach-O *compressed = true; break; ... }
-
segCount:根据 LC_SEGMENT_COMMAND 加载命令来统计段数量,这里抛出的错误日志也说明了段的数量是不能超过 255 个,代码片段如下:
case LC_SEGMENT_COMMAND: segCmd = (struct macho_segment_command*)cmd; ... if ( segCmd->vmsize != 0 ) *segCount += 1; if ( *segCount > 255 ) dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);
-
libCount:根据 LC_LOAD_DYLIB、LC_LOAD_WEAK_DYLIB、LC_REEXPORT_DYLIB、LC_LOAD_UPWARD_DYLIB 这几个加载命令来统计库的数量,库的数量不能超过4095个。代码片段如下:
case LC_LOAD_DYLIB: case LC_LOAD_WEAK_DYLIB: case LC_REEXPORT_DYLIB: case LC_LOAD_UPWARD_DYLIB: *libCount += 1; if ( *libCount > 4095 ) dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path)
-
codeSigCmd:通过解析LC_CODE_SIGNATURE来获取代码签名加载命令,代码片段如下:
case LC_CODE_SIGNATURE: *codeSigCmd = (struct linkedit_data_command*)cmd; break;
-
encryptCmd:通过LC_ENCRYPTION_INFO和LC_ENCRYPTION_INFO_64来获取段的加密信息,代码片段如下:
case LC_ENCRYPTION_INFO: ... *encryptCmd = (encryption_info_command*)cmd; break; case LC_ENCRYPTION_INFO_64: ... *encryptCmd = (encryption_info_command*)cmd; break;
ImageLoader 是抽象类,其子类负责把 Mach-O 文件实例化为 image,当 sniffLoadCommands() 解析完以后,根据 compressed 的值来决定调用哪个子类进行实例化,代码如下:
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context) {
bool compressed;
unsigned int segCount;
unsigned int libCount;
const linkedit_data_command* codeSigCmd;
const encryption_info_command* encryptCmd;
sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
// instantiate concrete class based on content of load commands
if ( compressed )
return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
else
#if SUPPORT_CLASSIC_MACHO
return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
throw "missing LC_DYLD_INFO load command";
#endif
}
此过程可以用下图来进行直观描述:
下面以 ImageLoaderMachOCompressed::instantiateMainExecutable()
为例来看一下实现:
// create image for main executable
ImageLoaderMachOCompressed* ImageLoaderMachOCompressed::instantiateMainExecutable(
const macho_header* mh, uintptr_t slide, const char* path,
unsigned int segCount, unsigned int libCount, const LinkContext& context) {
ImageLoaderMachOCompressed* image = ImageLoaderMachOCompressed::instantiateStart(mh, path, segCount, libCount);
// set slide for PIE programs
image->setSlide(slide);
// for PIE record end of program, to know where to start loading dylibs
if ( slide != 0 )
fgNextPIEDylibAddress = (uintptr_t)image->getEnd();
image->disableCoverageCheck();
image->instantiateFinish(context);
image->setMapped(context);
if ( context.verboseMapping ) {
dyld::log("dyld: Main executable mapped %s\n", path);
for(unsigned int i=0, e=image->segmentCount(); i < e; ++i) {
const char* name = image->segName(i);
if ( (strcmp(name, "__PAGEZERO") == 0) || (strcmp(name, "__UNIXSTACK") == 0) )
dyld::log("%18s at 0x%08lX->0x%08lX\n", name, image->segPreferredLoadAddress(i), image->segPreferredLoadAddress(i)+image->segSize(i));
else
dyld::log("%18s at 0x%08lX->0x%08lX\n", name, image->segActualLoadAddress(i), image->segActualEndAddress(i));
}
}
return image;
}
总结为4步:
-
ImageLoaderMachOCompressed::instantiateStart()
创建 ImageLoaderMachOCompressed 对象 -
image->disableCoverageCheck()
禁用段覆盖检测 -
image->instantiateFinish()
首先调用parseLoadCmds()
解析加载命令,然后调用this->setDyldInfo()
设置动态库链接信息,最后调用this->setSymbolTableInfo()
设置符号表相关信息 -
image->setMapped()
函数注册通知回调、计算执行时间等等
在调用完 ImageLoaderMachO::instantiateMainExecutable()
后继续调用 addImage()
,将 image
加入到 sAllImages
全局镜像列表,并将 image
映射到申请的内存中。代码如下:
static void addImage(ImageLoader* image) {
// add to master list
allImagesLock();
sAllImages.push_back(image);
allImagesUnlock();
// update mapped ranges
uintptr_t lastSegStart = 0;
uintptr_t lastSegEnd = 0;
for(unsigned int i=0, e=image->segmentCount(); i < e; ++i) {
if ( image->segUnaccessible(i) )
continue;
uintptr_t start = image->segActualLoadAddress(i);
uintptr_t end = image->segActualEndAddress(i);
if ( start == lastSegEnd ) {
// two segments are contiguous, just record combined segments
lastSegEnd = end;
}
else {
// non-contiguous segments, record last (if any)
if ( lastSegEnd != 0 )
addMappedRange(image, lastSegStart, lastSegEnd);
lastSegStart = start;
lastSegEnd = end;
}
}
if ( lastSegEnd != 0 )
addMappedRange(image, lastSegStart, lastSegEnd);
if ( gLinkContext.verboseLoading || (sEnv.DYLD_PRINT_LIBRARIES_POST_LAUNCH && (sMainExecutable!=NULL) && sMainExecutable->isLinked()) ) {
dyld::log("dyld: loaded: %s\n", image->getPath());
}
}
至此,初始化主程序这一步就完成了。ImageLoaderMachOClassic::instantiateMainExecutable()
函数的实现,同理可推,此处不再详述。
加载插入的动态库
这一步是加载环境变量 DYLD_INSERT_LIBRARIES
中配置的动态库,先判断环境变量DYLD_INSERT_LIBRARIES
中是否存在要加载的动态库,如果存在则调用 loadInsertedDylib()
依次加载,代码如下:
// load any inserted libraries
if ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib)
loadInsertedDylib(*lib);
}
loadInsertedDylib()
内部设置了一个 LoadContext
参数后,调用了load()
函数,
load()
函数的实现为一系列的 loadPhase*()
函数,loadPhase0()~loadPhase1()
函数会按照下图所示顺序搜索动态库,并调用不同的函数来继续处理。
当内部调用到loadPhase5load()
函数的时候,会先在共享缓存中搜寻,如果存在则使用ImageLoaderMachO::instantiateFromCache()
来实例化 ImageLoader
,否则通过 loadPhase5open()
打开文件并读取数据到内存后,再调用 loadPhase6()
,通过 ImageLoaderMachO::instantiateFromFile()
实例化 ImageLoader
,最后调用 checkandAddImage()
验证镜像并将其加入到全局镜像列表中。
load()
函数代码如下:
ImageLoader* load(const char* path, const LoadContext& context, unsigned& cacheIndex) {
...
// try all path permutations and check against existing loaded images
ImageLoader* image = loadPhase0(path, orgPath, context, cacheIndex, NULL);
if ( image != NULL ) {
CRSetCrashLogMessage2(NULL);
return image;
}
// try all path permutations and try open() until first success
std::vector<const char*> exceptions;
image = loadPhase0(path, orgPath, context, cacheIndex, &exceptions);
#if !TARGET_IPHONE_SIMULATOR
// <rdar://problem/16704628> support symlinks on disk to a path in dyld shared cache
if ( image == NULL)
image = loadPhase2cache(path, orgPath, context, cacheIndex, &exceptions);
#endif
...
}
链接主程序
这一步调用link()函数将实例化后的主程序进行动态修正,让二进制变为可正常执行的状态。link()
函数内部调用了 ImageLoader::link()
函数,从源代码可以看到,这一步主要做了以下几个事情:
-
recursiveLoadLibraries()
根据 LC_LOAD_DYLIB 加载命令把所有依赖库加载进内存 -
recursiveUpdateDepth()
递归刷新依赖库的层级 -
recursiveRebase()
由于 ASLR 的存在,必须递归对主程序以及依赖库进行重定位操作 -
recursiveBind()
把主程序二进制和依赖进来的动态库全部执行符号表绑定 -
weakBind()
如果链接的不是主程序二进制的话,会在此时执行弱符号绑定,主程序二进制则在link()
完后再执行弱符号绑定,后面会进行分析 -
recursiveGetDOFSections()
、context.registerDOFs()
注册DOF(DTrace Object Format)
节
ImageLoader::link()
源代码如下:
void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath) {
...
uint64_t t0 = mach_absolute_time();
// 递归加载加载主程序所需依赖库
this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
...
uint64_t t1 = mach_absolute_time();
context.clearAllDepths();
// 递归刷新依赖库的层级
this->recursiveUpdateDepth(context.imageCount());
uint64_t t2 = mach_absolute_time();
// 递归进行rebase
this->recursiveRebase(context);
uint64_t t3 = mach_absolute_time();
// 递归绑定符号表
this->recursiveBind(context, forceLazysBound, neverUnload);
uint64_t t4 = mach_absolute_time();
if ( !context.linkingMainExecutable )
// 弱符号绑定
this->weakBind(context);
uint64_t t5 = mach_absolute_time();
context.notifyBatch(dyld_image_state_bound, false);
uint64_t t6 = mach_absolute_time();
std::vector<DOFInfo> dofs;
// 注册DOF节
this->recursiveGetDOFSections(context, dofs);
context.registerDOFs(dofs);
uint64_t t7 = mach_absolute_time();
...
}
链接插入的动态库
这一步与链接主程序一样,将前面调用addImage()函数保存在sAllImages中的动态库列表循环取出并调用link()进行链接,需要注意的是,sAllImages中保存的第一项是主程序的镜像,所以要从i+1的位置开始,取到的才是动态库的ImageLoader:
ImageLoader* image = sAllImages[i+1];
接下来循环调用每个镜像的 registerInterposing()
函数,该函数会遍历 Mach-O 的 LC_SEGMENT_COMMAND
加载命令,读取 __DATA,__interpose,并将读取到的信息保存到 fgInterposingTuples
中,接着调用applyInterposing()
函数,内部经由 doInterpose()
虚函数进行替换操作,以ImageLoaderMachOCompressed::doInterpose()
函数的实现为例:该函数内部调用了 eachBind()
与eachLazyBind()
,具体处理函数是 interposeAt()
,该函数调用 interposedAddress()
在fgInterposingTuples
中查找需要替换的符号地址,进行最终的符号地址替换,代码如下:
void ImageLoaderMachOCompressed::doInterpose(const LinkContext& context) {
// update prebound symbols
eachBind(context, &ImageLoaderMachOCompressed::interposeAt);
eachLazyBind(context, &ImageLoaderMachOCompressed::interposeAt);
}
uintptr_t ImageLoaderMachOCompressed::interposeAt(const LinkContext& context,
uintptr_t addr, uint8_t type, const char*,
uint8_t, intptr_t, long, const char*,
LastLookup*,bool runResolver) {
if ( type == BIND_TYPE_POINTER ) {
uintptr_t* fixupLocation = (uintptr_t*)addr;
uintptr_t curValue = *fixupLocation;
if ( newValue != curValue) {
*fixupLocation = newValue;
}
}
return 0;
}
执行弱符号绑定
weakBind()
首先通过 getCoalescedImages()
合并所有动态库的弱符号到一个列表里,然后调用 initializeCoalIterator()
对需要绑定的弱符号进行排序,接着调用incrementCoalIterator()
读取 dyld_info_command 结构的 weak_bind_off
和 weak_bind_size
字段,确定弱符号的数据偏移与大小,最终进行弱符号绑定,代码如下:
bool ImageLoaderMachOCompressed::incrementCoalIterator(CoalIterator& it) {
if (it.done)
return false;
if ( this->fDyldInfo->weak_bind_size == 0 ) {
/// hmmm, ld set MH_WEAK_DEFINES or MH_BINDS_TO_WEAK, but there is no weak binding info
it.done = true;
it.symbolName = "~~~";
return true;
}
const uint8_t* start = fLinkEditBase + fDyldInfo->weak_bind_off;
const uint8_t* p = start + it.curIndex;
const uint8_t* end = fLinkEditBase + fDyldInfo->weak_bind_off + this->fDyldInfo->weak_bind_size;
uintptr_t count;
uintptr_t skip;
uintptr_t segOffset;
while ( p < end ) {
uint8_t immediate = *p & BIND_IMMEDIATE_MASK;
uint8_t opcode = *p & BIND_OPCODE_MASK;
++p;
switch (opcode) {
case BIND_OPCODE_DONE:
it.done = true;
it.curIndex = p - start;
it.symbolName = "~~~"; // sorts to end
return true;
}
break;
...
}
...
return true;
}
执行初始化方法
这一步由initializeMainExecutable()
完成。dyld 会优先初始化动态库,然后初始化主程序。该函数首先执行runInitializers()
,内部再依次调用 processInitializers()
、recursiveInitialization()
。我们在recursiveInitialization()
函数里找到了 notifySingle()
函数:
context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
接着跟进 notifySingle
函数,看到下面处理代码:
if ((state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
uint64_t t0 = mach_absolute_time();
(*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
uint64_t t1 = mach_absolute_time();
uint64_t t2 = mach_absolute_time();
uint64_t timeInObjC = t1-t0;
uint64_t emptyTime = (t2-t1)*100;
if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
timingInfo->addTime(image->getShortName(), timeInObjC);
}
}
我们关心的只有 sNotifyObjCInit
这个回调,继续寻找赋值的地方:
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped) {
// record functions to call
sNotifyObjCMapped = mapped;
sNotifyObjCInit = init;
sNotifyObjCUnmapped = unmapped;
...
}
再接着找 registerObjCNotifiers
函数调用,最终找到这里:
void _dyld_objc_notify_register(_dyld_objc_notify_mapped mapped,
_dyld_objc_notify_init init,
_dyld_objc_notify_unmapped unmapped) {
dyld::registerObjCNotifiers(mapped, init, unmapped);
}
那么到底谁调用了_dyld_objc_notify_register()
呢?静态分析已经无法得知,只能对_dyld_objc_notify_register()
下个符号断点观察一下了,
点击Xcode的「Debug」菜单,然后点击「Breakpoints」,接着选择「Create Symbolic Breakpoint...」。如下图所示:
在弹出的对话框中设置 _dyld_objc_notify_register()
符号断点,按下图所示:
运行程序,成功命中断点,从调用栈看到是libobjc.A.dylib的_objc_init
函数调用了_dyld_objc_notify_register()
。如下图所示:
下载 objc源代码,找到 _objc_init 函数:
/***********************************************************************
* _objc_init
* Bootstrap initialization. Registers our image notifier with dyld.
* Called by libSystem BEFORE library initialization time
**********************************************************************/
void _objc_init(void) {
static bool initialized = false;
if (initialized) return;
initialized = true;
// fixme defer initialization until an objc-using image is found?
environ_init();
tls_init();
static_init();
lock_init();
exception_init();
_dyld_objc_notify_register(&map_images, load_images, unmap_image);
}
这里注册的init回调函数就是load_images(),回调里面调用了 call_load_methods()
来执行所有的+load()
方法。现在修改一下工程源码,加入以下代码并下断点即可看到调用栈:
+ (void)load {
NSLog(@"load");
}
notifySingle()
之后就是调用 doInitialization()
,代码如下:
// initialize this image
// 调用constructor()
bool hasInitializers = this->doInitialization(context);
doInitialization()
内部首先调用 doImageInit
来执行镜像的初始化函数,也就是 LC_ROUTINES_COMMAND 中记录的函数,然后再执行doModInitFunctions()方法来解析并执行__DATA
,__mod_init_func
这个 section 中保存的函数,如下图所示。
_mod_init_func
中保存的是全局 C++对象的构造函数以及所有带__attribute__((constructor)
的C函数。
现在添加一些代码再来运行一下程序即可验证,如下图所示:
找入口点并返回
这一步调用主程序镜像的 getThreadPC()
,从加载命令读取 LC_MAIN 入口,如果没有 LC_MAIN 就调用getMain()
读取 LC_UNIXTHREAD,找到后就跳到入口点指定的地址并返回。
至此,整个 dyld 的加载过程就分析完成了。
欢迎点赞、转发、评论。微信搜索公众号「iOS成长指北」。由于笔者的学习能力和语言表达能力,有任何不清楚或错误的地方欢迎在留言区留言。后续也会对文章进行修改。希望能一起学习,获得成长。