前序

本篇是iOS大神之路的第二篇，将讲述链接器中的符号是如何绑定到地址上的？如果想要了解iOS启动的相关知识，相信通过几篇博客会加深大家对启动到底做了哪些事情！欢迎点赞博客及关注本人，后期会继续分享更多的干货供大家分析参考点评！！！

App启动及框架底层的研究，会以下面一个逻辑分为5篇博客进行讲解：

背景及问题

对于经历过几个项目的程序员来说，是不是经常好奇参与的这么些项目，为什么有的编译起来很快，有的却很慢；编译完成后，有的启动快，有的却很慢。其实，在理解了编译和启动时链接器所做的事情之后，就可以从根上找到这些问题的答案啦。带着这个问题，一起走进编译和启动的时候，链接器到底做了那些事情？

一、准备前戏

对于讲解编译和链接的内容，前提要讲述一下基本的概念，如下：

1.1 编译语言与解释语言

编程语言分为了编译语言和解释语言，两者的执行过程也是有所不同。

编译语言

编译语言是通过编译器将代码直接编写成机器码，然后直接在CPU上运行机器码的，这样能使得效率更高，运行更快。C，C++，OC等语言，都是使用的编译器，生成相关的可执行文件。

解释语言

解释语言使用的是解释器。是在运行的时候将程序翻译成机器语言，所以运行速度相对于编译型语言要慢。

1.2 编译器与解释器

采用编译器生成的机器码执行的好处是效率高，缺点是调试周期长

解释器执行的好处是编写调试方便，缺点是执行效率低

编译器

把一种编程语言(原始语言)转换为另一种编程语言(目标语言)的程序叫做编译器

解释器

解释器会在运行时解释执行代码，获取一段代码后就会将其翻译成目标代码（就是字节码（Bytecode）），然后一句一句地执行目标代码。也就是说是在运行时才去解析代码，比直接运行编译好的可执行文件自然效率就低，但是跑起来之后可以不用重启启动编译，直接修改代码即可看到效果，类似热更新，可以帮我们缩短整个程序的开发周期和功能更新周期。

二、LLVM

2.1 LLVM 前阵

翻译如下

LLVM项目是一个模块化、可重用的编译器和工具链技术的集合。尽管它的名字叫LLVM，但LLVM与传统虚拟机没什么关系。“LLVM”这个名字本身并不是首字母缩写;它是该项目的全名。

【拓展】
Chris Lattner，亦是Swift之父
美国计算机协会 (ACM) 将其2012 年软件系统奖项颁给了LLVM，之前曾经获得此奖项的软件和技术包括：Java、Apache、Mosaic、the World Wide Web、Smalltalk、UNIX、Eclipse等等

大家来看下Swift和LLVM的创始人Chris Lattner，希望没有侵害肖像权，本人怀着敬佩之情，不得用于商业用途【庐山真面目】

2.2 架构

2.2.1 传统的编译器架构

Frontend：前端

主要的任务是：词法分析、语法分析、语义分析、生成中间代码

Optimizer

主要的任务是：中间代码优化

Backend

主要的任务是：生成机器码

2.2.2 LLVM架构

讲解

不同的前端后端使用统一的中间代码LLVM IR
如果需要支持一种新的编程语言，那么只需实现一个新的前端
如果需要支持一种新的硬件设备，那么只需实现一个新的后端
优化阶段是一个通用的阶段，它针对的是统一的LLVM IR，不论是支持新的编程语言，还是新的硬件设备，都不需要对优化阶段做修改
相比之下，GCC的前端和后端没分得太开，前端后端耦合在了一起。所以GCC为了支持一门新的语言，或者为了支持一个新的目标平台，就变得特别困难
LLVM现在被作为实现各种静态和运行时编译语言的通用基础结构（GCC家族、Java、.NET、Python、Ruby、Scheme、Haskell、D等）

三、Clang

3.1 Clang前阵

翻译如下

Clang项目为LLVM项目的C语言家族(C、c++、Objective C/ c++、OpenCL、CUDA和RenderScript)提供了语言前端和工具基础设施。提供了兼容gcc的编译器驱动程序(clang)和兼容msvc的编译器驱动程序(clang-cl.exe)。您现在就可以获得并构建源代码。

简化之：Clang是基于LLVM架构的C/C++/Objective-C编译器的前端

相比于GCC，Clang具有以下优点：

编译速度快：在某些平台上，Clang的编译速度显著的快过GCC【Debug下编译OC速度比GCC快3倍】
占用内存小：Clang生成的AST所占用的内存是GCC的五分之一左右
设计清晰简单，容易理解，易于扩展增强
模块化设计：Clang采用基于库的模块化设计，易于 IDE 集成及其他用途的重用

3.2 Clang与LLVM区别

广义的LLVM：整个LLVM架构

狭义的LLVM：LLVM后端【代码优化、目标代码生成等】，详细看下面：

四、编译

4.1 编译过程

编译器分为前端和后端**【Objective-C/C/C++使用的编译器前端是clang,后端是LLVM】**

前端负责词法分析，语法分析，生成中间代码；
后端以中间代码作为输入，进行架构无关的代码优化，紧接着对不同架构生成不同的机器码

下面是编译过程的流程图**【请认真看下】**

下面是流程图的详细内容：

预处理器：Clang会预处理代码，比如把宏对应到相应的位置、删除注释，条件编译被处理等；
词法分析：词法分析器读入源文件的字节流，将其组织成词素序列，对于每个词素，词法分析器产生词法单元【token】作为输出，并且会用Loc来记录位置；
语法分析：此步将词法分析生成的标记流，解析成抽象语法树【AST】同样，在此环节中每一节点标记了源码中的位置【AST是抽象语法树，结构上比代码更精简，遍历起来更快，所以使用AST能够更快速地进行静态检查】
静态分析：将源码转化为抽象语法树后，编译器可以对这个语法树进行静态分。静态分析会对代码进行错误检查，如定义但是未使用的变量等，以此来提高代码质量。最后AST会生成IR，IR是一种更接近机器码的语言，区别在于和平台无关，通过IR可以生成多份适合不同平台的机器码；【静态分析的阶段会进行类型检查，比如给属性设置一个与其自身类型不相符的对象，编译器会给出一个可能使用不正确的警告】
中间代码生成和优化：此阶段LLVM会对代码进行编译优化；例如全局变量优化、循环优化，尾递归优化等，最后输出汇编代码xx.ll文件；
链接：连接器将编译产生的.o文件和（dylib,a,tbd）文件，生成一个Mach-O文件。Mach-O文件级可执行文件。编译过程全部结束，生成了可执行文件Mach-O

4.2 实战

demo代码如下

#import <Foundation/Foundation.h>

#define aa 10
int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        NSObject *obj = [[NSObject alloc] init];
        id __weak obj1 = obj;
        NSLog(@"------%@--%d--",[obj1 class],aa);
        
    }
    return 0;
}

4.2.1 预处理【preprocessor】

进入到工程目录下

预处理命令：

xcrun clang -E main.m

生成代码如下：【预处理的时候，注释被删除，条件编译被处理，宏定义被放入对应的位置】

int main(int argc, const char * argv[]) {
    @autoreleasepool {

        NSObject *obj = [[NSObject alloc] init];
        id __attribute__((objc_ownership(weak))) obj1 = obj;
        NSLog(@"------%@--%d--",[obj1 class],10);

    }
    return 0;
}

4.2.2 词法分析【lexical anaysis】

主要任务：词法分析器读入源文件的字节流，将其组织成词素序列，对于每个词素，词法分析器产生词法单元【token】作为输出，并且会用Loc来记录位置。

使用命令：

xcrun clang -fmodules -fsyntax-only -Xclang -dump-tokens main.m

运行出来结果：

annot_module_include '#import <Foundation/Foundation.h>

#define aa 10
int main(int argc, const char * argv[]) {
    @autoreleasep'  Loc=<main.m:8:1>
int 'int'  [StartOfLine] Loc=<main.m:11:1>
identifier 'main'  [LeadingSpace] Loc=<main.m:11:5>
l_paren '('  Loc=<main.m:11:9>
int 'int'  Loc=<main.m:11:10>
identifier 'argc'  [LeadingSpace] Loc=<main.m:11:14>
comma ','  Loc=<main.m:11:18>
const 'const'  [LeadingSpace] Loc=<main.m:11:20>
char 'char'  [LeadingSpace] Loc=<main.m:11:26>
star '*'  [LeadingSpace] Loc=<main.m:11:31>
identifier 'argv'  [LeadingSpace] Loc=<main.m:11:33>
l_square '['  Loc=<main.m:11:37>
r_square ']'  Loc=<main.m:11:38>
r_paren ')'  Loc=<main.m:11:39>
l_brace '{'  [LeadingSpace] Loc=<main.m:11:41>
at '@'  [StartOfLine] [LeadingSpace] Loc=<main.m:12:5>
identifier 'autoreleasepool'  Loc=<main.m:12:6>
l_brace '{'  [LeadingSpace] Loc=<main.m:12:22>
identifier 'NSObject'  [StartOfLine] [LeadingSpace] Loc=<main.m:14:9>
star '*'  [LeadingSpace] Loc=<main.m:14:18>
identifier 'obj'  Loc=<main.m:14:19>
equal '='  [LeadingSpace] Loc=<main.m:14:23>
l_square '['  [LeadingSpace] Loc=<main.m:14:25>
l_square '['  Loc=<main.m:14:26>
identifier 'NSObject'  Loc=<main.m:14:27>
identifier 'alloc'  [LeadingSpace] Loc=<main.m:14:36>
r_square ']'  Loc=<main.m:14:41>
identifier 'init'  [LeadingSpace] Loc=<main.m:14:43>
r_square ']'  Loc=<main.m:14:47>
semi ';'  Loc=<main.m:14:48>
identifier 'id'  [StartOfLine] [LeadingSpace] Loc=<main.m:15:9>
__attribute '__attribute__'  [LeadingSpace] Loc=<main.m:15:12 <Spelling=<built-in>:329:16>>
l_paren '('  Loc=<main.m:15:12 <Spelling=<built-in>:329:29>>
l_paren '('  Loc=<main.m:15:12 <Spelling=<built-in>:329:30>>
identifier 'objc_ownership'  Loc=<main.m:15:12 <Spelling=<built-in>:329:31>>
l_paren '('  Loc=<main.m:15:12 <Spelling=<built-in>:329:45>>
identifier 'weak'  Loc=<main.m:15:12 <Spelling=<built-in>:329:46>>
r_paren ')'  Loc=<main.m:15:12 <Spelling=<built-in>:329:50>>
r_paren ')'  Loc=<main.m:15:12 <Spelling=<built-in>:329:51>>
r_paren ')'  Loc=<main.m:15:12 <Spelling=<built-in>:329:52>>
identifier 'obj1'  [LeadingSpace] Loc=<main.m:15:19>
equal '='  [LeadingSpace] Loc=<main.m:15:24>
identifier 'obj'  [LeadingSpace] Loc=<main.m:15:26>
semi ';'  Loc=<main.m:15:29>
identifier 'NSLog'  [StartOfLine] [LeadingSpace] Loc=<main.m:16:9>
l_paren '('  Loc=<main.m:16:14>
at '@'  Loc=<main.m:16:15>
string_literal '"------%@--%d--"'  Loc=<main.m:16:16>
comma ','  Loc=<main.m:16:32>
l_square '['  Loc=<main.m:16:33>
identifier 'obj1'  Loc=<main.m:16:34>
identifier 'class'  [LeadingSpace] Loc=<main.m:16:39>
r_square ']'  Loc=<main.m:16:44>
comma ','  Loc=<main.m:16:45>
numeric_constant '10'  Loc=<main.m:16:46 <Spelling=main.m:10:12>>
r_paren ')'  Loc=<main.m:16:48>
semi ';'  Loc=<main.m:16:49>
r_brace '}'  [StartOfLine] [LeadingSpace] Loc=<main.m:18:5>
return 'return'  [StartOfLine] [LeadingSpace] Loc=<main.m:19:5>
numeric_constant '0'  [LeadingSpace] Loc=<main.m:19:12>
semi ';'  Loc=<main.m:19:13>
r_brace '}'  [StartOfLine] Loc=<main.m:20:1>
eof ''  Loc=<main.m:20:2>

4.3 中间代码生成和优化

主要任务：此阶段LLVM会对代码进行编译优化；例如全局变量优化、循环优化，尾递归优化等，最后输出汇编代码xx.ll文件

使用命令：

clang -O3 -S -emit-llvm main.m -o main.ll

在main.m中多个文件

查看文件内容：

; ModuleID = 'main.m'
source_filename = "main.m"
target datalayout = "e-m:o-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
target triple = "x86_64-apple-macosx10.15.0"

%struct._class_t = type { %struct._class_t*, %struct._class_t*, %struct._objc_cache*, i8* (i8*, i8*)**, %struct._class_ro_t* }
%struct._objc_cache = type opaque
%struct._class_ro_t = type { i32, i32, i32, i8*, i8*, %struct.__method_list_t*, %struct._objc_protocol_list*, %struct._ivar_list_t*, i8*, %struct._prop_list_t* }
%struct.__method_list_t = type { i32, i32, [0 x %struct._objc_method] }
%struct._objc_method = type { i8*, i8*, i8* }
%struct._objc_protocol_list = type { i64, [0 x %struct._protocol_t*] }
%struct._protocol_t = type { i8*, i8*, %struct._objc_protocol_list*, %struct.__method_list_t*, %struct.__method_list_t*, %struct.__method_list_t*, %struct.__method_list_t*, %struct._prop_list_t*, i32, i32, i8**, i8*, %struct._prop_list_t* }
%struct._ivar_list_t = type { i32, i32, [0 x %struct._ivar_t] }
%struct._ivar_t = type { i64*, i8*, i8*, i32, i32 }
%struct._prop_list_t = type { i32, i32, [0 x %struct._prop_t] }
%struct._prop_t = type { i8*, i8* }
%struct.__NSConstantString_tag = type { i32*, i32, i8*, i64 }

@"OBJC_CLASS_$_NSObject" = external global %struct._class_t
@"OBJC_CLASSLIST_REFERENCES_$_" = internal global %struct._class_t* @"OBJC_CLASS_$_NSObject", section "__DATA,__objc_classrefs,regular,no_dead_strip", align 8
@__CFConstantStringClassReference = external global [0 x i32]
@.str = private unnamed_addr constant [15 x i8] c"------%@--%d--\00", section "__TEXT,__cstring,cstring_literals", align 1
@_unnamed_cfstring_ = private global %struct.__NSConstantString_tag { i32* getelementptr inbounds ([0 x i32], [0 x i32]* @__CFConstantStringClassReference, i32 0, i32 0), i32 1992, i8* getelementptr inbounds ([15 x i8], [15 x i8]* @.str, i32 0, i32 0), i64 14 }, section "__DATA,__cfstring", align 8 #0
@llvm.compiler.used = appending global [1 x i8*] [i8* bitcast (%struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_" to i8*)], section "llvm.metadata"
; Function Attrs: ssp uwtable
define i32 @main(i32 %0, i8** nocapture readnone %1) local_unnamed_addr #1 {
  %3 = tail call i8* @llvm.objc.autoreleasePoolPush() #2
  %4 = load i8*, i8** bitcast (%struct._class_t** @"OBJC_CLASSLIST_REFERENCES_$_" to i8**), align 8
  %5 = tail call i8* @objc_alloc_init(i8* %4)
  %6 = tail call i8* @objc_opt_class(i8* %5)
  notail call void (i8*, ...) @NSLog(i8* bitcast (%struct.__NSConstantString_tag* @_unnamed_cfstring_ to i8*), i8* %6, i32 10)
  tail call void @llvm.objc.autoreleasePoolPop(i8* %3)
  ret i32 0
}
; Function Attrs: nounwind
declare i8* @llvm.objc.autoreleasePoolPush() #2
declare i8* @objc_alloc_init(i8*) local_unnamed_addr
declare void @NSLog(i8*, ...) local_unnamed_addr #3
declare i8* @objc_opt_class(i8*) local_unnamed_addr
; Function Attrs: nounwind
declare void @llvm.objc.autoreleasePoolPop(i8*) #2

attributes #0 = { "objc_arc_inert" }
attributes #1 = { ssp uwtable "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "min-legal-vector-width"="0" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #2 = { nounwind }
attributes #3 = { "correctly-rounded-divide-sqrt-fp-math"="false" "darwin-stkchk-strong-link" "disable-tail-calls"="false" "frame-pointer"="all" "less-precise-fpmad"="false" "no-infs-fp-math"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "no-trapping-math"="false" "probe-stack"="___chkstk_darwin" "stack-protector-buffer-size"="8" "target-cpu"="penryn" "target-features"="+cx16,+cx8,+fxsr,+mmx,+sahf,+sse,+sse2,+sse3,+sse4.1,+ssse3,+x87" "unsafe-fp-math"="false" "use-soft-float"="false" }

!llvm.module.flags = !{!0, !1, !2, !3, !4, !5, !6, !7}
!llvm.ident = !{!8}

!0 = !{i32 2, !"SDK Version", [3 x i32] [i32 10, i32 15, i32 6]}
!1 = !{i32 1, !"Objective-C Version", i32 2}
!2 = !{i32 1, !"Objective-C Image Info Version", i32 0}
!3 = !{i32 1, !"Objective-C Image Info Section", !"__DATA,__objc_imageinfo,regular,no_dead_strip"}
!4 = !{i32 4, !"Objective-C Garbage Collection", i32 0}
!5 = !{i32 1, !"Objective-C Class Properties", i32 64}
!6 = !{i32 1, !"wchar_size", i32 4}
!7 = !{i32 7, !"PIC Level", i32 2}
!8 = !{!"Apple clang version 12.0.0 (clang-1200.0.32.2)"}

4.4 生成汇编

使用命令：

xcrun clang -S -o - main.m | open -f

查看文件内容：

	.section	__TEXT,__text,regular,pure_instructions
	.build_version macos, 10, 15	sdk_version 10, 15, 6
	.globl	_main                   ## -- Begin function main
	.p2align	4, 0x90
_main:                                  ## @main
	.cfi_startproc
## %bb.0:
	pushq	%rbp
	.cfi_def_cfa_offset 16
	.cfi_offset %rbp, -16
	movq	%rsp, %rbp
	.cfi_def_cfa_register %rbp
	subq	$32, %rsp
	movl	$0, -4(%rbp)
	movl	%edi, -8(%rbp)
	movq	%rsi, -16(%rbp)
	callq	_objc_autoreleasePoolPush
	movq	_OBJC_CLASSLIST_REFERENCES_$_(%rip), %rcx
	movq	%rcx, %rdi
	movq	%rax, -32(%rbp)         ## 8-byte Spill
	callq	_objc_alloc_init
	movq	%rax, -24(%rbp)
	movq	-24(%rbp), %rax
	movq	%rax, %rdi
	callq	_objc_opt_class
	leaq	L__unnamed_cfstring_(%rip), %rcx
	movq	%rcx, %rdi
	movq	%rax, %rsi
	movl	$10, %edx
	movb	$0, %al
	callq	_NSLog
	movq	-32(%rbp), %rdi         ## 8-byte Reload
	callq	_objc_autoreleasePoolPop
	xorl	%eax, %eax
	addq	$32, %rsp
	popq	%rbp
	retq
	.cfi_endproc
                                        ## -- End function
	.section	__DATA,__objc_classrefs,regular,no_dead_strip
	.p2align	3               ## @"OBJC_CLASSLIST_REFERENCES_$_"
_OBJC_CLASSLIST_REFERENCES_$_:
	.quad	_OBJC_CLASS_$_NSObject

	.section	__TEXT,__cstring,cstring_literals
L_.str:                                 ## @.str
	.asciz	"------%@--%d--"

	.section	__DATA,__cfstring
	.p2align	3               ## @_unnamed_cfstring_
L__unnamed_cfstring_:
	.quad	___CFConstantStringClassReference
	.long	1992                    ## 0x7c8
	.space	4
	.quad	L_.str
	.quad	14                      ## 0xe

	.section	__DATA,__objc_imageinfo,regular,no_dead_strip
L_OBJC_IMAGE_INFO:
	.long	0
	.long	64

.subsections_via_symbols

汇编器以汇编代码作为输入，将汇编代码转换为机器代码，最后输出目标文件(object file)。

再次使用命令：

xcrun clang -fmodules -c main.m -o main.o

发现在目录下多个文件main.o

main.o文件内容：

4.5 链接

连接器把编译产生的.o文件和（dylib,a,tbd）文件，生成一个mach-o文件。

使用命令：

xcrun clang main.o -o main

生成了一个可执行的二进制mach-o文件

上面就是App在编译的时候【xcode - build】所做的事情，嘿嘿，下面进行Next Part

五、链接器

官宣：链接器最主要的作用，就是将符号绑定到地址上。

5.1 编译时链接器任务

链接器在编译时的作用是：就是完成变量、函数符号和其它绑定这样的任务【也就是变量名和函数名】

Mach-O文件里面的内容主要是代码和数据：代码是函数的定义；数据是全局变量的定义，包括全局变量的初始值。不管是代码还是数据，它们的实例都需要由符号将其关联起来。

==>通过上面讲述，可能会有一个疑问：为什么要让链接器做符号和地址绑定的呢？如果不绑定会出现什么问题呢？

如果地址和符号不做绑定的话，要让机器知道你在操作什么内存地址，你就需要在写代码的时候告诉每个指令的内存地址。但是这样做的结果是可读性和可维护性非常差，比如以后你要修改或者维护会让研发人员崩溃的，而这种崩溃的罪魁祸首是代码和内存地址绑定得太早。解决这个问题，首先想到的是用汇编语言来让这种绑定滞后。随着编程语言的进化，我们很快就发现，采用任何一种高级编程语言，都可以解决代码和内存绑定过早产生的问题，同时还能扫掉使用汇编写程序的烦恼，所以放在了链接器中做。

==>通过上面讲述，可能还会有一个疑问：链接器为什么还要把项目中的多个Mach-O文件合并成一个？

项目中文件之间的变量和接口函数都是相互依赖的，所以这时我们就需要通过链接器将项目中生成的多个 Mach-O 文件的符号和地址绑定起来。没有这个绑定过程的话，单个文件生成的 Mach-O 文件是无法正常运行起来的。因为，如果运行时碰到调用在其他文件中实现的函数的情况时，就会找不到这个调用函数的地址，从而无法继续执行。链接器在链接多个目标文件的过程中，会创建一个符号表，用于记录所有已定义的和所有未定义的符号。链接时如果出现相同符号的情况，就会出现“ld: dumplicate symbols”的错误信息；如果在其他目标文件里没有找到符号，就会提示“Undefined symbols”的错误信息。

通过上面的讲述，总结一下链接器对代码主要做了哪几件事情？

去项目文件里查找目标代码文件里没有定义的变量。
扫描项目中的不同文件，将所有符号定义和引用地址收集起来，并放到全局符号表中。
计算合并后长度及位置，生成同类型的段进行合并，建立绑定。
对项目中不同文件里的变量进行地址重定位。

链接器在整理函数的符号调用关系时，就可以帮你理清有哪些函数是没被调用的，并自动去除掉。那这是怎么实现的呢？

链接器在整理函数的调用关系时，会以 main 函数为源头，跟随每个引用，并将其标记为 live。跟随完成后，那些未被标记 live 的函数，就是无用函数。然后，链接器可以通过打开 Dead code stripping 开关，来开启自动去除无用代码的功能。并且，这个开关是默认开启的。

5.2 动态库链接Dyld-连接器的另一大作用

在真实的 iOS 开发中，你会发现很多功能都是现成可用的，不光你能够用，其他 App 也在用，比如 GUI 框架、I/O、网络等。链接这些共享库到你的 Mach-O 文件，也是通过链接器来完成的。

链接的共用库分为静态库和动态库：静态库是编译时链接的库，需要链接进你的 Mach-O 文件里，如果需要更新就要重新编译一次，无法动态加载和更新；而动态库是运行时链接的库，使用 dyld 就可以实现动态加载。

Mach-O 文件是编译后的产物，而动态库在运行时才会被链接，并没参与 Mach-O 文件的编译和链接，所以 Mach-O 文件中并没有包含动态库里的符号定义。也就是说，这些符号会显示为“未定义”，但它们的名字和对应的库的路径会被记录下来。运行时通过 dlopen 和 dlsym 导入动态库时，先根据记录的库路径找到对应的库，再通过记录的名字符号找到绑定的地址。

dlopen 会把共享库载入运行进程的地址空间，载入的共享库也会有未定义的符号，这样会触发更多的共享库被载入。dlopen 也可以选择是立刻解析所有引用还是滞后去做。dlopen 打开动态库后返回的是引用的指针，dlsym 的作用就是通过 dlopen 返回的动态库指针和函数符号，得到函数的地址然后使用。

加载过程开始会修正地址偏移，iOS 会用 ASLR 来做地址偏移避免攻击，确定 Non-Lazy Pointer 地址进行符号地址绑定，加载所有类，最后执行 load 方法和 Clang Attribute 的 constructor 修饰函数。

5.3 Dyld链接实战

demo代码如下

#import <Foundation/Foundation.h>
@interface Person : NSObject
- (void)eat;
@end

#import "Person.h"
@implementation Person
- (void)eat {
    NSLog(@"吃了苹果");
}
@end

#import <Foundation/Foundation.h>
#import "Person.h"

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        Person *person = [[Person alloc]init];
        [person eat];
    }
    return 0;
}

5.3.1 编译多个文件

使用命令:

xcrun clang -c Person.m
xcrun clang -c main.m

通过上面命令：目录下多生出两个文件main.o和Person.o

5.3.2 将编译后的文件链接起来，生成a.out可执行文件

使用命令：

xcrun clang main.o Person.o -Wl,`xcrun —show-sdk-path`/System/Library/Frameworks/Foundation.framework/Foundation

生成的文件如下：

通过命令

xcrun nm -nm a.out

查看a.out内容如下图

(undefined) external _NSLog (from Foundation)
                 (undefined) external _OBJC_CLASS_$_NSObject (from libobjc)
                 (undefined) external _OBJC_METACLASS_$_NSObject (from libobjc)
                 (undefined) external ___CFConstantStringClassReference (from CoreFoundation)
                 (undefined) external __objc_empty_cache (from libobjc)
                 (undefined) external _objc_alloc_init (from libobjc)
                 (undefined) external _objc_autoreleasePoolPop (from libobjc)
                 (undefined) external _objc_autoreleasePoolPush (from libobjc)
                 (undefined) external _objc_msgSend (from libobjc)
                 (undefined) external dyld_stub_binder (from libSystem)
0000000100000000 (__TEXT,__text) [referenced dynamically] external __mh_execute_header
0000000100003ec0 (__TEXT,__text) external _main
0000000100003f20 (__TEXT,__text) non-external -[Person eat]
0000000100008020 (__DATA,__objc_const) non-external __OBJC_METACLASS_RO_$_Person
0000000100008068 (__DATA,__objc_const) non-external __OBJC_$_INSTANCE_METHODS_Person
0000000100008088 (__DATA,__objc_const) non-external __OBJC_CLASS_RO_$_Person
00000001000080e0 (__DATA,__objc_data) external _OBJC_METACLASS_$_Person
0000000100008108 (__DATA,__objc_data) external _OBJC_CLASS_$_Person
0000000100008130 (__DATA,__data) non-external __dyld_private

因为 undefined 符号表示的是该文件类未定义，所以在目标文件和 Foundation framework 动态库做链接处理时，链接器会尝试解析所有的 undefined 符号。

dylib 这种格式，表示是动态链接的，编译的时候不会被编译到执行文件中，在程序执行的时候才 link，这样就不用算到包大小里，而且不更新执行程序就能够更新库。

关于动态链接器的作用顺序是怎样的，可以看这篇博客【Dyld Linking】

简单的说，dyld做了几件事如下：

先执行 Mach-O 文件，根据 Mach-O 文件里 undefined 的符号加载对应的动态库，系统会设置一个共享缓存来解决加载的递归依赖问题；
加载后，将 undefined 的符号绑定到动态库里对应的地址上；
最后再处理 +load 方法，main 函数返回后运行 static terminator。

5.4 动态链接器实际应用-编译调试的提速

iOS 原生代码的编译调试，都是通过一遍又一遍地编译重启 App 来进行的。所以，项目代码量越大，编译时间就越长。虽然我们可以通过将部分代码先编译成二进制集成到工程里，来避免每次都全量编译来加快编译速度，但即使这样，每次编译都还是需要重启 App，需要再走一遍调试流程。下面是一个工具加速编译调试速度。

Injection for Xcode 【工具地址】

John Holdsworth 开发了一个叫作 Injection 的工具可以动态地将 Swift 或 Objective-C 的代码在已运行的程序中执行，以加快调试速度，同时保证程序不用重启。

使用方式就是 clone 下代码，构建 InjectionPluginLite/InjectionPlugin.xcodeproj ；删除方式是，在终端里运行下面这行代码：

rm -rf ~/Library/Application\ Support/Developer/Shared/Xcode/Plug-ins/InjectionPlugin.xcplugin

构建完成后，我们就可以编译项目。这时添加一个新的方法：

- (void)injected
{
    NSLog(@"I've been injected: %@", self);
}

njection 会监听源代码文件的变化，如果文件被改动了，Injection Server 就会执行 rebuildClass 重新进行编译、打包成动态库，也就是 .dylib 文件。编译、打包成动态库后使用 writeSting 方法通过 Socket 通知运行的 App。writeString 的代码如下：

- (BOOL)writeString:(NSString *)string {
    const char *utf8 = string.UTF8String;
    uint32_t length = (uint32_t)strlen(utf8);
    if (write(clientSocket, &length, sizeof length) != sizeof length ||
        write(clientSocket, utf8, length) != length)
        return FALSE;
    return TRUE;
}

Server 会在后台发送和监听 Socket 消息，实现逻辑在 InjectionServer.mm 的 runInBackground 方法里。Client 也会开启一个后台去发送和监听 Socket 消息，实现逻辑在 InjectionClient.mm里的 runInBackground 方法里。

Client 接收到消息后会调用 inject(tmpfile: String) 方法，运行时进行类的动态替换。inject(tmpfile: String) 方法的具体实现代码，具体代码在这

inject(tmpfile: String) 方法的代码大部分都是做新类动态替换旧类。inject(tmpfile: String) 的入参 tmpfile 是动态库的文件路径，那么这个动态库是如何加载到可执行文件里的呢？具体的实现在 inject(tmpfile: String) 方法开始里，如下：

let newClasses = try SwiftEval.instance.loadAndInject(tmpfile: tmpfile)

看下 SwiftEval.instance.loadAndInject(tmpfile: tmpfile) 这个方法的代码实现：

@objc func loadAndInject(tmpfile: String, oldClass: AnyClass? = nil) throws -> [AnyClass] {

    print("???? Loading .dylib - Ignore any duplicate class warning...")
    // load patched .dylib into process with new version of class
    guard let dl = dlopen("\(tmpfile).dylib", RTLD_NOW) else {
        throw evalError("dlopen() error: \(String(cString: dlerror()))")
    }
    print("???? Loaded .dylib - Ignore any duplicate class warning...")

    if oldClass != nil {
        // find patched version of class using symbol for existing

        var info = Dl_info()
        guard dladdr(unsafeBitCast(oldClass, to: UnsafeRawPointer.self), &info) != 0 else {
            throw evalError("Could not locate class symbol")
        }

        debug(String(cString: info.dli_sname))
        guard let newSymbol = dlsym(dl, info.dli_sname) else {
            throw evalError("Could not locate newly loaded class symbol")
        }

        return [unsafeBitCast(newSymbol, to: AnyClass.self)]
    }
    else {
        // grep out symbols for classes being injected from object file

        try injectGenerics(tmpfile: tmpfile, handle: dl)

        guard shell(command: """
            \(xcodeDev)/Toolchains/XcodeDefault.xctoolchain/usr/bin/nm \(tmpfile).o | grep -E ' S _OBJC_CLASS_\\$_| _(_T0|\\$S).*CN$' | awk '{print $3}' >\(tmpfile).classes
            """) else {
            throw evalError("Could not list class symbols")
        }
        guard var symbols = (try? String(contentsOfFile: "\(tmpfile).classes"))?.components(separatedBy: "\n") else {
            throw evalError("Could not load class symbol list")
        }
        symbols.removeLast()

        return Set(symbols.flatMap { dlsym(dl, String($0.dropFirst())) }).map { unsafeBitCast($0, to: AnyClass.self) }

通过上面代码中，看到了动态库加载函数dlopen

guard let dl = dlopen("\(tmpfile).dylib", RTLD_NOW) else {
    throw evalError("dlopen() error: \(String(cString: dlerror()))")
}

如上代码所示，dlopen 会把 tmpfile 动态库文件载入运行的 App 里，返回指针 dl。接下来，dlsym 会得到 tmpfile 动态库的符号地址，然后就可以处理类的替换工作了。dlsym 调用对应代码如下：

guard let newSymbol = dlsym(dl, info.dli_sname) else {
    throw evalError("Could not locate newly loaded class symbol")
}

当类的方法都被替换后，我们就可以开始重新绘制界面了。整个过程无需重新编译和重启 App，至此使用动态库方式极速调试的目的就达成了。

总结：通过上面的文案讲述和代码演示，可以将Injection 的工作原理画出来，如下：

总结

今天这篇文章，详细分享了编译器和链接器基本内容和应用场景，通过自己不断的对底层只是的了解和打好了底层知识的基础以后，才可以利用它们去提高开发效率，为用户提供更稳定、性能更好的 App 。

本篇博客是App启动底层方面的第三篇博客，希望对大家有所帮助，也感谢大家的点赞作品及关注本人，共同进步，共勉！！！

机会❤️❤️❤️🌹🌹🌹

如果想和我一起共建抖音，成为一名bytedancer，Come on。期待你的加入！！！

截屏2022-06-08 下午6.09.11.png

LLVM+Clang+编译器+链接器--保值【进阶之路二】

前序

背景及问题

一、准备前戏

1.1 编译语言与解释语言

1.2 编译器与解释器

二、LLVM

2.1 LLVM 前阵

2.2 架构

三、Clang

3.1 Clang前阵

3.2 Clang与LLVM区别

四、编译

4.1 编译过程

4.2 实战

五、链接器

5.1 编译时链接器任务

5.2 动态库链接Dyld-连接器的另一大作用

5.3 Dyld链接实战

5.4 动态链接器实际应用-编译调试的提速

总结

机会❤️❤️❤️🌹🌹🌹