个人认为,事务的分析可以从三个方向来考虑:设计角度、过程角度和结果角度。
从App开发来看的话,设计角度是指站在App设计的角度上,分析需求、细化功能模块、设计功能模块之间的联系;过程角度是站在开发者的角度上,执行的具体的开发过程;而结果角度是指站在第三方的角度上,分析编译产物的过程。
本篇文章作为Mach-O文件入门文章,主要介绍Mach-O的文件结构。
Mach-O(Mach Object)文件格式,它是一种用于可运行文件、目标代码、动态库、内核转储的文件格式。作为.out格式的替代方案,Mach-O提供了更强的扩展性,并提升了符号表中信息的访问速度。
它主要用于OS X和iOS系统,类似于Windows的PE格式或者Linux的ELF格式。
了解Mach-O文件格式,有助于理解Xcode如何基于Mach-O运行,有助于理解更加底层debug方式。
文件类型
- Executable:应用程序的二进制文件
- Dylib:动态库。类似于Windows的动态链接库(DLL)
- Bundle:一种不会进行链接的动态库。只能在程序运行时,通过
dlopen()打开,比如:Mac OS的插件 - Image:镜像。用来代指Executable、Dylib、Bundle类型
- Framework:带有资源和头文件的动态库
文件结构
Mach-O文件由Header、Load commands、Segment组成。下图是苹果提供的Mach-O结构图:
图出自苹果的osx-abi-macho-file-format-reference,但苹果官网已经找不到相关地址。可以参考github上copy版本。
本文以arm64结构为探讨对象。
下图是Mach-O文件的具体结构图:
Header
Mach-O文件的开头就是Header。她用来描述文件类型、文件支持的架构、Load commands个数以及空间大小等信息。
文件位置:#import <mach-o/loader.h>
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
字段信息很多都是见名知意的。比较特殊的是filetype字段,比较常用的格式为MH_EXECUTE MH_DYLIB MH_DYLINKER MH_BUNDLE。以下为详细的可选值。
文件位置:#import <mach-o/loader.h>
#define MH_OBJECT 0x1 /* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3 /* fixed VM shared library file */
#define MH_CORE 0x4 /* core file */
#define MH_PRELOAD 0x5 /* preloaded executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library */
#define MH_DYLINKER 0x7 /* dynamic link editor */
#define MH_BUNDLE 0x8 /* dynamically bound bundle file */
#define MH_DYLIB_STUB 0x9 /* shared library stub for static */
/* linking only, no section contents */
#define MH_DSYM 0xa /* companion file with only debug */
/* sections */
#define MH_KEXT_BUNDLE 0xb /* x86_64 kexts */
Load commands
从文件结构上看,Header后紧跟的就是Load commands。主要作用是描述文件布局和链接信息。可以把她看做是存储着struct load_command类型的数组。
具体功能如下:
- 文件在虚拟内存中的初始布局
- 符号表的位置
- 主程序引用的共享库
struct load_command类型
文件位置:#import <mach-o/loader.h>
/*
* The load commands directly follow the mach_header. The total size of all
* of the commands is given by the sizeofcmds field in the mach_header. All
* load commands must have as their first two fields cmd and cmdsize. The cmd
* field is filled in with a constant for that command type. Each command type
* has a structure specifically for it. The cmdsize field is the size in bytes
* of the particular load command structure plus anything that follows it that
* is a part of the load command (i.e. section structures, strings, etc.). To
* advance to the next load command the cmdsize can be added to the offset or
* pointer of the current load command. The cmdsize for 32-bit architectures
* MUST be a multiple of 4 bytes and for 64-bit architectures MUST be a multiple
* of 8 bytes (these are forever the maximum alignment of any load commands).
* The padded bytes must be zero. All tables in the object file must also
* follow these rules so the file can be memory mapped. Otherwise the pointers
* to these tables will not work well or at all on some machines. With all
* padding zeroed like objects will compare byte for byte.
*/
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};
从注释中,可以发现cmd不仅决定了决定了Load command的类型,并且决定了Load command的整体结构,也就是说struct load_command只是基类结构体,真实的类型需要会根据cmd的数值,匹配真实的数据结构。
比如:比较常用的类型LC_SEGMENT_64(下边会说__Text、__DATA、__LINKEDIT类型就是这种类型),她的数据结构为
文件位置:#import <mach-o/loader.h>
/*
* The 64-bit segment load command indicates that a part of this file is to be
* mapped into a 64-bit task's address space. If the 64-bit segment has
* sections then section_64 structures directly follow the 64-bit segment
* command and their size is reflected in cmdsize.
*/
struct segment_command_64 { /* for 64-bit architectures */
uint32_t cmd; /* LC_SEGMENT_64 */
uint32_t cmdsize; /* includes sizeof section_64 structs */
char segname[16]; /* segment name */
uint64_t vmaddr; /* memory address of this segment */
uint64_t vmsize; /* memory size of this segment */
uint64_t fileoff; /* file offset of this segment */
uint64_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
其它cmd类型的信息可以查阅loader.h文件,可以关注以
LC_开头的宏定义。
对于LC_SEGMENT_64类型,需要特别注意。那就是她后边可以跟若干struct section_64,用来指明包含Section信息。在这种情况下cmdsize不仅表明struct segment_command_64占用的字节长度,还包含这些Section信息的长度;另外就是nsects字段标识了Section的具体个数。
Section的数据结构为:
文件位置:#import <mach-o/loader.h>
/*
* A segment is made up of zero or more sections. Non-MH_OBJECT files have
* all of their segments with the proper sections in each, and padded to the
* specified segment alignment when produced by the link editor. The first
* segment of a MH_EXECUTE and MH_FVMLIB format file contains the mach_header
* and load commands of the object file before its first section. The zero
* fill sections are always last in their segment (in all formats). This
* allows the zeroed segment padding to be mapped into memory where zero fill
* sections might be. The gigabyte zero fill sections, those with the section
* type S_GB_ZEROFILL, can only be in a segment with sections of this type.
* These segments are then placed after all other segments.
*
* The MH_OBJECT format has all of its sections in one segment for
* compactness. There is no padding to a specified segment boundary and the
* mach_header and load commands are not part of the segment.
*
* Sections with the same section name, sectname, going into the same segment,
* segname, are combined by the link editor. The resulting section is aligned
* to the maximum alignment of the combined sections and is the new section's
* alignment. The combined sections are aligned to their original alignment in
* the combined section. Any padded bytes to get the specified alignment are
* zeroed.
*
* The format of the relocation entries referenced by the reloff and nreloc
* fields of the section structure for mach object files is described in the
* header file <reloc.h>.
*/
struct section_64 { /* for 64-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint64_t addr; /* memory address of this section */
uint64_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
uint32_t reserved3; /* reserved */
};
Segment
Mach-O中,除了上边说的Header和Load command以外,剩下的就是Segment的。
Segment的对齐方式必须是内存的page size的整数倍。主要和虚拟内存的使用有关。
arm64为16KB
其它平台下为4KB
常用的Segment
__TEXT:代码段,保存了程序二进制数据。只有只读权限。
__DATA:数据段,保存了全局变量、静态变量。具有可读可写权限。
__LINKEDIT:保存了加载数据的方式。有符号表、间接符号表、字符串表等。