iOS源码解析: 聊一聊iOS中的hook方案

7,839 阅读32分钟

iOS中通常使用runtime来对OC方法进行hook,runtime不能用于C语言函数。而fishhook堪称iOS中的hook神器,可以对C语言函数进行hook。这篇博客对这两种hook方案以及一些三方库的源码进来了解析。

使用runtime对OC方法进行hook

runtime提供了两个函数用于实现OC方法hook,class_replaceMethod和method_exchangeImplementations。

/** 
 * Replaces the implementation of a method for a given class.
 * 
 * @param cls The class you want to modify.
 * @param name A selector that identifies the method whose implementation you want to replace.
 * @param imp The new implementation for the method identified by name for the class identified by cls.
 * @param types An array of characters that describe the types of the arguments to the method. 
 *  Since the function must take at least two arguments—self and _cmd, the second and third characters
 *  must be “@:” (the first character is the return type).
 * 
 * @return The previous implementation of the method identified by \e name for the class identified by \e cls.
 * 
 * @note This function behaves in two different ways:
 *  - If the method identified by \e name does not yet exist, it is added as if \c class_addMethod were called. 
 *    The type encoding specified by \e types is used as given.
 *  - If the method identified by \e name does exist, its \c IMP is replaced as if \c method_setImplementation were called.
 *    The type encoding specified by \e types is ignored.
 */
OBJC_EXPORT IMP _Nullable
class_replaceMethod(Class _Nullable cls, SEL _Nonnull name, IMP _Nonnull imp, 
                    const char * _Nullable types) 
/** 
 * Exchanges the implementations of two methods.
 * 
 * @param m1 Method to exchange with second method.
 * @param m2 Method to exchange with first method.
 * 
 * @note This is an atomic version of the following:
 *  \code 
 *  IMP imp1 = method_getImplementation(m1);
 *  IMP imp2 = method_getImplementation(m2);
 *  method_setImplementation(m1, imp2);
 *  method_setImplementation(m2, imp1);
 *  \endcode
 */
OBJC_EXPORT void
method_exchangeImplementations(Method _Nonnull m1, Method _Nonnull m2) 
    OBJC_AVAILABLE(10.5, 2.0, 9.0, 1.0, 2.0);

举个例子,通过如下代码,可以对UIButton的sendAction:to:forEvent:方法进行hook,添加一些自定义的逻辑。

#import <objc/runtime.h>

@implementation UIButton (MyHook)

+ (void)load {
    static dispatch_once_t onceToken;
    dispatch_once(&onceToken, ^{
        Class cls = [self class];
        Method before   = class_getInstanceMethod(self, @selector(sendAction:to:forEvent:));
        Method after    = class_getInstanceMethod(self, @selector(cs_sendAction:to:forEvent:));
        method_exchangeImplementations(before, after);
    });
}

- (void)cs_sendAction:(SEL)action to:(id)target forEvent:(UIEvent *)event {
  /// 一些hook需要的逻辑
  
  /// 这里调用hook后的方法,其实现其实已经是原方法了。
  [self cs_sendAction:action to:target forEvent:event];
}

@end

method_exchangeImplementations的函数实现如下:

void method_exchangeImplementations(Method m1, Method m2)
{
    if (!m1  ||  !m2) return;

    mutex_locker_t lock(runtimeLock);

    IMP m1_imp = m1->imp;
    m1->imp = m2->imp;
    m2->imp = m1_imp;


    // RR/AWZ updates are slow because class is unknown
    // Cache updates are slow because class is unknown
    // fixme build list of classes whose Methods are known externally?

    flushCaches(nil);

    // Update custom RR and AWZ when a method changes its IMP
    updateCustomRR_AWZ(nil, m1);
    updateCustomRR_AWZ(nil, m2);
}

一些注意事项

  1. 通常在+load方法中执行runtime的hook操作,如果在+initialize方法中执行,则未必是线程安全的。
  2. 如果被hook的方法,内部实现依赖了_cmd,则可能有问题。

关于method swizzling的补充

事实上,前边的用法不完美,会有潜在的问题。

- (BOOL)swizzleMethod:(SEL)origSel withMethod:(SEL)newSel {
  Method origMethod = class_getInstanceMethod(self, origSel);
  Method newMethod = class_getInstanceMethod(self, newSel);
  if (!origMethod || !newMethod) {
    return NO;
  }

  IMP newImpl = method_getImplementation(newMethod);
  const char *newTypeEncoding = method_getTypEncoding(newMethod);

  if (class_addMethod(self, origSel, newImpl, newTypeEncoding)) {
    IMP origImpl = method_getImplementation(origMethod);
    const char *origTypeEncoding = method_getTypEncoding(origMethod);
    class_replaceMethod(self, newSel, origImpl, origTypeEncoding);
  } else {
    method_exchangeImplementations(origMethod, newMethod);
  }
  return YES;  
}

为何要先使用class_addMethod方法?且在其返回YES的时候使用class_replaceMethod,而返回NO的时候使用method_exchangeImplementations?

因为class_getInstanceMethod方法,首先在self中查找,若无则沿着继承链向上查找,直到找到对应方法。所以,其返回的Method可能是父类的。若直接调用method_exchangeImplementations,则swizzle的实际上是父类中的方法。

Aspects三方库

iOS中经常使用Aspects来进行AOP(面向切面编程)。

Think of Aspects as method swizzling on steroids. It allows you to add code to existing methods per class or per instance, whilst thinking of the insertion point e.g. before/instead/after. Aspects automatically deals with calling super and is easier to use than regular method swizzling.

Aspects可以单独针对一个实例来进行hook操作,这一点要比常见的runtime强大得多。

不过Aspects官方文档也提出了一些问题,并建议不要在生产环境使用Aspects。

Aspects hooks deep into the class hierarchy and creates dynamic subclasses, much like KVO. There's known issues with this approach, and to this date (February 2019) I STRICTLY DO NOT RECOMMEND TO USE Aspects IN PRODUCTION CODE. We use it for partial test mocks in, PSPDFKit, an iOS PDF framework that ships with apps like Dropbox or Evernote, it's also very useful for quickly hacking something up.

Aspects使用OC的消息转发来进行方法hook,即并非如通用的runtime操作那样直接使用method_exchangeImplementations等技巧。

Aspects uses Objective-C message forwarding to hook into messages. This will create some overhead. Don't add aspects to methods that are called a lot. Aspects is meant for view/controller code that is not called 1000 times per second.

使用AOP来进行界面打点

[UIViewController aspect_hookSelector:@selector(viewDidLoad) withOptions:AspectPositionAfter usingBlock:^(id<AspectInfo> aspectInfo) {
    NSLog(@"statistics: viewDidLoad has been hooked.");
    NSLog(@"View Controller %@ didLoad\n\n", aspectInfo.instance);
} error:nil];

Aspects的原理解析

Aspects非常轻量,对外仅有的两个接口放在NSObject的category中,同时也支持移除hook。可以说非常强大了,源码也是顶级的,非常值得学习。

Aspects的对外接口

hook同时可以支持实例方法和类方法,允许我们通过block传入hook的实际操作,且可以通过AspectOptions来控制hook代码的执行时机。

hook操作可以在原方法的前后执行,也可以替换掉原方法,关于这一点的实现方式需要额外注意,通过简单的block保存操作(类似AFNetworking中的实现)是无法同时兼顾这两点的。

/**
 Aspects uses Objective-C message forwarding to hook into messages. This will create some overhead. Don't add aspects to methods that are called a lot. Aspects is meant for view/controller code that is not called a 1000 times per second.

 Adding aspects returns an opaque token which can be used to deregister again. All calls are thread safe.
 */
@interface NSObject (Aspects)

/// Adds a block of code before/instead/after the current `selector` for a specific class.
///
/// @param block Aspects replicates the type signature of the method being hooked.
/// The first parameter will be `id<AspectInfo>`, followed by all parameters of the method.
/// These parameters are optional and will be filled to match the block signature.
/// You can even use an empty block, or one that simple gets `id<AspectInfo>`.
///
/// @note Hooking static methods is not supported.
/// @return A token which allows to later deregister the aspect.
+ (id<AspectToken>)aspect_hookSelector:(SEL)selector
                      withOptions:(AspectOptions)options
                       usingBlock:(id)block
                            error:(NSError **)error
{
    return aspect_add((id)self, selector, options, block, error);
}

/// Adds a block of code before/instead/after the current `selector` for a specific instance.
- (id<AspectToken>)aspect_hookSelector:(SEL)selector
                      withOptions:(AspectOptions)options
                       usingBlock:(id)block
                            error:(NSError **)error
{
    return aspect_add(self, selector, options, block, error);
}

@end

Aspects的数据结构

AspectOptions

AspectOptions允许设置hook操作的执行时机,默认在原有实现之后执行。还可以设置hook为一次性的,用完就移除!

typedef NS_OPTIONS(NSUInteger, AspectOptions) {
    AspectPositionAfter   = 0,            /// Called after the original implementation (default)
    AspectPositionInstead = 1,            /// Will replace the original implementation.
    AspectPositionBefore  = 2,            /// Called before the original implementation.
    
    AspectOptionAutomaticRemoval = 1 << 3 /// Will remove the hook after the first execution.
};

AspectToken

调用aspect_hookSelector方法会返回一个遵循AspectToken协议的id对象,AspectToken协议有一个remove方法,能够移除已添加的hook操作。

/// Opaque Aspect Token that allows to deregister the hook.
@protocol AspectToken <NSObject>

/// Deregisters an aspect.
/// @return YES if deregistration is successful, otherwise NO.
- (BOOL)remove;

@end

AspectInfo

AspectInfo表示了hook一个OC方法需要的信息,可以看出NSInvocation很关键。

/// The AspectInfo protocol is the first parameter of our block syntax.
@protocol AspectInfo <NSObject>

/// The instance that is currently hooked.
- (id)instance;

/// The original invocation of the hooked method.
- (NSInvocation *)originalInvocation;

/// All method arguments, boxed. This is lazily evaluated.
- (NSArray *)arguments;

@end

AspectIdentifier

AspectIdentifier表示一个单独的aspect的相关信息,表示一次方法hook的所有信息。每做一次hook都需要生成一个AspectIdentifier的实例对象。

// Tracks a single aspect.
@interface AspectIdentifier : NSObject
+ (instancetype)identifierWithSelector:(SEL)selector object:(id)object options:(AspectOptions)options block:(id)block error:(NSError **)error;
- (BOOL)invokeWithInfo:(id<AspectInfo>)info;
@property (nonatomic, assign) SEL selector;
@property (nonatomic, strong) id block; /// 执行的block
@property (nonatomic, strong) NSMethodSignature *blockSignature; /// block的方法签名
@property (nonatomic, weak) id object;
@property (nonatomic, assign) AspectOptions options;
@end

AspectsContainer

AspectsContainer是对象或类的所有aspect信息。

// Tracks all aspects for an object/class.
@interface AspectsContainer : NSObject
- (void)addAspect:(AspectIdentifier *)aspect withOptions:(AspectOptions)injectPosition;
- (BOOL)removeAspect:(id)aspect;
- (BOOL)hasAspects;
@property (atomic, copy) NSArray *beforeAspects;
@property (atomic, copy) NSArray *insteadAspects;
@property (atomic, copy) NSArray *afterAspects;
@end

AspectTracker

用于追踪aspect

@interface AspectTracker : NSObject
- (id)initWithTrackedClass:(Class)trackedClass parent:(AspectTracker *)parent;
@property (nonatomic, strong) Class trackedClass;
@property (nonatomic, strong) NSMutableSet *selectorNames;
@property (nonatomic, weak) AspectTracker *parentEntry;
@end

aspect_add与aspect_remove

static id aspect_add(id self, SEL selector, AspectOptions options, id block, NSError **error) {
    NSCParameterAssert(self);
    NSCParameterAssert(selector);
    NSCParameterAssert(block);

    __block AspectIdentifier *identifier = nil;
    // 通过自旋锁来保证线程安全,所以Aspects号称的线程安全也体现在这里。
    aspect_performLocked(^{
        // 判断是否可以hook(因为Aspects有一些黑名单),hook执行的时机是否合理等。
        if (aspect_isSelectorAllowedAndTrack(self, selector, options, error)) {
            // AspectsContainer对象,使用关联对象。
            AspectsContainer *aspectContainer = aspect_getContainerForObject(self, selector);
            // 将aspect的信息封装到AspectIdentifier对象中。
            identifier = [AspectIdentifier identifierWithSelector:selector object:self options:options block:block error:error];
            if (identifier) {
                // AspectsContainer中也会包含identifier,会用到hook时机的那个参数
                [aspectContainer addAspect:identifier withOptions:options];

                // Modify the class to allow message interception.
                aspect_prepareClassAndHookSelector(self, selector, error);
            }
        }
    });
    return identifier;
}

static BOOL aspect_remove(AspectIdentifier *aspect, NSError **error) {
    NSCAssert([aspect isKindOfClass:AspectIdentifier.class], @"Must have correct type.");

    __block BOOL success = NO;
    aspect_performLocked(^{
        id self = aspect.object; // strongify
        if (self) {
            AspectsContainer *aspectContainer = aspect_getContainerForObject(self, aspect.selector);
            success = [aspectContainer removeAspect:aspect];

            aspect_cleanupHookedClassAndSelector(self, aspect.selector);
            // destroy token
            aspect.object = nil;
            aspect.block = nil;
            aspect.selector = NULL;
        }else {
            NSString *errrorDesc = [NSString stringWithFormat:@"Unable to deregister hook. Object already deallocated: %@", aspect];
            AspectError(AspectErrorRemoveObjectAlreadyDeallocated, errrorDesc);
        }
    });
    return success;
}

这里都是hook实际操作之外的一些相关代码,包括容错等。如:

  1. 禁止hook的方法包括retain、release、autorelease、forwardInvocation。
  2. dealloc方法的hook操作,只允许在原有代码之前执行。这一点是肯定的,因为dealloc后对象就销毁了。
  3. 判断方法是否已经被hook过了,避免重复hook导致异常出现。
  4. 方法在类中要存在。

block的签名-blockSignature及NSInvocation

看这个生成AspectIdentifier实例对象的方法,这里的block并非直接简单地保存起来,而是引入了方法签名。这里引入方法签名,显然是为了使用NSInvocation来调用该block。那么这种方式要如何实现呢?

+ (instancetype)identifierWithSelector:(SEL)selector object:(id)object options:(AspectOptions)options block:(id)block error:(NSError **)error {
    NSCParameterAssert(block);
    NSCParameterAssert(selector);

    /// 需要block的方法签名。
    NSMethodSignature *blockSignature = aspect_blockMethodSignature(block, error); // TODO: check signature compatibility, etc.
    if (!aspect_isCompatibleBlockSignature(blockSignature, object, selector, error)) {
        return nil;
    }

    AspectIdentifier *identifier = nil;
    if (blockSignature) {
        identifier = [AspectIdentifier new];
        identifier.selector = selector;
        identifier.block = block;
        identifier.blockSignature = blockSignature;
        identifier.options = options;
        identifier.object = object; // weak
    }
    return identifier;
}

如何构建block的签名信息呢,注意数据结构AspectBlockRef,这里有通过AspectBlockRef layout = (__bridge void *)block;进行桥转换,将block转换成AspectBlockRef结构体,说明block的底层结构(内存布局)即与AspectBlockRef一致。

static NSMethodSignature *aspect_blockMethodSignature(id block, NSError **error) {
    AspectBlockRef layout = (__bridge void *)block;
	  if (!(layout->flags & AspectBlockFlagsHasSignature)) {
        NSString *description = [NSString stringWithFormat:@"The block %@ doesn't contain a type signature.", block];
        AspectError(AspectErrorMissingBlockSignature, description);
        return nil;
    }
	  void *desc = layout->descriptor;
	  desc += 2 * sizeof(unsigned long int);
	  if (layout->flags & AspectBlockFlagsHasCopyDisposeHelpers) {
		  desc += 2 * sizeof(void *);
    }
	  if (!desc) {
        NSString *description = [NSString stringWithFormat:@"The block %@ doesn't has a type signature.", block];
        AspectError(AspectErrorMissingBlockSignature, description);
        return nil;
    }
	  const char *signature = (*(const char **)desc);
	  return [NSMethodSignature signatureWithObjCTypes:signature];
}

_AspectBlock

AspectBlockRef这个结构体用于将block封装成一个对象,然后就可以通过NSInvocation的方式来进行调用。

// Block internals.
typedef NS_OPTIONS(int, AspectBlockFlags) {
	AspectBlockFlagsHasCopyDisposeHelpers = (1 << 25),
	AspectBlockFlagsHasSignature          = (1 << 30)
};

typedef struct _AspectBlock {
	__unused Class isa;
	AspectBlockFlags flags;
	__unused int reserved;
	void (__unused *invoke)(struct _AspectBlock *block, ...);
	struct {
		unsigned long int reserved;
		unsigned long int size;
		// requires AspectBlockFlagsHasCopyDisposeHelpers
		void (*copy)(void *dst, const void *src);
		void (*dispose)(const void *);
		// requires AspectBlockFlagsHasSignature
		const char *signature;
		const char *layout;
	} *descriptor;
	// imported variables
} *AspectBlockRef;

AspectBlockRef和block真的可以对应起来吗?来看看看block的源码结构,注意其中的Block_layout:

// Values for Block_layout->flags to describe block objects
enum {
    BLOCK_DEALLOCATING =      (0x0001),  // runtime
    BLOCK_REFCOUNT_MASK =     (0xfffe),  // runtime
    BLOCK_NEEDS_FREE =        (1 << 24), // runtime
    BLOCK_HAS_COPY_DISPOSE =  (1 << 25), // compiler
    BLOCK_HAS_CTOR =          (1 << 26), // compiler: helpers have C++ code
    BLOCK_IS_GC =             (1 << 27), // runtime
    BLOCK_IS_GLOBAL =         (1 << 28), // compiler
    BLOCK_USE_STRET =         (1 << 29), // compiler: undefined if !BLOCK_HAS_SIGNATURE
    BLOCK_HAS_SIGNATURE  =    (1 << 30), // compiler
    BLOCK_HAS_EXTENDED_LAYOUT=(1 << 31)  // compiler
};

#define BLOCK_DESCRIPTOR_1 1
struct Block_descriptor_1 {
    uintptr_t reserved;
    uintptr_t size;
};

#define BLOCK_DESCRIPTOR_2 1
struct Block_descriptor_2 {
    // requires BLOCK_HAS_COPY_DISPOSE
    void (*copy)(void *dst, const void *src);
    void (*dispose)(const void *);
};

#define BLOCK_DESCRIPTOR_3 1
struct Block_descriptor_3 {
    // requires BLOCK_HAS_SIGNATURE
    const char *signature;
    const char *layout;     // contents depend on BLOCK_HAS_EXTENDED_LAYOUT
};

struct Block_layout {
    void *isa;
    volatile int32_t flags; // contains ref count
    int32_t reserved; 
    void (*invoke)(void *, ...);
    struct Block_descriptor_1 *descriptor;
    // imported variables
};

在这里,顺便多说一些,从block的源码中可以看出:

  1. block也有isa指针
  2. block也有引用计数
  3. block的invoke是C语言的匿名函数,也可以理解为函数指针,指向block的实际执行体
  4. Block_descriptor有三个,分别包含了不同的信息
  5. block的flags里边会存储block的信息,包含引用计数、是否有签名BLOCK_HAS_SIGNATURE等。
  6. block的签名很关键,没有签名则无法使用NSInvocation来执行。

我们可以通过如下代码进行验证,这里为了避免跟Aspects中定义的struct冲突,我们自行命名了一个_MyBlock。

typedef NS_OPTIONS(int, MyBlockFlags) {
    MyBlockFlagsHasCopyDisposeHelpers = (1 << 25),
    MyBlockFlagsHasSignature          = (1 << 30)
};

/// 所以,这里就是block的源码结构
typedef struct _MyBlock {
    __unused Class isa;
    MyBlockFlags flags;
    __unused int reserved;
    void (__unused *invoke)(struct _MyBlock *block, ...);
    struct {
        unsigned long int reserved;
        unsigned long int size;
        // requires AspectBlockFlagsHasCopyDisposeHelpers
        void (*copy)(void *dst, const void *src);
        void (*dispose)(const void *);
        // requires AspectBlockFlagsHasSignature
        const char *signature;
        const char *layout;
    } *descriptor;
    // imported variables
} *MyBlockRef;

- (void)testBlock {
    void(^block1)(void) = ^{
        NSLog(@"block1");
    };
    block1();
    
    /// block的源码结构:
    struct _MyBlock *myBlock = (__bridge struct _MyBlock *)block1;
    myBlock->invoke(myBlock); // 输出block1
}

这里,我们通过myBlock->invoke(myBlock)的方式执行了该block,验证了block的源码结构。

获取block的签名和使用NSInvocation

关于block的签名信息,在runtime源码(runtime.c文件)中也有相关的使用:

static struct Block_descriptor_3 * _Block_descriptor_3(struct Block_layout *aBlock)
{
    if (! (aBlock->flags & BLOCK_HAS_SIGNATURE)) return NULL;
    uint8_t *desc = (uint8_t *)aBlock->descriptor;
    desc += sizeof(struct Block_descriptor_1);
    if (aBlock->flags & BLOCK_HAS_COPY_DISPOSE) {
        desc += sizeof(struct Block_descriptor_2);
    }
    return (struct Block_descriptor_3 *)desc;
}

// Checks for a valid signature, not merely the BLOCK_HAS_SIGNATURE bit.
BLOCK_EXPORT
bool _Block_has_signature(void *aBlock) {
    return _Block_signature(aBlock) ? true : false;
}

BLOCK_EXPORT
const char * _Block_signature(void *aBlock)
{
    struct Block_descriptor_3 *desc3 = _Block_descriptor_3(aBlock);
    if (!desc3) return NULL;

    return desc3->signature;
}

我们可以通过如下代码进行block签名的验证:

- (void)testBlock {
    void(^block1)(void) = ^{
        NSLog(@"block1");
    };

    /// 如何通过NSInvocation来执行一个block,关键就在于获取block的方法签名
    NSMethodSignature *sign = [self blockSignature:block1];
    NSInvocation *invocation = [NSInvocation invocationWithMethodSignature:sign];
    invocation.target = block1;
    [invocation invoke]; // 输出block1
}

- (NSMethodSignature *)blockSignature:(id)block {
    const char *sign = _Block_signature((__bridge void *)block);
    return [NSMethodSignature signatureWithObjCTypes:sign];
}

可以看出,获取到了block的签名,然后使用NSInvocation调用block,结果与block自行调用一样。

因此,我们也明白了Aspects源码中关于block签名获取的原理,实现代码也基本类似,其中根据block的底层结构而添加的内存偏移操作是一致的。

至于调用NSInvocation的实际代码,请看后续的invokeWithInfo方法。

aspect_prepareClassAndHookSelector

这里是hook实际操作的核心代码,其中也有使用runtime的class_replaceMethod函数来实现,只不过这里替换的是消息转发的系统方法。

static void aspect_prepareClassAndHookSelector(NSObject *self, SEL selector, NSError **error) {
    NSCParameterAssert(selector);
    Class klass = aspect_hookClass(self, error);
    Method targetMethod = class_getInstanceMethod(klass, selector);
    IMP targetMethodIMP = method_getImplementation(targetMethod);
    if (!aspect_isMsgForwardIMP(targetMethodIMP)) {
        // Make a method alias for the existing method implementation, it not already copied.
        const char *typeEncoding = method_getTypeEncoding(targetMethod);
        SEL aliasSelector = aspect_aliasForSelector(selector);
        if (![klass instancesRespondToSelector:aliasSelector]) {
            /// 如:aliasSelector的方法实现指向原方法viewDidLoad。
            __unused BOOL addedAlias = class_addMethod(klass, aliasSelector, method_getImplementation(targetMethod), typeEncoding);
            NSCAssert(addedAlias, @"Original implementation for %@ is already copied to %@ on %@", NSStringFromSelector(selector), NSStringFromSelector(aliasSelector), klass);
        }

        /// 如:klass已动态添加了aspects__viewDidLoad方法。
        /// aspect_getMsgForwardIMP(self, selector) 为 (IMP) msgForwardIMP = 0x00007fff513f8400 (libobjc.A.dylib`_objc_msgForward)
        /// 将selector的实现指向_objc_msgForward,即原方法调用,直接走到了消息转发。
        /// 而该子类的消息转发方法已被替换为__ASPECTS_ARE_BEING_CALLED__。
        // We use forwardInvocation to hook in.
        class_replaceMethod(klass, selector, aspect_getMsgForwardIMP(self, selector), typeEncoding);
        AspectLog(@"Aspects: Installed hook for -[%@ %@].", klass, NSStringFromSelector(selector));
    }
}

这里主要有两个关键点:

Class klass = aspect_hookClass(self, error);

static Class aspect_hookClass(NSObject *self, NSError **error) {
    NSCParameterAssert(self);
	  Class statedClass = self.class;
	  Class baseClass = object_getClass(self);
	  NSString *className = NSStringFromClass(baseClass);

    // Already subclassed
	  if ([className hasSuffix:AspectsSubclassSuffix]) {
		    return baseClass;

        // We swizzle a class object, not a single object.
	  }else if (class_isMetaClass(baseClass)) {
        return aspect_swizzleClassInPlace((Class)self);
        // Probably a KVO'ed class. Swizzle in place. Also swizzle meta classes in place.
    }else if (statedClass != baseClass) {
        return aspect_swizzleClassInPlace(baseClass);
    }

    // Default case. Create dynamic subclass.
	  const char *subclassName = [className stringByAppendingString:AspectsSubclassSuffix].UTF8String;
	  Class subclass = objc_getClass(subclassName);

	  if (subclass == nil) {
        // 动态创建类
		    subclass = objc_allocateClassPair(baseClass, subclassName, 0);
		    if (subclass == nil) {
            NSString *errrorDesc = [NSString stringWithFormat:@"objc_allocateClassPair failed to allocate class %s.", subclassName];
            AspectError(AspectErrorFailedToAllocateClassPair, errrorDesc);
            return nil;
        }

		    aspect_swizzleForwardInvocation(subclass);
		    aspect_hookedGetClass(subclass, statedClass);
		    aspect_hookedGetClass(object_getClass(subclass), statedClass);
        // 注册新类
		    objc_registerClassPair(subclass);
	  }

    // isa混淆
	  object_setClass(self, subclass);
	  return subclass;
}

这一大串代码中,可以看到比较熟悉的一些runtime相关接口:object_getClass、objc_allocateClassPair、objc_registerClassPair、object_setClass等。

这段代码中,动态生成了一个当前对象的子类,然后aspect_hookedGetClass(subclass, statedClass);使得动态生成的子类对象的@selector(class)会返回原对象的类,同时还有aspect_hookedGetClass(object_getClass(subclass), statedClass);,这一块比较难以理解。

object_setClass(self, subclass); 这句代码尤其吸引眼球。这不就是KVO中使用到的isa替换么。。。所以,之后通过object_getClass(self)获取到的isa即指向了包含Aspects字符串的子类。这样的好处在于,对于一个实例或类,通过查看isa指针就能直观知道其是否已经被Aspects执行过hook操作,外部调用的时候则继续视为原对象使用,所有的hook操作都发生在动态生成的子类中,而不会涉及到对象自身的一些不必要改动。

class_replaceMethod(klass, selector, aspect_getMsgForwardIMP(self, selector), typeEncoding)

如果动态生成的类的实例不能响应方法,则先添加

__unused BOOL addedAlias = class_addMethod(klass, aliasSelector, method_getImplementation(targetMethod), typeEncoding);

这一步就是runtime的典型hook操作了。

// We use forwardInvocation to hook in.
class_replaceMethod(klass, selector, aspect_getMsgForwardIMP(self, selector), typeEncoding);

aspect_swizzleForwardInvocation

其中,aspect_swizzleForwardInvocation(klass)函数,会将klass的forwardInvocation:方法的实现体,替换为__ASPECTS_ARE_BEING_CALLED__。

static NSString *const AspectsForwardInvocationSelectorName = @"__aspects_forwardInvocation:";
static void aspect_swizzleForwardInvocation(Class klass) {
    NSCParameterAssert(klass);
    // If there is no method, replace will act like class_addMethod.
    IMP originalImplementation = class_replaceMethod(klass, @selector(forwardInvocation:), (IMP)__ASPECTS_ARE_BEING_CALLED__, "v@:@");
    if (originalImplementation) {
        class_addMethod(klass, NSSelectorFromString(AspectsForwardInvocationSelectorName), originalImplementation, "v@:@");
    }
    AspectLog(@"Aspects: %@ is now aspect aware.", NSStringFromClass(klass));
}

ASPECTS_ARE_BEING_CALLED

而__ASPECTS_ARE_BEING_CALLED__的实现如下:

/// 这里是实际hook代码执行的地方。
// This is the swizzled forwardInvocation: method.
static void __ASPECTS_ARE_BEING_CALLED__(__unsafe_unretained NSObject *self, SEL selector, NSInvocation *invocation) {
    NSCParameterAssert(self);
    NSCParameterAssert(invocation);
    SEL originalSelector = invocation.selector;
	  SEL aliasSelector = aspect_aliasForSelector(invocation.selector);
    invocation.selector = aliasSelector;
    AspectsContainer *objectContainer = objc_getAssociatedObject(self, aliasSelector);
    AspectsContainer *classContainer = aspect_getContainerForClass(object_getClass(self), aliasSelector);
    AspectInfo *info = [[AspectInfo alloc] initWithInstance:self invocation:invocation];
    NSArray *aspectsToRemove = nil;

    /// 通过aspect_invoke函数,来执行AspectIdentifier中的block
    // Before hooks.
    aspect_invoke(classContainer.beforeAspects, info);
    aspect_invoke(objectContainer.beforeAspects, info);

    // Instead hooks.
    BOOL respondsToAlias = YES;
    if (objectContainer.insteadAspects.count || classContainer.insteadAspects.count) {
        aspect_invoke(classContainer.insteadAspects, info);
        aspect_invoke(objectContainer.insteadAspects, info);
    }else {
        Class klass = object_getClass(invocation.target);
        do {
            if ((respondsToAlias = [klass instancesRespondToSelector:aliasSelector])) {
                [invocation invoke];
                break;
            }
        }while (!respondsToAlias && (klass = class_getSuperclass(klass)));
    }

    // After hooks.
    aspect_invoke(classContainer.afterAspects, info);
    aspect_invoke(objectContainer.afterAspects, info);

    // If no hooks are installed, call original implementation (usually to throw an exception)
    if (!respondsToAlias) {
        invocation.selector = originalSelector;
        SEL originalForwardInvocationSEL = NSSelectorFromString(AspectsForwardInvocationSelectorName);
        if ([self respondsToSelector:originalForwardInvocationSEL]) {
            ((void( *)(id, SEL, NSInvocation *))objc_msgSend)(self, originalForwardInvocationSEL, invocation);
        }else {
            [self doesNotRecognizeSelector:invocation.selector];
        }
    }

    // Remove any hooks that are queued for deregistration.
    [aspectsToRemove makeObjectsPerformSelector:@selector(remove)];
}

aspect_invoke

aspect_invoke中依次执行传入的AspectIdentifier对象中封装的操作步骤。

// This is a macro so we get a cleaner stack trace.
#define aspect_invoke(aspects, info) \
for (AspectIdentifier *aspect in aspects) {\
    [aspect invokeWithInfo:info];\
    if (aspect.options & AspectOptionAutomaticRemoval) { \
        aspectsToRemove = [aspectsToRemove?:@[] arrayByAddingObject:aspect]; \
    } \
}

而invokeWithInfo中则完整地使用了NSInvocation。关于NSInvocation,可以通过这篇博客来iOS中消息转发的套路来回顾一下。

NSInvocation可以给任意OC对象发送消息,其使用方式有固定的步骤:

  1. 根据selector来初始化方法签名对象NSMethodSignature
  2. 根据方法签名对象NSMethodSignature来初始化NSInvocation对象,必须使用invocationWithMethodSignature:方法。
  3. 设置target和selector。
  4. 设置参数,注意参数的index从2开始,因为0和1分别对应为target和selector。若参数index超出则会出错。
  5. 调用NSInvocation对象的invoke方法。
  6. 若有返回值,使用NSInvocation对象的getReturnValue来获取返回值。
- (BOOL)invokeWithInfo:(id<AspectInfo>)info {
    NSInvocation *blockInvocation = [NSInvocation invocationWithMethodSignature:self.blockSignature];
    NSInvocation *originalInvocation = info.originalInvocation;
    NSUInteger numberOfArguments = self.blockSignature.numberOfArguments;

    // Be extra paranoid. We already check that on hook registration.
    if (numberOfArguments > originalInvocation.methodSignature.numberOfArguments) {
        AspectLogError(@"Block has too many arguments. Not calling %@", info);
        return NO;
    }

    // The `self` of the block will be the AspectInfo. Optional.
    if (numberOfArguments > 1) {
        // index为0的参数为target,index为1的参数为selector
        [blockInvocation setArgument:&info atIndex:1];
    }
    
	  void *argBuf = NULL;
    // target和selector除外的参数,是从index为2开始
    for (NSUInteger idx = 2; idx < numberOfArguments; idx++) {
        const char *type = [originalInvocation.methodSignature getArgumentTypeAtIndex:idx];
		    NSUInteger argSize;
		    NSGetSizeAndAlignment(type, &argSize, NULL);
        
        /// reallocf将argBuf的内存大小增大或缩小为argSize大小。
		    if (!(argBuf = reallocf(argBuf, argSize))) {
            AspectLogError(@"Failed to allocate memory for block invocation.");
			      return NO;
		    }
        
		    [originalInvocation getArgument:argBuf atIndex:idx];
		    [blockInvocation setArgument:argBuf atIndex:idx];
    }
    
    [blockInvocation invokeWithTarget:self.block];
    
    if (argBuf != NULL) {
        free(argBuf);
    }
    return YES;
}

使用fishhook对C语言函数进行hook

FOUNDATION_EXPORT void NSLog(NSString *format, ...) NS_FORMAT_FUNCTION(1,2) NS_NO_TAIL_CALL;

如NSLog并非OC方法,因此无法使用runtime进行hook。这就需要使用到fishhook了。

对NSLog进行hook

// 申明一个函数指针,用于保存原NSLog的真实函数地址,其函数签名必须与原函数保持一致。
// 因为hook掉原函数后,在新函数中依然需要调用,不然原有功能就缺失了。
static void (*orig_nslog)(NSString *format, ...);
void my_nslog(NSString *format, ...) {
    // 此时,函数体已经交换,该调用实际上用的是NSLog的函数体。
    orig_nslog([NSString stringWithFormat:@"我的NSLog: %@", format]);
}

struct rebinding rebinding_nslog = {"NSLog", my_nslog, (void *)&orig_nslog};
rebind_symbols((struct rebinding [1]){rebinding_nslog}, 1);

原函数的实现体的内存地址需要保存至orig_nslog中,然后替换后的函数my_nslog中需要调用原函数,以保证系统函数的原有功能完整。

使用rebinding结构体表示一次hook操作,使用rebind_symbols函数进行符号重定向操作。

NSLog(@"123");
struct rebinding rebinding_nslog = {"NSLog", my_nslog, (void *)&orig_nslog};
rebind_symbols((struct rebinding [1]){rebinding_nslog}, 1);
NSLog(@"123");
NSLog([NSString stringWithFormat:@"456 %d", 789]);

输出结果:

123
我的NSLog: 123
我的NSLog: 456 789

对open/close进行hook

fishhook的示例代码对C语言函数open/close进行了hook操作,则在对App的Mach-O文件(包括App Binary,Plist文件,.data文件等)进行open操作的时候,能够插入自定义代码。

struct rebinding rebinding_close = {"close", my_close, (void *)&orig_close};
struct rebinding rebinding_open = {"open", my_open, (void *)&orig_open};

// rebinding是一个struct,定义了需要rebind的符号的信息
rebind_symbols((struct rebinding[2]){rebinding_close, rebinding_open}, 2);

依然,需要将原函数的实现体保存至orig_open和orig_close中,然后替换后的函数中再去调用原函数。

static int (*orig_close)(int);
static int (*orig_open)(const char *, int, ...);
int my_close(int fd) {
    printf("Calling real close(%d)\n", fd);
    return orig_close(fd);
}
int my_open(const char *path, int oflag, ...) {
    va_list ap = {0};
    mode_t mode = 0;
    
    if ((oflag & O_CREAT) != 0) {
        // mode only applies to O_CREAT
        va_start(ap, oflag);
        mode = va_arg(ap, int);
        va_end(ap);
        printf("Calling real open('%s', %d, %d)\n", path, oflag, mode);
        return orig_open(path, oflag, mode);
    } else {
        printf("Calling real open('%s', %d)\n", path, oflag);
        return orig_open(path, oflag, mode);
    }
}

fishhook官方提供的示例,也比较直观。

不能hook自定义函数

fishhook不能对自定义函数进行hook。

static void (*myFuncImp)(void);
void myFunc() {
    NSLog(@"myFunc");
}
void hookMyFunc() {
    NSLog(@"hookMyFunc");
}
myFunc();
struct rebinding rebinding_myFunc = {"myFunc", hookMyFunc, (void *)&myFuncImp};
rebind_symbols((struct rebinding [1]){rebinding_myFunc}, 1);
myFunc();

输出结果:

myFunc
myFunc

不能hook自定义函数的原因在于,App在需要调用系统函数的时候,会在_DATA段建立一个指针。dyld进行动态绑定,将该指针指向一个函数实现体。如,调用NSLog的时候,系统先建立一个函数指针,在dyld动态加载Foundation框架时,将该指针指向NSLog的函数实现体。而fishhook即可以通过修改该指针的指向地址,将其指向替换后的函数实现地址,即达到了hook C语言函数的目的。而自定义函数则不存在这样的逻辑,因此无法hook。

fishhook的原理解析

fishhook的源码涉及到了非常深入的Mach-O相关知识,不熟悉的同学建议先看下这篇博客对Mach-O文件的初步探索。fishhook即是针对符号进行重新绑定,来做到hook C语言函数的。

对于动态链接库里边的C语言函数,其函数的地址指针存放在__DATA.__la_symbol_ptr(懒绑定符号指针)和__DATA.__nl_symbol_ptr(非懒绑定符号指针)这两个section, 而其实现地址可能存在于dylib中, 要等到App启动之后才能明确知道. 这就是通常所说的符号Symbol绑定(rebind)地址. 之后调用函数,直接根据这两个section即可以找到函数指针, 然后在dylib中找到函数的实现地址即可使用。而fishhook即通过修改这两个section中存储的函数指针, 使得其指向新的函数实现地址,来进行C函数的hook操作, 也就是这里说的符号重绑定(rebind)。

fishhook官方的原理解释,How it works:

dyld binds lazy and non-lazy symbols by updating pointers in particular sections of the __DATA segment of a Mach-O binary. fishhook re-binds these symbols by determining the locations to update for each of the symbol names passed to rebind_symbols and then writing out the corresponding replacements.

For a given image, the __DATA segment may contain two sections that are relevant for dynamic symbol bindings: __nl_symbol_ptr and __la_symbol_ptr. __nl_symbol_ptr is an array of pointers to non-lazily bound data (these are bound at the time a library is loaded) and __la_symbol_ptr is an array of pointers to imported functions that is generally filled by a routine called dyld_stub_binder during the first call to that symbol (it's also possible to tell dyld to bind these at launch). In order to find the name of the symbol that corresponds to a particular location in one of these sections, we have to jump through several layers of indirection. For the two relevant sections, the section headers (struct sections from <mach-o/loader.h>) provide an offset (in the reserved1 field) into what is known as the indirect symbol table. The indirect symbol table, which is located in the __LINKEDIT segment of the binary, is just an array of indexes into the symbol table (also in __LINKEDIT) whose order is identical to that of the pointers in the non-lazy and lazy symbol sections. So, given struct section nl_symbol_ptr, the corresponding index in the symbol table of the first address in that section is indirect_symbol_table[nl_symbol_ptr->reserved1]. The symbol table itself is an array of struct nlists (see <mach-o/nlist.h>), and each nlist contains an index into the string table in __LINKEDIT which where the actual symbol names are stored. So, for each pointer __nl_symbol_ptr and __la_symbol_ptr, we are able to find the corresponding symbol and then the corresponding string to compare against the requested symbol names, and if there is a match, we replace the pointer in the section with the replacement.

The process of looking up the name of a given entry in the lazy or non-lazy pointer tables looks like this:

即:dyld通过更新Mach-O中的__DATA段的特定section中的指针,来绑定懒绑定和非懒绑定的符号。fishhook为传入rebind_symbols函数的每个符号名进行判断,决定其更新后的函数实现体的地址,并完成对应的函数替换,以此来重新绑定这些符号。

对于给定的image,__DATA段通常包含两个跟动态符号绑定相关的section:__nl_symbol_ptr和__la_symbol_ptr,分别是non-lazy binding和lazy binding。__nl_symbol_ptr是一个指针数组,存储的是非懒绑定的bound data(约束数据)(当一个库被加载的时候的bound),__la_symbol_ptr也是一个指针数组,存储的是一个叫做dyld_stub_binder的routine在首次调用那个符号(也可能是dyld在launch的时候绑定的)的时候导入的函数。为了在这些section中的特定位置找到那个符号的名称,我们必须经历几个中间层。对于这两个相关的section,section headers(在<mach-o/loader.h>定义)中提供了一个偏移量offset(即reserved1字段),这些offset存在于间接符号表(indirect symbol table)中。间接符号表也存储于Mach-O二进制文件的__LINKEDIT段中,只是一个数组而已,里边存储的是对应于符号表中(也在__LINKEDIT段)的索引,这些索引的顺序与指针在非懒绑定和懒绑定符号section的顺序保持一致。因此,对于nl_symbol_ptr section,其section的首地址 其section的其实地址在符号表中的索引,即为indirect_symbol_table[nl_symbol_ptr->reserved1]。符号表本身就是一个存储nlist结构体(见<mach-o/nlist.h>中)的数组,并且每一个nlist都包含了一个在__LINKEDIT段的字符串表中的索引,这个字符串表实际上存储的是真实的符号名。因此,对于每一个__nl_symbol_ptr和__la_symbol_ptr指针,我们能够得到其对应的符号及字符串,跟给定的符号名进行比较,如果匹配上了,我们就将section中的指针替换掉。

参考巧用符号表 - 探求 fishhook 原理(一)

__nl_symbol_ptr和__la_symbol_ptr都是由Indirect Pointer组成的指针数组。其中的元素决定了我们调用的方法应该以哪个代码段的方法来执行。通过Indirect Pointer,取出符号名,当与rebinds传入的函数名匹配则重写该Indirect Pointer指向的地址,即完成了函数的rebind操作。

根据给定的懒绑定或非懒绑定的指针表的入口,查找其名称的过程如图所示:

在这里,__DATA.__la_symbol_ptr(懒绑定符号指针)和__DATA.__nl_symbol_ptr(非懒绑定符号指针)这两个section非常关键。__la_symbol_ptr是懒绑定(lazy binding)的符号指针,在加载的时候,并未直接确定符号地址,而是在第一次调用该函数的时候,通过PLT(Procedure Linkage Table)进行一次懒绑定。而__nl_symbol_ptr则不会进行懒绑定。

fishhook的对外接口

rebinding结构体用于表示即将对一个函数进行hook所需的封装结构。

/*
 * A structure representing a particular intended rebinding from a symbol
 * name to its replacement
 */
struct rebinding {
  const char *name;
  void *replacement;
  void **replaced;
};

replacement指向替换后的函数实现体,replaced用于保存原函数实现体。

rebind_symbols函数接收即将进行hook的rebinding结构体数组,以及数组个数。

/*
 * For each rebinding in rebindings, rebinds references to external, indirect
 * symbols with the specified name to instead point at replacement for each
 * image in the calling process as well as for all future images that are loaded
 * by the process. If rebind_functions is called more than once, the symbols to
 * rebind are added to the existing list of rebindings, and if a given symbol
 * is rebound more than once, the later rebinding will take precedence.
 */
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel);
int rebind_symbols(struct rebinding rebindings[], size_t rebindings_nel) {
  int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);
  if (retval < 0) {
    return retval;
  }
  // If this was the first call, register callback for image additions (which is also invoked for
  // existing images, otherwise, just run on existing images
  if (!_rebindings_head->next) {
    _dyld_register_func_for_add_image(_rebind_symbols_for_image);
  } else {
    uint32_t c = _dyld_image_count();
    for (uint32_t i = 0; i < c; i++) {
      _rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));
    }
  }
  return retval;
}

rebind_symbols的源码实现分为两部分,首先prepend_rebindings函数,根据传入的rebindings数组,构建一个链表结构,表头为_rebindings_head。

然后视情况调用_dyld_register_func_for_add_image或_rebind_symbols_for_image函数。

链表结构

这里有一个链表,表头是_rebindings_head,每个节点都存储一个指针,指向rebinding结构体组成的数组,rebindings_nel即为数组个数,另外一个next指针指向后继节点。

struct rebindings_entry {
  struct rebinding *rebindings;
  size_t rebindings_nel;
  struct rebindings_entry *next;
};

static struct rebindings_entry *_rebindings_head;

执行这句代码,

int retval = prepend_rebindings(&_rebindings_head, rebindings, rebindings_nel);

prepend_rebindings的源码如下:

static int prepend_rebindings(struct rebindings_entry **rebindings_head,
                              struct rebinding rebindings[],
                              size_t nel) {
  struct rebindings_entry *new_entry = malloc(sizeof(struct rebindings_entry));
  if (!new_entry) {
    return -1;
  }
  /// 申请nel个数的rebinding结构体所需的内存空间
  new_entry->rebindings = malloc(sizeof(struct rebinding) * nel);
  if (!new_entry->rebindings) {
    free(new_entry);
    return -1;
  }
  /// 将传入rebindings结构体数组的内存, 拷贝至new_entry->rebindings. 第三个参数为拷贝的内存大小
  memcpy(new_entry->rebindings, rebindings, sizeof(struct rebinding) * nel);
  new_entry->rebindings_nel = nel;

  /// 这里是熟悉的链表操作:链表头部插入一个节点
  /// 后rebinding的放在了链表的头部
  new_entry->next = *rebindings_head;
  *rebindings_head = new_entry;
  
  return 0;
}

_dyld_register_func_for_add_image

_rebindings_head->next为空,意味着是首次调用。使用 ***_dyld_register_func_for_add_image(_rebind_symbols_for_image);***,将_rebind_symbols_for_image注册为dyld加载image后的回调函数。则,每次dyld加载一个image的时候,都会触发该_rebind_symbols_for_image函数。

/*
 * The following functions allow you to install callbacks which will be called   
 * by dyld whenever an image is loaded or unloaded.  During a call to _dyld_register_func_for_add_image()
 * the callback func is called for every existing image.  Later, it is called as each new image
 * is loaded and bound (but initializers not yet run).  The callback registered with
 * _dyld_register_func_for_remove_image() is called after any terminators in an image are run
 * and before the image is un-memory-mapped.
 */
extern void _dyld_register_func_for_add_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide))    __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);
extern void _dyld_register_func_for_remove_image(void (*func)(const struct mach_header* mh, intptr_t vmaddr_slide)) __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);

_rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));

这个函数即是fishhook的重新绑定符号的过程:遍历dyld动态加载的image,依次执行rebind操作。

_rebind_symbols_for_image(_dyld_get_image_header(i), _dyld_get_image_vmaddr_slide(i));

static void _rebind_symbols_for_image(const struct mach_header *header,
                                      intptr_t slide) {
    rebind_symbols_for_image(_rebindings_head, header, slide);
}

_dyld_get_image_header和_dyld_get_image_vmaddr_slide两个函数均接收image_index,即image的索引,分别返回image的header和image的虚拟地址偏移量。slide的出现,是因为ASLR技术(Address space layout randomization),即内核将Mach-O加载到虚拟内存中时,处于安全的考虑将其内存地址偏移一个随机的offset。

/*
 * The following functions allow you to iterate through all loaded images.  
 * This is not a thread safe operation.  Another thread can add or remove
 * an image during the iteration.  
 *
 * Many uses of these routines can be replace by a call to dladdr() which 
 * will return the mach_header and name of an image, given an address in 
 * the image. dladdr() is thread safe.
 */
extern uint32_t                    _dyld_image_count(void)                              __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);
extern const struct mach_header*   _dyld_get_image_header(uint32_t image_index)         __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);
extern intptr_t                    _dyld_get_image_vmaddr_slide(uint32_t image_index)   __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);
extern const char*                 _dyld_get_image_name(uint32_t image_index)           __OSX_AVAILABLE_STARTING(__MAC_10_1, __IPHONE_2_0);

fishhook的数据结构

dl_info

dladdr函数将mach-header中的信息填到dl_info结构体中。

/*
 * Structure filled in by dladdr().
 */
typedef struct dl_info {
        const char      *dli_fname;     /* Pathname of shared object */
        void            *dli_fbase;     /* Base address of shared object */
        const char      *dli_sname;     /* Name of nearest symbol */
        void            *dli_saddr;     /* Address of nearest symbol */
} Dl_info;

extern int dladdr(const void *, Dl_info *);

segment_command_t

segment_command_64结构体对应于Mach-O文件中的一个segment load命令LC_SEGMENT_64。该命令用于加载各种命令。

typedef struct segment_command_64 segment_command_t;

/*
 * The 64-bit segment load command indicates that a part of this file is to be
 * mapped into a 64-bit task's address space.  If the 64-bit segment has
 * sections then section_64 structures directly follow the 64-bit segment
 * command and their size is reflected in cmdsize.
 */
struct segment_command_64 { /* for 64-bit architectures */
	uint32_t	cmd;		/* LC_SEGMENT_64 */
	uint32_t	cmdsize;	/* includes sizeof section_64 structs */
	char		segname[16];	/* segment name */
	uint64_t	vmaddr;		/* memory address of this segment */
	uint64_t	vmsize;		/* memory size of this segment */
	uint64_t	fileoff;	/* file offset of this segment */
	uint64_t	filesize;	/* amount to map from the file */
	vm_prot_t	maxprot;	/* maximum VM protection */
	vm_prot_t	initprot;	/* initial VM protection */
	uint32_t	nsects;		/* number of sections in segment */
	uint32_t	flags;		/* flags */
};

这与使用MachOView查看Mach-O文件的结果一致。如LC_SEGMENT_64(__TEXT)命令用于加载代码段,其类型是LC_SEGMENT_64, cmdsize为472,segment name为__TEXT。

section是相同或相似信息的集合,如.text、.data、.bss section都是不同的section。而segment是由多个属性相同的section组成的。我们通常说的代码段和数据段指的其实就是segment。

typedef struct section_64 section_t;

struct section_64 { /* for 64-bit architectures */
	char		sectname[16];	/* name of this section */
	char		segname[16];	/* segment this section goes in */
	uint64_t	addr;		/* memory address of this section */
	uint64_t	size;		/* size in bytes of this section */
	uint32_t	offset;		/* file offset of this section */
	uint32_t	align;		/* section alignment (power of 2) */
	uint32_t	reloff;		/* file offset of relocation entries */
	uint32_t	nreloc;		/* number of relocation entries */
	uint32_t	flags;		/* flags (section type and attributes)*/
	uint32_t	reserved1;	/* reserved (for offset or index) */
	uint32_t	reserved2;	/* reserved (for count or sizeof) */
	uint32_t	reserved3;	/* reserved */
};

symtab_command

对应于LC_SYMTAB命令,用于加载符号表信息。

/*
 * The symtab_command contains the offsets and sizes of the link-edit 4.3BSD
 * "stab" style symbol table information as described in the header files
 * <nlist.h> and <stab.h>.
 */
struct symtab_command {
	uint32_t	cmd;		/* LC_SYMTAB */
	uint32_t	cmdsize;	/* sizeof(struct symtab_command) */
	uint32_t	symoff;		/* symbol table offset */
	uint32_t	nsyms;		/* number of symbol table entries */
	uint32_t	stroff;		/* string table offset */
	uint32_t	strsize;	/* string table size in bytes */
};

dysymtab_command

这个源码较长,这里就不贴出来了。其中一些关键的也是动态符号表的偏移量和符号个数等。

对应于LC_DYSYMTAB命令,用于动态链接器所需要的符号表信息。

/*
 * This is the second set of the symbolic information which is used to support
 * the data structures for the dynamically link editor.
 */

nlist_t

/*
 * This is the symbol table entry structure for 64-bit architectures.
 */
struct nlist_64 {
    union {
        uint32_t  n_strx; /* index into the string table */
    } n_un;
    uint8_t n_type;        /* type flag, see below */
    uint8_t n_sect;        /* section number or NO_SECT */
    uint16_t n_desc;       /* see <mach-o/stab.h> */
    uint64_t n_value;      /* value of this symbol (or stab offset) */
};

String Table

strtab是存放section名、变量名、符号名的字符串表,以\0为分隔符。符号名字符串的地址 = strtab的基地址base + 符号表中该符号名的偏移量offset。

rebind_symbols_for_image(_rebindings_head, header, slide);

接下来就是rebind的关键操作,下边的两个for循环是真的难点,要深入理解Mach-O文件的格式才能基本看懂。

static void rebind_symbols_for_image(struct rebindings_entry *rebindings,
                                     const struct mach_header *header,
                                     intptr_t slide) {
  Dl_info info;
  if (dladdr(header, &info) == 0) {
    return;
  }

  /// 先声明几个变量,在第二次循环中会使用到。
  segment_command_t *cur_seg_cmd;
  segment_command_t *linkedit_segment = NULL;
  struct symtab_command* symtab_cmd = NULL;
  struct dysymtab_command* dysymtab_cmd = NULL;
  
  /// 首先跳过Mach-O Header
  uintptr_t cur = (uintptr_t)header + sizeof(mach_header_t);
  /// 下边依次遍历每一个Load Command。cmdsize为加载命令的内存大小。
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    /// 取出Load Command
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
      /// LC_SEGMENT_ARCH_DEPENDENT是啥意思?特定架构?
      if (strcmp(cur_seg_cmd->segname, SEG_LINKEDIT) == 0) {
        /// __LINKEDIT包含了方法和变量的元数据(位置、偏移量),及代码签名等信息。
        /// 动态链接库使用的原始数据。
        linkedit_segment = cur_seg_cmd;
      }
    } else if (cur_seg_cmd->cmd == LC_SYMTAB) {
      /// 符号表
      symtab_cmd = (struct symtab_command*)cur_seg_cmd;
    } else if (cur_seg_cmd->cmd == LC_DYSYMTAB) {
      /// 动态符号表
      dysymtab_cmd = (struct dysymtab_command*)cur_seg_cmd;
    }
  }

  if (!symtab_cmd || !dysymtab_cmd || !linkedit_segment ||
      !dysymtab_cmd->nindirectsyms) {
    return;
  }

  // Find base symbol/string table addresses
  /// 计算得到Mach-O在虚拟内存中的基地址
  /// linkedit_segment->vmaddr为__LINKEDIT段的虚拟地址。
  /// linkedit_segment->fileoff为__LINKEDIT段在Mach-O中的偏移量
  /// 所以linkedit_segment->vmaddr - linkedit_segment->fileoff,即得到了进行链接时的基地址。
  /// slide为该image(Mach-O文件)在虚拟内存中地址偏移量(ASLR引入)。
  uintptr_t linkedit_base = (uintptr_t)slide + linkedit_segment->vmaddr - linkedit_segment->fileoff;
  /// LC_SYMTAB和LC_DYSYMTAB中所记录的Offset都是基于Mach-O在虚拟内存中的基地址的。
  /// linkedit_base + symtab_cmd->symoff 即为符号表的地址
  /// linkedit_base + symtab_cmd->stroff 即为符号表的字符串表地址
  /// 这部分要看symtab_command结构的详细组成部分。
  /// 将这一部分内存地址,对应于一个nlist_t结构体。nlist_t是符号表入口
  nlist_t *symtab = (nlist_t *)(linkedit_base + symtab_cmd->symoff);
  /// 获取字符串表
  char *strtab = (char *)(linkedit_base + symtab_cmd->stroff);

  // Get indirect symbol table (array of uint32_t indices into symbol table)
  /// linkedit_base + dysymtab_cmd->indirectsymoff 即为间接符号表的地址
  uint32_t *indirect_symtab = (uint32_t *)(linkedit_base + dysymtab_cmd->indirectsymoff);

  /// 又重新来一次:遍历每个Load Command
  cur = (uintptr_t)header + sizeof(mach_header_t);
  for (uint i = 0; i < header->ncmds; i++, cur += cur_seg_cmd->cmdsize) {
    cur_seg_cmd = (segment_command_t *)cur;
    if (cur_seg_cmd->cmd == LC_SEGMENT_ARCH_DEPENDENT) {
      if (strcmp(cur_seg_cmd->segname, SEG_DATA) != 0 &&
          strcmp(cur_seg_cmd->segname, SEG_DATA_CONST) != 0) {
        continue;
      }
      /// 找到__DATA段
      /// 一个Load Command的segment下边有多个section
      /// 其实关心的仅仅是懒绑定表和非懒绑定表。因为这两个section中存储的是函数实现地址。
      for (uint j = 0; j < cur_seg_cmd->nsects; j++) {
        /// 这一句不太懂,为啥是sizeof(segment_command_t),而非sizeof(section_64)
        section_t *sect =
          (section_t *)(cur + sizeof(segment_command_t)) + j;
        /// 懒绑定符号表
        if ((sect->flags & SECTION_TYPE) == S_LAZY_SYMBOL_POINTERS) {
          perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
        }
        /// 非懒绑定符号表
        if ((sect->flags & SECTION_TYPE) == S_NON_LAZY_SYMBOL_POINTERS) {
          perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);
        }
      }
    }
  }
}

关于这个计算过程的注释,已经都放在对应的代码位置了。其中计算的linkedit_base为链接时Mach-O在虚拟内存中的基地址,以此来计算符号表、字符串表、动态符号表的间接符号表的地址。

间接符号表 uint32_t indirectsymoff; 比较难理解。包含符号指针和routine stubs的section,有跟间接符号表的每个指针和stub分别对应起来的索引(也可能有基于section大小和入口固定大小的隐含个数)。对于这两类中的每个section,对应于间接符号表的索引存储在section header的reserved1字段(注意,这个字段在之后会用到)。间接符号表入口是一个简单的对应于符号表的32位索引,该符号即通过指针或者stub来引用。间接符号表用来在section中匹配入口。

/*
 * The sections that contain "symbol pointers" and "routine stubs" have
 * indexes and (implied counts based on the size of the section and fixed
 * size of the entry) into the "indirect symbol" table for each pointer
 * and stub.  For every section of these two types the index into the
 * indirect symbol table is stored in the section header in the field
 * reserved1.  An indirect symbol table entry is simply a 32bit index into
 * the symbol table to the symbol that the pointer or stub is referring to.
 * The indirect symbol table is ordered to match the entries in the section.
 */
uint32_t indirectsymoff; /* file offset to the indirect symbol table */
uint32_t nindirectsyms;  /* number of indirect symbol table entries */

INDIRECT_SYMBOL_LOCAL和INDIRECT_SYMBOL_ABS是两个特殊的间接符号表入口。

/*
 * An indirect symbol table entry is simply a 32bit index into the symbol table 
 * to the symbol that the pointer or stub is refering to.  Unless it is for a
 * non-lazy symbol pointer section for a defined symbol which strip(1) as 
 * removed.  In which case it has the value INDIRECT_SYMBOL_LOCAL.  If the
 * symbol was also absolute INDIRECT_SYMBOL_ABS is or'ed with that.
 */
#define INDIRECT_SYMBOL_LOCAL	0x80000000
#define INDIRECT_SYMBOL_ABS	0x40000000

perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);

在上一步,对懒绑定符号表和非懒绑定符号表,均调用了perform_rebinding_with_section(rebindings, sect, slide, symtab, strtab, indirect_symtab);函数,指向rebinding操作。这里的sect即为nl_symbol_ptr和la_symbol_ptr这两个section。

static void perform_rebinding_with_section(struct rebindings_entry *rebindings,
                                           section_t *section,
                                           intptr_t slide,
                                           nlist_t *symtab,
                                           char *strtab,
                                           uint32_t *indirect_symtab) {
  /// 间接符号表的地址 + reserved1,即得到了间接符号表中存储的所有索引。这里是一个数组,存储的是uint32_t类型元素。
  /// 不晓得为啥用reserved1这个字段,而不使用一个更加有意义的命名。
  uint32_t *indirect_symbol_indices = indirect_symtab + section->reserved1;
  /// section的地址 + slide偏移量 即为la_symbol_ptr在Mach-O映射的虚拟内存中的实际地址。
  /// slide依然是该image(Mach-O文件)在虚拟内存中地址偏移量(ASLR引入)。
  /// 该指针指向另一个指针A,A指向的是所有的懒绑定符号表数组
  void **indirect_symbol_bindings = (void **)((uintptr_t)slide + section->addr);
  /// 针对每4个或8个bytes(这跟CPU是多少位有关),进行遍历操作
  for (uint i = 0; i < section->size / sizeof(void *); i++) {
    /// 获取一个间接符号表中的索引,即
    uint32_t symtab_index = indirect_symbol_indices[i];
    /// 跳过这两类入口
    if (symtab_index == INDIRECT_SYMBOL_ABS || symtab_index == INDIRECT_SYMBOL_LOCAL ||
        symtab_index == (INDIRECT_SYMBOL_LOCAL   | INDIRECT_SYMBOL_ABS)) {
      continue;
    }
    
    /// 获取到符号表对应的字符串表,n_un.n_strx是字符串表的索引。这里的索引可以简单等同于offset
    uint32_t strtab_offset = symtab[symtab_index].n_un.n_strx;
    /// 这样即获得了该索引对应着的符号名(即string table中存储的是符号名,通过偏移量获取符号名)
    char *symbol_name = strtab + strtab_offset;
    struct rebindings_entry *cur = rebindings;
    while (cur) {
      /// 遍历rebindings中的每个元素
      for (uint j = 0; j < cur->rebindings_nel; j++) {
        /// 符号名与方法名相等。为何要判断&symbol_name[1]?
        if (strlen(symbol_name) > 1 &&
            strcmp(&symbol_name[1], cur->rebindings[j].name) == 0) {
          if (cur->rebindings[j].replaced != NULL &&
              indirect_symbol_bindings[i] != cur->rebindings[j].replacement) {
            /// 先将原有的函数实现保存
            *(cur->rebindings[j].replaced) = indirect_symbol_bindings[i];
          }
          /// 替换函数
          indirect_symbol_bindings[i] = cur->rebindings[j].replacement;
          goto symbol_loop;
        }
      }
      cur = cur->next;
    }
  symbol_loop:;
  }
}

对于fishhook还有一些疑问待解决:

  1. 为何要判断&symbol_name[1]?
  2. 对indirect_symbol_bindings的理解不够透彻?

对objc_msgSend进行hook

对objc_msgSend进行hook是另一个比较深入的话题了,需要涉及不少汇编代码了,暂时还遗留太多疑问了。这里暂时先贴一些参考资料:

  1. objc_msgSend_hook
  2. MTHawkeye
  3. amd64-and-va_arg
  4. IHI0055B_aapcs64
  5. ARM64FunctionCallingConventions

参考资料

  1. Aspects
  2. fishhook
  3. Hook 原理之 fishhook 源码解析
  4. 巧用符号表 - 探求 fishhook 原理(一)
  5. 验证试验 - 探求 fishhook 原理(二)