iOS类的结构之cache_t分析

821 阅读17分钟

前言

本文基于X86_64架构解析,部分内容会因架构不同而有所不同,请以实际架构为准。

1. cache_t解析

已知iOS方法调用是通过SEL(方法编号)在内存中查找IMP(方法指针)的过程,但是如果方法数量巨大,那么每次调用任一方法都需要把所有方法遍历一遍,效率低下,为了提高效率,使响应更加快速,cache_t结构体就出现了。cache_t里的指针指向了调用过的方法的SELIMP组成的bucket_t结构体存储的哈希链表的首地址(转为10进制存储,只存首地址节省内存,不会使类无限增大,后续取值通过内存平移),以便后续方法的查找。

cache_t结构简单图解:

cache_t结构.001.jpeg

1.1 cache_t结构体部分源码解析

struct cache_t {
private:
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask;            // 8
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask;              // 4 
#if __LP64__
            uint16_t                   _flags;                  // 2
#endif
            uint16_t                   _occupied;               // 2
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache; // 8
    };
    
    //缓存为空缓存,第一次判断使用
    bool isConstantEmptyCache() const;
    bool canBeFreed() const;
    // 可使用总容量,为capacity - 1
    mask_t mask() const;
    
    // 两倍扩容
    void incrementOccupied();
    // 设置buckets和mask
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    // 重新开辟内存
    void reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld);
    // 根据oldCapacity回收oldBuckets
    void collect_free(bucket_t *oldBuckets, mask_t oldCapacity);
    
public:
    // 开辟的总容量
    unsigned capacity() const;
    // 获取buckets
    struct bucket_t *buckets() const;
    // 获取class
    Class cls() const;
    // 获取已缓存的数量
    mask_t occupied() const;
    // 将调用的方法插入到buckets所在的内存区域
    void insert(SEL sel, IMP imp, id receiver);
    
    // 篇幅原因,很多地方省略了代码
}

1.2 cache_t部分成员变量解析

  • _bucketsAndMaybeMask:根据不同架构决定存放不同的信息,X86_64存放bucketsarm6416位存储mask,低48buckets
  • _maybeMask:当前缓存区的容量,arm64架构下不使用。
  • _occupied:当前缓存的方法个数。

X86_64架构下_bucketsAndMaybeMask验证:

(lldb) p/x XJPerson.class
(Class) $0 = 0x0000000100008308 XJPerson
(lldb) p/x 0x0000000100008308 + 0x10
(long) $1 = 0x0000000100008318
(lldb) p (cache_t *)0x0000000100008318
(cache_t *) $2 = 0x0000000100008318
(lldb) p *$2
(cache_t) $3 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4298437504
    }
  }
   = {
     = {
      _maybeMask = {
        std::__1::atomic<unsigned int> = {
          Value = 0
        }
      }
      _flags = 32784
      _occupied = 0
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0000801000000000
      }
    }
  }
}
(lldb) p $3.buckets()
(bucket_t *) $4 = 0x000000010034f380
// 16进制输出_bucketsAndMaybeMask
// 还原buckets地址
(lldb) p/x 4298437504                
(long) $5 = 0x000000010034f380       

// $4 == $5,_bucketsAndMaybeMask指针就指向buckets的首地址

图解: image.png

1.3 cache_t部分关键函数解析

1.3.1 cache_t::insert函数解析

  1. 获取当前已缓存方法个数(第一次为0),然后+1。
  2. 从获取缓存区容量,第一次为0。
  3. 判断是否为第一次缓存方法,第一次缓存就开辟capacity(1 << INIT_CACHE_SIZE_LOG2(X86_64为2,arm64为1)) * sizeof(bucket_t)大小的内存空间,将bucket_t *首地址存入_bucketsAndMaybeMask,将newCapacity - 1mask存入_maybeMask_occupied设置为0。
  4. 不是第一次缓存,就判断是否需要扩容(已缓存容量超过总容量的3/4或者7/8),需要扩容就双倍扩容(但不能大于最大值),然后像第三步一样重新开辟内存,并且回收旧缓存区的内存。
  5. 哈希算法算出方法缓存的位置,do{} while()循环判断当前位置是否可存,如果哈希冲突了,就一直再哈希,直到找到可存入的位置位置,如果找完都未找到就调用bad_cache函数。
// 篇幅原因,省略部分代码
void cache_t::insert(SEL sel, IMP imp, id receiver)
{
    // 获取当前已缓存方法个数(第一次为0),然后+1
    mask_t newOccupied = occupied() + 1;
    // 从获取缓存区容量,第一次为0
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    // 判断是否为第一次缓存方法
    if (slowpath(isConstantEmptyCache())) {
        // Cache is read-only. Replace it.
        // INIT_CACHE_SIZE = 1 << INIT_CACHE_SIZE_LOG2,
        // INIT_CACHE_SIZE_LOG2在不同架构下值不同,
        // X86_64为2,arm64为1
        if (!capacity) capacity = INIT_CACHE_SIZE;
        // 重新开辟一块capacity * sizeof(bucket_t)大小的内存空间,
        // 将`bucket_t *`首地址存入`_bucketsAndMaybeMask`,
        // 将`newCapacity - 1`的`mask`存入`_maybeMask`,
        // _occupied设置为0
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    // 未超过容量的3/4 或 7/8(根据架构决定),正常使用
    else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
    }
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.
    }
#endif
    else {
        // 超过容量的3/4 或 7/8,两倍扩容
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        // MAX_CACHE_SIZE = 1 << MAX_CACHE_SIZE_LOG2,MAX_CACHE_SIZE_LOG2 = 16
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        // 重新开辟一块capacity * sizeof(bucket_t)大小的内存空间,
        // 将`bucket_t *`首地址存入`_bucketsAndMaybeMask`,
        // 将`newCapacity - 1`的`mask`存入`_maybeMask`,
        // _occupied设置为0,回收旧内存
        reallocate(oldCapacity, capacity, true);
    }
    
    // 创建bucket_t
    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    // 哈希算法,算出存储位
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    do {
        if (fastpath(b[i].sel() == 0)) { // i处为空,可以插入
            // _occupied++,缓存的方法个数+1
            incrementOccupied();
            设置bucket_t的sel,imp,cls
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin)); // 哈希冲突,再哈希

    bad_cache(receiver, (SEL)sel);
#endif // !DEBUG_TASK_THREADS
}

1.3.2 cache_fill_ratio函数解析

负载因子,缓存bucket_t的内存空间的占用比率,架构不同,策略不同,arm64capacity * 7 / 8X86_64capacity * 3 / 4,当使用率>此比率时,就需要进行两倍扩容。

  • 3/4的占用率对空间利用率、防止哈希冲突都比较有利。

  • 7/8增加缓存占用率可以减少缓存中的碎片和空间的浪费,但代价是哈希冲突也相应增加了。

// X86_64
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 3 / 4;
}
// arm64
// 87.5%的比率
static inline mask_t cache_fill_ratio(mask_t capacity) {
    return capacity * 7 / 8;
}

1.3.3 reallocate函数解析

重新开辟newCapacity * sizeof(bucket_t)大小的内存空间,将bucket_t *首地址存入_bucketsAndMaybeMask,将newCapacity - 1mask存入_maybeMaskfreeOld代表是否回收旧内存,第一次插入方法时为false,后续扩容时为true,调用collect_free函数清空、回收。

ALWAYS_INLINE
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld)
{
    bucket_t *oldBuckets = buckets();
    bucket_t *newBuckets = allocateBuckets(newCapacity);

    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this

    ASSERT(newCapacity > 0);
    ASSERT((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);

    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    if (freeOld) {
        collect_free(oldBuckets, oldCapacity);
    }
}

1.3.4 allocateBuckets函数解析

开辟newCapacity * sizeof(bucket_t)大小的内存空间,创建新的bucket_t指针,CACHE_END_MARKER1时(__arm__ || __x86_64__ || __i386__架构),会存储end标记,即在Capacity - 1的位置存储一个bucket_t指针,sel1imp根据架构为newBuckets(缓存区首地址)或者newBuckets - 1classnil

#if CACHE_END_MARKER

bucket_t *cache_t::endMarker(struct bucket_t *b, uint32_t cap)
{
    return (bucket_t *)((uintptr_t)b + bytesForCapacity(cap)) - 1;
}

bucket_t *cache_t::allocateBuckets(mask_t newCapacity)
{
    // Allocate one extra bucket to mark the end of the list.
    // This can't overflow mask_t because newCapacity is a power of 2.
    bucket_t *newBuckets = (bucket_t *)calloc(bytesForCapacity(newCapacity), 1);

    bucket_t *end = endMarker(newBuckets, newCapacity);

#if __arm__
    // End marker's sel is 1 and imp points BEFORE the first bucket.
    // This saves an instruction in objc_msgSend.
    end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)(newBuckets - 1), nil);
#else
    // End marker's sel is 1 and imp points to the first bucket.
    end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)newBuckets, nil);
#endif
    
    if (PrintCaches) recordNewCache(newCapacity);

    return newBuckets;
}

#else 
// M1版iMac走这里,因为M1首次创建的容量为2,比率为7/8,所以没有设置endMarker
bucket_t *cache_t::allocateBuckets(mask_t newCapacity)
{
    if (PrintCaches) recordNewCache(newCapacity);

    return (bucket_t *)calloc(bytesForCapacity(newCapacity), 1);
}
#endif

扩容策略说明:

为什么要清空oldBuckets,而不是空间扩容,然后在后面附加新的缓存呢?

解答:已经开辟的内存无法更改,这里的扩容其实是伪扩容,是开辟了一块新的内存,替代了原来的旧内存,之所以使用这种方式,第一,如果将旧buckets的缓存都拿出来,放入新的buckets,耗费性能和时间;第二,苹果缓存策略认为越新越好。举例说明,A方法被调用了一次之后,被继续调用的概率极低,扩容之后仍然保持缓存没有意义,如果再次调用A方法,会再一次进行缓存,直到下一次扩容之前;第三,防止缓存的方法无限增多,导致方法查找缓慢。

1.3.5 setBucketsAndMask函数解析

  • bucket_t *首地址存入_bucketsAndMaybeMask
  • newCapacity - 1mask存入_maybeMask
  • _occupied设为0,因为刚刚设置buckets,还没有真正缓存方法。
void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets' bounds).
    // It is unsafe for objc_msgSend to see old buckets and new mask.
    // Therefore we write new buckets, wait a lot, then write new mask.
    // objc_msgSend reads mask first, then buckets.

#ifdef __arm__
    // ensure other threads see buckets contents before buckets pointer
    mega_barrier();

    _bucketsAndMaybeMask.store((uintptr_t)newBuckets, memory_order_relaxed);

    // ensure other threads see new buckets before new mask
    mega_barrier();

    _maybeMask.store(newMask, memory_order_relaxed);
    _occupied = 0;
#elif __x86_64__ || i386
    // ensure other threads see buckets contents before buckets pointer
    _bucketsAndMaybeMask.store((uintptr_t)newBuckets, memory_order_release);

    // ensure other threads see new buckets before new mask
    _maybeMask.store(newMask, memory_order_release);
    _occupied = 0;
#else
#error Don't know how to do setBucketsAndMask on this architecture.
#endif
}

1.3.6 collect_free函数解析

将传入的内存地址的内容清空,回收内存。

void cache_t::collect_free(bucket_t *data, mask_t capacity)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked();
#else
    runtimeLock.assertLocked();
#endif

    if (PrintCaches) recordDeadCache(capacity);

    _garbage_make_room ();
    garbage_byte_size += cache_t::bytesForCapacity(capacity);
    garbage_refs[garbage_count++] = data;
    cache_t::collectNolock(false);
}

1.3.7 cache_hash函数解析

哈希算法,计算方法插入的位置。

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    uintptr_t value = (uintptr_t)sel;
#if CONFIG_USE_PREOPT_CACHES
    value ^= value >> 7;
#endif
    return (mask_t)(value & mask);
}

1.3.8 cache_next函数解析

再哈希算法,用于哈希冲突后,再次计算方法插入的位置。

#if CACHE_END_MARKER
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return (i+1) & mask;
}
#elif __arm64__
static inline mask_t cache_next(mask_t i, mask_t mask) {
    return i ? i-1 : mask;
}
#else
#error unexpected configuration
#endif

2 bucket_t解析

2.1 bucket_t结构体部分源码解析

struct bucket_t {
private:
    // IMP-first is better for arm64e ptrauth and no worse for arm64.
    // SEL-first is better for armv7* and i386 and x86_64.
#if __arm64__
    explicit_atomic<uintptr_t> _imp; // imp地址,以uintptr_t(unsigned long)格式存储
    explicit_atomic<SEL> _sel;       // sel
#else
    explicit_atomic<SEL> _sel;
    explicit_atomic<uintptr_t> _imp;
#endif
    
    // imp编码,(uintptr_t)newImp ^ (uintptr_t)cls
    // imp地址转10进制 ^ class地址转10进制
    // imp以uintptr_t(unsigned long)格式存储在bucket_t中
    // Sign newImp, with &_imp, newSel, and cls as modifiers.
    uintptr_t encodeImp(UNUSED_WITHOUT_PTRAUTH bucket_t *base, IMP newImp, UNUSED_WITHOUT_PTRAUTH SEL newSel, Class cls) const {
        if (!newImp) return 0;
#if CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_PTRAUTH
        return (uintptr_t)
            ptrauth_auth_and_resign(newImp,
                                    ptrauth_key_function_pointer, 0,
                                    ptrauth_key_process_dependent_code,
                                    modifierForSEL(base, newSel, cls));
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_ISA_XOR
        // imp地址转10进制 ^ class地址转10进制
        return (uintptr_t)newImp ^ (uintptr_t)cls;
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_NONE
        return (uintptr_t)newImp;
#else
#error Unknown method cache IMP encoding.
#endif
    }
    
public:
    // 返回sel
    inline SEL sel() const { return _sel.load(memory_order_relaxed); }
    
    // imp解码,(IMP)(imp ^ (uintptr_t)cls)
    // imp地址10进制 ^ class地址10进制,再转换成IMP类型
    // 原理:c = a ^ b; a = c ^ b; -> b ^ a ^ b = a
    inline IMP imp(UNUSED_WITHOUT_PTRAUTH bucket_t *base, Class cls) const {
        uintptr_t imp = _imp.load(memory_order_relaxed);
        if (!imp) return nil;
#if CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_PTRAUTH
        SEL sel = _sel.load(memory_order_relaxed);
        return (IMP)
            ptrauth_auth_and_resign((const void *)imp,
                                    ptrauth_key_process_dependent_code,
                                    modifierForSEL(base, sel, cls),
                                    ptrauth_key_function_pointer, 0);
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_ISA_XOR
        // imp地址10进制 ^ class地址10进制,再转换成IMP类型
        return (IMP)(imp ^ (uintptr_t)cls);
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_NONE
        return (IMP)imp;
#else
#error Unknown method cache IMP encoding.
#endif
    }
    
    // 此处只是声明,实现请看下面函数解析
    void set(bucket_t *base, SEL newSel, IMP newImp, Class cls);

// 篇幅原因,省略部分代码
}

2.2 bucket_t成员变量解析

  • _sel,方法sel。
  • _imp,方法实现地址的10进制,需要^上class地址的10进制,然后再转换成IMP类型。

2.3 bucket_t部分关键函数解析

2.3.1 sel函数解析

获取bucket_的sel。

// 返回sel
    inline SEL sel() const { return _sel.load(memory_order_relaxed); }

2.3.2 encodeImp函数解析

IMP编码,imp地址10进制^class地址10进制。impuintptr_tunsigned long)格式存储在bucket_t中。

// imp编码,(uintptr_t)newImp ^ (uintptr_t)cls
    // imp地址转10进制 ^ class地址转10进制
    // imp以 uintptr_t(unsigned long)格式存储在bucket_t中
    // Sign newImp, with &_imp, newSel, and cls as modifiers.
    uintptr_t encodeImp(UNUSED_WITHOUT_PTRAUTH bucket_t *base, IMP newImp, UNUSED_WITHOUT_PTRAUTH SEL newSel, Class cls) const {
        if (!newImp) return 0;
#if CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_PTRAUTH
        return (uintptr_t)
            ptrauth_auth_and_resign(newImp,
                                    ptrauth_key_function_pointer, 0,
                                    ptrauth_key_process_dependent_code,
                                    modifierForSEL(base, newSel, cls));
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_ISA_XOR
        // imp地址转10进制 ^ class地址转10进制
        return (uintptr_t)newImp ^ (uintptr_t)cls;
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_NONE
        return (uintptr_t)newImp;
#else
#error Unknown method cache IMP encoding.
#endif
    }

2.3.3 imp函数解析

IMP解码并返回,imp地址10进制^class地址10进制,再转换成IMP类型(跟上面的编码函数组成对称编解码)。

编解码原理:a两次^ b,还是等于a,即:c = a ^ b; a = c ^ b; -> b ^ a ^ b = a

这里的class就是算法中的salt(盐),至于为什么用class作为salt,因为这些imp都归属于class,所以用class作为salt

// imp解码,(IMP)(imp ^ (uintptr_t)cls)
    // imp地址10进制 ^ class地址10进制,再转换成IMP类型
    // 原理:b ^ a ^ b = a
    inline IMP imp(UNUSED_WITHOUT_PTRAUTH bucket_t *base, Class cls) const {
        uintptr_t imp = _imp.load(memory_order_relaxed);
        if (!imp) return nil;
#if CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_PTRAUTH
        SEL sel = _sel.load(memory_order_relaxed);
        return (IMP)
            ptrauth_auth_and_resign((const void *)imp,
                                    ptrauth_key_process_dependent_code,
                                    modifierForSEL(base, sel, cls),
                                    ptrauth_key_function_pointer, 0);
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_ISA_XOR
        // imp地址10进制 ^ class地址10进制,再转换成IMP类型
        return (IMP)(imp ^ (uintptr_t)cls);
#elif CACHE_IMP_ENCODING == CACHE_IMP_ENCODING_NONE
        return (IMP)imp;
#else
#error Unknown method cache IMP encoding.
#endif
    }

2.3.4 bucket_t::set函数解析

bucket_t设置selimpclass

void bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls)
{
    ASSERT(_sel.load(memory_order_relaxed) == 0 ||
           _sel.load(memory_order_relaxed) == newSel);

    // objc_msgSend uses sel and imp with no locks.
    // It is safe for objc_msgSend to see new imp but NULL sel
    // (It will get a cache miss but not dispatch to the wrong place.)
    // It is unsafe for objc_msgSend to see old imp and new sel.
    // Therefore we write new imp, wait a lot, then write new sel.
    
    uintptr_t newIMP = (impEncoding == Encoded
                        ? encodeImp(base, newImp, newSel, cls)
                        : (uintptr_t)newImp);

    if (atomicity == Atomic) {
        _imp.store(newIMP, memory_order_relaxed);
        
        if (_sel.load(memory_order_relaxed) != newSel) {
#ifdef __arm__
            mega_barrier();
            _sel.store(newSel, memory_order_relaxed);
#elif __x86_64__ || __i386__
            _sel.store(newSel, memory_order_release);
#else
#error Don't know how to do bucket_t::set on this architecture.
#endif
        }
    } else {
        _imp.store(newIMP, memory_order_relaxed);
        _sel.store(newSel, memory_order_relaxed);
    }
}

2.4 imp编解码lldb调试验证

  1. cache_t::insert函数的do{} while()循环中bucket_t::set函数之后添加相关代码。

image.png

  1. imp函数中添加断点,并lldb调试验证。
(lldb) p imp
(uintptr_t) $0 = 48640
(lldb) p (IMP)(imp ^ (uintptr_t)cls)  // 直接解码
(IMP) $1 = 0x0000000100003d20 (KCObjcBuild`-[XJPerson smileToLife])
// 下面一步一步解码,并再次编码验证
(lldb) p (uintptr_t)cls
(uintptr_t) $2 = 4295000864
(lldb) p 48640 ^ 4295000864
(long) $3 = 4294982944
(lldb) p/x 4294982944
(long) $4 = 0x0000000100003d20
(lldb) p (IMP)0x0000000100003d20
// 解码成功
(IMP) $5 = 0x0000000100003d20 (KCObjcBuild`-[XJPerson smileToLife])
// 再次编码验证
(lldb) p 4294982944 ^ 4295000864
(long) $6 = 48640

图解:

image.png

3. lldb动态调试验证cache_t结构

3.1 示例代码

@interface XJPerson : NSObject

- (void)loveEveryone;

- (void)smileToLife;

- (void)takeCareFamily;

@end

@implementation XJPerson

- (void)loveEveryone
{
    NSLog(@"%s", __func__);
}

- (void)smileToLife
{
    NSLog(@"%s", __func__);
}

- (void)takeCareFamily
{
    NSLog(@"%s", __func__);
}

@end

int main(int argc, const char * argv[]) {
    @autoreleasepool {

        XJPerson *p  = [XJPerson alloc];
        Class pClass = [XJPerson class];
        NSLog(@"%@",pClass);
        
    }
    return 0;
}

3.2 lldb调试步骤说明

(lldb) p/x XJPerson.class
(Class) $0 = 0x0000000100008308 XJPerson
(lldb) p/x 0x0000000100008308 + 0x10         // 偏移16字节,拿到cache_t指针(isa8字节,superclass8字节)
(long) $1 = 0x0000000100008318
(lldb) p (cache_t *)0x0000000100008318       // 类型转换
(cache_t *) $2 = 0x0000000100008318          
(lldb) p *$2                                 // 取值cache输出
(cache_t) $3 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4298437504
    }
  }
   = {
     = {
      _maybeMask = {                        // _maybeMask-缓存容量为0,因为当前还未调用方法,所以还未开辟缓存空间
        std::__1::atomic<unsigned int> = {
          Value = 0
        }
      }
      _flags = 32784
      _occupied = 0                        // _occupied-方法缓存个数,当前还未调用方法
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0000801000000000
      }
    }
  }
}
(lldb) p [p loveEveryone]                  // lldb调用方法
2021-06-26 21:00:31.415209+0800 KCObjcBuild[2793:42005] -[XJPerson loveEveryone]
(lldb) p *$2                               // 再次取值cache输出
(cache_t) $4 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4302872208
    }
  }
   = {
     = {
      _maybeMask = {    // _maybeMask-缓存容量为7,
                        // x86_64架构第一次开辟容量应该为4,
                        // 此处出现了预期外的结果,原因后面分析
        std::__1::atomic<unsigned int> = {
          Value = 7
        }
      }
      _flags = 32784
      _occupied = 1     // _occupied-方法缓存个数
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0001801000000007
      }
    }
  }
}
(lldb) p [p takeCareFamily]             // 再次调用2个方法
2021-06-26 21:00:55.714319+0800 KCObjcBuild[2793:42005] -[XJPerson takeCareFamily]
(lldb) p [p smileToLife]
2021-06-26 21:01:16.249795+0800 KCObjcBuild[2793:42005] -[XJPerson smileToLife]
(lldb) p *$2                            // 再次取值cache输出
(cache_t) $5 = {
  _bucketsAndMaybeMask = {
    std::__1::atomic<unsigned long> = {
      Value = 4302872208
    }
  }
   = {
     = {
      _maybeMask = {    // _maybeMask-缓存容量为7,
                        // x86_64架构第一次开辟容量应该为4,
                        // 此处出现了预期外的结果,原因后面分析
        std::__1::atomic<unsigned int> = {
          Value = 7
        }
      }
      _flags = 32784
      _occupied = 5     // _occupied-方法缓存个数
                        // 一共调用了3个方法,却缓存了5个
                        // 说明可能缓存了其他未知的方法
    }
    _originalPreoptCache = {
      std::__1::atomic<preopt_cache_t *> = {
        Value = 0x0005801000000007
      }
    }
  }
}

// 内存平移方式取值
// 取出第一个位置的bucket_t
(lldb) p $5.buckets()[0]
(bucket_t) $6 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 3359864
    }
  }
}
(lldb) p $6.sel()
(SEL) $7 = "respondsToSelector:"    // 第一个位置为-[NSObject respondsToSelector:],不是我调用的,lldb系统插入的
(lldb) p $6.imp(nil, pClass)
(IMP) $8 = 0x000000010033c770 (libobjc.A.dylib`-[NSObject respondsToSelector:] at NSObject.mm:2307)

// 取出第二个位置的bucket_t
(lldb) p $5.buckets()[1]
(bucket_t) $9 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 48728
    }
  }
}
(lldb) p $9.sel()
(SEL) $10 = "takeCareFamily"    // 第二个位置为-[XJPerson takeCareFamily]
(lldb) p $9.imp(nil, pClass)
(IMP) $11 = 0x0000000100003d50 (KCObjcBuild`-[XJPerson takeCareFamily])

// 取出第三个位置的bucket_t
(lldb) p $5.buckets()[2]
(bucket_t) $12 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}
(lldb) p $12.sel()            // 第三个位置为空,还未缓存方法
(SEL) $13 = <no value available>

// 取出第四个位置的bucket_t
(lldb) p $5.buckets()[3]
(bucket_t) $14 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 48680
    }
  }
}
(lldb) p $14.sel()
(SEL) $15 = "smileToLife"    // 第四个位置为-[XJPerson smileToLife]
(lldb) p $14.imp(nil, pClass)
(IMP) $16 = 0x0000000100003d20 (KCObjcBuild`-[XJPerson smileToLife])

// 取出第五个位置的bucket_t
(lldb) p $5.buckets()[4]
(bucket_t) $17 = {
  _sel = {
    std::__1::atomic<objc_selector *> = (null) {
      Value = (null)
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 0
    }
  }
}
(lldb) p $17.sel()    // 第五个位置为空,还未缓存方法
(SEL) $18 = <no value available>

// 取出第六个位置的bucket_t
(lldb) p $5.buckets()[5]
(bucket_t) $19 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 3358936
    }
  }
}
(lldb) p $19.sel()
(SEL) $20 = "class"     // 第六个位置为-[NSObject class]
(lldb) p $19.imp(nil, pClass)
(IMP) $21 = 0x000000010033c3d0 (libobjc.A.dylib`-[NSObject class] at NSObject.mm:2243)

// 取出第七个位置的bucket_t
(lldb) p $5.buckets()[6]
(bucket_t) $22 = {
  _sel = {
    std::__1::atomic<objc_selector *> = "" {
      Value = ""
    }
  }
  _imp = {
    std::__1::atomic<unsigned long> = {
      Value = 49144
    }
  }
}
(lldb) p $22.sel()
(SEL) $23 = "loveEveryone"    // 第七个位置为-[XJPerson loveEveryone]
(lldb) p $22.imp(nil, pClass)
(IMP) $24 = 0x0000000100003cf0 (KCObjcBuild`-[XJPerson loveEveryone])
(lldb) 

图解:

image.png image.png image.png image.png image.png

3.3 lldb调试结果分析

  • 调用一个方法,_occupied1_maybeMask7(第一次开辟其实为4),根据前面allocateBuckets函数解析里面提到过的CACHE_END_MARKER1时(__arm__ || __x86_64__ || __i386__架构为1),会存储end标记,再加上调用-[XJPerson loveEveryone]方法之前lldb插入的-[NSObject class]-[NSObject respondsToSelector:]方法,在插入-[XJPerson loveEveryone]时会判断已经超过了3/4的容量,所以插入-[XJPerson loveEveryone]方法之前进行了扩容,清空了之前的缓存,在新的缓存区存入了-[XJPerson loveEveryone]方法。
  • 又调用2个方法后,_occupied5_maybeMask7,因为lldb又在我们方法调用之前插入了-[NSObject class]-[NSObject respondsToSelector:]方法,这样加上上面已经缓存的一个方法,缓存区的方法就达到了5个。

3.4 lldb自动插入方法验证

验证方式一:

  1. cache_t::insert函数中添加打印输出selimpreceiver

image.png

  1. 运行源码,在创建完XJPerson类的对象p1之后断住断点,此时Xcode控制台会输出很多信息,这不是我们想要的,直接清空。

image.png

  1. 然后使用lldb调用p1对象的方法p [p1 smileToLife],然后p p1输出p1对象的信息,比较控制台打印输出的方法的receiverp1的地址,就会发现lldb确实在我们调用方法之前插入了respondsToSelector:class方法。

image.png

总结:

  • lldb在调用用户输入的方法之前可能需要插入相应的其他方法,在现在测试的X86_64架构下,系统底层每次开辟缓存时插入的endMarker加上lldb在我们方法调用前插入了两个方法,到我们自己方法插入时,底层就会判定缓存空间占用比率超过了设定的比率,就进行了扩容,在新的缓存区存入了我们调用的方法smileToLife,并清空、回收了旧的缓存区,所以才会出现前面调用一个方法就_occupied1_maybeMask7的情况。

验证方式二:

  1. cache_t::insert函数中扩容判断之前添加代码,在插入我们所调用的方法时输出缓存区所有的bucketselimp以及地址。

image.png

  1. 运行源码,在创建完XJPerson类的对象p1之后断住断点,然后使用lldb调用p1对象的方法p [p1 smileToLife],这时控制台就会输出缓存区的所有bucket的信息,可以清楚的看到class方法、respondsToSelector:方法以及sel1,imp为缓存区首地址,ClassnilendMarker

image.png

4.仿造源码调试cache_t

4.1 仿造源码调试的优点

仿造源码调试主要解决的问题有三点:

  • 没有源码,或者下载的源码无法直接运行调试。
  • 不会lldb调试,而且lldb调试比较麻烦,过程繁琐。
  • 适用于小规模取样测试。

4.2 仿照流程:

  1. 仿照源码定义xj_objc_class结构体,添加隐藏成员变量isa
  2. 仿照源码定义xj_class_data_bits_t结构体,只保留bits成员变量。
  3. 仿照源码定义xj_bucket_t结构体。
  4. 重命名uint32_tmask_t
  5. 仿照源码定义xj_cache_t结构体,只保留需要用到的几个成员变量,并将源码的_bucketsAndMaybeMask变量替换成xj_bucket_t指针。
  6. 实例化XJPerson类的实例对象,调用方法,并将XJPerson类强转成xj_objc_class指针。
  7. 输出方法缓存个数和缓存容量。
  8. for循环输出缓存的方法selimp

4.3 示例代码:

struct xj_bucket_t {
    SEL _sel;
    IMP _imp;
};

struct xj_class_data_bits_t {
    uintptr_t bits;
};

typedef uint32_t mask_t;
struct xj_cache_t {
    struct xj_bucket_t *_buckets;   // 8
    mask_t              _maybeMask; // 4
    uint16_t            _flags;     // 2
    uint16_t            _occupied;  // 2
};

struct xj_objc_class {
    Class isa;
    Class superclass;
    struct xj_cache_t cache;             // formerly cache pointer and vtable
    struct xj_class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
};

int main(int argc, const char * argv[]) {
    @autoreleasepool {
        
        XJPerson *person = [XJPerson alloc];
        Class pClass = person.class;
        [person likeFood];
        [person enjoyLife];
        [person smileToLife];
        [person likeSwimming];
        [person loveEveryone];
        [person takeCareFamily];
        
        struct xj_objc_class *xj_class = (__bridge struct xj_objc_class *)(pClass);
        
        NSLog(@"occupied = %hu - mask = %u", xj_class->cache._occupied, xj_class->cache._maybeMask);
        
        for (mask_t i = 0; i < xj_class->cache._maybeMask; i++) {
            struct xj_bucket_t bucket = xj_class->cache._buckets[i];
            NSLog(@"%@ - %pf", NSStringFromSelector(bucket._sel), bucket._imp);
        }
        
        NSLog(@"Hello, World!");
    }
    return 0;
}

************************ 打印输出 ************************

2021-06-27 16:16:14.561903+0800 cache_tAnalysis[2857:57028] -[XJPerson likeFood]
2021-06-27 16:16:14.562789+0800 cache_tAnalysis[2857:57028] -[XJPerson enjoyLife]
2021-06-27 16:16:14.562965+0800 cache_tAnalysis[2857:57028] -[XJPerson smileToLife]
2021-06-27 16:16:14.563129+0800 cache_tAnalysis[2857:57028] -[XJPerson likeSwimming]
2021-06-27 16:16:14.563242+0800 cache_tAnalysis[2857:57028] -[XJPerson loveEveryone]
2021-06-27 16:16:14.563362+0800 cache_tAnalysis[2857:57028] -[XJPerson takeCareFamily]
2021-06-27 16:16:14.563403+0800 cache_tAnalysis[2857:57028] occupied = 4 - mask = 7
2021-06-27 16:16:14.563495+0800 cache_tAnalysis[2857:57028] takeCareFamily - 0xba68f
2021-06-27 16:16:14.563553+0800 cache_tAnalysis[2857:57028] likeSwimming - 0xbdf8f
2021-06-27 16:16:14.563585+0800 cache_tAnalysis[2857:57028] (null) - 0x0f
2021-06-27 16:16:14.563620+0800 cache_tAnalysis[2857:57028] smileToLife - 0xbab8f
2021-06-27 16:16:14.563648+0800 cache_tAnalysis[2857:57028] (null) - 0x0f
2021-06-27 16:16:14.563673+0800 cache_tAnalysis[2857:57028] (null) - 0x0f
2021-06-27 16:16:14.563705+0800 cache_tAnalysis[2857:57028] loveEveryone - 0xba88f
2021-06-27 16:16:14.563732+0800 cache_tAnalysis[2857:57028] Hello, World!
Program ended with exit code: 0

4.4 运行结果分析:

  • 仿照源码调试不会想lldb那样自动插入方法。
  • 底层在开辟方法缓存区的时候在插入了endMarker,所以容量之剩下了3个,插入完likeFoodenjoyLife方法后,容量就达到了3/4,所以在插入smileToLife方法之前进行了扩容,清空了之前的缓存,所以,最终缓存的方法只有4个。

5. cache_t流程图

cache_t结构分析流程图.jpg

6. 元类的cache_t

元类的cache_t缓存类方法,原理一样,就不做相应的解析了。