前面的类的底层探究（上）、类的底层探究（中）中分别探索了类结构中的 isa、superclass、bits；本篇来探索 cache

struct objc_class : objc_object { 
    // 省略 ...    
    // Class ISA; // isa 继承自 objc_object 8字节    
    Class superclass; // 8字节    
    cache_t cache;            // formerly cache pointer and vtable   
    class_data_bits_t bits;   // class_rw_t * plus custom rr/alloc flags 
    
   // 省略 ... 
   class_rw_t *data() const {  
       return bits.data();    
   } 
   // 省略 ...
}

一. cache_t 结构分析

1.1 `cache_t`源码分析

源码:

struct cache_t {
private:
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask;
#if __LP64__
            uint16_t                   _flags;
#endif
            uint16_t                   _occupied;
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache;
    };
    
    /*
    CACHE_MASK_STORAGE 的定义：
    
    #define CACHE_MASK_STORAGE_OUTLINED 1
    #define CACHE_MASK_STORAGE_HIGH_16 2
    #define CACHE_MASK_STORAGE_LOW_4 3
    #define CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS 4

    #if defined(__arm64__) && __LP64__

    // arm64 的 macos 或 模拟器
    #if TARGET_OS_OSX || TARGET_OS_SIMULATOR
    #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
    #else    
    // arm64 的真机
    #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_HIGH_16
    #endif
    
    // 非64位 真机
    #elif defined(__arm64__) && !__LP64__
    #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_LOW_4
    // macOS 模拟器
    #else
    #define CACHE_MASK_STORAGE CACHE_MASK_STORAGE_OUTLINED
    #endif
    */
    
    // 。。。中间是不同的架构之间的判断 主要是用来不同类型 mask 和 buckets 的掩码
    // macOS 模拟器
    #if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED
       。。。
    // arm64 的 macos 或 模拟器
    #elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16_BIG_ADDRS
       。。。
    // arm64 的真机
    #elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
       。。。
    // 非64位 真机
    #elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
        。。。
    #endif
    
public:
    void incrementOccupied();
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    void reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld);
    unsigned capacity() const;
    struct bucket_t *buckets() const;
    Class cls() const;
    void insert(SEL sel, IMP imp, id receiver);  
    。。。
};

_bucketsAndMaybeMask 是个指针，占 8字节 和 isa_t中的bits类似
联合体里有一个结构体和一个结构体指针_originalPreoptCache
- 结构体中有三个成员变量 _maybeMask，_flags，_occupied
- _originalPreoptCache 初始时候的缓存，现在探究类中的缓存，这个变量基本不会用到
cache_t 提供了公用的方法去获取值，以及针对不同的系统架构去设置 mask 和 buckets 的掩码

在 cache_t 看到了buckets()，这个类似于class_data_bits_t里面的提供的methods()，都是通过方法获取值。查看bucket_t源码：

struct bucket_t {
private: 
// IMP-first is better for arm64e ptrauth and no worse for arm64. 
// SEL-first is better for armv7* and i386 and x86_64. 
#if __arm64__ //真机 
    explicit_atomic<uintptr_t> _imp; 
    explicit_atomic<SEL> _sel; 
#else explicit_atomic<SEL> 
    _sel; explicit_atomic<uintptr_t>
    _imp; 
#endif
    。。。
    //下面是方法省略 
};

bucket_t区分真机和其它，但是变量没变都是_sel和_imp只不过顺序不一样
bucket_t里面存的是_sel和_imp，cache里面缓存的应该是方法

1.2 `cache_t结构图`

1.3 `lldb`调试验证

1.3.1 自定义 `YJPerson` 类

// YJPerson.h
@interface YJPerson : NSObject 
- (void)say1;
- (void)say2;
- (void)say3;
- (void)say4;
- (void)say5;
- (void)say6;
- (void)say7;
- (void)say8;
- (void)say9;
@end

// YJPerson.m
@implementation YJPerson 
- (void)say1 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say2 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say3 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say4 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say5 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say6 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say7 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say8 {  NSLog(@"调用了 ： %s", __func__); }
- (void)say9 {  NSLog(@"调用了 ： %s", __func__); }
@end

1.3.1 `lldb` 进行调试

cache的地址，需要首地址偏移16字节即0x10
cache_t中的方法buckets()指向的是一块内存的首地址，也是第一个bucket的地址
p/x $3.buckets()[indx] 的方式打印内存中其余的bucket发现_sel和imp
YJPerson对象没有调用对象方法，所以buckets中没有缓存方法的数据

1.3.2 调用对象方法 `[person say1]` 后，继续 `lldb` 调试：

调用了 say1 后 _maybeMask、_occupied 发生了变化，是不是方法已经被缓存了呢？带着疑问继续调试：

总结：

调用过的方法会以 bucket_t 结构缓存到 buckets 中，但 buckets 中的存储并不像数组一样按序存储 why?
调用了 say1 后 _maybeMask、_occupied 分别赋值了 3、1，这又是为啥，为啥就是 3、1
带着疑问我们继续。。。

二. cache_t 源码分析

带着上面的疑问，找到方法缓存的入口：

// 在 cachet_t 结构中
void insert(SEL sel, IMP imp, id receiver);

2.1 insert 方法实现

void cache_t::insert(SEL sel, IMP imp, id receiver)
{
    ...
    // 使用cache直到超出我们预期的填充率
    // Use the cache as-is if until we exceed our expected fill ratio.

    // occupied()获取当前的 _occupied，第一次进来 _occupied = 0，即 newOccupied = 1
    mask_t newOccupied = occupied() + 1; 
    // 容量，第一次进来 oldCapacity = 0，capacity = 0
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    // 第一次进来时，cache 为空，
    if (slowpath(isConstantEmptyCache())) { 
        // Cache is read-only. Replace it.
        // INIT_CACHE_SIZE = (1 << 2 = 4)，初始化容量为 4，
        if (!capacity) capacity = INIT_CACHE_SIZE; 
        // 分配内存
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    // (occupied() + 1) + CACHE_END_MARKER <= 3/4容量 
    else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
        // 第一次分配，默认 capacity = 4， newCccupied = 1, (1+1=2) <= 3 满足条件，啥也不做  
        // 第二次分配，默认 capacity = 4， newCccupied = 2, (2+1=3) <= 3 满足条件，啥也不做  
        // 第三次分配，默认 capacity = 4， newCccupied = 3, (3+1=4) <= 3 不满足条件，走其它
    }
    // (occupied() + 1) + CACHE_END_MARKER <= 总容量
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.
        // newOccupied+1 <= capacity 允许容量刚好是用完, 继续使用
        // 如上面第三次分配，默认 capacity = 4， newCccupied = 3, (3+1=4) <= 4 满足条件，啥也不做
    }
#endif
    // 容量超过 3/4 && 且利用率不能为100% 时
    else {
        // 扩容，按3/4容量为例
        // 如上面第三次分配，默认 capacity = 4， newCccupied = 3, (3+1=4) <= 3 不满足条件，就会走这儿
        // capacity 有值，进行2倍扩容；没值赋初始值4
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        // 容量最大限制 MAX_CACHE_SIZE = 1 << 16, 65536
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        // 分配内存；true 清理旧 buckets
        reallocate(oldCapacity, capacity, true); 
    }

    // 获取到第一个 bucket 的地址，也就是 buckets() 的首地址
    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    // 哈希函数计算插入位置
    mask_t begin = cache_hash(sel, m); 
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    // do while 先执行，后判断
    do {
        // 如果哈希函数计算的索引位置 没有数据（没有bucket，即 sel() == 0）
        if (fastpath(b[i].sel() == 0)) { 
            // _occupied++; // 位置占用+1
            incrementOccupied(); 
            // 把 sel 和 imp 写入 bucket
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        // 如果缓存buckets 中已经有缓存过该 sel，则啥也不做
        if (b[i].sel() == sel) { // 第i个bucket已存放该sel
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
        
        // 再哈希，解决哈希冲突， cache_next(i, m) ==> (i+1) & mask
        // 1. mask范围内, 往后找桶
        // 2. 超过mask范围
        // 如 cache_next(7, 7) ==> 8 & 7 ==> 1000 & 111 ==> 0, 回到0, 重头开始找
    } while (fastpath((i = cache_next(i, m)) != begin));

    // 抛异常
    bad_cache(receiver, (SEL)sel);
#endif // !DEBUG_TASK_THREADS
}

总结，insert主要做了：

计算 buckets 所需容量，这里 buckets 是个哈希表
使用 reallocate()函数 分配空间
存放 bucket，使用 cache_hash()函数 计算索引，cache_next()函数 再次计算解决哈希冲突

2.2 insert 中相关函数

2.2.1 `reallocate` 函数

ALWAYS_INLINE
void cache_t::reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld)
{
    // 获取 oldBuckets 的首地址
    bucket_t *oldBuckets = buckets();
    // 获取新开辟的 newBuckets 的首地址
    bucket_t *newBuckets = allocateBuckets(newCapacity);
    
    // Cache's old contents are not propagated. 
    // This is thought to save cache memory at the cost of extra cache fills.
    // fixme re-measure this
    ASSERT(newCapacity > 0);
    ASSERT((uintptr_t)(mask_t)(newCapacity-1) == newCapacity-1);
    // 设置 Buckets 和 Mask 的值，Buckets 存的是 newBuckets 的首地址 Mask 存的是 newCapacity - 1
    // 此时的 _occupied = 0，因为新开辟的
    setBucketsAndMask(newBuckets, newCapacity - 1);
    
    // freeOld == true，回收旧的内存
    if (freeOld) {
        collect_free(oldBuckets, oldCapacity);
    }
}

reallocate 方法主要做三件事

allocateBuckets 开辟内存
setBucketsAndMask 设置 mask 和 buckets 的值
collect_free 是否释放旧的内存，由 freeOld 控制

2.2.1.1 `allocateBuckets` 函数

size_t cache_t::bytesForCapacity(uint32_t cap)
{
    // bucket_t 大小 * cap
    return sizeof(bucket_t) * cap;
}

#if CACHE_END_MARKER // macOs 模拟器
bucket_t *cache_t::endMarker(struct bucket_t *b, uint32_t cap)
{
    // （首地址 + 开辟的内存）- 1 就是最后一个位置的地址
    return (bucket_t *)((uintptr_t)b + bytesForCapacity(cap)) - 1;
}

bucket_t *cache_t::allocateBuckets(mask_t newCapacity)
{
    // Allocate one extra bucket to mark the end of the list.
    // This can't overflow mask_t because newCapacity is a power of 2.
    // 开辟 newCapacity * bucket_t 大小的内存
    bucket_t *newBuckets = (bucket_t *)calloc(bytesForCapacity(newCapacity), 1);
    // 获取最后一个 bucket 的地址，重要
    bucket_t *end = endMarker(newBuckets, newCapacity);

#if __arm__
    // End marker's sel is 1 and imp points BEFORE the first bucket.
    // This saves an instruction in objc_msgSend.
    end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)(newBuckets - 1), nil);
#else
    // End marker's sel is 1 and imp points to the first bucket.
    // 把最后一个位置的 lastBucket.sel = 1，lastBucket.imp = firstBucket；最后一个位置被占了
    end->set<NotAtomic, Raw>(newBuckets, (SEL)(uintptr_t)1, (IMP)newBuckets, nil);
#endif

    // 记录新的缓存
    if (PrintCaches) recordNewCache(newCapacity);
    return newBuckets;
}

allocateBuckets 方法主要做两件事

calloc(bytesForCapacity(newCapacity), 1)开辟newCapacity * bucket_t 大小的内存
end->set将开辟内存的最后一个位置存入sel = 1，imp = 第一个buket位置的地址

2.2.1.2 `setBucketsAndMask` 函数

void cache_t::setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask)
{
    // objc_msgSend uses mask and buckets with no locks.
    // It is safe for objc_msgSend to see new buckets but old mask.
    // (It will get a cache miss but not overrun the buckets' bounds).
    // It is unsafe for objc_msgSend to see old buckets and new mask.
    // Therefore we write new buckets, wait a lot, then write new mask.
    // objc_msgSend reads mask first, then buckets.

#ifdef __arm__ // 允许使用 SUPPORT_MODE=1 MOD运算符
    // ensure other threads see buckets contents before buckets pointer
    mega_barrier();  // 防止多线程同时访问
    _bucketsAndMaybeMask.store((uintptr_t)newBuckets, memory_order_relaxed);

    // ensure other threads see new buckets before new mask
    mega_barrier();

    _maybeMask.store(newMask, memory_order_relaxed);
    _occupied = 0;
#elif __x86_64__ || i386 // macOs 和 模拟器
    // ensure other threads see buckets contents before buckets pointer
    // _bucketsAndMaybeMask 写入数据
    // ((uintptr_t)newBuckets 是 buckets() 指向这块内存的首地址（也就是第一个bucket的内存）
    _bucketsAndMaybeMask.store((uintptr_t)newBuckets, memory_order_release);

    // ensure other threads see new buckets before new mask
    // 向 _maybeMask 写入数据
    // newMask 最大值是 2^15，正常是 newMask = capacity - 1
    _maybeMask.store(newMask, memory_order_release);
    _occupied = 0; // 置0
#else
#error Don't know how to do setBucketsAndMask on this architecture.
#endif
}

setBucketsAndMask 主要根据不同的架构系统向_bucketsAndMaybeMask 和 _maybeMask写入数据

2.2.2 `collect_free` 函数

void cache_t::collect_free(bucket_t *data, mask_t capacity)
{
#if CONFIG_USE_CACHE_LOCK
    cacheUpdateLock.assertLocked();
#else
    runtimeLock.assertLocked();    
#endif

    if (PrintCaches) recordDeadCache(capacity);
    
    _garbage_make_room ();
    garbage_byte_size += cache_t::bytesForCapacity(capacity);
    garbage_refs[garbage_count++] = data;
    cache_t::collectNolock(false);
}

collect_free 主要是清空数据，回收内存

2.2.3 `cache_hash` 和 `cache_next` 函数

首次哈希函数 cache_hash：

mask_t m = capacity - 1; // m = 容量-1
mask_t begin = cache_hash(sel, m);  // 容量-1 和 sel 作为参数计算 hash 值
mask_t i = begin;\

static inline mask_t cache_hash(SEL sel, mask_t mask) 
{
    // 将 sel 转为 无符号长整型
    uintptr_t value = (uintptr_t)sel; 
#if CONFIG_USE_PREOPT_CACHES // 真机
    value ^= value >> 7;
#endif
    // 与 mask 进行与运算（这样得到的值范围是 0 ~ mask）
    return (mask_t)(value & mask); 
}

首次哈希函数 cache_next：

// 再次哈希使用的是当前的 位置 和 容量-1 作为参数进行cache_next计算
cache_next(i, m)
static inline mask_t cache_next(mask_t i, mask_t mask) {
    // 如果i不为0，返回i-1，否则返回mask（容量-1）；
    // 也可以理解为判断发生冲突的位置是不是在buckets的最开头
    // 如果不在最开头就直接前移(i-1)，如果在最开头就直接跳到 容量-1(mask) 的位置，
    // 再依次向前，直到再次遇到一开始的begin位置，此时说明循环了一圈了还没找到空位置插，坏缓存了；
    return i ? i-1 : mask;
}

2.2.4 缓存写入函数 `set`

template<Atomicity atomicity, IMPEncoding impEncoding>
void bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls)
{
    ASSERT(_sel.load(memory_order_relaxed) == 0 ||
           _sel.load(memory_order_relaxed) == newSel);
    // objc_msgSend uses sel and imp with no locks.
    // It is safe for objc_msgSend to see new imp but NULL sel
    // (It will get a cache miss but not dispatch to the wrong place.)
    // It is unsafe for objc_msgSend to see old imp and new sel.
    // Therefore we write new imp, wait a lot, then write new sel.
    // 原有的 imp 进行编码（和 class 进行异或运算）转成 uintptr_t 类型
    uintptr_t newIMP = (impEncoding == Encoded
                        ? encodeImp(base, newImp, newSel, cls)
                        : (uintptr_t)newImp);

    if (atomicity == Atomic) { // 修饰符 atomic
        _imp.store(newIMP, memory_order_relaxed);
        if (_sel.load(memory_order_relaxed) != newSel) {
#ifdef __arm__
            mega_barrier();
            _sel.store(newSel, memory_order_relaxed);
#elif __x86_64__ || __i386__
            _sel.store(newSel, memory_order_release);
#else
#error Don't know how to do bucket_t::set on this architecture.
#endif
        }
    } else {    
        // imp、sel 写入
        _imp.store(newIMP, memory_order_relaxed);
        _sel.store(newSel, memory_order_relaxed);
    }
}

set把sel和imp写入bucket，开始缓存方法

2.2.5 `incrementOccupied` 函数

void cache_t::incrementOccupied() 
{
    _occupied++;
}

_occupied自动加1，_occupied表示内存中已经存储缓存方法的的个数

三. `insert` 调用流程

前面探究了 insert 方法里面具体实现了什么，下面探究的是调用一个实例方法怎么就调用了 cache 里面的insert方法呢？

首先在insert方法中打个断点，然后运行源码

函数调用栈信息显示 insert 方法流程：

_objc_msgSend_uncached -->
lookUpImpOrForward -->
log_and_fill_cache -->
cache_t::insert

堆栈信息只显示到了 _objc_msgSend_uncached，但是我们是调用了 [person say1] 也就是实例方法最后调用了cache_t::insert。

现在我们知道了部分流程_objc_msgSend_uncached 到 cache_t::insert过程[person say1] 到 _objc_msgSend_uncached 这个过程并不清楚。

开启汇编模式探究：

发现 say1 执行的是 objc_msgSend，conreol step into 进入 objc_msgSend 发现调用了 _objc_msgSend_uncached

重新整理 insert 流程，[person say] 底层调用流程

objc_msgSend-->
_objc_msgSend_uncached -->
lookUpImpOrForward -->
log_and_fill_cache -->
cache_t::insert

总结

cache_t 原理分析图

cache_t 中各个变量的含义

_bucketsAndMaybeMask存储buckets 和 msak（真机），macOS或者模拟器存储buckets
_maybeMask是指掩码数据，用于在哈希算法或者哈希冲突算法中哈希下标 _maybeMask = capacity -1
_occupied会随着缓存的个数增加，扩容是_occupied = 0
数据丢失是因为扩容的时候旧的内存回收了数据全部清除
cache存储bucket的位置乱序，因为位置是hash根据你的sel和mask生成所以不固定

`imp` 编码解码补充

bucket 中的的 imp 地址，存储的是经过编码以后强转成 uintptr_t 类型数据，解码是会还原成原来的imp

`imp` 编码

b[i].set<Atomic, Encoded>(b, sel, imp, cls()) 缓存sel，imp。set方法内会调用encodeImp。encodeImp方法会对imp进行编码(uintptr_t)newImp ^ (uintptr_t)cls即异或运算
bucket里面的imp是否进行编解码，除了外部变量控制以外，主要是看bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls))的cls参数是否为nil。cls有值imp进行编码，cls没有值imp相当于没编码。所以缓存开辟内存的最后一个bucket调用set方法时cls = nil编码以后得到的还是原来的imp相当于没编码

`imp`解码

imp解码的方式即异或运算和imp编码的异或运算是一样的
上面lldb调试出现7中打印信息调用imp(nil, cls())，imp(nil, cls())对最后一个bucket的imp进行一次异或运算，所以想要恢复imp的原来的地址，需要手动进行一次异或运算
异或运算：参与运算的两个值，如果两个相应位相同，则结果为0，否则为1

OC底层原理（06）类的底层探究（下）

一. cache_t 结构分析

1.1 cache_t源码分析

1.2 cache_t结构图

1.3 lldb调试验证

1.3.1 自定义 YJPerson 类

1.3.1 lldb 进行调试

1.3.2 调用对象方法 [person say1] 后，继续 lldb 调试：

二. cache_t 源码分析

2.1 insert 方法实现

2.2 insert 中相关函数

2.2.1 reallocate 函数

2.2.1.1 allocateBuckets 函数

2.2.1.2 setBucketsAndMask 函数

2.2.2 collect_free 函数

2.2.3 cache_hash 和 cache_next 函数

2.2.4 缓存写入函数 set

2.2.5 incrementOccupied 函数

三. insert 调用流程

总结

imp 编码解码补充

imp 编码

imp解码