之前的两篇文章中，我们探索了objc_class结构体中的isa、bits，今天就主要看下类的缓存cache
我们知道，类的底层为objc_class的结构体，如下代码

struct objc_class : objc_object {
    // Class ISA;   //8字节
    Class superclass;  // 8字节
    cache_t cache;             // formerly cache pointer and vtable
    class_data_bits_t bits;    // class_rw_t * plus custom rr/alloc flags
}

那么如果我们要探索cache_t，就意味着我们要把objc_class的首地址平移16个字节即可，如下图所示：发现我们所打印的cache_t的信息与其定义一样,并且像_occupied``_maybeMask值也都为0，貌似我们无法从打印的信息中获得缓存的数据。那么我们通过cache_t的源码又能得到哪些信息呢。

struct cache_t {
    explicit_atomic<uintptr_t> _bucketsAndMaybeMask;
    union {
        struct {
            explicit_atomic<mask_t>    _maybeMask;
#if __LP64__
            uint16_t                   _flags;
#endif
            uint16_t                   _occupied; // 2 记录当前存储的方法数量
        };
        explicit_atomic<preopt_cache_t *> _originalPreoptCache;
    };
    ...部分代码省略...
    
    //缓存为空缓存，第一次判断使用
    bool isConstantEmptyCache() const;
    bool canBeFreed() const;
    // 可使用总容量，为bucket_t列表长度-1
    mask_t mask() const;
    //当前bucket_t列表已缓存的方法个数加1
    void incrementOccupied();
    // 设置buckets和mask
    void setBucketsAndMask(struct bucket_t *newBuckets, mask_t newMask);
    // 重新开辟内存
    void reallocate(mask_t oldCapacity, mask_t newCapacity, bool freeOld);
    // 根据oldCapacity回收oldBuckets
    void collect_free(bucket_t *oldBuckets, mask_t oldCapacity);
     //当前bucket_t列表能缓存的最大个数
    unsigned capacity() const;
    // 获取buckets
    struct bucket_t *buckets() const;
    // 获取class
    Class cls() const;
    // 当前bucket_t列表已缓存的方法个数
    mask_t occupied() const;
    // 将调用的方法插入到buckets所在的内存区域
    void insert(SEL sel, IMP imp, id receiver);

一、cache_t的源码分析

1、参数定义解析

在进行具体的源码分析前，我们先要进行一些参数定义的解释，以方便后边的分析，首先看下

#if LP64

#if __LP64__
    ......
#endif
    ......

这里的L表示的是Long，这里的P表示的是Pointer,64表示Long和Pointer都是64位的；在OC中，long类型在32位系统中占4字节，在64位系统中占8字节。

mask_t

#if __LP64__
typedef uint32_t mask_t;  // x86_64 & arm64 asm are less efficient with 16-bits
#else
typedef uint16_t mask_t;
#endif

mask_t64位系统下为4字节，32位系统下为2字节，苹果爸爸还贴心的告诉我们在x86_64和arm64 架构下使用16位(2字节)时效率较低。

2、cache_t结构体大小

通过查看cache_t结构体，发现结构体内部共分为两部分_bucketsAndMaybeMask和一个联合体，

_bucketsAndMaybeMask 为uintptr_t指针，定义为unsigned long类型，所以为8字节；
联合体包含一个结构体和一个结构体指针，指针大小为8字节，结构体内部mask_t为4字节，uint16_t为2字节，故联合体也占用8个字节；因此cache_t一共占用16字节的内存空间。

3、cache_t结构体成员及核心方法

_bucketsAndMaybeMask

_bucketsAndMaybeMask是一个掩码地址，buckets为bucket_t列表地址，也就是具体的哈希表地址（也可以说是容器地址或者桶地址），bucketsMask为掩码其值为~0ul。该哈希表存储了当前缓存的方法编号和方法实现，即bucket_t，mask和buckets进行了掩码运算，后文buckets()中有具体解释，将mask和buckets放在了一起，是为了减少了占用空间。

_maybeMask

_maybeMask为bucket_t列表的长度-1，或者理解为容器的大小。

_occupied

_occupied为bucket_t列表的当前存储数量，也就是记录当前存储的方法数量。

bucket_t

struct bucket_t {
#if __arm64__
    explicit_atomic<uintptr_t> _imp;
    explicit_atomic<SEL> _sel;
#else
    explicit_atomic<SEL> _sel;
    explicit_atomic<uintptr_t> _imp;
#endif

    template <Atomicity, IMPEncoding>
    void set(bucket_t *base, SEL newSel, IMP newImp, Class cls);
}

bucket_t中存储了类对象的方法编号_sel，及其指向方法实现的地址指针_imp。同样对环境进行了区分，不同的区别在于sel和imp的顺序不一致。
bucket_t中核心方法为set方法,用来设置bucket_t内容。

buckets()

struct bucket_t *cache_t::buckets() const
{
    uintptr_t addr = _bucketsAndMaybeMask.load(memory_order_relaxed);
    return (bucket_t *)(addr & bucketsMask);
}

buckets()用来获取bucket_t列表，也就是获取存储缓存的哈希表，前文中说到_bucketsAndMaybeMask是一个掩码地址，这里第一步先取到_bucketsAndMaybeMask，然后与上bucketsMask就得到了bucket_t列表地址。

二、底层探索

根据前文可以看到，cache_t的核心为buckets，那么cache又是如何管理缓存的呢？如果我们全局搜索cache_t::insert,会有如下的一个注释信息，标注着缓存的读取时机（方法）以及写入时机（方法），其中就有insert方法。

insert

cache_t::insert()。参数有：方法编号sel，方法实现地址指针imp，消息接受者。其源码如下

void cache_t::insert(SEL sel, IMP imp, id receiver)
{
    runtimeLock.assertLocked();

    // Never cache before +initialize is done
    if (slowpath(!cls()->isInitialized())) {
        return;
    }

    if (isConstantOptimizedCache()) {
        _objc_fatal("cache_t::insert() called with a preoptimized cache for %s",
                    cls()->nameForLogging());
    }

#if DEBUG_TASK_THREADS
    return _collecting_in_critical();
#else
#if CONFIG_USE_CACHE_LOCK
    mutex_locker_t lock(cacheUpdateLock);
#endif

    ASSERT(sel != 0 && cls()->isInitialized());

    // Use the cache as-is if until we exceed our expected fill ratio.
    mask_t newOccupied = occupied() + 1;
    unsigned oldCapacity = capacity(), capacity = oldCapacity;
    if (slowpath(isConstantEmptyCache())) {
        // Cache is read-only. Replace it.
        if (!capacity) capacity = INIT_CACHE_SIZE;
        reallocate(oldCapacity, capacity, /* freeOld */false);
    }
    else if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
    }
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.
    }
#endif
    else {
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true);
    }

    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    do {
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied();
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));

    bad_cache(receiver, (SEL)sel);
#endif // !DEBUG_TASK_THREADS
}

初看insert方法，我们看到代码很长，但这里分成了两大部分，其一为判断buckets是否需要扩容，其二为插入数据。我们逐一查看：
进入方法之中，我们首先看到：mask_t newOccupied = occupied() + 1;

occupied()

newOccupied = occupied() + 1;而occupied()直接返回cache_t结构体的成员变量_occupied，也就是返回的当前缓存的数量，在初次进入的时候缓存的数量为0，_occupied也就是0，newOccupied为1。

接下来是unsigned oldCapacity = capacity(), capacity = oldCapacity;

oldCapacity、capacity

oldCapacity调用的是mask()方法，而mask()返回的是cache_t结构体中联合体的_maybeMask，为bucket_t列表的长度-1（与数组类似，数组arr长度为5，则最后元素的下标则为4），如果_maybeMask有值则+1,否则为0。
capacity初始值为oldCapacity，代表bucket_t列表的长度，容器的大小。

如果缓存为空

接下来先判断cache_t是否为空，如果为空，则cache需要进行初始化操作。

capacity = INIT_CACHE_SIZE，
reallocate开辟bucket内存，并对cache_t结构体参数赋初值。

    if (slowpath(isConstantEmptyCache())) {  //判空
        // Cache is read-only. Replace it.
        if (!capacity) capacity = INIT_CACHE_SIZE;   //如果为空，capacity赋初值
        reallocate(oldCapacity, capacity, /* freeOld */false); //设置一个新的 bucket_t
    }

那这里的INIT_CACHE_SIZE值为多少呢？

INIT_CACHE_SIZE

我们发现INIT_CACHE_SIZE定义为INIT_CACHE_SIZE_LOG2，而INIT_CACHE_SIZE_LOG2不同设备值不同。

#if CACHE_END_MARKER || (__arm64__ && !__LP64__)
    INIT_CACHE_SIZE_LOG2 = 2,
#else
    INIT_CACHE_SIZE_LOG2 = 1,
#endif
    
    INIT_CACHE_SIZE      = (1 << INIT_CACHE_SIZE_LOG2),
    MAX_CACHE_SIZE_LOG2  = 16,
    MAX_CACHE_SIZE       = (1 << MAX_CACHE_SIZE_LOG2),
    FULL_UTILIZATION_CACHE_SIZE_LOG2 = 3,
    FULL_UTILIZATION_CACHE_SIZE = (1 << FULL_UTILIZATION_CACHE_SIZE_LOG2),
};

CACHE_END_MARKER的定义如下：发现 INIT_CACHE_SIZE的值，如果在arm64架构下1左移1位，即为2，在arm32、i386、x86_64架构下为1左移2位，即为4,最终发现capacity的值，也就是bucket_t列表的长度在初始化的时候，在x86_64架构下为4，arm64架构下为2。
然后reallocate将oldCapacity, capacity作为参数传入。

reallocate

首先获得老的bucket_t列表地址。
然后开辟新的bucket_t列表地址。
而setBucketsAndMask主要在为cache_t结构体的成员赋初值
freeOld用来判断是否对老的bucket_t列表进行释放。

因此，如果缓存为空，在arm64下开辟一个长度为2的buckets,在x86_64等架构下开辟长度为4的buckets

如果缓存不为空

如果缓存不为空，这里又有三个判断，根据不同情况分别处理

缓存数 <= cache_fill_ratio

if (fastpath(newOccupied + CACHE_END_MARKER <= cache_fill_ratio(capacity))) {
        // Cache is less than 3/4 or 7/8 full. Use it as-is.
    }
#if CACHE_ALLOW_FULL_UTILIZATION
    else if (capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity) {
        // Allow 100% cache utilization for small buckets. Use it as-is.
    }
#endif
    else {
        capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE;
        if (capacity > MAX_CACHE_SIZE) {
            capacity = MAX_CACHE_SIZE;
        }
        reallocate(oldCapacity, capacity, true);
    }

newOccupied + CACHE_END_MARKER这里为当前缓存数，CACHE_END_MARKER上方截图中，arm64下为0，x86_64下为1， cache_fill_ratio的定义如下图所示：

因此综合来看arm64架构下，实际缓存的大小小于等于bucket_t列表长度的7/8，或者在x86_64架构下，实际缓存的大小+1小于等于bucket_t列表长度的3/4，则不需要扩容。

arm64架构下的优化处理

CACHE_ALLOW_FULL_UTILIZATION 我们看到这里有个宏定义CACHE_ALLOW_FULL_UTILIZATION其在arm64架构下值为1，会走capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity这个判断
FULL_UTILIZATION_CACHE_SIZE
capacity <= FULL_UTILIZATION_CACHE_SIZE && newOccupied + CACHE_END_MARKER <= capacity 这里 capacity为bucket_t列表的长度，也就是说arm64架构下bucket_t列表长度小于等于8，并且缓存长度小于等于bucket_t列表长度的时候，也是不需要扩容。

`bucket_t列表`扩容

MAX_CACHE_SIZE意思为最大容量为多大。
capacity = capacity ? capacity * 2 : INIT_CACHE_SIZE; 因此，如果不满足前边的条件，当capacity为0的时候，初始化为INIT_CACHE_SIZE大小，否则两倍扩容

扩容总结：

所以，综合上⾯的代码的出来的结论就是：在arm64结构，也就是真机环境下，刚开始初始化的缓存⽅法的容器的⻓度2，当容器的⻓度⼩于等于8时，是满容量了才扩容。当容器的⻓度⼤于8时，是7/8扩容。也就是说当容器的⻓度为8时，容器可以存储8个⽅法。当容器的⻓度为16时，当第15个⽅法需要存储进来的时候，容器就要扩容了。

在x86_64架构下，刚开始初始化的容器的⻓度为4，是3/4扩容。这⾥的3/4扩容指的是：如果容器的⻓度为4，当第3个数据需要存储的时候，就要扩容了。如果容器的⻓度为8，当第6个数据需要存储的时候，就要扩容了。也就是说容器只能存储容器⻓度的3/4减1个⽅法。

还有⼀点就是：当容器扩容之后，前⾯存储的⽅法也会随之清空。

往bucket_t列表存储

    bucket_t *b = buckets();
    mask_t m = capacity - 1;
    mask_t begin = cache_hash(sel, m);
    mask_t i = begin;

    // Scan for the first unused slot and insert there.
    // There is guaranteed to be an empty slot.
    do {
        if (fastpath(b[i].sel() == 0)) {
            incrementOccupied();
            b[i].set<Atomic, Encoded>(b, sel, imp, cls());
            return;
        }
        if (b[i].sel() == sel) {
            // The entry was added to the cache by some other thread
            // before we grabbed the cacheUpdateLock.
            return;
        }
    } while (fastpath((i = cache_next(i, m)) != begin));

bucket_t *b = buckets();获取bucket_t列表
mask_t m = capacity - 1; m为扩容后的bucket_t列表的长度-1。
mask_t begin = cache_hash(sel, m);sel对应在bucket_t列表中的起始地址
do while循环，意义就是对buckets这个哈希列表的赋值过程。
通过b[i].sel() == 0判断当前bucket_t列表中i位置是否有值，
- 等于0代表没有值，则进行插入操作。_occupied++;调用set方法。
- 等于sel代表这个方法已经缓存过了，直接返回。
- 如果上两种都没有满足，说明哈希碰撞了，需要解决哈希冲突，这里使用了开放地址法处理哈希碰撞。

set

在上述描述中可以看到存储的时候有set方法，用来往bucket_t列表中设置一个bucket_t

template<Atomicity atomicity, IMPEncoding impEncoding>
void bucket_t::set(bucket_t *base, SEL newSel, IMP newImp, Class cls)
{
    ASSERT(_sel.load(memory_order_relaxed) == 0 ||
           _sel.load(memory_order_relaxed) == newSel);

    static_assert(offsetof(bucket_t,_imp) == 0 &&
                  offsetof(bucket_t,_sel) == sizeof(void *),
                  "bucket_t layout doesn't match arm64 bucket_t::set()");

    uintptr_t encodedImp = (impEncoding == Encoded
                            ? encodeImp(base, newImp, newSel, cls)
                            : (uintptr_t)newImp);

    // LDP/STP guarantees that all observers get
    // either imp/sel or newImp/newSel
    stp(encodedImp, (uintptr_t)newSel, this);
}

这里提供了一个模板函数，store就是往内存写入数据，set方法就是把newIMP和newSel写入内存，encodeImp方法是做签名用，其返回是一个签名后的imp地址。

思考：整个往bucket_t列表中插入的操作其实就是往哈希表中进行插入的操作，那么苹果为什么对于方法的缓存要使用哈希表呢？

三、哈希表

为什么要使用这种结构？

哈希表可以提供快速的操作。哈希表单次查找的时间复杂度为O(1),避免了遍历数组的操作，通过以空间换时间，提升方法调用效率。

哈希扩容

影响哈希表扩容的因素有两个，bucket_t列表本身的容量和负载因子，当前的 哈希表大小 > 临界值（ = 容量 * 负载因子）的时候，哈希表就需要扩容。在这里的容量指的就是capacity，负载因子就是我们分析出来的arm64下的7/8，x86_64下的3/4。

解决哈希冲突的办法：

针对哈希冲突，一般通过以下几种方式解决哈希冲突：

开放地址法，一旦发生了冲突，就去寻找下一个空的散列地址，只要散列表足够大，空的散列地址总能找到。在寻找下一个散列地址时，又有线性探测再散列、二次探测再散列、伪随机探测再散列等方式来获取。
再哈希法，在发生冲突时，再用第二个，第三个...哈希函数算出哈希值，直到算出的哈希值不同为止。虽然不易发生聚集，但增加了计算时间。（多种哈希函数的获取可以根据哈希性质进行简单改写，比如把需要哈希的值末尾追加不同的值，就构成了多种哈希函数）
链表法（拉链法），把发生冲突的元素放到一个链表中，并将链表的头指针存在哈希表的第i个单元中。链地址法适用于经常进行插入和删除的情况。
建立一个公共溢出区，将哈希表分为基本表和溢出表两部分，凡是和基本表发生冲突的元素，一律填入溢出表。建立一个公共溢出区域，把冲突的都放在另一个地方，不在表里面。

四、lldb验证

首先我的电脑为M1的电脑，架构也就是arm64架构。
准备数据，我们在FMUserInfo类中添加30个方法，然后逐步调用这30个方法，查看下cache缓存情况。
为了探索insert方法方便，我们先在insert方法中添加一行打印。然后运行：

在没有执行method1方法的时候，_occupied为0，且_maybeMask也为0. 然后我们执行method1方法。

我们发现_occupied为1，但是_maybeMask为0，讲道理_maybeMask不是容器的大小么？，不可能为0呀。这里我们可以通过cache_t结构体中的mask()来获取一下。（mask()方法返回的就是_maybeMask）。

这里就打印出了_maybeMask的数值，上方文章中说到，arm64中初始分配大小为2，_maybeMask大小为bucket_t列表大小-1，因此_maybeMask数值为1。
我们继续多运行两个方法，看下数值变化。

当走过method3时，发生了扩容现象，正如前文中所说当容器的⻓度⼩于等于8时，是满容量了才扩容，并且扩容后，缓存内容被清除，_occupied又置为1。
继续运行，按照前文所述，当运行过method6时，buckets容器会再次满，执行method7时会再次扩容，lldb验证下：

下一步根据规则，当容器的⻓度⼩于等于8时，是满容量了才扩容，那么运行过method14的时候就又到了满容量的点，我们先看下：

此时_occupied为8，已经满容量了，那么运行过method15就会扩容。果然又再次扩容。

下一步根据规则，buckets容器会在过了容量大7/8的时候进行扩容，也就是当第15个⽅法需要存储进来的时候，容器就要扩容了，即为method29，我们先看下method28的结果。

当前已存储14个，下一个第15个要扩容。

由此我们便验证了之前的扩容规则。

OC类的探索-cache