新系统下类的cache_t结构猜想

695 阅读18分钟

前言

本文讨论类的缓存cache_t在新系统(catalina和iOS)下新的内存储存方式

本文只是在我有限水平下的猜想, 没有源码作为依据。

同时假设已经对Class的cache_t有所了解, 所以解释起来快速

标题中新系统为 "macOS Catalina""iOS13", x86_64包括模拟器下的iOS13
我不会汇编, 后面的分析都是我蒙的, 欢迎拍砖指正

旧系统cache_t结构回顾

struct cache_t {
    struct bucket_t *_buckets;
    mask_t _mask;
    mask_t _occupied;
}

首先cache_t的定义, buckets数组指针8字节, mask占4字节, occupied占4字节

然后随便造个栗子输出一下, 随便建个类打印x一下看看.

(lldb) x a.class
0x100001138: 11 11 00 00 01 80 1d 00 40 71 fb 9e ff 7f 00 00  ........@q......
0x100001148: e0 c7 7b 00 01 00 00 00 07 00 00 00 01 00 00 00  ..{.............

e0 c7 7b 00 01 00 00 00 | 07 00 00 00 | 01 00 00 00

还是非常清晰的能看出, mask为7, occupied为1, 简洁明了, 也符合cache_t结构体的定义

发现问题

cache_t结构如此之清晰, 当然就顺手再试试. 然后到Catalina下也建了个类, x一下输出

(lldb) x p.class
0x100001250: 28 12 00 00 01 00 00 00 18 b1 9b 92 ff 7f 00 00  (...............
0x100001260: a0 3d 08 02 01 00 00 00 03 00 00 00 10 80 02 00  .=..............

a0 3d 08 02 01 00 00 00 | 03 00 00 00 | 10 80 02 00, 非常清晰明了的看出, mask为3, occupied为0x00028010, 简单明了, SO EA..........

............occupied为163856 ???? 肯定是电脑坏了, 这是苹果让我换16寸的阴谋!!!

不信邪的用我的6sp iOS13做了一下同样的试验, x输出, 读数据

(lldb) x p1.class
0x101001540: 18 15 00 01 01 00 00 00 58 b2 5d f2 01 00 00 00  ........X.].....
0x101001550: 00 83 ba 81 02 00 03 00 00 00 00 00 10 80 02 00  ................

一顿操作猛如虎, 得出结果 mask为0, occupied为163856, Excuse me?? mask都能是0了???? 比macOS还莫名其妙??

苹果爸爸的源码只开源到10.14.5, 没有10.15的源码, 咋办捏?

x86_64的新cache_t

没源码, 只能去逆向他了

找到libobjc.A.dylib

(lldb) image list
[  3] 17241F77-6A7A-39D7-8836-63E2725AA3C9 0x00007fff67948000 /usr/lib/libobjc.A.dylib 

进入/usr/lib目录, 找到了这个文件

复制出来, 打开IDA, 拖进去, 搜索cache, 出来的列表....依据对旧cache_t的了解, 锁定cache_fill函数, F5生成伪代码 (其它函数看着也不像)

cache_fill

cache_fill的伪代码 点击展开
signed __int64 __fastcall cache_fill(signed __int64 a1, signed __int64 a2, __int64 a3, void *a4)
{
  __int64 v4; // r15
  signed __int64 v5; // rdx
  signed __int64 result; // rax
  int v7; // er14
  unsigned int v8; // er12
  int v9; // er14
  __int64 v10; // r8
  unsigned int v11; // ebx
  signed __int64 v12; // rdx
  signed __int64 *v13; // rcx
  signed __int64 v14; // r13
  unsigned int v15; // er14
  __int64 v16; // rax
  __int64 v17; // rax
  unsigned int v18; // er14
  __int64 v19; // ST00_8
  __int64 v20; // rax
  void *ptr; // [rsp+10h] [rbp-30h]

  v4 = a3;
  v5 = a1;
  if ( !(*(_BYTE *)(a1 + 28) & 1) )
    v5 = *(_QWORD *)a1 & 0x7FFFFFFFFFF8LL;
  result = *(_QWORD *)(v5 + 32) & 0x7FFFFFFFFFF8LL;
  if ( *((_BYTE *)&_objc_empty_vtable_0.magic + result + 3) & 0x20 )
  {
    ptr = a4;
    v7 = *(unsigned __int16 *)(a1 + 30);
    if ( *(_DWORD *)(a1 + 24) )
      v8 = *(_DWORD *)(a1 + 24) + 1;
    else
      v8 = 0;
    if ( (unsigned __int8)cache_t::isConstantEmptyCache((cache_t *)(a1 + 16)) )
    {
      v15 = 4;
      if ( v8 )
        v15 = v8;
      v16 = *(_QWORD *)(a1 + 16);
      v17 = allocateBuckets(v15);
      v9 = v15 - 1;
      *(_QWORD *)(a1 + 16) = v17;
      *(_DWORD *)(a1 + 24) = v9;
      *(_WORD *)(a1 + 30) = 0;
    }
    else if ( v7 + 2 > 3 * (v8 >> 2) )
    {
      v18 = 4;
      if ( v8 )
        v18 = 2 * v8;
      if ( v18 >= 0x10000 )
        v18 = 0x10000;
      v19 = *(_QWORD *)(a1 + 16);
      v20 = allocateBuckets(v18);
      v9 = v18 - 1;
      *(_QWORD *)(a1 + 16) = v20;
      *(_DWORD *)(a1 + 24) = v9;
      *(_WORD *)(a1 + 30) = 0;
      cache_collect_free(v19, v8);
    }
    else
    {
      v9 = v8 - 1;
    }
    v10 = *(_QWORD *)(a1 + 16);
    v11 = v9 & a2;
    while ( 1 )
    {
      v12 = 16LL * v11;
      v13 = (signed __int64 *)(v10 + v12);
      if ( !*(_QWORD *)(v10 + v12) )
        break;
      result = *v13;
      if ( *v13 == a2 )
        return result;
      v11 = v9 & (v11 + 1);
      if ( v11 == (v9 & (unsigned int)a2) )
        cache_t::bad_cache(ptr);
    }
    ++*(_WORD *)(a1 + 30);
    v14 = v4 ^ a1;
    if ( !v4 )
      v14 = 0LL;
    *(_QWORD *)(v10 + v12 + 8) = v14;
    result = *v13;
    if ( *v13 != a2 )
      *v13 = a2;
  }
  return result;
}

虽然苹果改过结构, 但是代码的结构应该变动不会太大, 拿这儿的伪代码和756的代码对比一下, 发现确实能对比查看一番. 第一个方框为realloc方法, 第二个方框为expand方法. 在旧版代码可以看到这两个方法都开辟新缓存空间, 然后设置bucket和mask (setBucketsAndMask方法).

猜测新结构

发挥瞎蒙本领的时间到了!! 对比下旧代码, 猜测第一个方框里的代码

v20 = allocateBuckets(v18);  // bucket_t *newBuckets = allocateBuckets(newCapacity);
v9 = v18 - 1;                // mask = newCapacity - 1
*(_QWORD *)(a1 + 16) = v20;  // v20为newBuckets, a1为class新地址, a1偏移16字节赋值newBuckets, QWROD为8字节
*(_DWORD *)(a1 + 24) = v9;   // a1偏移24字节赋值mask, DWORD为4字节
*(_WORD *)(a1 + 30) = 0;     // a1偏移30字节赋值0, WORD为2字节

在这几句赋值代码中, 可以猜测一下: 偏移16字节开始8字节为buckets数组指针, 偏移24字节开始4字节为mask, 偏移30字节开始2字节为occupied

对比验证

(lldb) x p.class
0x100001250: 28 12 00 00 01 00 00 00 18 b1 9b 92 ff 7f 00 00  (...............
0x100001260: a0 3d 08 02 01 00 00 00 03 00 00 00 10 80 02 00  .=..............

拿出刚才发现问题的数据, 对比一下然后读出: mask为3, occupied为2, 嗯, 这下正常了

a0 3d 08 02 01 00 00 00 | 03 00 00 00 | 10 80 | 02 00

中间的两个字节

在上面的读取中, 中间的0x8010....这是啥东西? 本来也不知道从哪儿下手的, 然后...第28字节嘛, 我就在伪代码里ctrl+F了一下下....还真有.....

  if ( !(*(_BYTE *)(a1 + 28) & 1) )
    v5 = *(_QWORD *)a1 & 0x7FFFFFFFFFF8LL;
  result = *(_QWORD *)(v5 + 32) & 0x7FFFFFFFFFF8LL;

看到0x7FFFFFFFFFF8LL, 眼熟! 这不是ISA_MASK或者FAST_DATA_MASK么, 一个是取isa的一个取rw的. 取ISA? 取RW?? 对比下旧代码, 不就是 if (!cls->isInitialized()) return;这句么...?

那么, 猜一下 !(*(_BYTE *)(a1 + 28) & 1) 是用来判断是否是元类的, 应该不过分吧....

0x10 & 1 = 0, false, 不是元类? p.class确实不是元类!!!

再拿元类来试一下?

0x0000000100001228 & 0x00007ffffffffff8ULL = 0x100001228

(lldb) x 0x100001228
0x100001228: f0 b0 9b 92 ff 7f 00 00 f0 b0 9b 92 ff 7f 00 00  ................
0x100001238: 60 3d 08 02 01 00 00 00 03 00 00 00 31 e0 01 00  `=..........1...

0x31 & 1 = 1, true, 是元类!!!,

所以这儿是存放了rw的部分flag信息? (不在这儿验证了(或者说不知道咋验证了))

ISA??

等一下, 上面的掩码取isa....好像, 掩了之后的结果好像是一样的啊????

多试几次!!!

类和元类的前8字节, 开始直接存纯isa指针了

推出结论

struct cache_t {
    struct bucket_t *_buckets;
    uint32_t _mask;
    uint16_t _flags;  // 就先起个名叫flags吧, 无所谓了
    uint16_t _occupied;
}

arm64的新cache_t

找libobjc.A.dylib

需要一部越狱的iOS13设备, 于是我把我的娱乐用iPad给升级越狱了

熟练的来到dylib的位置, 愉快的复.....文件呢??!!!!!!!就留了个软链接 ???

iPad:/usr/lib root# ls -al | grep objc
-rwxr-xr-x   1 root wheel   50080 Jul 14 15:37 libobjc-trampolines.dylib*
lrwxr-xr-x   1 root wheel      15 Jul 14 15:37 libobjc.dylib -> libobjc.A.dylib

硬杠汇编

没库文件, 只能用xcode硬杠汇编了... 添加符号断点, cache_fill, 断住

汇编源码 点击展开
libobjc.A.dylib`cache_fill:
    0x1aa7000b8 <+0>:   stp    x26, x25, [sp, #-0x50]!
    0x1aa7000bc <+4>:   stp    x24, x23, [sp, #0x10]
    0x1aa7000c0 <+8>:   stp    x22, x21, [sp, #0x20]
    0x1aa7000c4 <+12>:  stp    x20, x19, [sp, #0x30]
    0x1aa7000c8 <+16>:  stp    x29, x30, [sp, #0x40]
    0x1aa7000cc <+20>:  add    x29, sp, #0x40            ; =0x40 
    0x1aa7000d0 <+24>:  mov    x22, x3
    0x1aa7000d4 <+28>:  mov    x21, x2
    0x1aa7000d8 <+32>:  mov    x19, x1
    0x1aa7000dc <+36>:  mov    x20, x0
    0x1aa7000e0 <+40>:  ldrb   w9, [x0, #0x1c]
    0x1aa7000e4 <+44>:  mov    x8, x0
    0x1aa7000e8 <+48>:  tbnz   w9, #0x2, 0x1aa7000f4     ; <+60>
    0x1aa7000ec <+52>:  ldr    x8, [x20]
    0x1aa7000f0 <+56>:  and    x8, x8, #0xffffffff8
    0x1aa7000f4 <+60>:  ldr    x8, [x8, #0x20]
    0x1aa7000f8 <+64>:  and    x8, x8, #0x7ffffffffff8
    0x1aa7000fc <+68>:  ldrb   w8, [x8, #0x3]
    0x1aa700100 <+72>:  tbz    w8, #0x5, 0x1aa7001c4     ; <+268>
    0x1aa700104 <+76>:  add    x23, x20, #0x10           ; =0x10 
    0x1aa700108 <+80>:  ldrh   w25, [x20, #0x1e]
    0x1aa70010c <+84>:  ldr    x8, [x20, #0x10]
    0x1aa700110 <+88>:  lsr    x8, x8, #48
    0x1aa700114 <+92>:  cbnz   x8, 0x1aa700120           ; <+104>
    0x1aa700118 <+96>:  mov    w24, #0x0
    0x1aa70011c <+100>: b      0x1aa70012c               ; <+116>
    0x1aa700120 <+104>: ldr    x8, [x23]
    0x1aa700124 <+108>: lsr    x8, x8, #48
    0x1aa700128 <+112>: add    w24, w8, #0x1             ; =0x1 
    0x1aa70012c <+116>: mov    x0, x23
->  0x1aa700130 <+120>: bl     0x1aa70002c               ; cache_t::isConstantEmptyCache()
    0x1aa700134 <+124>: cbnz   w0, 0x1aa7001dc           ; <+292>
    0x1aa700138 <+128>: lsr    w8, w24, #2
    0x1aa70013c <+132>: lsl    w8, w8, #1
    0x1aa700140 <+136>: add    w8, w8, w24, lsr #2
    0x1aa700144 <+140>: cmp    w8, w25
    0x1aa700148 <+144>: b.ls   0x1aa700208               ; <+336>
    0x1aa70014c <+148>: ldr    x8, [x23]
    0x1aa700150 <+152>: and    x8, x8, #0xfffffffffff
    0x1aa700154 <+156>: sub    w9, w24, #0x1             ; =0x1 
    0x1aa700158 <+160>: and    w10, w9, w19
    0x1aa70015c <+164>: mov    x11, x10
    0x1aa700160 <+168>: mov    w12, w11
    0x1aa700164 <+172>: add    x13, x8, w11, uxtw #4
    0x1aa700168 <+176>: add    x11, x13, #0x8            ; =0x8 
    0x1aa70016c <+180>: ldr    x13, [x13, #0x8]
    0x1aa700170 <+184>: cbz    x13, 0x1aa7001a4          ; <+236>
    0x1aa700174 <+188>: ldr    x11, [x11]
    0x1aa700178 <+192>: cmp    x11, x19
    0x1aa70017c <+196>: b.eq   0x1aa7001c4               ; <+268>
    0x1aa700180 <+200>: sub    w11, w12, #0x1            ; =0x1 
    0x1aa700184 <+204>: cmp    w12, #0x0                 ; =0x0 
    0x1aa700188 <+208>: csel   w11, w9, w11, eq
    0x1aa70018c <+212>: cmp    w11, w10
    0x1aa700190 <+216>: b.ne   0x1aa700160               ; <+168>
    0x1aa700194 <+220>: mov    x0, x22
    0x1aa700198 <+224>: mov    x1, x19
    0x1aa70019c <+228>: mov    x2, x20
    0x1aa7001a0 <+232>: bl     0x1aa71e098               ; cache_t::bad_cache(objc_object*, objc_selector*, objc_class*)
    0x1aa7001a4 <+236>: add    x8, x8, x12, lsl #4
    0x1aa7001a8 <+240>: ldrh   w9, [x20, #0x1e]
    0x1aa7001ac <+244>: add    w9, w9, #0x1              ; =0x1 
    0x1aa7001b0 <+248>: strh   w9, [x20, #0x1e]
    0x1aa7001b4 <+252>: eor    x9, x21, x20
    0x1aa7001b8 <+256>: cmp    x21, #0x0                 ; =0x0 
    0x1aa7001bc <+260>: csel   x9, xzr, x9, eq
    0x1aa7001c0 <+264>: stp    x9, x19, [x8]
    0x1aa7001c4 <+268>: ldp    x29, x30, [sp, #0x40]
    0x1aa7001c8 <+272>: ldp    x20, x19, [sp, #0x30]
    0x1aa7001cc <+276>: ldp    x22, x21, [sp, #0x20]
    0x1aa7001d0 <+280>: ldp    x24, x23, [sp, #0x10]
    0x1aa7001d4 <+284>: ldp    x26, x25, [sp], #0x50
    0x1aa7001d8 <+288>: ret    
    0x1aa7001dc <+292>: cmp    w24, #0x0                 ; =0x0 
    0x1aa7001e0 <+296>: orr    w8, wzr, #0x4
    0x1aa7001e4 <+300>: csel   w24, w8, w24, eq
    0x1aa7001e8 <+304>: ldr    xzr, [x20, #0x10]
    0x1aa7001ec <+308>: mov    x0, x24
    0x1aa7001f0 <+312>: bl     0x1aa6fffb8               ; allocateBuckets(unsigned int)
    0x1aa7001f4 <+316>: sub    w8, w24, #0x1             ; =0x1 
    0x1aa7001f8 <+320>: orr    x8, x0, x8, lsl #48
    0x1aa7001fc <+324>: str    x8, [x20, #0x10]
    0x1aa700200 <+328>: strh   wzr, [x20, #0x1e]
    0x1aa700204 <+332>: b      0x1aa70014c               ; <+148>
    0x1aa700208 <+336>: lsl    w8, w24, #1
    0x1aa70020c <+340>: cmp    w24, #0x0                 ; =0x0 
    0x1aa700210 <+344>: orr    w9, wzr, #0x4
    0x1aa700214 <+348>: csel   w8, w9, w8, eq
    0x1aa700218 <+352>: cmp    w8, #0x10, lsl #12        ; =0x10000 
    0x1aa70021c <+356>: orr    w9, wzr, #0x10000
    0x1aa700220 <+360>: csel   w25, w8, w9, lo
    0x1aa700224 <+364>: ldr    x26, [x20, #0x10]
    0x1aa700228 <+368>: mov    x0, x25
    0x1aa70022c <+372>: bl     0x1aa6fffb8               ; allocateBuckets(unsigned int)
    0x1aa700230 <+376>: sub    w8, w25, #0x1             ; =0x1 
    0x1aa700234 <+380>: orr    x8, x0, x8, lsl #48
    0x1aa700238 <+384>: str    x8, [x20, #0x10]
    0x1aa70023c <+388>: strh   wzr, [x20, #0x1e]
    0x1aa700240 <+392>: and    x0, x26, #0xfffffffffff
    0x1aa700244 <+396>: mov    x1, x24
    0x1aa700248 <+400>: bl     0x1aa700254               ; cache_collect_free(bucket_t*, unsigned int)
    0x1aa70024c <+404>: mov    x24, x25
    0x1aa700250 <+408>: b      0x1aa70014c               ; <+148>

关键信息

定位的过程就不说了, 一个不会汇编的人这点儿东西找了一个多小时才找到大概可能的位置...

这儿都是我瞎蒙的, 欢迎拍砖

瞎蒙的内容和代码写一起, 方便解释

// 首先读一下x20, 下面用到, 需要对比
(lldb) register read x20
     x20 = 0x0000000104acd540  (void *)0x0000000104acd518: LGPerson
(lldb) x 0x0000000104acd540
0x104acd540: 18 d5 ac 04 01 00 00 00 58 b2 5d f2 01 00 00 00  ........X.].....
0x104acd550: 90 fa 71 aa 01 00 00 00 00 00 00 00 10 80 00 00  ..q.............


  //  register read 得出x24为4, 存入x0当成函数参数, 传递给allocateBuckets
0x1aa7001ec <+308>: mov    x0, x24

  // 开辟空间, 返回结果x0为0x00000002802ab280 (buckets数组地址)
0x1aa7001f0 <+312>: bl     0x1aa6fffb8               ; allocateBuckets(unsigned int)

  // w24是4, w8 = w24 - 1
  // 猜测为mask = occupied -1 
0x1aa7001f4 <+316>: sub    w8, w24, #0x1             ; =0x1 
  
  // x8左移48位, 然后与x0进行异或运算, 赋值给x8
  // 执行结果为: x8 = 0x00030002802ab280
  // 发现mask被赋值在buckets的开始2字节了
0x1aa7001f8 <+320>: orr    x8, x0, x8, lsl #48

  // x8赋值给class偏移16字节位置
  // 执行结果为: 
  //  (lldb) x 0x0000000104acd540
  //  0x104acd540: 18 d5 ac 04 01 00 00 00 58 b2 5d f2 01 00 00 00  ........X.].....
  //  0x104acd550: 80 b2 2a 80 02 00 03 00 00 00 00 00 10 80 00 00  ..*.............
  // 对比之前的x20, 发现buckets&mask赋值在偏移16字节处
0x1aa7001fc <+324>: str    x8, [x20, #0x10]

  // 把wzr赋值给偏移30字节处
  // 执行结果, x20未变
0x1aa700200 <+328>: strh   wzr, [x20, #0x1e]

其中, 重点为这一句: 0x1aa7001f8 <+320>: orr x8, x0, x8, lsl #48

将2字节的mask放在了buckets数组指针的前两个字节处, occupied依然为偏移30的2字节

对比验证

把最开始的问题输出拿过来分析..

(lldb) x p1.class
0x101001540: 18 15 00 01 01 00 00 00 58 b2 5d f2 01 00 00 00  ........X.].....
0x101001550: 00 83 ba 81 02 00 03 00 00 00 00 00 10 80 02 00  ................

00 83 ba 81 02 00 | 03 00 | 00 00 00 00 10 80 | 02 00

恩......按照上面那一堆瞎蒙, 好像这儿也....可以读通了???

buckets六字节?

按上面的说法, buckets只剩6字节的空间了, 一个地址6字节怎么成

查看上面的汇编代码, 两处alloc之后, 都跳转向了0x1aa70014c, 对照旧版代码, 猜测这儿是bucket_t *bucket = cache->find(key, receiver);的实现部分

这儿是另外一次进去, 类信息已经和上一次有了变化!

// 输出类信息备用, 当前存于x20
(lldb) x/4gx 0x0000000100385540
0x100385540: 0x0000000100385518 0x00000001f25db258
0x100385550: 0x000700028381d080 0x0000801000000000

  // 将x23地址指向的赋值给x8
  // 输出x23目标, 保存的是0x000700028381d080, 正好是类的buckets部分
0x1aa70014c <+148>: ldr    x8, [x23]
  // buckets和0xfffffffffff进行了与运算!!!!!
0x1aa700150 <+152>: and    x8, x8, #0xfffffffffff

0xfffffffffff, 取buckets时, 掩码取后面的44位, 其余位设置为0, 使用掩码的方式分离了mask部分

mask占据前面的16位, 中间4位无用

推出结论

cache_t现在buckets和mask共同占用8字节(联合体?), mask占据2字节, buckets实际占用44位

occupied依然是偏移30位占据2个字节

结论总结

看看就行, 不要随意相信, 因为是我瞎蒙的

macOS Catalina下为:
    struct cache_t {
        struct bucket_t *_buckets;
        uint32_t _mask;
        uint16_t _flags;  // 就先起个名叫flags吧, 无所谓了
        uint16_t _occupied;
    }


iOS13下为:
    buckets和mask一起占用8字节, mask占据其中2字节
    occupied依然占据2字节