探究OC 类(Class)(续)

388 阅读7分钟

探究OC类Class中, 知道了对象方法存在类的class_rw_t中,类方法存储在对应的class的元类class_rw_t中。以及成员变量存储在class_ro_t中。 从结构体命名可以了解到,一个是可读可写, 一个是只读。为什么去这样设计呢? 通过Advancements in the Objective-C runtime就可以了解clean memorydirty memory,通过它们对内存进行优化。从而设计class_ro_tclass_rw_t这两种数据结构的原因。因为当一个类从磁盘加载时,类的信息是不会发生改变的,只有在运行时才有可能发生改变(比如加载分类)。所以出于这种情况,苹果设计出了class_ro_t。可以移除class_ro_t对应的内存空间,当需要的时候再次从磁盘中获取,从而达到内存优化的目的。而class_rw_t是在类运行时产生的,并获取干净内存class_ro_t

在[探究OC类Class]通过LLDB打印,从class的 class_data_bits_t* bits中获取到了属性,方法,接下来继续研究一下成员变量和属性以及编码。

Class的成员变量和属性以及编码

@interface LGPerson : NSObject
{
    NSString * hobby;
}
//isa 8 0-7
@property(atomic)NSString * name; //8   8 9 10 11 12 13 14 15
@property(nonatomic, assign)int age; //4   16-19
    
-(void)say;
+(void)play;
(lldb) x/5gx LGPerson.class
0x100008248: 0x0000000100008220 0x000000010036a140
0x100008258: 0x0000000100362380 0x0000802c00000000
0x100008268: 0x00000001013833b4
(lldb) p (class_data_bits_t *)0x100008268
(class_data_bits_t *) $2 = 0x0000000100008268
(lldb) p $2->data()
(class_rw_t *) $3 = 0x00000001013833b0
(lldb) p $3->methods()
(const method_array_t) $4 = {
  list_array_tt<method_t, method_list_t, method_list_t_authed_ptr> = {
     = {
      list = {
        ptr = 0x00000001000080d8
      }
      arrayAndFlag = 4295000280
    }
  }
}
(lldb) p $4.list
(const method_list_t_authed_ptr<method_list_t>) $5 = {
  ptr = 0x00000001000080d8
}
(lldb) p $5.ptr
(method_list_t *const) $6 = 0x00000001000080d8
(lldb) p *$6
(method_list_t) $7 = {
  entsize_list_tt<method_t, method_list_t, 4294901763, method_t::pointer_modifier> = (entsizeAndFlags = 27, count = 6)
}
(lldb) p $7.get(0).big()
(method_t::big) $8 = {
  name = "say"
  types = 0x0000000100003f62 "v16@0:8"
  imp = 0x0000000100003d90 (KCObjcBuild`-[LGPerson say])
}
(lldb) p $7.get(1).big()
(method_t::big) $9 = {
  name = "name"
  types = 0x0000000100003f78 "@16@0:8"
  imp = 0x0000000100003da0 (KCObjcBuild`-[LGPerson name])
}
(lldb) p $7.get(2).big()
(method_t::big) $10 = {
  name = ".cxx_destruct"
  types = 0x0000000100003f62 "v16@0:8"
  imp = 0x0000000100003e30 (KCObjcBuild`-[LGPerson .cxx_destruct])
}
(lldb) p $7.get(3).big()
(method_t::big) $11 = {
  name = "setName:"
  types = 0x0000000100003f80 "v24@0:8@16"
  imp = 0x0000000100003dc0 (KCObjcBuild`-[LGPerson setName:])
}
(lldb) p $7.get(4).big()
(method_t::big) $12 = {
  name = "age"
  types = 0x0000000100003f8b "i16@0:8"
  imp = 0x0000000100003df0 (KCObjcBuild`-[LGPerson age])
}
(lldb) p $7.get(5).big()
(method_t::big) $13 = {
  name = "setAge:"
  types = 0x0000000100003f93 "v20@0:8i16"
  imp = 0x0000000100003e10 (KCObjcBuild`-[LGPerson setAge:])
}

Method Encode String

通过LLDB的打印,获取了所有的方法是一个big的结构体,包含nametypesimp 其中types = 0x0000000100003f93 "v20@0:8i16"types有一些特殊的符号,它们都代表什么呢?

Type Encodings中可以找到对应符号的意义。

比如v20@0:8i16

  1. v -> void,返回值为void
  2. 20 -> 所有参数所占长度为20字节
  3. @ -> 对象类型(这里代表id 类型的 self)
  4. 0 -> 代表从字节0号位置开始
  5. : -> SEL
  6. 8 -> 从8号位置开始
  7. i -> Int 类型
  8. 16 -> 从16号位置开始 总体的意思是: 方法的类型为, 一个返回值为void的setAge方法,其中有3个参数,总共占20字节, 第一个参数是对象类型,也就是self,从0字节开始(id typedef struct objc_object *id,指针占8字节),第二个参数是SEL(SEL typedef struct objc_selector *SEL,指针占8字节),从8字节开始,第三个参数为int(占4字节),从16字节开始,总共就占20字节,完美对应上了。

Property Encode String

同样通过LLDB的打印,获取了所有的方法是一个property_t的结构体, struct property_t { const char *name; const char *attributes; };

(lldb) p/x LGPerson.class
(Class) $1 = 0x0000000100008258 LGPerson
(lldb) p (class_data_bits_t *)0x0000000100008278
(class_data_bits_t *) $2 = 0x0000000100008278
(lldb) p $2->data()
(class_rw_t *) $3 = 0x0000000100723020
(lldb) p $3->ro()
(const class_ro_t *) $4 = 0x00000001000080a0
(lldb) p *$4
(const class_ro_t) $5 = {
  flags = 388
  instanceStart = 8
  instanceSize = 32
  reserved = 0
   = {
    ivarLayout = 0x0000000100003f2b "\x01\x11"
    nonMetaclass = 0x0000000100003f2b
  }
  name = {
    std::__1::atomic<const char *> = "LGPerson" {
      Value = 0x0000000100003f22 "LGPerson"
    }
  }
  baseMethodList = 0x00000001000080e8
  baseProtocols = 0x0000000000000000
  ivars = 0x0000000100008180
  weakIvarLayout = 0x0000000000000000
  baseProperties = 0x00000001000081e8
  _swiftMetadataInitializer_NEVER_USE = {}
}
(lldb) p $5.baseProperties
(property_list_t *const) $6 = 0x00000001000081e8
(lldb) p *$6
(property_list_t) $7 = {
  entsize_list_tt<property_t, property_list_t, 0, PointerModifierNop> = (entsizeAndFlags = 16, count = 2)
}
(lldb) p $7.get(0)
(property_t) $8 = (name = "name", attributes = "T@\"NSString\",&,V_name")
(lldb) p $7.get(1)
(property_t) $9 = (name = "age", attributes = "Ti,N,V_age")

包含nameattributes, 其中(property_t) $8 = (name = "name", attributes = "T@\"NSString\",&,V_name")attributes有一些特殊的符号,它们都代表什么呢?

Declared Properties 其中TV 代表编码的开始与结束 @ -> 同样为对象类型,"NSString" -> NSString类型, & -> 代表retain 强引用 也就是strong,_name也就是name属性在底层生成的成员变量的名字。 age属性编码的字符串中的N代表 nonatomic 总体的意思就是 属性列表中包含两属性, 他们分别是 属性名为name,它的特性分别为:原子性为atomic,类型为NSString,属性所对应的成员名为_name,并且强引用。其次是属性名为age,特性为:原子性为nonatomic,类型为int,属性所对应的成员名为_age,非强引用。

成员变量/实例变量

@interface LGPerson : NSObject
{
    NSString * hobby;
    int age;
}

其中hobby和age为成员变量,确切的hobby为实例变量,因为它是一个对象类型,非基本类型。其实成员变量/实例变量编码后的字符串与方法的差不多。

(ivar_t) $12 = {
  offset = 0x0000000100008218
  name = 0x0000000100003f33 "hobby"
  type = 0x0000000100003f78 "@\"NSString\""
  alignment_raw = 3
  size = 8
}
(lldb) p $11.get(1)
(ivar_t) $13 = {
  offset = 0x0000000100008220
  name = 0x0000000100003f39 "_age"
  type = 0x0000000100003f84 "i"
  alignment_raw = 2
  size = 4
}
(lldb) p $11.get(2)
(ivar_t) $14 = {
  offset = 0x0000000100008228
  name = 0x0000000100003f3e "_name"
  type = 0x0000000100003f78 "@\"NSString\""
  alignment_raw = 3
  size = 8
}
struct ivar_t {
#if __x86_64__
    // *offset was originally 64-bit on some x86_64 platforms.
    // We read and write only 32 bits of it.
    // Some metadata provides all 64 bits. This is harmless for unsigned 
    // little-endian values.
    // Some code uses all 64 bits. class_addIvar() over-allocates the 
    // offset for their benefit.
#endif
    int32_t *offset;
    const char *name;
    const char *type;
    // alignment is sometimes -1; use alignment() instead
    uint32_t alignment_raw;
    uint32_t size;

    uint32_t alignment() const {
        if (alignment_raw == ~(uint32_t)0) return 1U << WORD_SHIFT;
        return 1 << alignment_raw;
    }
};

ivar_t包含 offset, name, type, alignment_raw, size,通过命名就可以了解到它们的作用。其他type编码的字符串含义与上面属性跟方法的一样, 就不一一赘述了。

属性修饰符对class的影响

对象的isa本质中, 通过clang,我们发现对象在底层的是一个objc_object结构体, 并且发现属性生成getset方法是通过对self指针的内存平移拿到对象属性的指针,进行获取和赋值。 我们继续进一步分析。

@interface LGPerson : NSObject
{
    // STRING   int  double  float char bool
    NSString *hobby; // 字符串
    int a;
    NSObject *objc;  // 结构体
}

@property (nonatomic, copy) NSString *nickName;
@property (atomic, copy) NSString *acnickName;
@property (nonatomic) NSString *nnickName;
@property (atomic) NSString *anickName;

@property (nonatomic, strong) NSString *name;
@property (atomic, strong) NSString *aname;

@end
#ifndef _REWRITER_typedef_LGPerson
#define _REWRITER_typedef_LGPerson
typedef struct objc_object LGPerson;
typedef struct {} _objc_exc_LGPerson;
#endif

extern "C" unsigned long OBJC_IVAR_$_LGPerson$_nickName;
extern "C" unsigned long OBJC_IVAR_$_LGPerson$_nnickName;
extern "C" unsigned long OBJC_IVAR_$_LGPerson$_anickName;
extern "C" unsigned long OBJC_IVAR_$_LGPerson$_name;
extern "C" unsigned long OBJC_IVAR_$_LGPerson$_aname;
struct LGPerson_IMPL {
	struct NSObject_IMPL NSObject_IVARS;
	NSString *hobby;
	int a;
	NSObject *objc;
	NSString *_nickName;
	NSString *_acnickName;
	NSString *_nnickName;
	NSString *_anickName;
	NSString *_name;
	NSString *_aname;
};

static NSString * _I_LGPerson_nickName(LGPerson * self, SEL _cmd) { return (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_nickName)); }
extern "C" __declspec(dllimport) void objc_setProperty (id, SEL, long, id, bool, bool);

static void _I_LGPerson_setNickName_(LGPerson * self, SEL _cmd, NSString *nickName) { objc_setProperty (self, _cmd, __OFFSETOFIVAR__(struct LGPerson, _nickName), (id)nickName, 0, 1); }

extern "C" __declspec(dllimport) id objc_getProperty(id, SEL, long, bool);

static NSString * _I_LGPerson_acnickName(LGPerson * self, SEL _cmd) { typedef NSString * _TYPE;
return (_TYPE)objc_getProperty(self, _cmd, __OFFSETOFIVAR__(struct LGPerson, _acnickName), 1); }
static void _I_LGPerson_setAcnickName_(LGPerson * self, SEL _cmd, NSString *acnickName) { objc_setProperty (self, _cmd, __OFFSETOFIVAR__(struct LGPerson, _acnickName), (id)acnickName, 1, 1); }

static NSString * _I_LGPerson_nnickName(LGPerson * self, SEL _cmd) { return (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_nnickName)); }
static void _I_LGPerson_setNnickName_(LGPerson * self, SEL _cmd, NSString *nnickName) { (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_nnickName)) = nnickName; }

static NSString * _I_LGPerson_anickName(LGPerson * self, SEL _cmd) { return (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_anickName)); }
static void _I_LGPerson_setAnickName_(LGPerson * self, SEL _cmd, NSString *anickName) { (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_anickName)) = anickName; }

static NSString * _I_LGPerson_name(LGPerson * self, SEL _cmd) { return (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_name)); }
static void _I_LGPerson_setName_(LGPerson * self, SEL _cmd, NSString *name) { (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_name)) = name; }

static NSString * _I_LGPerson_aname(LGPerson * self, SEL _cmd) { return (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_aname)); }
static void _I_LGPerson_setAname_(LGPerson * self, SEL _cmd, NSString *aname) { (*(NSString **)((char *)self + OBJC_IVAR_$_LGPerson$_aname)) = aname; }

通过Clang将OC文件重写成C++文件(clang -rewrite-objc xx.c -o xx.cpp),可以清楚看到LGPerson是一个objc_object,属性生成为对应的带下划线的的成员,并且生成的对应的getset方法。

不过有一点不同的是,有的getset方法是通过对self指针内存平移进行赋值取值。而有的是通过objc_setPropertyobjc_getProperty函数进行赋值取值。

为什么会这样做呢?

我们清楚OC Runtime在调用方法时,是通过对应的SEL去找到对应的IMP,而此时却多了一层objc_setPropertyobjc_getProperty,那说明,objc_setPropertyobjc_getProperty内部会对SEL与真正的IMP进行绑定。

extern "C" __declspec(dllimport) void objc_setProperty (id, SEL, long, id, bool, bool); extern "C" __declspec(dllimport) id objc_getProperty(id, SEL, long, bool);

从函数的定义中我们也大概能猜测到,参数中包含selfSEL。然后SEL与IMP重新映射我们没有看见基于运行时的代码,那它肯定是在编译期间做了映射。于是查看LVVM源码,查找objc_setProperty

image.pnggetSetPropertyFn方法中看见此时创建了objc_setProperty方法,继续查看getSetPropertyFn方法在哪里被调用

image.png

image.png

 PropertyImplStrategy strategy(CGM, propImpl);
  switch (strategy.getKind()) {
  case PropertyImplStrategy::Native: {
    // We don't need to do anything for a zero-size struct.
    if (strategy.getIvarSize().isZero())
      return;

    Address argAddr = GetAddrOfLocalVar(*setterMethod->param_begin());

    LValue ivarLValue =
      EmitLValueForIvar(TypeOfSelfObject(), LoadObjCSelf(), ivar, /*quals*/ 0);
    Address ivarAddr = ivarLValue.getAddress(*this);

    // Currently, all atomic accesses have to be through integer
    // types, so there's no point in trying to pick a prettier type.
    llvm::Type *bitcastType =
      llvm::Type::getIntNTy(getLLVMContext(),
                            getContext().toBits(strategy.getIvarSize()));

    // Cast both arguments to the chosen operation type.
    argAddr = Builder.CreateElementBitCast(argAddr, bitcastType);
    ivarAddr = Builder.CreateElementBitCast(ivarAddr, bitcastType);

    // This bitcast load is likely to cause some nasty IR.
    llvm::Value *load = Builder.CreateLoad(argAddr);

    // Perform an atomic store.  There are no memory ordering requirements.
    llvm::StoreInst *store = Builder.CreateStore(load, ivarAddr);
    store->setAtomic(llvm::AtomicOrdering::Unordered);
    return;
  }

case PropertyImplStrategy::Native这个分支中, 它获取了classImpl, propImpl, 通过

ObjCIvarDecl *ivar = propImpl->getPropertyIvarDecl();
ObjCMethodDecl *setterMethod = propImpl->getSetterMethodDecl();

拿到ivar, setterMethod,然后通过平移拿到对应变量的地址和值。 然后case为case PropertyImplStrategy::GetSetProperty: case PropertyImplStrategy::SetPropertyAndExpressionGet:就需要重新映射到objc_setProperty,什么时候修改PropertyImplStrategykind呢?于是就找到了PropertyImplStrategy的构造方法。

image.png 可以看到, 原来是属性的修饰符为copy时,会对imp重新映射。然后按同样的方法查找objc_getProperty,会发现同时copy修饰的属性,如果原子性不为Atomic,get方法就不需要重新映射。

类方法的存储

在[探究OC类Class]通过LLDB打印,了解到对象方法存储在class中, 类方法是存储在对应的元类中,下面通过api的方式来打印一下:

@interface LGPerson : NSObject
{
    NSObject *objc; 
    NSString *nickName;
}
@property (nonatomic, copy) NSString *name;
@property (nonatomic, strong) NSObject *obj;


- (void)sayHello;
+ (void)sayHappy;

@end
void lgInstanceMethod_classToMetaclass(Class pClass){
    
    const char *className = class_getName(pClass);
    Class metaClass = objc_getMetaClass(className);
    
    Method method1 = class_getInstanceMethod(pClass, @selector(sayHello));
    Method method2 = class_getInstanceMethod(metaClass, @selector(sayHello));

    Method method3 = class_getInstanceMethod(pClass, @selector(sayHappy));
    Method method4 = class_getInstanceMethod(metaClass, @selector(sayHappy));
    
    LGLog(@"%s - %p-%p-%p-%p",__func__,method1,method2,method3,method4);
}

image.png 从打印结果中,从metaClass获取对象方法sayHello为空, 从pClass中去获取sayHappy对象方法为空,一切都OK。

void lgClassMethod_classToMetaclass(Class pClass){
    
    const char *className = class_getName(pClass);
    Class metaClass = objc_getMetaClass(className);
    
    Method method1 = class_getClassMethod(pClass, @selector(sayHello));
    Method method2 = class_getClassMethod(metaClass, @selector(sayHello));

//    - (void)sayHello;
//    + (void)sayHappy;
    Method method3 = class_getClassMethod(pClass, @selector(sayHappy));
    Method method4 = class_getClassMethod(metaClass, @selector(sayHappy));
    
    LGLog(@"%s-%p-%p-%p-%p",__func__,method1,method2,method3,method4);
}

接着用过获取class方法来检测一下:

image.png 出乎意料的是元类获取类方法居然同样返回了。查看源码

Method class_getClassMethod(Class cls, SEL sel)
{
    if (!cls  ||  !sel) return nil;

    return class_getInstanceMethod(cls->getMeta(), sel);
}

Class getMeta() {
        if (isMetaClassMaybeUnrealized()) return (Class)this;
        else return this->ISA();
    }

可以看到其实在底层并没有所谓的类方法,所有的方法都是对象方法,确切的说都是结构体的成员函数。当元类获取类方式时,会判断当前是否是元类, 如果是元类,就直接返回对象方法。

我们通过打印方法的IMP接着测试 image.png 变得更奇怪了,居然都有返回IMP指针,但是元类打印sayHello的IMP与打印sayHappy的IMP是一样的。按常规思维这两个IMP 应该是为nil,但又都返回了并且IMP 指针一样。

IMP class_getMethodImplementation(Class cls, SEL sel)
{
    IMP imp;

    if (!cls  ||  !sel) return nil;

    lockdebug_assert_no_locks_locked_except({ &loadMethodLock });

    imp = lookUpImpOrNilTryCache(nil, sel, cls, LOOKUP_INITIALIZE | LOOKUP_RESOLVER);

    // Translate forwarding function to C-callable external version
    if (!imp) {
        return _objc_msgForward;
    }

    return imp;
}

查看源码,原来当没找到IMP时,统一返回了 _objc_msgForward,所以导致返回的IMP一致。

总结:

  1. apple在OC runtime底层进行了内存优化,提出clean memory和dirty memory设计。从而产生了class_ro_tclass_rw_t这两种数据类型。当class从disk加载时,这个时候是一份干净内存存储在class_ro_t,当在运行时(可能添加分类,类可能会被修改),此时会生成一份脏内存class_rw_t,可以情况干净内存,当需要时再从disk中加载,从而达到内存优化的目的。

  2. 通过对class内存中方法,属性以及成员变量的打印,理解到方法及属性编码后的类型。

  3. 进一步对OC对象底层分享, 通过clang重写OC文件并结合LVVM,了解到属性的修饰符对getset方法的影响。 set方法是否通过objc_setProperty映射,受属性修饰符影响: copy属性影响set方法通过objc_setProperty映射,反之通过内存平移进行赋值。

get方法是否通过objc_getProperty映射,受属性修饰符影响: 当属性同时被atomiccopy修饰,get走objc_getProperty映射,反之走内存平移。 同时Retainatomic也影响是否通过objc_getProperty映射,反之走内存平移。

  1. 通过API的形式打印class的方法存储,发现在底层所有的方法都成为函数,并没有所谓的对象方法类方法。当元类获取class方法时,此时会将对象返回直接返回,并且当未找到IMP时,会统一返回_objc_msgForward,作为消息转发的标识。