从库文件到类的内存加载

694 阅读8分钟

前言

通过前面的学习,基本已经搞清楚了,dyld对于底层镜像文件的处理,把这些镜像文件映射到程序中,但仍未变成内存中的数据,只是库或者名字而已,内存中的数据并没有加载到内存,没有加载到内存便无法读取里面的方法、协议、成员变量等等。我们还知道这些镜像文件的格式是mach0,那么系统在加载的时候,是如何把这些库文件加载到内存中的呢?最好的方式就是通过把mach0格式的数据读取到内存中,类似于表结构一样存储,每张表存储对应的类,然后就可以通过rw ro把内容放到类里面。

初始化准备工作

截屏2021-07-16 上午10.42.26.png

environ_init() : 读取影响运行时的环境变量。如果需要,还可以打印环境变量帮助。

tls_init() 关于线程key的绑定 - 比如每线程数据的析构函数

static_init() 运行C ++静态构造函数。在dyld调用我们的静态构造函数之前,libc 会调用 _objc_init(), 这里他会自己调用本身的静态函数,不需要dyld调用,为了及时性。

lock_init() 没有重写,采用C++ 的特性

exception_init () 初始化libobjc的异常处理系统

runtime_init() runtime运行时环境初始化,里面主要 是:unattachedCategories allocatedClasses

cache_init() 缓存条件初始化

_imp_implementationWithBlock_init 启动回调机制。通常这不会做什么,因为所有的初始化都 是惰性的,但是对于某些进程,我们会迫不及待地加载trampolines dylib。

map_images

了解是前期准备~~下面开始重点。 截屏2021-07-16 上午10.41.50.png

_dyld_objc_notify_register(&map_images, load_images, unmap_image);

map_images是指针拷贝!即值拷贝,同步变化。

_read_images

通过map_images进入到map_images_nolock,目的是为了看,镜像文件是如何加入到表中,在map_images_nolock顾名思义,_read_images就是读取的关键方法。

整体流程

截屏2021-07-16 下午2.22.43.png

doneOnce

UnfixedSelectors

方法修复

static size_t UnfixedSelectors;
    {
        mutex_locker_t lock(selLock);
        for (EACH_HEADER) {
            if (hi->hasPreoptimizedSelectors()) continue;

            bool isBundle = hi->isBundle();
            SEL *sels = _getObjc2SelectorRefs(hi, &count);
            UnfixedSelectors += count;
            for (i = 0; i < count; i++) {
                const char *name = sel_cname(sels[i]);
                SEL sel = sel_registerNameNoLock(name, isBundle);
                if (sels[i] != sel) {
                    sels[i] = sel;
                }
            }
        }
    }

这里对方法的地址重新修复和绑定,macho的地址,不准确,编译时存在地址的偏移,以dyld为准。

readClass

for (EACH_HEADER) {
        if (! mustReadClasses(hi, hasDyldRoots)) {
            // Image is sufficiently optimized that we need not call readClass()
            continue;
        }

        classref_t const *classlist = _getObjc2ClassList(hi, &count);

        bool headerIsBundle = hi->isBundle();
        bool headerIsPreoptimized = hi->hasPreoptimizedClasses();

        for (i = 0; i < count; i++) {
            Class cls = (Class)classlist[i];
            Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);

            if (newCls != cls  &&  newCls) {
                // Class was moved but not deleted. Currently this occurs 
                // only when the new class resolved a future class.
                // Non-lazily realize the class below.
                resolvedFutureClasses = (Class *)
                    realloc(resolvedFutureClasses, 
                            (resolvedFutureClassCount+1) * sizeof(Class));
                resolvedFutureClasses[resolvedFutureClassCount++] = newCls;
            }
        }
    }

在这里可以看到有readClass方法

Class readClass(Class cls, bool headerIsBundle, bool headerIsPreoptimized)
{
    const char *mangledName = cls->nonlazyMangledName();
    
    const char *NXPersonName = "NXPerson";

    if (strcmp(mangledName, LGPersonName) == 0) {
        // 普通写得类 他是如何
        printf("%s -NX: 要研究的: - %s\n",__func__,mangledName);
    }
    
    if (missingWeakSuperclass(cls)) {
        // No superclass (probably weak-linked). 
        // Disavow any knowledge of this subclass.
        if (PrintConnecting) {
            _objc_inform("CLASS: IGNORING class '%s' with "
                         "missing weak-linked superclass", 
                         cls->nameForLogging());
        }
        addRemappedClass(cls, nil);
        cls->setSuperclass(nil);
        return nil;
    }
    
    cls->fixupBackwardDeployingStableSwift();

    Class replacing = nil;
    if (mangledName != nullptr) {
        if (Class newCls = popFutureNamedClass(mangledName)) {
            // This name was previously allocated as a future class.
            // Copy objc_class to future class's struct.
            // Preserve future's rw data block.

            if (newCls->isAnySwift()) {
                _objc_fatal("Can't complete future class request for '%s' "
                            "because the real class is too big.",
                            cls->nameForLogging());
            }

            class_rw_t *rw = newCls->data();
            const class_ro_t *old_ro = rw->ro();
            memcpy(newCls, cls, sizeof(objc_class));

            // Manually set address-discriminated ptrauthed fields
            // so that newCls gets the correct signatures.
            newCls->setSuperclass(cls->getSuperclass());
            newCls->initIsa(cls->getIsa());

            rw->set_ro((class_ro_t *)newCls->data());
            newCls->setData(rw);
            freeIfMutable((char *)old_ro->getName());
            free((void *)old_ro);

            addRemappedClass(cls, newCls);

            replacing = cls;
            cls = newCls;
        }
    }
    
    if (headerIsPreoptimized  &&  !replacing) {
        // class list built in shared cache
        // fixme strict assert doesn't work because of duplicates
        // ASSERT(cls == getClass(name));
        ASSERT(mangledName == nullptr || getClassExceptSomeSwift(mangledName));
    } else {
        if (mangledName) { //some Swift generic classes can lazily generate their names
            addNamedClass(cls, mangledName, replacing);
        } else {
            Class meta = cls->ISA();
            const class_ro_t *metaRO = meta->bits.safe_ro();
            ASSERT(metaRO->getNonMetaclass() && "Metaclass with lazy name must have a pointer to the corresponding nonmetaclass.");
            ASSERT(metaRO->getNonMetaclass() == cls && "Metaclass nonmetaclass pointer must equal the original class.");
        }
        addClassTableEntry(cls);
    }

    // for future reference: shared cache never contains MH_BUNDLEs
    if (headerIsBundle) {
        cls->data()->flags |= RO_FROM_BUNDLE;
        cls->ISA()->data()->flags |= RO_FROM_BUNDLE;
    }
    
    return cls;
}

到了这里可以猜想这边可能存在rw ro的操作,但是通过打下断点,找我们自己的类后面有没有做相应的操作,但是事实却没有,只走了addNamedClass和addClassTableEntry,没有加载到ro rw。

addNamedClass 哈希map加入到gdb的表中,传入的name就是外部的mangledName。

static void addNamedClass(Class cls, const char *name, Class replacing = nil)
{
    runtimeLock.assertLocked();
    Class old;
    if ((old = getClassExceptSomeSwift(name))  &&  old != replacing) {
        inform_duplicate(name, old, cls);

        // getMaybeUnrealizedNonMetaClass uses name lookups.
        // Classes not found by name lookup must be in the
        // secondary meta->nonmeta table.
        addNonMetaClass(cls);
    } else {
        NXMapInsert(gdb_objc_realized_classes, name, cls);
    }
    ASSERT(!(cls->data()->flags & RO_META));

    // wrong: constructed classes are already realized when they get here
    // ASSERT(!cls->isRealized());
}

这里只是先加入到了表,猜测应该是先加到表,在做ro rw。这里也加入了元类,说明在readClass中没有做其他操作了,那么可以继续往read_images的其他方法里看,通过同样的方式打下断点,查看关键方法。

addClassTableEntry(Class cls, bool addMeta = true)
{
    runtimeLock.assertLocked();

    // This class is allowed to be a known class via the shared cache or via
    // data segments, but it is not allowed to be in the dynamic table already.
    auto &set = objc::allocatedClasses.get();

    ASSERT(set.find(cls) == set.end());

    if (!isKnownClass(cls))
        set.insert(cls);
    if (addMeta)
        addClassTableEntry(cls->ISA(), false);
}

Class newCls = readClass(cls, headerIsBundle, headerIsPreoptimized);

通过readClass可以知道这个类,但是类里面的东西还不能完全知道。

realizeClassWithoutSwift

通过在_read_images里打下断点判断我们自己的类,可以看到,他只是走了realizeClassWithoutSwift方法研究关联类,那么可以作进一步细看。

static Class realizeClassWithoutSwift(Class cls, Class previously)
{
    runtimeLock.assertLocked();

    class_rw_t *rw;
    Class supercls;
    Class metacls;

    if (!cls) return nil;
    if (cls->isRealized()) {
        validateAlreadyRealizedClass(cls);
        return cls;
    }
    ASSERT(cls == remapClass(cls));

    // fixme verify class is not in an un-dlopened part of the shared cache?

    auto ro = (const class_ro_t *)cls->data();
    auto isMeta = ro->flags & RO_META;
    if (ro->flags & RO_FUTURE) {
        // This was a future class. rw data is already allocated.
        rw = cls->data();
        ro = cls->data()->ro();
        ASSERT(!isMeta);
        cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
    } else {
        // Normal class. Allocate writeable class data.
        rw = objc::zalloc<class_rw_t>();
        rw->set_ro(ro);
        rw->flags = RW_REALIZED|RW_REALIZING|isMeta;
        cls->setData(rw);
    }

    cls->cache.initializeToEmptyOrPreoptimizedInDisguise();

#if FAST_CACHE_META
    if (isMeta) cls->cache.setBit(FAST_CACHE_META);
#endif

    // Choose an index for this class.
    // Sets cls->instancesRequireRawIsa if indexes no more indexes are available
    cls->chooseClassArrayIndex();

    if (PrintConnecting) {
        _objc_inform("CLASS: realizing class '%s'%s %p %p #%u %s%s",
                     cls->nameForLogging(), isMeta ? " (meta)" : "", 
                     (void*)cls, ro, cls->classArrayIndex(),
                     cls->isSwiftStable() ? "(swift)" : "",
                     cls->isSwiftLegacy() ? "(pre-stable swift)" : "");
    }

    // Realize superclass and metaclass, if they aren't already.
    // This needs to be done after RW_REALIZED is set above, for root classes.
    // This needs to be done after class index is chosen, for root metaclasses.
    // This assumes that none of those classes have Swift contents,
    //   or that Swift's initializers have already been called.
    //   fixme that assumption will be wrong if we add support
    //   for ObjC subclasses of Swift classes.
    supercls = realizeClassWithoutSwift(remapClass(cls->getSuperclass()), nil);
    metacls = realizeClassWithoutSwift(remapClass(cls->ISA()), nil);

#if SUPPORT_NONPOINTER_ISA
    if (isMeta) {
        // Metaclasses do not need any features from non pointer ISA
        // This allows for a faspath for classes in objc_retain/objc_release.
        cls->setInstancesRequireRawIsa();
    } else {
        // Disable non-pointer isa for some classes and/or platforms.
        // Set instancesRequireRawIsa.
        bool instancesRequireRawIsa = cls->instancesRequireRawIsa();
        bool rawIsaIsInherited = false;
        static bool hackedDispatch = false;

        if (DisableNonpointerIsa) {
            // Non-pointer isa disabled by environment or app SDK version
            instancesRequireRawIsa = true;
        }
        else if (!hackedDispatch  &&  0 == strcmp(ro->getName(), "OS_object"))
        {
            // hack for libdispatch et al - isa also acts as vtable pointer
            hackedDispatch = true;
            instancesRequireRawIsa = true;
        }
        else if (supercls  &&  supercls->getSuperclass()  &&
                 supercls->instancesRequireRawIsa())
        {
            // This is also propagated by addSubclass()
            // but nonpointer isa setup needs it earlier.
            // Special case: instancesRequireRawIsa does not propagate
            // from root class to root metaclass
            instancesRequireRawIsa = true;
            rawIsaIsInherited = true;
        }

        if (instancesRequireRawIsa) {
            cls->setInstancesRequireRawIsaRecursively(rawIsaIsInherited);
        }
    }
// SUPPORT_NONPOINTER_ISA
#endif

    // Update superclass and metaclass in case of remapping
    cls->setSuperclass(supercls);
    cls->initClassIsa(metacls);

    // Reconcile instance variable offsets / layout.
    // This may reallocate class_ro_t, updating our ro variable.
    if (supercls  &&  !isMeta) reconcileInstanceVariables(cls, supercls, ro);

    // Set fastInstanceSize if it wasn't set already.
    cls->setInstanceSize(ro->instanceSize);

    // Copy some flags from ro to rw
    if (ro->flags & RO_HAS_CXX_STRUCTORS) {
        cls->setHasCxxDtor();
        if (! (ro->flags & RO_HAS_CXX_DTOR_ONLY)) {
            cls->setHasCxxCtor();
        }
    }
    
    // Propagate the associated objects forbidden flag from ro or from
    // the superclass.
    if ((ro->flags & RO_FORBIDS_ASSOCIATED_OBJECTS) ||
        (supercls && supercls->forbidsAssociatedObjects()))
    {
        rw->flags |= RW_FORBIDS_ASSOCIATED_OBJECTS;
    }

    // Connect this class to its superclass's subclass lists
    if (supercls) {
        addSubclass(supercls, cls);
    } else {
        addRootClass(cls);
    }

    // Attach categories
    methodizeClass(cls, previously);

    return cls;
}

在进入之前先打断点,打印输出ro,methodlist无数据,说明此时数据还没有装载进类里面并且首次进来的时候,会走else里面,也就是对ro数据进行copy到rw,这里才是真正的从ro->rw

截屏2021-07-16 下午3.19.45.png

嘟嘟嘟嘟往下看,从这里开始可以看到是绑定了元类和父类的继承关系。但是始终没有关于ro和rw的操作。

    supercls = realizeClassWithoutSwift(remapClass(cls->getSuperclass()), nil);
    metacls = realizeClassWithoutSwift(remapClass(cls->ISA()), nil);

目前我们想看的是在哪里对ro rw赋值,但是直到return之前都是发现无法打印,猜测可能是methodizeClass方法做了处理,在方法开始的时候打印ro rw也是没有值得,猜测应该里下面做了具体操作了。(包括此处需要对元类进行单独判断)

methodizeClass

在存在list的情况下,系统做了prepareMethodLists方法处理,进入以后,可以看到他做了fixupMethodList的方法.

fixupMethodList(method_list_t *mlist, bool bundleCopy, bool sort)
{
    runtimeLock.assertLocked();
    ASSERT(!mlist->isFixedUp());

    // fixme lock less in attachMethodLists ?
    // dyld3 may have already uniqued, but not sorted, the list
    if (!mlist->isUniqued()) {
        mutex_locker_t lock(selLock);
    
        // Unique selectors in list.
        for (auto& meth : *mlist) {
            const char *name = sel_cname(meth.name());
            meth.setName(sel_registerNameNoLock(name, bundleCopy));
        }
    }

    // Sort by selector address.
    // Don't try to sort small lists, as they're immutable.
    // Don't try to sort big lists of nonstandard size, as stable_sort
    // won't copy the entries properly.
    if (sort && !mlist->isSmallList() && mlist->entsize() == method_t::bigSize) {
        method_t::SortBySELAddress sorter;
        std::stable_sort(&mlist->begin()->big(), &mlist->end()->big(), sorter);
    }
    
    // Mark method list as uniqued and sorted.
    // Can't mark small lists, since they're immutable.
    if (!mlist->isSmallList()) {
        mlist->setFixedUp();
    }
}

1.对sel的名字设置

2.对方法进行按地址排序

截屏2021-07-16 下午4.27.43.png

但此时仍然无法打印出ro 和 rw。

lazy Class

在之前的我们调试_read_images中可以进入realizeClassWithoutSwift,其实在默认情况下不会走,看一下上面的注释,只有实现了load方法才会走!

截屏2021-07-16 下午4.33.32.png

这些工作是在类加载的时候就做了,但是大部分的类,我们是需要用到的时候才会去加载,没必要在刚开始的时候全部加载,而我们知道加载类必然要走realizeClassWithoutSwift方法,那么这个方法如果是懒加载的话,在哪里走的呢?

realizeClassWithoutSwift方法中,卡好我们的类的断点,打印bt堆栈信息即可。追溯到lookUpImpOrForward - >realizeClassMaybeSwiftMaybeRelock

截屏2021-07-16 下午4.45.13.png

🤔一下,方法在何时调起,在进入到main函数以后,只要方法调起,就会走到lookUpImpOrForward方法,不管是alloc还是其他类方法,只要方法第一次被使用,就会加载。

RWE

在前面的探索中,系统并没有给rwe赋值,之前的学习中知道,这个rwe是rw的扩展,对一些动态加载的类信息包括分类等等的添加,那么我们就可以写一个分类,然后通过xcrun编译main文件,找到他t底层的实现,然后在源码中找到他具体的实现方式即可,也就是从外部蹭蹭,内部突破的方式。

通过编译可以看到,分类也有一个结构体,那么,分类的结构体是怎么加载到主类中的呢?

在methodizeClass中,有一句注释是Attach categories,并且下方调用了attachToClass方法,在attachToClass方法中调用了attachCategories,里面通过调用extAllocIfNeeded实现了对rwe的赋值.

截屏2021-07-18 上午7.55.48.png

通过全局搜索attachCategories找到发起者,除了attachToClass,还有load_categories_nolock,说明分类的加载是在这两个方法调了以后,那么这两个方法什么时候调用呢?下回再说~

后记

最近工作有点忙,都没有很多时间总结,哎~ 类的加载这部分内容还是很重要的,都是比较抽象和找方法的过程,如果哪一步错了基本上就走不回来了,经验仅记录,探索仍需努力。