iOS 底层探究:dyld加载分析中

1,090 阅读8分钟

这是我参与8月更文挑战的第22天,活动详情查看:8月更文挑战

承接上文,这篇文章让我们继续分析dyld的加载

3.3 mapSharedCache加载共享缓存

共享缓存专门缓存系统动态库,如:UIKitFoundation等。(自己的库、三方库不行)
mapSharedCache真正调用的是loadDyldCache

static void mapSharedCache(uintptr_t mainExecutableSlide)
{
    ……
    //真正调用的是loadDyldCache
    loadDyldCache(opts, &sSharedCacheLoadInfo);
    ……
}

3.3.1 loadDyldCache

bool loadDyldCache(const SharedCacheOptions& options, SharedCacheLoadInfo* results)
{
    results->loadAddress        = 0;
    results->slide              = 0;
    results->errorMessage       = nullptr;

#if TARGET_OS_SIMULATOR
    // simulator only supports mmap()ing cache privately into process
    return mapCachePrivate(options, results);
#else
    if ( options.forcePrivate ) {
        // mmap cache into this process only
        //仅加载到当前进程
        return mapCachePrivate(options, results);
    }
    else {
        // fast path: when cache is already mapped into shared region
        bool hasError = false;
        //已经加载不进行任何处理
        if ( reuseExistingCache(options, results) ) {
            hasError = (results->errorMessage != nullptr);
        } else {
            // slow path: this is first process to load cache
            //当前进程第一次加载
            hasError = mapCacheSystemWide(options, results);
        }
        return hasError;
    }
#endif
}

loadDyldCache3个逻辑:
1.仅加载到当前进程调用mapCachePrivate。不放入共享缓存,仅自己使用。
2.已经加载过不进行任何处理。
3.当前进程第一次加载调用mapCacheSystemWide

动态库的共享缓存在整个应用的启动过程中是最先被加载的。

3.4 instantiateFromLoadedImage 实例化主程序(创建image

static ImageLoaderMachO* instantiateFromLoadedImage(const macho_header* mh, uintptr_t slide, const char* path)
{
    //实例化image
    ImageLoader* image = ImageLoaderMachO::instantiateMainExecutable(mh, slide, path, gLinkContext);
    //将image添加到all Images中
    addImage(image);
    return (ImageLoaderMachO*)image;
//  throw "main executable not a known format";
}
  • 传入主程序的HeaderASLRpath实例化主程序生成image
  • image加入all images中。

实际上实例化真正调用的是ImageLoaderMachO::instantiateMainExecutable

// create image for main executable
ImageLoader* ImageLoaderMachO::instantiateMainExecutable(const macho_header* mh, uintptr_t slide, const char* path, const LinkContext& context)
{
    bool compressed;
    unsigned int segCount;
    unsigned int libCount;
    const linkedit_data_command* codeSigCmd;
    const encryption_info_command* encryptCmd;
    //获取Load Commands
    sniffLoadCommands(mh, path, false, &compressed, &segCount, &libCount, context, &codeSigCmd, &encryptCmd);
    // instantiate concrete class based on content of load commands
    //根据 compressed 确定用哪个子类进行加载image,ImageLoader是个抽象类,根据值选择对应的子类实例化image。
    if ( compressed ) 
        return ImageLoaderMachOCompressed::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
    else
#if SUPPORT_CLASSIC_MACHO
        return ImageLoaderMachOClassic::instantiateMainExecutable(mh, slide, path, segCount, libCount, context);
#else
        throw "missing LC_DYLD_INFO load command";
#endif
}
  • 调用sniffLoadCommands生成相关信息,比如compressed
  • 根据 compressed 确定用哪个子类进行加载imageImageLoader是个抽象类,根据值选择对应的子类实例化主程序。 sniffLoadCommands
void ImageLoaderMachO::sniffLoadCommands(const macho_header* mh, const char* path, bool inCache, bool* compressed,
                                            unsigned int* segCount, unsigned int* libCount, const LinkContext& context,
                                            const linkedit_data_command** codeSigCmd,
                                            const encryption_info_command** encryptCmd)
{
    //根据LC_DYLIB_INFO 和  LC_DYLD_INFO_ONLY 来获取的
    *compressed = false;
    //segment数量
    *segCount = 0;
    //lib数量
    *libCount = 0;
    //代码签名和加密
    *codeSigCmd = NULL;
    *encryptCmd = NULL;
        ……
    // fSegmentsArrayCount is only 8-bits
    //segCount 最多 256 个
    if ( *segCount > 255 )
        dyld::throwf("malformed mach-o image: more than 255 segments in %s", path);

    // fSegmentsArrayCount is only 8-bits
    //libCount最多 4096 个
    if ( *libCount > 4095 )
        dyld::throwf("malformed mach-o image: more than 4095 dependent libraries in %s", path);

    if ( needsAddedLibSystemDepency(*libCount, mh) )
        *libCount = 1;

    // dylibs that use LC_DYLD_CHAINED_FIXUPS have that load command removed when put in the dyld cache
    if ( !*compressed && (mh->flags & MH_DYLIB_IN_CACHE) )
        *compressed = true;
}
  • compressed是根据LC_DYLIB_INFOLC_DYLD_INFO_ONLY来获取的。
  • segCount最多256个。
  • libCount最多4096个。

3.5 loadInsertedDylib 插入&加载动态库

static void loadInsertedDylib(const char* path)
{
    unsigned cacheIndex;
    try {
    ……
        //调用load,加载动态库的真正函数
        load(path, context, cacheIndex);
    }
    ……
}
  • 根据上下文初始化配置调用load加载动态库。

3.6 ImageLoader::link链接主程序/动态库

void link(ImageLoader* image, bool forceLazysBound, bool neverUnload, const ImageLoader::RPathChain& loaderRPaths, unsigned cacheIndex)
{
    // add to list of known images.  This did not happen at creation time for bundles
    if ( image->isBundle() && !image->isLinked() )
        addImage(image);

    // we detect root images as those not linked in yet 
    if ( !image->isLinked() )
        addRootImage(image);
    
    // process images
    try {
        const char* path = image->getPath();
#if SUPPORT_ACCELERATE_TABLES
        if ( image == sAllCacheImagesProxy )
            path = sAllCacheImagesProxy->getIndexedPath(cacheIndex);
#endif
        //最终会调用到image的link
        image->link(gLinkContext, forceLazysBound, false, neverUnload, loaderRPaths, path);
    }
}
  • link最终调用的是ImageLoader::linkImageLoader::link
void ImageLoader::link(const LinkContext& context, bool forceLazysBound, bool preflightOnly, bool neverUnload, const RPathChain& loaderRPaths, const char* imagePath)
{   
    // clear error strings
    (*context.setErrorStrings)(0, NULL, NULL, NULL);
    //起始时间。用于记录时间间隔
    uint64_t t0 = mach_absolute_time();
    //递归加载主程序依赖的库,完成之后发通知。
    this->recursiveLoadLibraries(context, preflightOnly, loaderRPaths, imagePath);
    context.notifyBatch(dyld_image_state_dependents_mapped, preflightOnly);
……
    uint64_t t1 = mach_absolute_time();
    context.clearAllDepths();
    this->updateDepth(context.imageCount());

    __block uint64_t t2, t3, t4, t5;
    {
        dyld3::ScopedTimer(DBG_DYLD_TIMING_APPLY_FIXUPS, 0, 0, 0);
        t2 = mach_absolute_time();
        //Rebase修正ASLR
        this->recursiveRebaseWithAccounting(context);
        context.notifyBatch(dyld_image_state_rebased, false);

        t3 = mach_absolute_time();
        if ( !context.linkingMainExecutable )
            //绑定NoLazy符号
            this->recursiveBindWithAccounting(context, forceLazysBound, neverUnload);

        t4 = mach_absolute_time();
        if ( !context.linkingMainExecutable )
            //绑定弱符号
            this->weakBind(context);
        t5 = mach_absolute_time();
    }

    // interpose any dynamically loaded images
    if ( !context.linkingMainExecutable && (fgInterposingTuples.size() != 0) ) {
        dyld3::ScopedTimer timer(DBG_DYLD_TIMING_APPLY_INTERPOSING, 0, 0, 0);
        //递归应用插入的动态库
        this->recursiveApplyInterposing(context);
    }

    // now that all fixups are done, make __DATA_CONST segments read-only
    if ( !context.linkingMainExecutable )
        this->recursiveMakeDataReadOnly(context);

    if ( !context.linkingMainExecutable )
        context.notifyBatch(dyld_image_state_bound, false);
    uint64_t t6 = mach_absolute_time();

    if ( context.registerDOFs != NULL ) {
        std::vector<DOFInfo> dofs;
        this->recursiveGetDOFSections(context, dofs);
        //注册
        context.registerDOFs(dofs);
    }
    //计算结束时间.
    uint64_t t7 = mach_absolute_time();

    // clear error strings
    //配置环境变量,就可以看到dyld应用加载的时长。
    (*context.setErrorStrings)(0, NULL, NULL, NULL);
    fgTotalLoadLibrariesTime += t1 - t0;
    fgTotalRebaseTime += t3 - t2;
    fgTotalBindTime += t4 - t3;
    fgTotalWeakBindTime += t5 - t4;
    fgTotalDOF += t7 - t6;
    
    // done with initial dylib loads
    fgNextPIEDylibAddress = 0;
}
  • 修正ASLR
  • 绑定NoLazy符号。
  • 绑定弱符号。
  • 注册。
  • 记录时间,可以通过配置看到dyld应用加载时长。

3.7 initializeMainExecutable 初始化主程序

void initializeMainExecutable()
{
    // record that we've reached this step
    gLinkContext.startedInitializingMainExecutable = true;

    // run initialzers for any inserted dylibs
    //拿到所有的镜像文件
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        //从1开始到最后。(第0个为主程序)
        for(size_t i=1; i < rootCount; ++i) {
            //image初始化,调用 +load 和 构造函数
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
        }
    }
    
    // run initializers for main executable and everything it brings up
    //调用主程序初始化
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
    
    // register cxa_atexit() handler to run static terminators in all loaded images when this process exits
    if ( gLibSystemHelpers != NULL ) 
        (*gLibSystemHelpers->cxa_atexit)(&runAllStaticTerminators, NULL, NULL);

    // dump info if requested
    if ( sEnv.DYLD_PRINT_STATISTICS )
        ImageLoader::printStatistics((unsigned int)allImagesCount(), initializerTimes[0]);
    if ( sEnv.DYLD_PRINT_STATISTICS_DETAILS )
        ImageLoaderMachO::printStatisticsDetails((unsigned int)allImagesCount(), initializerTimes[0]);
}
  • 初始化images,下标从1开始,然后再初始化主程序(下标0runInitializers
  • 可以配置环境变量DYLD_PRINT_STATISTICSDYLD_PRINT_STATISTICS_DETAILS打印相关信息。 dyld ImageLoader::runInitializers(ImageLoader.cpp)
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time();
    mach_port_t thisThread = mach_thread_self();
    ImageLoader::UninitedUpwards up;
    up.count = 1;
    up.imagesAndPaths[0] = { this, this->getPath() };
    //加工初始化
    processInitializers(context, thisThread, timingInfo, up);
    context.notifyBatch(dyld_image_state_initialized, false);
    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time();
    fgTotalInitTime += (t2 - t1);
}
  • up.count值设置为1然后调用processInitializersImageLoader::processInitializers
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                     InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    uint32_t maxImageCount = context.imageCount()+2;
    ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
    ImageLoader::UninitedUpwards& ups = upsBuffer[0];
    ups.count = 0;
    for (uintptr_t i=0; i < images.count; ++i) {
        //递归初始化
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    // If any upward dependencies remain, init them.
    if ( ups.count > 0 )
        processInitializers(context, thisThread, timingInfo, ups);
}
  • 最终调用了recursiveInitialization。、 ImageLoader::recursiveInitialization(ImageLoader.cpp)
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
                                          InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{
……
    if ( fState < dyld_image_state_dependents_initialized-1 ) {
        uint8_t oldState = fState;
        // break cycles
        fState = dyld_image_state_dependents_initialized-1;
        try {
            // initialize lower level libraries first
            //先初始化下级lib
            for(unsigned int i=0; i < libraryCount(); ++i) {
                ImageLoader* dependentImage = libImage(i);
                if ( dependentImage != NULL ) {
……
                    else if ( dependentImage->fDepth >= fDepth ) {
                        //依赖文件递归初始化
                        dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
                    }
                }
            }       
……
            fState = dyld_image_state_dependents_initialized;
            oldState = fState;
            //这里调用传递的状态是dyld_image_state_dependents_initialized,image传递的是自己。也就是最后调用了自己的+load。从libobjc.A.dylib开始调用。
            context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
            
            // initialize this image
            //初始化镜像文件,调用c++构造函数。libSystem的libSystem_initializer就是在这里调用的。会调用到objc_init中。_dyld_objc_notify_register 中会调用自身的+load方法,然后c++构造函数。
            //1.调用libSystem_initializer->objc_init 注册回调。
            //2._dyld_objc_notify_register中调用 map_images,load_images,这里是首先初始化一些系统库,调用系统库的load_images。比如libdispatch.dylib,libsystem_featureflags.dylib,libsystem_trace.dylib,libxpc.dylib。
            //3.自身的c++构造函数
            bool hasInitializers = this->doInitialization(context);

            // let anyone know we finished initializing this image
            fState = dyld_image_state_initialized;
            oldState = fState;
            //这里调用不到+load方法。 notifySingle内部fState==dyld_image_state_dependents_initialized 才调用+load。
            context.notifySingle(dyld_image_state_initialized, this, NULL);
……
        }
……
    }
    recursiveSpinUnLock();
}
  • 整个过程是一个递归的过程,先调用依赖库的,再调用自己的。
  • 调用notifySingle最终调用到了objc中所有的+ load方法。这里第一个notifySingle调用的是+load方法,第二个notifySingle由于参数是dyld_image_state_initialized不会调用到+load方法。这里的dyld_image_state_dependents_initialized意思是依赖文件初始化完毕了,可以初始化自己了。
  • 调用doInitialization最终调用了c++的系统构造函数。先调用的是libSystem_initializer -> objc_init进行注册回调。在回调中调用了map_imagesload_images(+load)。这里的load_images是调用一些加载一些系统库,比如:libdispatch.dylib,libsystem_featureflags.dylib,libsystem_trace.dylib,libxpc.dylibc++系统构造函数
__attribute__((constructor)) void func() {
   printf("\n ---func--- \n");
}

⚠️这里也就说明了对于同一个image而言,+ load方法是比c++构造函数更早调用的。

dyld::notifySingledyld2.cpp
notifySingle对应一个函数,在setContext的时候赋值:

image.png

//调用到objc里面去
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
……

    if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
        uint64_t t0 = mach_absolute_time();
        dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
        //回调指针 sNotifyObjCInit 是在 registerObjCNotifiers 中赋值的。这里执行会跑到objc的load_images中
        (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
        uint64_t t1 = mach_absolute_time();
        uint64_t t2 = mach_absolute_time();
        uint64_t timeInObjC = t1-t0;
        uint64_t emptyTime = (t2-t1)*100;
        if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
            timingInfo->addTime(image->getShortName(), timeInObjC);
        }
    }
……
}
  • notifySingle中找不到load image的调用(从堆栈信息中可以看到notifySingle之后是load image)。
  • 这个函数执行一个回调sNotifyObjCInit,条件是statedyld_image_state_dependents_initialized

搜索下回调sNotifyObjCInit的赋值操作,发现是在registerObjCNotifiers中赋值的 registerObjCNotifiers

//谁调用的 registerObjCNotifiers ? _dyld_objc_notify_register。这里赋值了三个参数 _dyld_objc_notify_mapped,_dyld_objc_notify_init,_dyld_objc_notify_unmapped
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    // record functions to call
    //第一个参数 map_images
    sNotifyObjCMapped   = mapped;
    //第二个参数 load_images
    sNotifyObjCInit     = init;
    //第三个参数 unmap_image
    sNotifyObjCUnmapped = unmapped;

    // call 'mapped' function with all images mapped so far
    try {
        //赋值后马上回调 map_images
        notifyBatchPartial(dyld_image_state_bound, true, NULL, false, true);
    }
    catch (const char* msg) {
        // ignore request to abort during registration
    }

    // <rdar://problem/32209809> call 'init' function on all images already init'ed (below libSystem)
    for (std::vector<ImageLoader*>::iterator it=sAllImages.begin(); it != sAllImages.end(); it++) {
        ImageLoader* image = *it;
        if ( (image->getState() == dyld_image_state_initialized) && image->notifyObjC() ) {
            dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
            //调用一些系统库的 load_images。
            (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
        }
    }
}
  • registerObjCNotifiers赋值来自于第二个参数_dyld_objc_notify_init
  • 赋值后里面调用了notifyBatchPartial(内部调用了sNotifyObjCMapped)。
  • 循环调用load_images,这里调用的是依赖的系统库的libdispatch.dylib,libsystem_featureflags.dylib,libsystem_trace.dylib,libxpc.dylib

搜索发现是_dyld_objc_notify_register调用的registerObjCNotifiers_dyld_objc_notify_registerdyldAPIs.cpp

//_objc_init中调用的。
//单个镜像文件的加载来到了这里->_dyld_objc_notify_register,打符号断点查看被objc-os.mm中 _objc_init 调用。
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}
  • _dyld_objc_notify_register的调用者在dyld中找不到。 打符号断点_dyld_objc_notify_register排查调用情况:

image.png 可以看到是被_objc_init调用的。

_objc_init的调用在objc-os.mm中,查看源码:

void _objc_init(void)
{
    static bool initialized = false;
    if (initialized) return;
    initialized = true;
    
    // fixme defer initialization until an objc-using image is found?
    environ_init();
    tls_init();
    static_init();
    runtime_init();
    exception_init();
#if __OBJC2__
    cache_t::init();
#endif
    _imp_implementationWithBlock_init();

    //_objc_init 调用dyldAPIs.cpp 中_dyld_objc_notify_register,第二个参数是load_images
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
  • 证实是_objc_init调用了_dyld_objc_notify_register
  • 第一个参数是map_images,赋值给sNotifyObjCMapped
  • 第二个参数是load_images,赋值给sNotifyObjCInit
  • 第三个参数是unmap_image,赋值给sNotifyObjCUnmapped

这三个参数将在后面详细介绍是如何与dyld进行交互的。 ImageLoaderMachO::doInitialization(ImageLoaderMachO.cpp)

bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
    CRSetCrashLogMessage2(this->getPath());

    // mach-o has -init and static initializers
    doImageInit(context);
    //加载c++构造函数
    doModInitFunctions(context);
    
    CRSetCrashLogMessage2(NULL);
    
    return (fHasDashInit || fHasInitializers);
}

加上以下代码查看MachO文件:

__attribute__((constructor)) void func1() {
    printf("\n ---func1--- \n");
}

__attribute__((constructor)) void func2() {
    printf("\n ---func2--- \n");
}

会发现MachO中多了__mod_init_func

image.png

  • 调用doModInitFunctions函数加载c++构造函数(__attribute__((constructor))修饰的c函数)

ImageLoaderMachO::doModInitFunctions

image.png

  • 内部是对macho文件的一些读取操作。
  • 会进行__mod_init_func section的确认,与上面的验证符合。
  • 加载前必须加载完libSystem库。