《面试官一个小时逼疯面试者》之Import System面试必考-原理篇

932 阅读19分钟

距离上次的那篇《《面试官一个小时逼疯面试者》之聊聊Python Import System?》已经过去了一个月的时间了,这期间请教了很多位大佬,询问他们关于我之前文章的意见。前前后后收集了将近10位大佬以及近百位读者的意见,他们的统一观点是:“文章太干了,不适合用来面试,竟然在Python玩起了Java八股文那一套,好家伙,面试都得深入到Cpython源码了,这谁能受得住?”

好吧,我承认我这次是有标题党的嫌疑了,加上没有太多面试官的经验,不太清楚面试中如何考察面试者的能力,文章写得偏题了。因此这次我卷土重来,结合“百家之言”,再和大家分享下基于Import System可以延展开来的面试题。我将此次的分享分为两个部分,一个是关于我们日常使用Import System的一些技巧的原理,另一方面是关于Import System的真实场景的解决方案。

关注公众号《技术拆解官》,回复“import”获取高清PDF阅读版本

首先带来的是原理篇

find_spec流程

一、原理篇

原理篇不是和上篇文章一样真正的深入地结合底层源码去梳理Cpython源码的逻辑,所以大家可以放心,这里的原理只是相对于真实场景中项目问题来说的,比如我们要从零开始写一个Web服务框架,而原理就是我们需要懂得各层的网络协议、应用服务器、Web服务器等等。有些知识点在上篇文章中已经提及到了,这里就直接引用了,大家有不熟悉的地方可以重刷下《《面试官一个小时逼疯面试者》之聊聊Python Import System?》

1 Python包搜索路径

对于Python开发者来说,最初接触到Import System中让人头疼的点就是关于Python包搜索路径的问题,这个问题也是面试中经常会被问到的问题,那么我们第一个问题需要解决的问题就是关于Python包搜索路径的,那么Python包搜索路径的顺序是什么样的呢?我们该如何去理解呢?我们从之前文章的import关键字流程图中就可以得到答案,重温下那张流程图

find_spec流程

首先要注意的是,在import关键字流程图中有三个list结构是不可忽视的,分别是sys.pathsys.meta_pathsys.path_hooks

想要了解Python包的搜索路径,归根结底也就是要了解sys.meta_pathImporter的顺序,在源码中是循环sys.meta_path中的Importer并传入path来进行搜索的,而这里的path是来源于parent_module.__path__,当path为空时,就被赋值为sys.path。而在sys.meta_path中,能用到sys.path的主要是PathFinder,因此我们可以首先得出一个结论,最先被搜索用到的类是PathFinder前面的两个Importer,也就是BuiltinImporterFrozenImporter,搜索的范围是内置模块和frozen module,所以到目前为止的顺序是

find_spec流程

接着看下来第二部分就到了由第三个Importer---PathFinder去循环sys.path列表的阶段了,因此,既然是循环sys.path,那么sys.path的顺序应该是整个搜索链路的一个子集,我们先来看看sys.path的结果是

(base) [root@VM-0-8-centos ~]# python -m site
sys.path = [
    '/root', # 项目的根目录
    '/root/miniconda3/lib/python38.zip', # 当前环境的标准包 
    '/root/miniconda3/lib/python3.8',
    '/root/miniconda3/lib/python3.8/lib-dynload',
    '/root/miniconda3/lib/python3.8/site-packages', # 当前环境的三方包
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.8/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

上面是标准的sys.path路径列表的结果,我们可以看出搜索路径的顺序是

根目录 -> 标准包 -> 三方包

不过这里有两个问题需要注意,一个是PYTHONPATH的设置问题,上面可以看到我们并没有显式的设置PYTHONPATH。首先一起了解下什么是PYTHONPATH(引用PYTHONPATH-python-docs

Augment the default search path for module files. The format is the same as the shell’s PATH: one or more directory pathnames separated by os.pathsep (e.g. colons on Unix or semicolons on Windows). Non-existent directories are silently ignored. (作为包搜索路径的扩展,格式与普通shell格式相同,可以添加多个路径)

In addition to normal directories, individual PYTHONPATH entries may refer to zipfiles containing pure Python modules (in either source or compiled form). Extension modules cannot be imported from zipfiles.

The default search path is installation dependent, but generally begins with *prefix*/lib/python*version* (see PYTHONHOME above). It is always appended to PYTHONPATH.

An additional directory will be inserted in the search path in front of PYTHONPATH as described above under Interface options. The search path can be manipulated from within a Python program as the variable sys.path. (PYTHONPATH将被加入sys.path中被使用)

从上面的解释中可以得知,我们可以通过分好分隔来添加多个路径,这些路径将会被插入到sys.path中被使用。那么下面我们指定PYTHONPATH路径来测试下结果

(base) [root@VM-0-8-centos ~]# export PYTHONPATH=/root/test 指定具体路径(路径可以不存在)
(base) [root@VM-0-8-centos ~]# python -m site
sys.path = [
    '/root',
    '/root/test',
    '/root/miniconda3/lib/python38.zip',
    '/root/miniconda3/lib/python3.8',
    '/root/miniconda3/lib/python3.8/lib-dynload',
    '/root/miniconda3/lib/python3.8/site-packages',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.8/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

可以看到,PYTHONPATH的路径被加入到sys.path中了,并且顺序是排在当前环境内置包之前、根目录之后。

这里有个有趣的点是我们手动指定PYTHONPATH的方式有点像切换虚拟环境,对于不同的虚拟环境我们只需要指定下PYTHONPATH就可以更改python的版本、包的版本了,相当于一个劣质的环境管理方案。当然了,其实用PYTHONPATH的方式来实现虚拟环境管理会有很多问题,这和真实的虚拟环境管理的实现原理还是不一样的(虚拟环境主要利用的原理是改变的是包的路径以及激活当前环境的$PATH,而PYTHONPATH的路径由于是被加到了虚拟环境的目录之前,因此会影响到所有虚拟环境的),这个我们之后再出篇文章来好好聊聊。

上面我们说到了PYTHONPATH的问题之后,再来看看另一个注意点,pth文件,关于pth文件的介绍可以参考pep-0648,大概含义也正如标题所说的

Extensible customizations of the interpreter at startup (解释器启动时的可扩展自定义)

我们可以依赖pth文件来自定义包的加载方式,而pth指定的路径会被加入到sys.path当中

Note that pth files were originally developed to just add additional directories to sys.path, but they may also contain lines which start with "import", which will be passed to exec(). Users have exploited this feature to allow the customizations that they needed. See setuptools [4] or betterexceptions [5] as examples.

那么具体是怎么使用呢?我们需要到site-packages目录下,新建一个pth文件(这里需要注意的是,虽然我们可以把pth放置到各个Python解释器可以访问到的地方,但是由于pth是专门负责包导入的扩展,并且加载的顺位排在三方包之后,因此通常把它放置在site-packages目录中)

(base) [root@VM-0-8-centos site-packages]# cd /root/miniconda3/lib/python3.8/site-packages
(base) [root@VM-0-8-centos site-packages]# echo "/root/test" > test.pth
(base) [root@VM-0-8-centos site-packages]# mkdir /root/test
mkdir: cannot create directory ‘/root/test’: File exists
(base) [root@VM-0-8-centos site-packages]# python -m site
sys.path = [
    '/root/miniconda3/lib/python3.8/site-packages',
    '/root/miniconda3/lib/python38.zip',
    '/root/miniconda3/lib/python3.8',
    '/root/miniconda3/lib/python3.8/lib-dynload',
    '/root/test',
]
USER_BASE: '/root/.local' (doesn't exist)
USER_SITE: '/root/.local/lib/python3.8/site-packages' (doesn't exist)
ENABLE_USER_SITE: True

可以看到我们在site-packages目录下新建了pth文件,并在其中指定了相应的路径,在解释器启动的同时,会遍历python的所有可触达目录,找到pth文件时会解析文件内容将其中的路径导入sys.path当中(当然,如pep中提到的,可导入的并不只是路径),作为额外的搜索路径。

到这里我们再来总结下整体的搜索链路是

find_spec流程

这里可能大家会有疑问,Python中内置库和标准库不是一个东西吗?其实不是的,在Python官方文档中有提及

Python’s standard library is very extensive, offering a wide range of facilities as indicated by the long table of contents listed below. The library contains built-in modules (written in C) that provide access to system functionality such as file I/O that would otherwise be inaccessible to Python programmers, as well as modules written in Python that provide standardized solutions for many problems that occur in everyday programming. Some of these modules are explicitly designed to encourage and enhance the portability of Python programs by abstracting away platform-specifics into platform-neutral APIs.

它说明到内置模块是用C语言写的,提供了对系统功能的访问。比如从Python的标准库路径下面是找不到sys 这个库的,原因就是它是操作系统相关,用C语言编写的。可以看到asyncio这个模块,它是用Python写的。

虽然这段解释说明了内置模块不是标准库,但是内置模块可以划分到标准库一类中去,这里需要注意的是划分归类,而不是本质相同。这里说明内置模块不是标准库似乎有点吹毛求疵的意思,好像区不区分它们没有什么意义?是的,大多数情况下,对它们之间没有做区分的必要。但是对于我们理解Python的模块查找顺序时,这却是一个至关重要的差异。上面我们已经讲解了Python模块的搜索路径顺序,可以看到我们的项目根目录处于内置库和标准库之间,想象一个场景,当我们本地有和内置库、标准库同名的模块文件时,谁会被覆盖呢?

2 导入协议和Hooks注册

之前的文章中我们梳理了Import System的核心流程,也了解到了我们可以通过对Import流程中的各阶段进行import hook来自定义模块导入方式,那么接下来我们就一起来看看如何利用Import SystemImporter Protocol来开发我们自定义的importer并完成import hook的注册。

首先我们需要明白的是什么是Importer Protocol,可参考pep-0302-specification-part-1-the-importer-protocol的解释,Importer Protocol主要包含两个部分的调用,分别是finderloader,也就是查找器和加载器,而其中起到作用的则是finder.find_specloader.exec_module两个方法(针对Python 3.4之后的版本,之前的版本分别是find_moduleload_module两个方法)。

从上面,我们可以得知要实现自定义的importer需要做两个方面的工作

  • 实现Finder协议

  • 实现Loader协议

在了解两个协议的实现原理之前我们需要注意的是它们两者之前的连接者,也就是ModuleSpec(模块规范)。

2.1 ModuleSpec(模块规范)

什么是模块规范呢?从官方文档来理解下

The import machinery uses a variety of information about each module during import, especially before loading. Most of the information is common to all modules. The purpose of a module’s spec is to encapsulate this import-related information on a per-module basis. (导入机制在导入期间(尤其是在加载之前)会使用有关每个模块的各种信息。 大多数信息是所有模块共有的。 模块规范的目的是在每个模块的基础上封装与导入相关的信息。)

Using a spec during import allows state to be transferred between import system components, e.g. between the finder that creates the module spec and the loader that executes it. Most importantly, it allows the import machinery to perform the boilerplate operations of loading, whereas without a module spec the loader had that responsibility. (作为查找器和加载器之间的中间状态传输)

The module’s spec is exposed as the __spec__ attribute on a module object. See ModuleSpec for details on the contents of the module spec. (模块的规范都可以通过__spec__属性来获取)

简单理解,就是对于模块信息的一种整合,而ModuleSpec也是在Python 3.4之后正式推出的。依据PEP 451 -- A ModuleSpec Type for the Import SystemModuleSpec是用来替代有查找器返回的加载器,将两者解耦开来,统一封装模块信息。

下面来看看__spec__中都包含了什么

(base) [root@VM-0-8-centos ~]# python
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.__spec__
ModuleSpec(name='sys', loader=<class '_frozen_importlib.BuiltinImporter'>)
>>> 

包括模块的名字和加载器,当然,还有其中没显示的其他属性

On ModuleSpecOn Modules
name__name__
loader__loader__
parent__package__
origin__file__
cached__cached__
submodule_search_locations__path__
loader_state-
has_location

我们可以从每个module的属性中来获取到module对应的ModuleSpec的值,比如

(base) [root@VM-0-8-centos ~]# python
Python 3.8.5 (default, Sep  4 2020, 07:30:14) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import sys
>>> sys.__name__
'sys'
>>> sys.__spec__.name
'sys'
>>> 

可以看出,使用ModuleSpec相比于直接返回加载器来说,包含的信息更加丰富,也更利于我们做二次开发。

2.2 查找器

核心原理:通过find_spec返回ModuleSpec或者None

官方内置的查找器类存在于sys.meta_path中,它们都有一个共同的方法,find_spec,正是通过这个方法返回的ModuleSpec对象,才能让加载器能够获取到ModuleSpec对象进行模块加载,所以我们想要自定义一个查找器,核心原理就是要实现find_spec方法并且返回具体类的ModuleSpec对象。

对于底层的类的自定义实现,Python中都会存在相应的抽象类,我们直接继承实现就好,针对于Finder的抽象类主要是以下两个

class importlib.abc.MetaPathFinder
	def find_spec(fullname, path, target=None)
    	'''
        An abstract method for finding a spec for the specified module. If this is a top-level import, path will be None. Otherwise, this is a search for a subpackage or module and path will be the value of __path__ from the parent package. If a spec cannot be found, None is returned. When passed in, target is a module object that the finder may use to make a more educated guess about what spec to return. importlib.util.spec_from_loader() may be useful for implementing concrete MetaPathFinders. (一个为指定模块寻找规范的抽象方法。如果这是顶级导入,path将是None,否则,这是在搜索子包或模块,并且路径将是父包中__path__的值。如果找不到规范,则返回None。当传入时,target是一个模块对象,查找器可以使用它来对要返回的规范进行更有根据的猜测,importlib.util.spec_from_loader()可能在实现具体的元路径查找器时有用。)
        '''
class importlib.abc.PathEntryFinder
	def find_spec(fullname, path, target=None)
    	'''
        An abstract method for finding a spec for the specified module. The finder will search for the module only within the path entry to which it is assigned. If a spec cannot be found, None is returned. When passed in, target is a module object that the finder may use to make a more educated guess about what spec to return. importlib.util.spec_from_loader() may be useful for implementing concrete PathEntryFinders. (一个为指定模块寻找规范的抽象方法。如果这是顶级导入,path将是None,否则,这是在搜索子包或模块,并且路径将是父包中__path__的值。如果找不到规范,则返回None。当传入时,target是一个模块对象,查找器可以使用它来对要返回的规范进行更有根据的猜测,importlib.util.spec_from_loader()可能在实现具体的路径入口查找器时有用。)
        '''

大家看到这里,都会对这两个Finder产生疑惑,到底该继承哪个抽象类来实现我们的功能呢?其实官方已经为我们做出说明了,虽然它们两个确实很相似,但是它们的作用域是不同的

path entry finder

A finder returned by a callable on sys.path_hooks which knows how to locate modules given a path entry.

See importlib.abc.PathEntryFinder for the methods that path entry finders implement.

meta path finder

A finder returned by a search of sys.meta_path. Meta path finders are related to, but different from path entry finders.

See importlib.abc.MetaPathFinder for the methods that meta path finders implement.

两者分别是需要被加入sys.path_hookssys.meta_path的,我们也可以从源码中发现一些踪迹

_register(PathEntryFinder, machinery.FileFinder)

_register(MetaPathFinder, machinery.BuiltinImporter, machinery.FrozenImporter,
          machinery.PathFinder, machinery.WindowsRegistryFinder)

def _register(abstract_cls, *classes):
    for cls in classes:
        # 抽象类注册抽象子类
        abstract_cls.register(cls)
        if _frozen_importlib is not None:
            try:
                frozen_cls = getattr(_frozen_importlib, cls.__name__)
            except AttributeError:
                frozen_cls = getattr(_frozen_importlib_external, cls.__name__)
            abstract_cls.register(frozen_cls)

PathEntryFinder类注册了FileFinder成为了它的抽象子类,而FileFindersys.path_hooks中的一个hook方法

MetaPathFinder 类注册了BuiltinImporterFrozenImporterPathFinder成为它的抽象子类,而相对应的这些都是来源于sys.meta_path

额外的知识点提示:Python中ABC抽象类直接继承和使用register的区别是?

对于开发者来说可以根据需要来进行选择,无疑MetaPathFinder的作用域更深。

我们想要实现finder的协议只需要新建类继承MetaPathFinder,实现其中的find_spec方法,返回特定类的ModuleSpec即可。

2.3 加载器

核心原理:exec_module是关键方法,核心流程相同,不同类型的Loader的扩展方式不同

相比较于查找器的find_spec方法来说,加载器的exec_module由于涉及到具体加载模块,所以原理无疑是更复杂一些。但是因为ModuleSpec的推出,在实现步骤上也省略了很多,我们通过官方文档来具体看下老版本load_module(Python 3.4之前的加载方法)需要完成哪些事情

If there is an existing module object named 'fullname' in sys.modules, the loader must use that existing module. (Otherwise, the reload() builtin will not work correctly.) If a module named 'fullname' does not exist in sys.modules, the loader must create a new module object and add it to sys.modules.

Note that the module object must be in sys.modules before the loader executes the module code. This is crucial because the module code may (directly or indirectly) import itself; adding it to sys.modules beforehand prevents unbounded recursion in the worst case and multiple loading in the best.

If the load fails, the loader needs to remove any module it may have inserted into sys.modules. If the module was already in sys.modules then the loader should leave it alone.

The __file__ attribute must be set. This must be a string, but it may be a dummy value, for example "". The privilege of not having a __file__ attribute at all is reserved for built-in modules.

The __name__ attribute must be set. If one uses imp.new_module() then the attribute is set automatically.

If it's a package, the __path__ variable must be set. This must be a list, but may be empty if __path__ has no further significance to the importer (more on this later).

The __loader__ attribute must be set to the loader object. This is mostly for introspection and reloading, but can be used for importer-specific extras, for example getting data associated with an importer.

The __package__ attribute must be set.

从文档中我们可以看到在正式执行load_module方法加载模块前,我们要为模块做大量的模块属性赋值,再将模块导入,再看下官方文档给出的具体案例

# Consider using importlib.util.module_for_loader() to handle
# most of these details for you.
def load_module(self, fullname):
    # 获取源码
    code = self.get_code(fullname)
    ispkg = self.is_package(fullname)
    # 获取module,开始手动赋值
    mod = sys.modules.setdefault(fullname, imp.new_module(fullname))
    mod.__file__ = "<%s>" % self.__class__.__name__
    mod.__loader__ = self
    if ispkg:
        mod.__path__ = []
        mod.__package__ = fullname
    else:
        mod.__package__ = fullname.rpartition('.')[0]
    # exec执行源码,加载到__dict__
    exec(code, mod.__dict__)
    return mod

从官方推出ModuleSpec之后,关于赋值的步骤已经有函数实现了,我们可以直接省略,现在可以这么来做

# 直接利用ModuleSpec解析得到的module对象来进行导入
def _new_module(name):
	# 通过type关键字新建类
    return type(sys)(name)

def _init_module_attrs(spec, module, *, override=False):
    # The passed-in module may be not support attribute assignment,
    # in which case we simply don't set the attributes.
    # __name__
    if (override or getattr(module, '__name__', None) is None):
        try:
            module.__name__ = spec.name
        except AttributeError:
            pass
    # __loader__
    if override or getattr(module, '__loader__', None) is None:
        loader = spec.loader
        if loader is None:
            # A backward compatibility hack.
            if spec.submodule_search_locations is not None:
                if _bootstrap_external is None:
                    raise NotImplementedError
                _NamespaceLoader = _bootstrap_external._NamespaceLoader

                loader = _NamespaceLoader.__new__(_NamespaceLoader)
                loader._path = spec.submodule_search_locations
                spec.loader = loader
                # While the docs say that module.__file__ is not set for
                # built-in modules, and the code below will avoid setting it if
                # spec.has_location is false, this is incorrect for namespace
                # packages.  Namespace packages have no location, but their
                # __spec__.origin is None, and thus their module.__file__
                # should also be None for consistency.  While a bit of a hack,
                # this is the best place to ensure this consistency.
                #
                # See # https://docs.python.org/3/library/importlib.html#importlib.abc.Loader.load_module
                # and bpo-32305
                module.__file__ = None
        try:
            module.__loader__ = loader
        except AttributeError:
            pass
    # __package__
    if override or getattr(module, '__package__', None) is None:
        try:
            module.__package__ = spec.parent
        except AttributeError:
            pass
    # __spec__
    try:
        module.__spec__ = spec
    except AttributeError:
        pass
    # __path__
    if override or getattr(module, '__path__', None) is None:
        if spec.submodule_search_locations is not None:
            try:
                module.__path__ = spec.submodule_search_locations
            except AttributeError:
                pass
    # __file__/__cached__
    if spec.has_location:
        if override or getattr(module, '__file__', None) is None:
            try:
                module.__file__ = spec.origin
            except AttributeError:
                pass

        if override or getattr(module, '__cached__', None) is None:
            if spec.cached is not None:
                try:
                    module.__cached__ = spec.cached
                except AttributeError:
                    pass
    return module

def module_from_spec(spec):
    """Create a module based on the provided spec."""
    # Typically loaders will not implement create_module().
    module = None
    if hasattr(spec.loader, 'create_module'):
        # If create_module() returns `None` then it means default
        # module creation should be used.
        module = spec.loader.create_module(spec)
    elif hasattr(spec.loader, 'exec_module'):
        raise ImportError('loaders that define exec_module() '
                          'must also define create_module()')
    if module is None:
        module = _new_module(spec.name)
    _init_module_attrs(spec, module)
    return module

mod = importlib.util.module_from_spec(spec)
spec.loader.exec_module(mod)
def exec_module(self, module):
    filename = self.get_filename(self.fullname)
    poc_code = self.get_data(filename)
    obj = compile(poc_code, filename, 'exec', dont_inhert=True, optimize=-1)
    exec(obj, module.__dict__)

讲完了exec_module协议的基本原理之后,我们再来模仿Finder的开发模式,看看我们可以使用哪些官方提供的抽象类来开发,从官方文档-(importlib.abc)[docs.python.org/3/library/i…

object
 +-- Finder (deprecated)
 |    +-- MetaPathFinder
 |    +-- PathEntryFinder
 +-- Loader
      +-- ResourceLoader --------+
      +-- InspectLoader          |
           +-- ExecutionLoader --+
                                 +-- FileLoader
                                 +-- SourceLoader

Loader抽象类主要分为三类,一类是ResourceLoader,在Python 3.7版本之后已经被ResourceReader所替代了,对于这个类的实现,官方建议是与特定的资源相匹配,也就是只加载具体的包。

另外两个Loader都是源于InspectLoader,因此它们的底层实现逻辑是相同的

# importlib.abc
class Loader(metaclass=abc.ABCMeta):

    """Abstract base class for import loaders."""

    def create_module(self, spec):
        """Return a module to initialize and into which to load.

        This method should raise ImportError if anything prevents it
        from creating a new module.  It may return None to indicate
        that the spec should create the new module.
        """
        # By default, defer to default semantics for the new module.
        return None

    # We don't define exec_module() here since that would break
    # hasattr checks we do to support backward compatibility.

    def load_module(self, fullname):
        """Return the loaded module.

        The module must be added to sys.modules and have import-related
        attributes set properly.  The fullname is a str.

        ImportError is raised on failure.

        This method is deprecated in favor of loader.exec_module(). If
        exec_module() exists then it is used to provide a backwards-compatible
        functionality for this method.

        """
        if not hasattr(self, 'exec_module'):
            raise ImportError
        return _bootstrap._load_module_shim(self, fullname)

    def module_repr(self, module):
        """Return a module's repr.

        Used by the module type when the method does not raise
        NotImplementedError.

        This method is deprecated.

        """
        # The exception will cause ModuleType.__repr__ to ignore this method.
        raise NotImplementedError

class InspectLoader(Loader):

    """Abstract base class for loaders which support inspection about the
    modules they can load.

    This ABC represents one of the optional protocols specified by PEP 302.

    """

    def is_package(self, fullname):
        """Optional method which when implemented should return whether the
        module is a package.  The fullname is a str.  Returns a bool.

        Raises ImportError if the module cannot be found.
        """
        raise ImportError

    def get_code(self, fullname):
        """Method which returns the code object for the module.

        The fullname is a str.  Returns a types.CodeType if possible, else
        returns None if a code object does not make sense
        (e.g. built-in module). Raises ImportError if the module cannot be
        found.
        """
        source = self.get_source(fullname)
        if source is None:
            return None
        return self.source_to_code(source)

    @abc.abstractmethod
    def get_source(self, fullname):
        """Abstract method which should return the source code for the
        module.  The fullname is a str.  Returns a str.

        Raises ImportError if the module cannot be found.
        """
        raise ImportError

    @staticmethod
    def source_to_code(data, path='<string>'):
        """Compile 'data' into a code object.

        The 'data' argument can be anything that compile() can handle. The'path'
        argument should be where the data was retrieved (when applicable)."""
        return compile(data, path, 'exec', dont_inherit=True)

    exec_module = _bootstrap_external._LoaderBasics.exec_module
    load_module = _bootstrap_external._LoaderBasics.load_module

核心功能无疑是exec_module方法,但是InspectLoader在此基础上实现了几个拓展协议,参考pep-0302-Optional Extensions to the Importer Protocol

2.4 hooks注册

上面我们在实现好导入协议之后,就需要将自定义的importer进行注册才能使用,而根据注册的方式又分为两个hooksMeta hooksPath hooks

2.4.1 Meta Hooks

Meta hooks是在import流程的初始时进行调用的,我们可以将Meta hooks插入sys.meta_path的任意一个位置,当然,也可以放在最前面,这样就可以重载内置模块、frozen module等等

2.4.2 Path hooks

相反的,对于Path hooks来说,作用域仅仅局限在sys.path的路径列表,注册的方法是作为callables插入sys.path_hooks中,Path hooks处理路径的结果会保存在sys.path_importer_cache当中,每次触发Path hooks时都会预先进行检查。

这次我们的原理篇就到这里,下一次我们引入真实场景,看看在实际工作中Import System可以用来解决哪些问题。