简谈Flask消息流处理最近一直在使用Flask框架进行开发，虽然在大部分时间都在开发业务服务的相关代码，但是与之前在使

最近一直在使用Flask框架进行开发，虽然在大部分时间都在开发业务服务的相关代码，但是与之前在使用 Django开发一样，还是对其内部的实现逻辑充满好奇，也断断续续通过Flask的源码来了解其内部消息流处理的机制，但还未记录成一个完整的workflow，所以这里就总结下 Flask 在接收到请求之后，其内部消息流的一些处理逻辑。

Flask Framework

Flask 是一个Python编写的轻量级web应用框架，其官方定义为微服务框架（micro framework)。这里的轻量级是相较于Django而言的，Django官方对其的定义为: The web framework for perfectionists with deadlines. 而它确实是一个终极杀器，Django内置了很多便捷的模块：DjangoForm、Django csrftoken、Django ORM、Django Cache……能够满足框架的大部分业务场景，能够让使用者更快速的上手，搭建一个功能完整的web application。Flask的设计理念与Django相反，它的设计更为灵活、轻便、易于扩展、可定制性更高，因此Flask只提供了web application的核心功能，其他的需要由用户来做决策，比如使用什么类型的ORM、如何进行Form Validation、或者如何处理用户login以及管理session……实际上这些常见的问题都有成熟解决方案，Flask提供了很多extension模块，在Flask Community上可以看到找到这些不同功能的extension。

回归文章主题，通常将Flask作为web framework时，需要定义一个application对象，实际上它是一个Flask类实例，以下是一个简单的 app 创建方法以及应用程序的启动：

def create_app(config)
    app = Flask(__name__)
    # 设置跨域资源共享
    COSR(app, resources=[r"/external-api/*"], origins=[r"https?:.*"])
    # 加载配置文件
    app.config.from_pyfile(config)
    # 设置数据库连接
    app.config['SQLALCHEMY_DATABASE_URI'] = construct_db_url(config)
    # init extensions
    database.init_app(app)
    babel.init_app(app)
    security.init_app(app)
    
    # 注册各个子模块的路由
    register_blueprint(app)
    
    return app
  
if __name__ == "__main__":
    app = create_app("config.py")
    app.run("0.0.0.0", port=5000)

上面启动了一个简单的Flask web app，监听并处理5000端口的请求，注意这里只是用作演示，而在生产环境中，通常会使用ngixn配合gunicorn或者uwsgi来更加高效的处理http请求，因为Flask框架本身（也即是Python语言）处理并发的能力很弱。Flask 类定义了 __call__方法，所以app是一个可调用对象，当接收到请求时，实现了WSGI协议的gunicorn或者uwsgi的worker会调用app来处理请求：

def __call__(self, environ, start_response):
    """Shortcut for :attr:`wsgi_app`."""
    return self.wsgi_app(environ, start_response)

wsgi_app 方法是整个Flask数据流处理的核心，定义如下，关于wsgi协议相关内容，以及参数的含义可以参考之前的一篇转载

def wsgi_app(self, environ, start_response):
    ctx = self.request_context(environ)
    error = None
    try:
        try:
            ctx.push()
            response = self.full_dispatch_request()
        except Exception as e:
            error = e
            response = self.make_response(self.handle_exception(e))
        except:
            error = sys.exc_info()[1]
            raise
        return response(environ, start_response)
     finally:
        if self.should_ignore_error(error):
            error = None
            ctx.auto_pop(error)

要搞清楚wsgi_app方法中每个步骤的含义，需要先了解Flask中上下文（context）的概念。在使用Flask框架处理请求（实现API）时，通常会使用current_app对象来获取配置信息，使用request对象来获取请求参数信息，而current_app和request作为全局变量，用户不需要考虑处理单元之间（比如线程）的安全问题，实际上在不同处理单元中，current_app和request是相互独立的，当请求到来时，Flask app会创建处理该请求线程的上下文信息，其中current_app和request分别为应用上下文（App Context)和请求上下文（Request Context）。

App Context

The application context keeps track of the application-level data during a request, CLI command, or other activity. Rather than passing the application around to each function, the current_app and g proxies are accessed instead.

The application context is created and destroyed as necessary. When a Flask application begins handling a request, it pushes an application context and a request context. When the request ends it pops the request context then the application context. Typically, an application context will have the same lifetime as a request.

Request Context

The request context keeps track of the request-level data during a request. Rather than passing the request object to each function that runs during a request, the request and session proxies are accessed instead.

When the Flask application handles a request, it creates a Request object based on the environment it received from the WSGI server. Because a worker (thread, process, or coroutine depending on the server) handles only one request at a time, the request data can be considered global to that worker during that request. Flask uses the term context local for this.

在处理请求过程中，app context用于追踪application-level的数据，比如app的config配置信息，request context用于追踪request-level的数据，比如请求的param、method等。当Flask app收到请求时，它会先后push app context和request context，使context当前的处理单元绑定，当请求处理结束之后，Flask app会先后pop request context和app context，解除context与处理单元的绑定并销毁。在一个处理单元中生命周期中，可以将request、session视为全局变量。

理解Flask全局变量

在深入了解wsgi_app方法之前，需要先掌握current_app、request、session这些核心数据类型。查看current_app、request、session的定义，会发现在Flask框架中，它们都是LocalProxy类实例：

# flask/global.py
current_app = LocalProxy(_find_app)
request = LocalProxy(partial(_lookup_req_object, 'request'))
session = LocalProxy(partial(_lookup_req_object, 'session'))

LocalProxy 是werkzeug中定义的代理类，它内部重写了__setitem__和 __getitem__ 等元方法，当对LocalProxy实例进行操作时，都会forward到它所代理的对象上，LocalProxy类的定义如下（这里只展示部分）：

class LocalProxy(object):
  
    def __init__(self, local, name=None):
        object.__setattr__(self, '_LocalProxy__local', local)
        object.__setattr__(self, '__name__', name)
        if callable(local) and not hasattr(local, '__release_local__'):
            object.__setattr__(self, '__wrapped__', local)
    
    def _get_current_object(self):
        if not hasattr(self.__local, '__release_local__'):
            return self.__local() 
        try:
            return getattr(self.__local, self.__name__)
        except AttributeError:
            raise RuntimeError('no object bound to %s' % self.__name__)
            
    def __getattr__(self, name):
        if name == '__members__':
            return dir(self._get_current_object())
        return getattr(self._get_current_object(), name)
            
    def __setitem__(self, key, value):
        self._get_current_object()[key] = value
        
    __setattr__ = lambda x, n, v: setattr(x._get_current_object(), n, v)
    __getitem__ = lambda x, i: x._get_current_object()[i]

在 __init__ 方法中，通过 object._setattr_ 定义了一个名为 __local 的实例属性，它的值指向在创建 LocalProxy 时传入的第一个参数，__local 实际就上LocalProxy的代理实现，定义了 __name__ 属性来与 __local 关联。需要注意的是在python中以双下划线的开头的变量或者方法表示私有，_LocalProxy__local 是python语言对于私有变量的一种特殊的访问方式。对私有变量的访问限制，python并没有C++那样严格：

Python performs name mangling of private variables. Every member with double underscore will be changed to _object._class__variable. If so required, it can still be accessed from outside the class, but the practice should be refrained.

对LoalProxy对象的操作，并不直接forward到其 __local 属性上，而是先调用 _get_current_object() 方法，然后将对LocalProxy的操作forward到前者返回结果上。回到current_app、request、和session的定义，会发现它们代理的对象都是方法而非对象，current_app代理的是 _find_app 方法，request和session代理的是 _lookup_req_object 的偏函数。再结合LocalProxy中 _get_current_object() 方法的定义，可以看出，它返回的是这些函数的执行结果，这一点对于Flask中全局变量（current_app、request等）的在处理单元中的相互隔离非常重要，稍后会提到。

接着来看下current_app、request、session代理的方法是如何定义的：

def _find_app():
    top = _app_ctx_stack.top
    if top is None:
        raise RuntimeError(_app_ctx_err_msg)
    return top.app

def _lookup_req_object(name):
    top = _request_ctx_stack.top
    if top is None:
        raise RuntimeError(_request_ctx_err_msg)
    return getattr(top, name)

def _lookup_app_object(name):
    top = _app_ctx_stack.top
    if top is None:
        raise RuntimeError(_app_ctx_err_msg)
    return getattr(top, name)

_request_ctx_stack = LocalStack()
_app_ctx_stack = LocalStack()

current_app = LocalProxy(_find_app)
request = LocalProxy(partial(_lookup_req_object, 'request'))
session = LocalProxy(partial(_lookup_req_object, 'session'))
g = LocalProxy(partial(_lookup_app_object, 'g'))

current_app 代理的方法 _find_app 返回 _app_ctx_stask.top.app，request和session代理的 _lookup_req_object 偏函数分别返回 _request_ctx_stack.top中名为request和session的属性。从命名即可看出，_app_ctx_stack 和 _request_ctx_stack 分别表示上文提到的应用上下问和请求上下文，它们都是LocalStack类型，这是一种后进先出的数据结构，且在栈顶进行操作，定义如下：

class LocalStack(object):
    def __init__(self):
        self._local = Local()

    def __release_local__(self):
        self._local.__release_local__()

    def _get__ident_func__(self):
        return self._local.__ident_func__

    def _set__ident_func__(self, value):
        object.__setattr__(self._local, '__ident_func__', value)
    __ident_func__ = property(_get__ident_func__, _set__ident_func__)
    
    def __call__(self):
        def _lookup():
            rv = self.top
            if rv is None:
                raise RuntimeError('object unbound')
            return rv
        return LocalProxy(_lookup)

    def push(self, obj):
        rv = getattr(self._local, 'stack', None)
        if rv is None:
            self._local.stack = rv = []
        rv.append(obj)
        return rv

    def pop(self):
        stack = getattr(self._local, 'stack', None)
        if stack is None:
            return None
        elif len(stack) == 1:
            release_local(self._local)
            return stack[-1]
        else:
            return stack.pop()

    @property
    def top(self):
        try:
            return self._local.stack[-1]
        except (AttributeError, IndexError):
            return None

LocalStack对外提供了push、pop、top三个方法，其内部实现是将操作委托给内部成员 _local ，以 _local.stack 属性来模拟栈的行为，_local 是在初始化时会新建的一个Local类型的成员对象。LocalStack类中定义了 __call__ 方法，因此它是一个可调用对象。在LocalStack的定义中引入了Local类型，再简单看下Local类的定义（部分）：

class Local(object):
    def __init__(self):
        object.__setattr__(self, '__storage__', {})
        object.__setattr__(self, '__ident_func__', get_ident)

    def __call__(self, proxy):
        return LocalProxy(self, proxy)

    def __release_local__(self):
        self.__storage__.pop(self.__ident_func__(), None)

    def __getattr__(self, name):
        try:
            return self.__storage__[self.__ident_func__()][name]
        except KeyError:
            raise AttributeError(name)

    def __setattr__(self, name, value):
        ident = self.__ident_func__()
        storage = self.__storage__
        try:
            storage[ident][name] = value
        except KeyError:
            storage[ident] = {name: value}

Local内部维护了一个名为 __storage__ 的字典结构，其内容都由这个字典结构来进行存储。Local类设计的巧妙之处在于，它定义了一个 __ident_func__ 属性，其值指向get_ident的方法，后者返回一个unique non-zero integer，用以标识当前的处理单元，处理单元可以是线程，也可以是协程（greenlet）。Local类重写了 __getattr__ 和 __setattr__ ，当访问或修改Local对象属性时，会分别调用这两个方法，且这两个方法都是先通过 __ident_func__ 属性获取到当前处理单元的标识，然后对当前处理单元指向的数据结构（也是字典类型）进行操作。正是因为 __ident_func__ 属性能够标识当前的处理单元，所以在 getter/setter 时，操作的数据只对本处理单元有效，而对其他处理单元不可见，能够保证线程（处理单元）安全，不需要进行加锁或者其他同步操作。

回到Flask框架，在API或者视图函数中会经常使用current_app、request、session这三个全局变量，这里的全局表示在进程的处理单元（比如线程/协程）中共享，当以单进程多线程的方式来启动Flask应用时，打印current_app、request、session变量的id，会发现总是相同的，这常常会让初学Flask的人感到非常疑惑（至少我当时是这样的）：为什么不同的线程，可以使用同一个变量来读取不同的上下文，处理不同请求，同时不需要处理线程安全问题。原因正是因为这些全局变量在实现时，使用了werkzeug定义的Local相关的数据结构，通过代理或者懒加载的技术与当前处理单元绑定，使接口变得更简单，也能让开发者更专注与业务逻辑的处理。

总结下三个Local数据结构的作用，首先Local类将其内部的存储结构 __storage__ 属性与 __ident_func__ 绑定，使各个处理单元之间的操作互不干扰，相互隔离。LocalStack类内部定义了一个Local类的属性，并将对LocalStack实例的操作forward到其内部Local属性上，因此它也可以实现与Local类似的功能，区别在于LocalStack是一个后进先出的栈类型，所有操作都在栈顶。LocalProxy类的作用是对其代理的对象实现延迟绑定（懒加载）。比如，request是LocalProxy对象，它代理partial(_lookup_req_object, 'request')这个偏函数，当每次访问或操作request时会调用上述的偏函数，返回_request_ctx_stack.top.request，我们已经知道 _request_ctx_stack 是LocalStack类型，所以对它的访问都会绑定到当前处理单元。试想，如果request不通过LocalProxy对象来代理 partial(_lookup_req_object, 'request') 方法，而是直接指向 partial(_lookup_req_object, 'request') 返回的对象:

request = partial(_lookup_req_object, 'request')

那么request的值始终是固定的，即始终指向首次访问它时的 _request_ctx_stack.top.request 。同理，如果session对象也使用上面的方式定义，那么不同处理单元便无法关联其自身上下文session。下图展示了LocalProxy的运行原理：

wsgi_app 处理Flask消息流

再次回到wsgi_app方法，我们来逐一看下它都完成了哪些操作。首先 ctx = self.request_context(environ)，就是根据environ环境变量，为当前的处理单元创建一个RequestContext对象，它包含了所有与当前请求相关的信息。接着调用RequestContext的push方法（ctx.push），将其绑定到当前处理单元的上下文，push 方法实现如下：

class RequestContext(object):
	  # other methods
    def push(self):
      	top = _request_ctx_stack.top
        if top is not None and top.preserved:
            top.pop(top._preserved_exc)

        app_ctx = _app_ctx_stack.top
        if app_ctx is None or app_ctx.app != self.app:
            app_ctx = self.app.app_context()
            app_ctx.push()
            self._implicit_app_ctx_stack.append(app_ctx)
        else:
            self._implicit_app_ctx_stack.append(None)

        _request_ctx_stack.push(self)
        self.session = self.app.open_session(self.request)
        if self.session is None:
            self.session = self.app.make_null_session()

我们已经知道 _request_ctx_stack 和 _app_ctx_stack 分别表示请求上下文和应用上下文，那么push方法进行了以下操作：

将根据当前请求环境变量生成的RequestContext 对象，入栈（push）到_request_ctx_stack 中，即完成当前请求与_reques_ctx_stack的绑定，这一步非常关键。_request_ctx_stack 对象是LocalStack类型，它可以保证处理单元之间的隔离性，再结合globals.py中request和session的定义，可以知道在request和session指向的值是在这里与当前处理单元关联的，而实际上，它们指向的值就是RequestContext对象的同名属性
将AppContext对象，入栈（push）到_app_ctx_stack中，后者同样为LocakStack类型，再根据globals.py中current_app的定义，可以知道current_app指向的值同样是在这里与当前处理单元关联的
打开/创建session对象，Flask默认使用SecureCookieSessionInterface接口，默认从cookie中读取session信息，可以复写Flask的session_interface，从而修改session的读取/存储方式，比如使用自定义的redis_interface将session持久化到redis中。上文已经提到，在处理单元中使用的session全局对象，实际上就是RequestContext的session属性，也就是这里通过open_session得到的值

在绑定请求上下文和应用上下文之后，调用Flask.full_dispatch_request对request进行分发处理:

def full_dispatch_request(self):
    self.try_trigger_before_first_request_functions()
    try:
        request_started.send(self)
        rv = self.preprocess_request()
        if rv is None:
            rv = self.dispatch_request()
    except Exception as e:
        rv = self.handle_user_exception(e)
        return self.finalize_request(rv)

通过分析代码，在dispatch方法中，主要进行了如下操作：

在处理request之前需要执行预处理操作，其中包含对before_first_request和before_request的钩子函数的调用，这些钩子函数通常是使用@app.before_first_request和@app.before_request装饰器包裹的函数，它们将会在请求真正处理之前被调用
执行dispatch_request，它会根据请求url匹配到对应blueprint中定义的视图方法，这些方法通常是用来处理我们的业务逻辑，在执行完视图方法之后，根据其返回信息构造response对象，与处理请求时类似，在返回response之前，要先对通过after_this_request装饰器注册的钩子函数进行调用
执行save_session，将在处理请求之前通过open_session新建的session保存起来，这里Flask默认仍然使用SecureCookieSessionInterface接口来完成，主要是对session信息进行签名，并保存在cookie中，通过http response返回到客户端。可以修改session_interface，复写save_session方法，将session持久化到redis或者其他存储

在上述步骤执行完成之后，意味着当前处理单元的一次任务结束了，所以 wsgi 方法最后一步使用 ctx.pop 方法，将当前的上下文信息退栈，与入栈（push）相对应。

总结

通过源码分析了Flask处理http请求的过程，简单总结，主要有以下步骤：

接收由WSGI网关转发过来的http请求
根据environ创建上下文（AppContext和RequestContext）
上下文信息入栈（push），与当前处理单元关联
分发请求（dispatch_request)，根据请求的url信息，找到对应的method，处理业务逻辑
生成response，上下文对象出栈（pop）
通过WSGI网关将response返回给客户端

用简单的UML图表示如下：

实际上，Flask在处理消息流时，完成的操作比上述更加复杂，比如在处理消息的不同阶段，支持信号量的发送，方便其他模块完成一些异步操作，提高可扩展性，比如在绑定AppContext时，会创建request级别的全局变量g，用来支持request级别的数据共享等等。