各位好,
在使用 httpx 通过 HTTP 代理访问 HTTPS 服务时,我遇到了一个难以复现但致命的连接池耗尽问题:程序运行一段时间后,所有请求都会抛出 PoolTimeout,即使此时代理服务本身是正常的(浏览器访问正常)。
经过对 httpx 和底层 httpcore 的源码追踪,我发现问题的根源在于:当 CONNECT 隧道建立成功、但后续 TLS 握手失败时,连接对象的状态会永久停留在 ACTIVE,既无法被复用,也不会被连接池清理,最终形成"zombie connection"占满整个连接池。
我已经提交了一个修复方案,希望能得到社区的意见:
PR 链接: github.com/encode/http…
以下是我的完整排查分析,着重梳理了 httpcore 的状态机流转与异常处理边界。
深度排查:状态机与异常流的追踪
为了厘清 PoolTimeout 的根源,我从 AsyncHTTPProxy 开始,对 httpcore 的请求生命周期进行了逐行追踪。
连接池调度与实现细节
AsyncHTTPProxy 类继承自 AsyncConnectionPool。
class AsyncHTTPProxy(AsyncConnectionPool): # pragma: nocover
"""
A connection pool that sends requests via an HTTP proxy.
"""
当请求进入连接池时,会触发 AsyncConnectionPool.handle_async_request。该方法将请求放入队列,并进入 while True循环等待分配连接:
# AsyncConnectionPool.handle_async_request
...
while True:
with self._optional_thread_lock:
# Assign incoming requests to available connections,
# closing or creating new connections as required.
closing = self._assign_requests_to_connections()
await self._close_connections(closing)
# Wait until this request has an assigned connection.
connection = await pool_request.wait_for_connection(timeout=timeout)
try:
# Send the request on the assigned connection.
response = await connection.handle_async_request(
pool_request.request
)
except ConnectionNotAvailable:
# In some cases a connection may initially be available to
# handle a request, but then become unavailable.
#
# In this case we clear the connection and try again.
pool_request.clear_connection()
else:
break # pragma: nocover
...
这里的逻辑是:若连接获取失败或不可用,会通过 ConnectionNotAvailable 异常进行重试;否则正常返回响应。
连接分配的核心逻辑在 _assign_requests_to_connections 中。首次请求时,由于连接池为空,会进入创建新连接的分支:
# AsyncConnectionPool._assign_requests_to_connections
...
if available_connections:
# log: "reusing existing connection"
connection = available_connections[0]
pool_request.assign_to_connection(connection)
elif len(self._connections) < self._max_connections:
# log: "creating new connection"
connection = self.create_connection(origin)
self._connections.append(connection)
pool_request.assign_to_connection(connection)
elif idle_connections:
# log: "closing idle connection"
connection = idle_connections[0]
self._connections.remove(connection)
closing_connections.append(connection)
# log: "creating new connection"
connection = self.create_connection(origin)
self._connections.append(connection)
pool_request.assign_to_connection(connection)
...
值得注意的是,虽然父类 AsyncConnectionPool 定义了 create_connection,但 AsyncHTTPProxy 进行了方法重写(Method Overriding),返回的是专门处理代理隧道的 AsyncTunnelHTTPConnection 实例,而非普通的直接连接。
def create_connection(self, origin: Origin) -> AsyncConnectionInterface:
if origin.scheme == b"http":
return AsyncForwardHTTPConnection(
proxy_origin=self._proxy_url.origin,
proxy_headers=self._proxy_headers,
remote_origin=origin,
keepalive_expiry=self._keepalive_expiry,
network_backend=self._network_backend,
proxy_ssl_context=self._proxy_ssl_context,
)
return AsyncTunnelHTTPConnection(
proxy_origin=self._proxy_url.origin,
proxy_headers=self._proxy_headers,
remote_origin=origin,
ssl_context=self._ssl_context,
proxy_ssl_context=self._proxy_ssl_context,
keepalive_expiry=self._keepalive_expiry,
http1=self._http1,
http2=self._http2,
network_backend=self._network_backend,
)
对于 HTTPS 请求,create_connection 返回 AsyncTunnelHTTPConnection 实例。此时仅完成对象实例化,实际的 TCP 连接和 TLS 握手尚未进行。
建立隧道
回到 AsyncConnectionPool.handle_async_request 的主循环。_assign_requests_to_connections 创建并分配连接后,代码等待连接就绪,然后进入 try 块执行实际请求:
# AsyncConnectionPool.handle_async_request
...
connection = await pool_request.wait_for_connection(timeout=timeout)
try:
# Send the request on the assigned connection.
response = await connection.handle_async_request(
pool_request.request
)
except ConnectionNotAvailable:
# In some cases a connection may initially be available to
# handle a request, but then become unavailable.
#
# In this case we clear the connection and try again.
pool_request.clear_connection()
else:
break # pragma: nocover
...
这里的 connection 即上一步创建的 AsyncTunnelHTTPConnection 实例。connection.handle_async_request 进入二级逻辑。
# AsyncConnectionPool.handle_async_request
...
# Assign incoming requests to available connections,
# closing or creating new connections as required.
closing = self._assign_requests_to_connections()
await self._close_connections(closing)
...
_assign_requests_to_connections 返回的 closing 列表为空——首次创建连接时尚无过期连接需要清理。随后请求被分发给 AsyncTunnelHTTPConnection 实例,进入其 handle_async_request 方法。
# AsyncConnectionPool.handle_async_request
...
# Wait until this request has an assigned connection.
connection = await pool_request.wait_for_connection(timeout=timeout)
try:
# Send the request on the assigned connection.
response = await connection.handle_async_request(
pool_request.request
)
...
connection.handle_async_request 即 AsyncTunnelHTTPConnection.handle_async_request。该方法首先检查 self._connected 标志:对于新连接,会构造 HTTP CONNECT 请求发送至代理服务器。
# AsyncTunnelHTTPConnection.handle_async_request
...
async with self._connect_lock:
if not self._connected:
target = b"%b:%d" % (self._remote_origin.host, self._remote_origin.port)
connect_url = URL(
scheme=self._proxy_origin.scheme,
host=self._proxy_origin.host,
port=self._proxy_origin.port,
target=target,
)
connect_headers = merge_headers(
[(b"Host", target), (b"Accept", b"*/*")], self._proxy_headers
)
connect_request = Request(
method=b"CONNECT",
url=connect_url,
headers=connect_headers,
extensions=request.extensions,
)
connect_response = await self._connection.handle_async_request(
connect_request
)
...
CONNECT 请求通过 self._connection.handle_async_request() 发送。这里的 self._connection 在AsyncTunnelHTTPConnection 的 _init_ 中初始化。
# AsyncTunnelHTTPConnection.__init__
...
self._connection: AsyncConnectionInterface = AsyncHTTPConnection(
origin=proxy_origin,
keepalive_expiry=keepalive_expiry,
network_backend=network_backend,
socket_options=socket_options,
ssl_context=proxy_ssl_context,
)
...
self._connection 是 AsyncHTTPConnection 实例(位于 connection.py)。当调用其 handle_async_request 发送 CONNECT 请求时,内部实际经历了两个层次的调用:
第一层:延迟连接建立
AsyncHTTPConnection.handle_async_request 首先检查是否已建立底层连接。若未建立,则先执行 _connect(),然后根据协商结果创建实际的协议处理实例:
# AsyncHTTPConnection.handle_async_request
...
async with self._request_lock:
if self._connection is None:
stream = await self._connect(request)
ssl_object = stream.get_extra_info("ssl_object")
http2_negotiated = (
ssl_object is not None
and ssl_object.selected_alpn_protocol() == "h2"
)
if http2_negotiated or (self._http2 and not self._http1):
from .http2 import AsyncHTTP2Connection
self._connection = AsyncHTTP2Connection(
origin=self._origin,
stream=stream,
keepalive_expiry=self._keepalive_expiry,
)
else:
self._connection = AsyncHTTP11Connection(
origin=self._origin,
stream=stream,
keepalive_expiry=self._keepalive_expiry,
)
...
注意此 self._connection 被赋值为 AsyncHTTP11Connection(或 HTTP/2)实例。
第二层:协议处理与状态转换
随后,AsyncHTTPConnection 将请求委托给新创建的 AsyncHTTP11Connection 实例:
# AsyncHTTPConnection.handle_async_request
...
return await self._connection.handle_async_request(request)
...
在 AsyncHTTP11Connection 内部,构造函数初始化 self._state = HTTPConnectionState.NEW。而在 handle_async_request 方法中,状态被转换为 ACTIVE——这正是后续问题的核心:
# AsyncHTTP11Connection.handle_async_request
...
async with self._state_lock:
if self._state in (HTTPConnectionState.NEW, HTTPConnectionState.IDLE):
self._request_count += 1
self._state = HTTPConnectionState.ACTIVE
self._expire_at = None
else:
raise ConnectionNotAvailable()
...
在该方法中,请求头和响应头处理完成后,handle_async_request 返回 Response。注意 content 参数是 HTTP11ConnectionByteStream(self, request):
# AsyncHTTP11Connection.handle_async_request
...
return Response(
status=status,
headers=headers,
content=HTTP11ConnectionByteStream(self, request),
extensions={
"http_version": http_version,
"reason_phrase": reason_phrase,
"network_stream": network_stream,
},
)
...
这里采用了延迟清理设计:响应头返回时连接状态仍为 ACTIVE;响应体的读取和状态轮转(转为 IDLE)被推迟到 HTTP11ConnectionByteStream.aclose()被调用时执行。
至此,Response 以 ACTIVE 状态逐层返回。httpcore 中所有连接类的 handle_async_request 均返回 Response,这是其统一接口设计。
返回到 AsyncTunnelHTTPConnection.handle_async_request:
# AsyncTunnelHTTPConnection.handle_async_request
...
connect_response = await self._connection.handle_async_request(
connect_request
)
...
接下来检查 CONNECT 响应状态。若返回非 2xx 状态码,此处正确调用了 aclose() 进行清理:
# AsyncTunnelHTTPConnection.handle_async_request
...
if connect_response.status < 200 or connect_response.status > 299:
reason_bytes = connect_response.extensions.get("reason_phrase", b"")
reason_str = reason_bytes.decode("ascii", errors="ignore")
msg = "%d %s" % (connect_response.status, reason_str)
await self._connection.aclose()
raise ProxyError(msg)
stream = connect_response.extensions["network_stream"]
...
若 CONNECT 成功(200),则从响应扩展中提取原始网络流,并准备后续 TLS 握手的参数。
接下来就是问题发生的核心位置。原始代码如下:
# AsyncTunnelHTTPConnection.handle_async_request
...
async with Trace("start_tls", logger, request, kwargs) as trace:
stream = await stream.start_tls(**kwargs)
trace.return_value = stream
...
这里的 stream.start_tls() 负责与目标服务器建立 TLS 加密通道。
关于 stream 的具体来源,需要追溯到多层调用之前。
stream 来自 connect_response.extensions["network_stream"]。在 CONNECT 请求处理流程中,这个值由 AsyncHTTP11Connection 在返回 Response 时设置:
# AsyncHTTP11Connection.handle_async_request
...
return Response(
status=status,
headers=headers,
content=HTTP11ConnectionByteStream(self, request),
extensions={
"http_version": http_version,
"reason_phrase": reason_phrase,
"network_stream": network_stream,
},
)
...
具体来说,当 AsyncHTTP11Connection.handle_async_request() 处理完 CONNECT 请求后,会将底层的 _network_stream 包装为 AsyncHTTP11UpgradeStream 放入响应的 extensions 中。
# AsyncHTTP11Connection.handle_async_request
...
network_stream = self._network_stream
# CONNECT or Upgrade request
if (status == 101) or (
(request.method == b"CONNECT") and (200 <= status < 300)
):
network_stream = AsyncHTTP11UpgradeStream(network_stream, trailing_data)
...
这里的 self._network_stream 来自 AsyncHTTP11Connection 的构造函数:
# AsyncHTTP11Connection.__init__
...
self._network_stream = stream
...
而这个 stream 又是由 AsyncHTTPConnection 在创建 AsyncHTTP11Connection 时传入的。
该过程发生在 AsyncHTTPConnection.handle_async_request 中。_connect() 方法创建原始网络流,随后根据 ALPN 协商结果决定使用 HTTP/1.1 还是 HTTP/2:
# AsyncHTTPConnection.handle_async_request
...
async with self._request_lock:
if self._connection is None:
stream = await self._connect(request)
ssl_object = stream.get_extra_info("ssl_object")
http2_negotiated = (
ssl_object is not None
and ssl_object.selected_alpn_protocol() == "h2"
)
if http2_negotiated or (self._http2 and not self._http1):
from .http2 import AsyncHTTP2Connection
self._connection = AsyncHTTP2Connection(
origin=self._origin,
stream=stream,
keepalive_expiry=self._keepalive_expiry,
)
else:
self._connection = AsyncHTTP11Connection(
origin=self._origin,
stream=stream,
keepalive_expiry=self._keepalive_expiry,
)
...
Fine
AsyncHTTPConnection 创建 AsyncHTTP11Connection 时传入的 stream 来自 self._connect() 方法。该方法通过 self._network_backend.connect_tcp() 创建原始 TCP 连接:
# AsyncHTTPConnection._connect
...
stream = await self._network_backend.connect_tcp(**kwargs)
...
async with Trace("start_tls", logger, request, kwargs) as trace:
stream = await stream.start_tls(**kwargs)
trace.return_value = stream
return stream
...
注意:如果代理协议是 HTTPS,_connect() 内部会先完成与代理的 TLS 握手(第一段 start_tls),再返回已加密的流。
self._network_backend 在构造函数中初始化,默认为 AutoBackend:
# AsyncHTTPConnection.__init__
...
self._network_backend: AsyncNetworkBackend = (
AutoBackend() if network_backend is None else network_backend
)
...
AutoBackend 是一个适配器,根据运行时环境自动选择具体后端(如 AnyIO 或 Trio):
# AutoBackend.connect_tcp
async def connect_tcp(
self,
host: str,
port: int,
timeout: float | None = None,
local_address: str | None = None,
socket_options: typing.Iterable[SOCKET_OPTION] | None = None,
) -> AsyncNetworkStream:
await self._init_backend()
return await self._backend.connect_tcp(
host,
port,
timeout=timeout,
local_address=local_address,
socket_options=socket_options,
)
实际的网络 I/O 由 _backend(如 AnyIOBackend)完成。
_init_backend 方法检测当前运行的异步库环境,默认加载 AnyIOBackend:
# AutoBackend._init_backend
async def _init_backend(self) -> None:
if not (hasattr(self, "_backend")):
backend = current_async_library()
if backend == "trio":
from .trio import TrioBackend
self._backend: AsyncNetworkBackend = TrioBackend()
else:
from .anyio import AnyIOBackend
self._backend = AnyIOBackend()
因此,AutoBackend.connect_tcp() 的实际返回值由 AnyIOBackend.connect_tcp() 提供。
AnyIOBackend.connect_tcp() 最终返回 AnyIOStream 对象:
# AnyIOBackend.connect_tcp
...
return AnyIOStream(stream)
...
该对象被层层传递,最终回到 AsyncHTTPConnection._connect() 方法中。
# AsyncHTTPConnection._connect
...
stream = await self._network_backend.connect_tcp(**kwargs)
...
if self._origin.scheme in (b"https", b"wss"):
...
async with Trace("start_tls", logger, request, kwargs) as trace:
stream = await stream.start_tls(**kwargs)
trace.return_value = stream
return stream
...
此处需注意:若代理使用 HTTPS 协议,_connect() 会先执行 start_tls() 与代理建立 TLS 加密通道(而非与目标服务器)。调用 AnyIOStream.start_tls() 后返回的已是 TLS 包装后的流。若是 HTTP 代理,则直接返回未加密的原始流。
值得一提的是,AnyIOStream.start_tls() 在异常时会自动调用 self.aclose() 关闭底层 socket。(参考github.com/encode/http…,respect)
# AnyIOStream.start_tls
...
try:
with anyio.fail_after(timeout):
ssl_stream = await anyio.streams.tls.TLSStream.wrap(
self._stream,
ssl_context=ssl_context,
hostname=server_hostname,
standard_compatible=False,
server_side=False,
)
except Exception as exc: # pragma: nocover
await self.aclose()
raise exc
return AnyIOStream(ssl_stream)
...
随后 AnyIOStream 返回至 AsyncHTTPConnection.handle_async_request,最终作为 stream 参数传递给 AsyncHTTP11Connection 的构造函数。
# AsyncHTTPConnection.handle_async_request
...
async with self._request_lock:
if self._connection is None:
stream = await self._connect(request) # 这里
ssl_object = stream.get_extra_info("ssl_object")
http2_negotiated = (
ssl_object is not None
and ssl_object.selected_alpn_protocol() == "h2"
)
if http2_negotiated or (self._http2 and not self._http1):
from .http2 import AsyncHTTP2Connection
self._connection = AsyncHTTP2Connection(
origin=self._origin,
stream=stream,
keepalive_expiry=self._keepalive_expiry,
)
else:
self._connection = AsyncHTTP11Connection(
origin=self._origin,
stream=stream,
keepalive_expiry=self._keepalive_expiry,
)
...
D.C. al Fine
梳理完 stream 的完整来源后,我们回到问题的核心位置:
# AsyncTunnelHTTPConnection.handle_async_request
...
async with Trace("start_tls", logger, request, kwargs) as trace:
stream = await stream.start_tls(**kwargs)
trace.return_value = stream
...
此时,本地到代理的 TCP 连接已建立,CONNECT 请求已成功返回 200。stream.start_tls() 负责与目标服务器建立 TLS 加密通道。这里的 stream 即前文追溯的 AnyIOStream 对象,其 start_tls() 方法在异常时会调用 self.aclose() 关闭底层 socket——但这仅停留在传输层的清理。
异常捕获边界的结构性错位
在正常的请求处理流程中,httpcore 建立了多层次的异常防护。AsyncHTTP11Connection.handle_async_request 通过外层 try-except 块确保:无论请求发送还是响应头接收阶段发生网络异常,都会调用 _response_closed() 将 _state 从 ACTIVE 清理为 CLOSED 或 IDLE。
# AsyncHTTP11Connection.handle_async_request
...
except BaseException as exc:
with AsyncShieldCancellation():
async with Trace("response_closed", logger, request) as trace:
await self._response_closed()
raise exc
...
AsyncHTTPConnection 同样设有保护,但其范围仅覆盖到 TCP 连接建立和 CONNECT 请求返回为止。
# AsyncHTTPConnection.handle_async_request
...
except BaseException as exc:
self._connect_failed = True
raise exc
...
然而,在AsyncTunnelHTTPConnection.handle_async_request的代理隧道建立流程中,控制流发生了结构性断裂:
# AsyncTunnelHTTPConnection.handle_async_request
...
connect_response = await self._connection.handle_async_request(
connect_request
)
...
此时 AsyncHTTP11Connection._state 已被设为 ACTIVE。若 CONNECT 请求被拒绝(如 407 认证失败),代码会正确调用 aclose() 进行清理:
# AsyncTunnelHTTPConnection.handle_async_request
...
if connect_response.status < 200 or connect_response.status > 299:
reason_bytes = connect_response.extensions.get("reason_phrase", b"")
reason_str = reason_bytes.decode("ascii", errors="ignore")
msg = "%d %s" % (connect_response.status, reason_str)
await self._connection.aclose()
raise ProxyError(msg)
...
但如果 CONNECT 成功返回 200,而后续的 TLS 握手失败,则没有对应的异常处理路径。
# AsyncTunnelHTTPConnection.handle_async_request
...
async with Trace("start_tls", logger, request, kwargs) as trace:
stream = await stream.start_tls(**kwargs)
trace.return_value = stream
...
如前文所述,stream 是 AnyIOStream 对象。调用 stream.start_tls() 时,若发生异常,AnyIOStream.start_tls() 会关闭底层 socket。但这仅停留在网络层的清理——上层的 AsyncHTTP11Connection 对此无感知,其 _state 仍保持 ACTIVE;同时 AsyncTunnelHTTPConnection 也未捕获该异常来触发 self._connection.aclose()。
这导致 HTTP 层连接状态与网络层实际状态永久脱节:TLS 握手失败时,异常直接向上传播,没有任何代码路径将 _state 从 ACTIVE 转为 CLOSED,最终形成zombie connect。
异常继续向上传播?最终到达调用链顶端的 Connection Pool
# AsyncConnectionPool.handle_async_request
...
try:
# Send the request on the assigned connection.
response = await connection.handle_async_request(
pool_request.request
)
except ConnectionNotAvailable:
# In some cases a connection may initially be available to
# handle a request, but then become unavailable.
#
# In this case we clear the connection and try again.
pool_request.clear_connection()
else:
break # pragma: nocover
...
此处只捕获 ConnectionNotAvailable 异常进行重试。TLS 握手失败抛出的异常未被捕获,继续向上传播。
# AsyncConnectionPool.handle_async_request
...
except BaseException as exc:
with self._optional_thread_lock:
# For any exception or cancellation we remove the request from
# the queue, and then re-assign requests to connections.
self._requests.remove(pool_request)
closing = self._assign_requests_to_connections()
await self._close_connections(closing)
raise exc from None
...
此处 _assign_requests_to_connections() 会遍历连接池,决定哪些连接需要关闭。它会检查 connection.is_closed() 和 connection.has_expired():
# AsyncConnectionPool._assign_requests_to_connections
...
# First we handle cleaning up any connections that are closed,
# have expired their keep-alive, or surplus idle connections.
for connection in list(self._connections):
if connection.is_closed():
# log: "removing closed connection"
self._connections.remove(connection)
elif connection.has_expired():
# log: "closing expired connection"
self._connections.remove(connection)
closing_connections.append(connection)
elif (
connection.is_idle()
and sum(connection.is_idle() for connection in self._connections)
> self._max_keepalive_connections
):
# log: "closing idle connection"
self._connections.remove(connection)
closing_connections.append(connection)
...
这里的 connection 即前文的 AsyncTunnelHTTPConnection 实例。这些方法经过层层代理:AsyncTunnelHTTPConnection → AsyncHTTPConnection → AsyncHTTP11Connection。
-
is_closed() → False(_state == ACTIVE)
-
has_expired() → False(仅当 _state == IDLE 时才检查可读性)
因此,即使异常传播到最顶层,AsyncConnectionPool 也无法识别这个已断开的连接,只能将异常继续抛出。
还有更上层吗?
我觉得没有了。except BaseException块中的raise exc from None就是最终出口,异常直接抛给调用httpcore的用户代码(如 httpx 或应用层)。并且,异常越往上抛,越脱离原始连接对象的上下文,这不应该视为合理。
修复方案
问题的本质已经清晰:TLS 握手失败时,异常传播路径上缺少对AsyncHTTP11Connection状态的显式清理。
我的修复很简单——在 TLS 握手阶段增加异常捕获,确保失败时主动关闭连接:
# AsyncTunnelHTTPConnection.handle_async_request
...
try:
async with Trace("start_tls", logger, request, kwargs) as trace:
stream = await stream.start_tls(**kwargs)
trace.return_value = stream
except Exception:
# Close the underlying connection when TLS handshake fails to avoid
# zombie connections occupying the connection pool
await self._connection.aclose()
raise
...
这行await self._connection.aclose()强制将AsyncHTTP11Connection._state从ACTIVE转为CLOSED,让连接池的is_closed()检查能够正确识别,从而在下一次_assign_requests_to_connections()调用时将该连接移除。
总结
通过这次追踪,我对 httpcore 的分层架构有了更清晰的理解。这个场景的特殊之处在于:它恰好处于多个抽象层级的交接处——与代理之间的TCP 连接已建立、HTTP 请求已完成,但到目标地址的 TLS 升级尚未成功。此时异常传播路径跨越了 Stream → Connection → Pool 的边界,状态同步的复杂性显著增加。这类问题在异步网络编程中并不罕见:当控制流在多个对象间委托时,确保每个退出路径都正确同步状态是一项系统性挑战。我的修复只是在现有的异常处理框架中,补全了这一特定路径的状态清理逻辑。
PR 链接:github.com/encode/http…
感谢 encode 团队维护了如此优雅的代码库,也感谢 AI 协助完成了这次深度分析。