LangGraph设计与实现-第14章-Runtime 与 Context第14章 Runtime 与 Context

《LangGraph 设计与实现》完整目录

前言

第1章为什么需要理解 LangGraph

第2章架构总览

第3章 StateGraph 图构建 API

第4章 Channel 状态管理与 Reducer

第5章图编译：从 StateGraph 到 CompiledStateGraph

第6章 Pregel 执行引擎

第7章任务调度与并行执行

第8章 Checkpoint 持久化

第9章中断与人机协作

第10章 Command 与高级控制流

第11章子图与嵌套

第12章 Send 与动态并行

第13章流式输出与调试

第14章 Runtime 与 Context（当前）

第15章 Store 与长期记忆

第16章预构建 Agent 组件

第17章多 Agent 模式实战

第18章设计模式与架构决策

第14章 Runtime 与 Context

14.1 引言

在构建 LLM 应用的图时，节点函数往往需要访问一些"运行时依赖"——当前用户的身份信息、数据库连接池、API 密钥、或者一个全局的向量存储。这些依赖既不属于图的状态（它们不随步骤变化），也不应该被硬编码在节点函数中（它们因调用而异）。传统做法是通过闭包或全局变量传递，但这在测试、多租户和类型安全方面都存在痛点。

LangGraph 1.1.6 引入了 Runtime 类和 ContextT 泛型来解决这个问题。Runtime 是一个不可变的数据容器，在图执行开始时由调用方创建，自动注入到每个节点函数中。它携带了 context（用户自定义的运行时上下文）、store（持久化存储）、stream_writer（流式写入器）、execution_info（执行元数据）等运行时信息。

本章将从 Runtime 的数据类定义出发，分析 ContextT 泛型的设计理念、ExecutionInfo 和 ServerInfo 的信息模型、context 与 state 的本质区别，以及 Runtime 在 Pregel 循环中的注入机制。

:::tip 本章要点

Runtime 类的完整字段定义——context、store、stream_writer、previous、execution_info、server_info
ContextT 泛型的类型传播——从 StateGraph 到节点函数的端到端类型安全
ExecutionInfo 与 ServerInfo 的信息模型——执行元数据的结构化表达
Context vs State 的本质区别——不可变依赖 vs 可变状态
Runtime 注入机制——从编译到执行的完整链路 :::

14.2 Runtime 类的设计

14.2.1 数据类定义

Runtime 定义在 langgraph/runtime.py 中，是一个泛型冻结数据类：

@dataclass(**_DC_KWARGS)  # kw_only=True, slots=True, frozen=True
class Runtime(Generic[ContextT]):
    """Convenience class that bundles run-scoped context and other runtime utilities."""

    context: ContextT = field(default=None)
    """Static context for the graph run, like user_id, db_conn, etc."""

    store: BaseStore | None = field(default=None)
    """Store for the graph run, enabling persistence and memory."""

    stream_writer: StreamWriter = field(default=_no_op_stream_writer)
    """Function that writes to the custom stream."""

    previous: Any = field(default=None)
    """The previous return value for the given thread (functional API only)."""

    execution_info: ExecutionInfo | None = field(default=None)
    """Read-only execution information/metadata for the current node run."""

    server_info: ServerInfo | None = field(default=None)
    """Metadata injected by LangGraph Server. None for open-source."""

_DC_KWARGS 展开为 kw_only=True, slots=True, frozen=True，这意味着：

kw_only：所有字段必须通过关键字参数传递，避免位置参数的歧义
slots：使用 __slots__ 优化内存和属性访问速度
frozen：实例创建后不可修改，确保运行时安全

14.2.2 字段语义

graph TB
    Runtime[Runtime 对象]
    Runtime --> Context["context: ContextT<br/>用户自定义上下文<br/>如 user_id, db_conn"]
    Runtime --> Store["store: BaseStore | None<br/>持久化存储<br/>跨线程记忆"]
    Runtime --> SW["stream_writer: StreamWriter<br/>自定义流式写入<br/>发射中间结果"]
    Runtime --> Prev["previous: Any<br/>上次执行的返回值<br/>仅函数式 API"]
    Runtime --> EI["execution_info: ExecutionInfo<br/>执行元数据<br/>checkpoint_id, task_id 等"]
    Runtime --> SI["server_info: ServerInfo<br/>服务器元数据<br/>assistant_id, user 等"]

六个字段覆盖了节点函数可能需要的所有运行时信息：

字段	类型	来源	可变性
context	ContextT（泛型）	调用方传入	整个执行期间不变
store	BaseStore	图编译时配置	引用不变，内容可变
stream_writer	StreamWriter	框架自动注入	每个任务独立
previous	Any	Checkpoint 读取	只读
execution_info	ExecutionInfo	框架生成	每个任务独立
server_info	ServerInfo	LangGraph Server	只读

14.2.3 不可变性与 override/merge

虽然 Runtime 是 frozen 的，但它提供了两个方法来创建修改后的副本：

def merge(self, other: Runtime[ContextT]) -> Runtime[ContextT]:
    """Merge two runtimes together. If a value is not provided in other,
    the value from self is used."""
    return Runtime(
        context=other.context or self.context,
        store=other.store or self.store,
        stream_writer=other.stream_writer
            if other.stream_writer is not _no_op_stream_writer
            else self.stream_writer,
        previous=self.previous if other.previous is None else other.previous,
        execution_info=other.execution_info or self.execution_info,
        server_info=other.server_info or self.server_info,
    )

def override(self, **overrides) -> Runtime[ContextT]:
    """Replace the runtime with a new runtime with the given overrides."""
    return replace(self, **overrides)

merge 用于子图继承父图的 Runtime 时，合并两个 Runtime 对象。override 用于框架在任务准备阶段注入特定字段（如 execution_info）。

14.3 ContextT 泛型

14.3.1 定义

# langgraph/typing.py
ContextT = TypeVar("ContextT", bound=StateLike | None, default=None)

ContextT 是一个带默认值的类型变量，约束为 StateLike | None。StateLike 包括 TypedDict、BaseModel、dataclass 等结构化类型。默认值为 None，这意味着如果不指定 context_schema，Runtime 的 context 字段类型就是 None。

14.3.2 类型传播链路

flowchart LR
    Schema["context_schema=MyContext"] --> SG["StateGraph[State, MyContext]"]
    SG --> Compile["compile()"]
    Compile --> CSG["CompiledStateGraph[State, MyContext, ...]"]
    CSG --> Invoke["invoke(input, context=MyContext(...))"]
    Invoke --> RT["Runtime[MyContext]"]
    RT --> Node["node(state, runtime: Runtime[MyContext])"]

类型从 StateGraph 的 context_schema 参数开始，贯穿编译、调用、注入的全过程。IDE 和类型检查器可以在每一步提供准确的类型补全。

14.3.3 使用示例

from dataclasses import dataclass
from langgraph.graph import StateGraph
from langgraph.runtime import Runtime
from typing_extensions import TypedDict

@dataclass
class AppContext:
    user_id: str
    api_key: str
    is_admin: bool = False

class State(TypedDict, total=False):
    response: str

def my_node(state: State, runtime: Runtime[AppContext]) -> State:
    # IDE 知道 runtime.context 的类型是 AppContext
    user_id = runtime.context.user_id
    if runtime.context.is_admin:
        return {"response": f"Admin {user_id}: full access"}
    return {"response": f"User {user_id}: limited access"}

graph = (
    StateGraph(state_schema=State, context_schema=AppContext)
    .add_node("my_node", my_node)
    .set_entry_point("my_node")
    .set_finish_point("my_node")
    .compile()
)

result = graph.invoke({}, context=AppContext(user_id="alice", api_key="sk-..."))

14.4 ExecutionInfo：执行元数据

14.4.1 数据结构

@dataclass(frozen=True, slots=True)
class ExecutionInfo:
    """Read-only execution info/metadata for the current thread/run/node."""

    checkpoint_id: str
    """The checkpoint ID for the current execution."""

    checkpoint_ns: str
    """The checkpoint namespace for the current execution."""

    task_id: str
    """The task ID for the current execution."""

    thread_id: str | None = None
    """None when running without a checkpointer."""

    run_id: str | None = None
    """None when run_id is not provided in RunnableConfig."""

    node_attempt: int = 1
    """Current node execution attempt number (1-indexed)."""

    node_first_attempt_time: float | None = None
    """Unix timestamp for when the first attempt started."""

ExecutionInfo 提供了节点函数可能需要的所有执行上下文信息，而无需直接操作低层的 RunnableConfig。

14.4.2 字段用途

graph TB
    EI[ExecutionInfo]
    EI --> CID["checkpoint_id<br/>当前检查点 ID<br/>用于状态追踪"]
    EI --> CNS["checkpoint_ns<br/>检查点命名空间<br/>标识子图层级"]
    EI --> TID["task_id<br/>任务 ID<br/>唯一标识本次执行"]
    EI --> ThID["thread_id<br/>线程 ID<br/>跨轮次对话标识"]
    EI --> RID["run_id<br/>运行 ID<br/>单次调用标识"]
    EI --> NA["node_attempt<br/>重试次数<br/>1 表示首次执行"]
    EI --> NFAT["node_first_attempt_time<br/>首次尝试时间<br/>用于超时计算"]

典型的使用场景：

def my_node(state: State, runtime: Runtime) -> State:
    info = runtime.execution_info
    # 日志中记录执行上下文
    logger.info(f"Thread={info.thread_id}, Task={info.task_id}, Attempt={info.node_attempt}")

    # 根据重试次数调整行为
    if info.node_attempt > 1:
        logger.warning("Retrying, using fallback strategy")

    # 使用 thread_id 做线程级缓存
    cache_key = f"{info.thread_id}:{info.task_id}"
    ...

14.4.3 patch 方法

ExecutionInfo 是 frozen 的，但提供了 patch 方法创建修改后的副本：

def patch(self, **overrides: Any) -> ExecutionInfo:
    """Return a new execution info object with selected fields replaced."""
    return replace(self, **overrides)

框架在重试时使用这个方法更新 node_attempt 和 node_first_attempt_time。

14.5 ServerInfo：服务端元数据

14.5.1 数据结构

@dataclass(frozen=True, slots=True)
class ServerInfo:
    """Metadata injected by LangGraph Server."""

    assistant_id: str
    """The assistant ID for the current execution."""

    graph_id: str
    """The graph ID for the current execution."""

    user: BaseUser | None = None
    """The authenticated user, if any."""

ServerInfo 只在 LangGraph Platform（部署服务）环境中被填充。在本地开源运行时，runtime.server_info 始终为 None。

14.5.2 BaseUser 协议

# 来自 langgraph_sdk.auth.types
class BaseUser:
    """认证用户协议，支持属性访问和字典访问"""
    identity: str  # 用户唯一标识
    # 支持 user.identity 和 user["identity"] 两种访问方式

这使得节点函数可以在有认证的环境中安全地获取用户信息：

def secure_node(state: State, runtime: Runtime) -> State:
    if runtime.server_info and runtime.server_info.user:
        user_id = runtime.server_info.user.identity
    else:
        user_id = "anonymous"
    ...

14.6 Context vs State 的本质区别

14.6.1 概念对比

这是理解 LangGraph 运行时模型的关键区分：

graph LR
    subgraph "State（状态）"
        direction TB
        S1[可变] --> S2[在节点间流动]
        S2 --> S3[被 Channel 管理]
        S3 --> S4[支持 reducer 合并]
        S4 --> S5[被 Checkpoint 持久化]
    end

    subgraph "Context（上下文）"
        direction TB
        C1[不可变] --> C2[在整个执行期间固定]
        C2 --> C3[由调用方提供]
        C3 --> C4[不参与状态管理]
        C4 --> C5[不被 Checkpoint 持久化]
    end

维度	State	Context
可变性	每个节点可以修改	整个执行期间不变
流转方式	通过 Channel 在节点间传递	通过 Runtime 注入到所有节点
持久化	被 Checkpoint 保存	不被保存
典型内容	消息列表、处理结果	用户 ID、API 密钥
定义方式	`state_schema=State`	`context_schema=Context`
传入方式	`graph.invoke(input)`	`graph.invoke(input, context=ctx)`

14.6.2 为什么 Context 不放在 State 中？

把运行时依赖放在 State 中存在几个问题：

Checkpoint 污染：数据库连接、API 密钥不应被序列化到 checkpoint
类型混淆：状态字段应该是"数据"，而不是"工具"
安全风险：checkpoint 可能被导出或共享，敏感信息不应出现在其中
语义错误：reducer 不应该对"用户 ID"做 operator.add

Context 通过将依赖项与数据分离，彻底解决了这些问题。

14.6.3 为什么 Context 不放在 Config 中？

LangGraph 0.6.0 之前，运行时依赖通过 RunnableConfig.configurable 传递（即旧的 config_schema 参数）。这种方式有几个缺点：

类型不安全：config 是 dict[str, Any]，失去了泛型类型信息
API 混乱：config 的主要用途是传递 thread_id、checkpoint_id 等框架参数
嵌套访问：需要 config["configurable"]["user_id"] 这样的深层访问

context_schema 和 Runtime[ContextT] 提供了一流的、类型安全的替代方案：

# 旧方式（已弃用）
def my_node(state, config: RunnableConfig):
    user_id = config["configurable"]["user_id"]  # 无类型提示

# 新方式
def my_node(state, runtime: Runtime[AppContext]):
    user_id = runtime.context.user_id  # IDE 自动补全

14.7 Runtime 注入机制

14.7.1 注入链路总览

flowchart TB
    subgraph 调用层
        Caller["graph.invoke(input, context=ctx)"]
    end

    subgraph Pregel 初始化
        Caller --> CreateRT["创建 Runtime(context=ctx, store=store)"]
        CreateRT --> InjectConfig["写入 config[CONF][CONFIG_KEY_RUNTIME]"]
    end

    subgraph 任务准备
        InjectConfig --> PNT["prepare_next_tasks"]
        PNT --> PST["prepare_single_task / prepare_push_task_send"]
        PST --> Override["runtime.override(<br/>previous=...,<br/>store=...,<br/>execution_info=...)"]
        Override --> TaskConfig["写入 task config[CONF][CONFIG_KEY_RUNTIME]"]
    end

    subgraph 节点执行
        TaskConfig --> GetRT["节点接收 runtime 参数"]
        GetRT --> UseRT["runtime.context.user_id"]
    end

14.7.2 Pregel 初始化阶段

当调用 graph.invoke(input, context=ctx) 时，Pregel 的 stream 方法将 context 封装到 Runtime 中，并存入配置：

# Pregel.stream 中的简化逻辑
runtime = Runtime(context=context, store=self.store)
config = patch_configurable(config, {CONFIG_KEY_RUNTIME: runtime})

14.7.3 任务准备阶段

在 prepare_single_task 中，框架从配置中取出 Runtime，注入任务级别的信息：

# prepare_single_task 中的 PULL 任务逻辑
runtime = cast(
    Runtime, configurable.get(CONFIG_KEY_RUNTIME, DEFAULT_RUNTIME)
)
runtime = runtime.override(
    previous=checkpoint["channel_values"].get(PREVIOUS, None),
    store=store,
    execution_info=ExecutionInfo(
        checkpoint_id=checkpoint["id"],
        checkpoint_ns=task_checkpoint_ns,
        task_id=task_id,
        thread_id=configurable.get(CONFIG_KEY_THREAD_ID),
        run_id=str(rid) if (rid := config.get("run_id")) else None,
    ),
)

每个任务都获得一个新的 Runtime 实例（因为 frozen，所以是 replace 创建的新对象），其中 execution_info 包含了该任务特有的元数据。

14.7.4 节点函数接收

框架通过检查节点函数的参数签名，自动注入 Runtime：

# 节点函数可以声明 runtime 参数
def my_node(state: State, runtime: Runtime[AppContext]) -> State:
    ...

# 或者通过 get_runtime() 手动获取
from langgraph.runtime import get_runtime

def my_node(state: State) -> State:
    runtime = get_runtime(AppContext)  # 返回 Runtime[AppContext]
    ...

get_runtime 函数从当前线程的配置中提取 Runtime：

def get_runtime(context_schema: type[ContextT] | None = None) -> Runtime[ContextT]:
    runtime = cast(Runtime[ContextT], get_config()[CONF].get(CONFIG_KEY_RUNTIME))
    return runtime

14.8 DEFAULT_RUNTIME 与空操作

DEFAULT_RUNTIME = Runtime(
    context=None,
    store=None,
    stream_writer=_no_op_stream_writer,
    previous=None,
    execution_info=None,
)

DEFAULT_RUNTIME 是当没有显式提供 context 时使用的默认值。它的所有字段都是"空"或"无操作"的，确保节点代码在没有 context 的情况下也能安全运行。

def _no_op_stream_writer(_: Any) -> None: ...

空操作的 stream_writer 意味着节点调用 runtime.stream_writer(data) 不会产生任何效果——数据会被静默丢弃。这个设计让节点代码不需要检查 runtime 是否"可用"。

14.9 子图中的 Runtime 传播

14.9.1 merge 语义

当执行进入子图时，子图可能有自己的 store 和 context。Runtime 的 merge 方法用于合并父图和子图的 Runtime：

def merge(self, other: Runtime[ContextT]) -> Runtime[ContextT]:
    return Runtime(
        context=other.context or self.context,        # 子图优先
        store=other.store or self.store,              # 子图优先
        stream_writer=other.stream_writer             # 子图优先
            if other.stream_writer is not _no_op_stream_writer
            else self.stream_writer,
        previous=self.previous if other.previous is None else other.previous,
        execution_info=other.execution_info or self.execution_info,
        server_info=other.server_info or self.server_info,
    )

合并策略是"子图覆盖父图"——如果子图提供了自己的 context，则使用子图的；否则继承父图的。

14.9.2 传播示意

graph TB
    subgraph 父图
        PR["Runtime[ParentCtx]<br/>context=ParentCtx(...)"]
        PR --> N1[Node A]
        PR --> SubEntry[子图入口]
    end

    subgraph 子图
        SubEntry --> Merge["merge(parent_rt, child_rt)"]
        Merge --> CR["Runtime[ChildCtx]<br/>context 继承或覆盖"]
        CR --> N2[Node B]
        CR --> N3[Node C]
    end

14.10 设计决策

14.10.1 为什么 Runtime 是 frozen 的？

frozen dataclass 带来三个好处：

线程安全：并发执行的多个节点读取同一个 Runtime 实例时不会发生数据竞争
语义正确性：context 代表"不变的运行时依赖"，frozen 在类型层面强制了这个语义
可哈希性：frozen dataclass 默认可哈希，便于缓存和去重

14.10.2 为什么 execution_info 在 Runtime 中而不是单独注入？

将 execution_info 放在 Runtime 中而非作为独立参数注入，有两个原因：

减少参数数量：节点函数只需要一个 runtime 参数就能访问所有运行时信息
一致的生命周期：所有运行时信息在同一个对象中创建和传递，生命周期一致

14.10.3 ToolRuntime 与 Runtime 的关系

langgraph.prebuilt 中还有一个 ToolRuntime 类，它是专为工具函数设计的：

class ToolRuntime(_DirectlyInjectedToolArg, Generic[ContextT, StateT]):
    """Runtime context automatically injected into tools."""
    context: ContextT     # 与 Runtime 共享
    store: BaseStore      # 与 Runtime 共享
    stream_writer: StreamWriter  # 与 Runtime 共享
    config: RunnableConfig       # 工具特有
    state: StateT                # 工具特有
    tool_call_id: str            # 工具特有

ToolRuntime 共享了 Runtime 的 context、store、stream_writer 字段，但增加了工具特有的 config、state 和 tool_call_id。它们之间的关系是互补而非继承——Runtime 服务于节点，ToolRuntime 服务于工具。

14.11 实战：完整的 Runtime 使用案例

14.11.1 多租户 Agent 系统

以下示例展示了如何使用 Runtime 构建一个支持多租户的 Agent 系统，每个用户有独立的数据隔离和权限控制：

from dataclasses import dataclass
from typing import TypedDict
from langgraph.graph import StateGraph, START, END
from langgraph.runtime import Runtime
from langgraph.store.memory import InMemoryStore

@dataclass
class TenantContext:
    """多租户上下文"""
    tenant_id: str
    user_id: str
    role: str  # "admin" | "editor" | "viewer"
    db_connection_string: str

class AgentState(TypedDict, total=False):
    messages: list
    response: str

store = InMemoryStore()

def access_control_node(state: AgentState, runtime: Runtime[TenantContext]) -> dict:
    """访问控制节点：根据角色决定权限"""
    ctx = runtime.context
    info = runtime.execution_info

    # 记录审计日志到 Store
    if runtime.store:
        runtime.store.put(
            ("audit", ctx.tenant_id),
            f"access_{info.task_id}",
            {
                "user": ctx.user_id,
                "role": ctx.role,
                "action": "query",
                "thread_id": info.thread_id,
            }
        )

    if ctx.role == "viewer":
        return {"response": "You have read-only access."}
    return state

def process_node(state: AgentState, runtime: Runtime[TenantContext]) -> dict:
    """业务处理节点：使用租户隔离的数据"""
    ctx = runtime.context

    # 从租户命名空间读取配置
    if runtime.store:
        config = runtime.store.get(("tenants", ctx.tenant_id), "config")
        model_name = config.value["model"] if config else "default-model"
    else:
        model_name = "default-model"

    # 流式输出处理进度
    runtime.stream_writer({"status": "processing", "model": model_name})

    return {"response": f"Processed by {model_name} for tenant {ctx.tenant_id}"}

graph = (
    StateGraph(state_schema=AgentState, context_schema=TenantContext)
    .add_node("access_control", access_control_node)
    .add_node("process", process_node)
    .add_edge(START, "access_control")
    .add_edge("access_control", "process")
    .add_edge("process", END)
    .compile(store=store)
)

# 不同租户使用不同的 context
result = graph.invoke(
    {"messages": ["Hello"]},
    context=TenantContext(
        tenant_id="acme",
        user_id="alice",
        role="admin",
        db_connection_string="postgresql://acme:...",
    )
)

14.11.2 Runtime 在重试场景中的行为

当节点配置了 RetryPolicy 时，Runtime 的 execution_info 会在每次重试中更新：

def flaky_node(state: AgentState, runtime: Runtime) -> dict:
    info = runtime.execution_info
    print(f"Attempt {info.node_attempt}")  # 1, 2, 3...

    if info.node_attempt == 1:
        raise ConnectionError("Temporary failure")

    # 第二次尝试成功
    return {"response": "Success after retry"}

框架会通过 runtime.patch_execution_info(node_attempt=2) 创建新的 Runtime 副本，传递给重试的执行。

14.12 小结

本章深入分析了 LangGraph 的 Runtime 与 Context 机制。Runtime[ContextT] 通过泛型类型参数将运行时依赖注入从一个"约定"提升为一个"类型安全的协议"。六个字段——context、store、stream_writer、previous、execution_info、server_info——覆盖了节点函数可能需要的全部运行时信息。frozen 语义确保了并发安全，override 和 merge 方法提供了不可变更新的能力。

Context 与 State 的分离是 LangGraph 架构中的关键决策：State 是"随步骤变化的数据"，通过 Channel 管理和 Checkpoint 持久化；Context 是"整个执行期间不变的依赖"，通过 Runtime 注入且不被持久化。这种分离让状态管理更纯粹，同时为敏感信息（如 API 密钥）提供了安全的传递通道。

下一章我们将探讨 BaseStore 接口和 InMemoryStore 实现，了解 LangGraph 如何提供跨线程的长期记忆能力。