构建一个自主深度思考的RAG管道以解决复杂查询--定义中央代理系统的RAG状态(3)要构建我们的推理代理，我们首先需要一

要构建我们的推理代理，我们首先需要一种方法来管理其状态。在我们简单的RAG链中，每个步骤都是无状态的，但是…

然而，智能代理需要一个记忆系统。它需要记住原始问题、它制定的计划以及到目前为止收集到的证据。

检索增强生成（RAG）状态（作者：法里德·汗 )

RAGState将作为中央存储器，在我们LangGraph工作流的每个节点之间传递。为了构建它，我们将定义一系列结构化数据类，从最基本的构建块开始：研究计划中的单个步骤。

我们希望定义我们的智能体计划的原子单元。每个步骤不仅必须包含一个待回答的问题，还必须包含其背后的推理，以及至关重要的，智能体应该使用的特定工具。这迫使智能体的规划过程变得明确且结构化。

from langchain_core.documents import Document
from langchain_core.pydantic_v1 import BaseModel, Field

# Pydantic model for a single step in the agent's reasoning plan
class Step(BaseModel):
    # A specific, answerable sub-question for this research step
    sub_question: str = Field(description="A specific, answerable question for this step.")
    # The agent's justification for why this step is necessary
    justification: str = Field(description="A brief explanation of why this step is necessary to answer the main query.")
    # The specific tool to use for this step: either internal document search or external web search
    tool: Literal["search_10k", "search_web"] = Field(description="The tool to use for this step.")
    # A list of critical keywords to improve the accuracy of the search
    keywords: List[str] = Field(description="A list of critical keywords for searching relevant document sections.")
    # (Optional) A likely document section to perform a more targeted, filtered search within
    document_section: Optional[str] = Field(description="A likely document section title (e.g., 'Item 1A. Risk Factors') to search within. Only for 'search_10k' tool.")

OurStep类，使用 Pydantic 的BaseModel，作为我们的规划器代理的严格契约。tool: Literal[...]字段强制大语言模型在使用我们的内部知识（search_10k）或寻求外部信息（search_web）之间做出明确的决策。

这种结构化输出比尝试解析自然语言计划可靠得多。

既然我们已经定义了单个步骤，我们就需要一个容器来容纳整个步骤序列。我们将创建一个计划类，它只是一个步骤对象的列表。这代表了代理完整的端到端研究策略。

# Pydantic model for the overall plan, which is a list of individual steps
class Plan(BaseModel):
    # A list of Step objects that outlines the full research plan
    steps: List[Step] = Field(description="A detailed, multi-step plan to answer the user's query.")

我们编写了一个计划类，它将为整个研究过程提供结构。当我们调用计划代理时，我们将要求它返回一个符合此模式的JSON对象。这确保了在采取任何检索操作之前，代理策略是清晰、有序且机器可读的。

接下来，当我们的智能体执行其计划时，它需要一种方式来记住所学内容。我们将定义一个过去步骤字典，用于存储每个已完成步骤的结果。这将构成智能体的研究历史或实验笔记。

# A TypedDict to store the results of a completed step in our research history
class PastStep(TypedDict):
    step_index: int              # The index of the completed step (e.g., 1, 2, 3)
    sub_question: str            # The sub-question that was addressed in this step
    retrieved_docs: List[Document] # The precise documents retrieved and reranked for this step
    summary: str                 # The agent's one-sentence summary of the findings from this step

这个过去步骤结构对于智能体的自我批判循环至关重要。在每一步之后，我们将填充其中一个字典，并将其添加到我们的状态中。然后，智能体将能够查看这个不断增长的摘要列表，以了解它所知道的内容，并决定是否有足够的信息来完成其任务。

最后，我们将把所有这些部分整合到主RAGState字典中。这是一个核心对象，它将贯穿我们整个图，保存原始查询、完整计划、过去步骤的历史记录，以及

当前

正在执行步骤的所有中间数据。

# The main state dictionary that will be passed between all nodes in our LangGraph agent
class RAGState(TypedDict):
    original_question: str     # The initial, complex query from the user that starts the process
    plan: Plan                 # The multi-step plan generated by the Planner Agent
    past_steps: List[PastStep] # A cumulative history of completed research steps and their findings
    current_step_index: int    # The index of the current step in the plan being executed
    retrieved_docs: List[Document] # Documents retrieved in the current step (results of broad recall)
    reranked_docs: List[Document]  # Documents after precision reranking in the current step
    synthesized_context: str   # The concise, distilled context generated from the reranked docs
    final_answer: str          # The final, synthesized answer to the user's original question

这个RAGStateTypedDict是我们代理的完整思维。我们图中的每个节点都会接收这个字典作为输入，并返回其更新版本作为输出。

例如，计划节点将填充计划字段，检索节点将填充检索到的文档字段，依此类推。这种共享的、持久的状态正是我们简单的RAG链所缺乏的复杂迭代推理的实现方式。

现在我们已经定义了代理记忆的蓝图，接下来就可以构建系统的第一个认知组件：填充此状态的规划代理。