🧠 思维链、思维树、自我纠正：AI推理技术全解析系统总结了模型推理技术的核心概念、方法体系和实际应用。从思维链到思维树

最近很多小伙伴都在问：AI模型到底是怎么"思考"的？为什么有些AI能解决复杂问题，有些却只能回答简单问题？

答案就在推理技术中。今天我们就来聊聊这个让AI真正"聪明"起来的技术。

一、文章核心内容概述

📌 1.1 推理技术的核心概念与理论基础

第17章系统阐述了智能体推理技术的核心概念，其本质在于将模型的内部推理过程显性化。

与传统单一输出模式不同，推理技术通过多步逻辑推理和问题分解，使智能体能够处理复杂问题。

核心理论基础包括：

计算资源分配原则：推理过程中增加计算资源投入，通过延长推理时间实现更深入的思考

问题分解策略：将复杂问题拆分为可管理的子问题，降低单步推理难度

透明化推理：使模型的决策过程可追溯、可审计，增强可信度

📌 1.2 主要推理方法体系

👉 1.2.1 思维链（Chain-of-Thought, CoT）

CoT是推理技术的基础，通过模拟人类逐步思考过程，引导模型生成中间推理步骤。

技术特点包括：

逐步推理机制：将复杂问题分解为逻辑连贯的子步骤

透明度提升：使模型决策过程可视化，便于调试和理解

应用灵活性：支持少样本学习和零样本推理

👉 1.2.2 思维树（Tree-of-Thought, ToT）

ToT在CoT基础上引入多路径探索机制，形成树状推理结构：

多路径探索：同时考虑多个推理路径，避免单一思路局限

回溯与修正：支持在推理过程中回溯并选择更优路径

策略性规划：适用于需要复杂决策和规划的场景

1.2.3 自我纠正（Self-correction）

自我纠正是推理质量保证的关键机制：

迭代优化循环：生成-评估-修正的闭环流程

质量评估标准：准确性、完整性、清晰度、一致性等多维度评估

主动改进机制：主动识别并修正推理中的缺陷

1.2.4 程序辅助语言模型（PALMs）

PALMs将符号推理与语言模型结合：

代码执行能力：通过生成和执行代码实现精确计算

确定性计算：利用编程语言的确定性保证计算准确性

混合推理模式：自然语言理解与符号计算的有机结合

1.2.5 强化学习与可验证奖励（RLVR）

RLVR代表了推理技术的前沿发展方向：

自适应推理：根据问题复杂度动态调整推理深度

长序列推理：支持数千token的复杂推理轨迹

无监督学习：通过已知正确答案的问题进行训练优化

1.2.6 推理与行动（ReAct）

ReAct将推理与外部工具使用相结合：

思考-行动循环：推理→行动→观察→再推理的迭代过程

工具集成能力：支持数据库查询、API调用等外部工具使用

动态适应机制：根据环境反馈实时调整策略

1.3 应用场景分类

1.3.1 复杂问答系统

多跳推理：需要整合多个信息源进行逻辑推导

信息合成：将分散信息整合为连贯答案

路径探索：考虑多种可能的推理路径

1.3.2 数学问题求解

问题分解：将复杂数学问题拆分为可计算子问题

代码验证：通过代码执行确保计算准确性

步骤可视化：展示完整的解题过程

1.3.3 代码调试与生成

逻辑分析：系统性地分析代码逻辑缺陷

迭代优化：基于测试结果持续改进代码质量

解释性输出：提供代码修改的详细理由

1.3.4 战略规划

多方案评估：比较不同策略的优劣

后果预测：分析各种决策可能带来的结果

实时调整：根据反馈动态优化计划

1.3.5 医疗诊断支持

系统化评估：按步骤分析症状、检查结果和病史

鉴别诊断：考虑多种可能的疾病诊断

工具辅助：利用医学数据库等外部资源

1.3.6 法律分析

文档分析：系统解读法律文件和判例

逻辑一致性：确保论证过程的逻辑严密性

深入研究：进行深入的法律研究和论证构建

二、技术启示分析

2.1 对复杂问题解决的价值

推理技术的核心价值在于提升AI系统处理复杂问题的能力。通过显性化推理过程，AI系统能够：

处理不确定性：在信息不完整的情况下进行合理推断

应对复杂性：分解和解决超出单步推理能力的问题

保证可靠性：通过多步验证提高输出的准确性和一致性

2.2 实际应用的技术优势

2.2.1 透明度与可解释性

推理技术使AI决策过程变得透明，这对于关键应用场景至关重要：

医疗诊断：医生可以理解AI的诊断依据

法律分析：律师可以审查AI的论证逻辑

代码生成：开发者可以理解AI的编程思路

2.2.2 适应性学习能力

推理技术赋予AI系统更强的学习适应性：

少样本学习：通过推理过程快速适应新任务

知识迁移：将已有知识应用于新领域

持续改进：通过自我纠正机制不断提升性能

2.2.3 人机协作新模式

推理技术为人机协作创造了新的可能性：

互补优势：AI处理复杂推理，人类提供领域知识

协同决策：人类与AI共同参与决策过程

能力扩展：扩展人类在复杂问题上的处理能力

2.3 技术发展趋势

2.3.1 从静态到动态推理

推理技术正从固定的推理模式向动态自适应方向发展：

自适应深度：根据问题复杂度动态调整推理步骤

多模态推理：结合文本、代码、图像等多种信息源

实时推理：支持流式数据的连续推理

2.3.2 从单一到协同推理

未来推理技术将更加注重多智能体协同：

分布式推理：多个智能体协作解决复杂问题

专业分工：不同智能体专注于特定推理任务

共识机制：协调不同智能体的推理结果

2.3.3 从封闭到开放环境

推理技术正在向开放环境扩展：

工具集成：与外部工具和系统的深度集成

环境感知：实时感知和响应环境变化

长期规划：支持跨越时间尺度的复杂规划

三、个人学习心得

3.1 理解过程中的关键突破

在学习模型推理技术的过程中，有几个关键概念的理解突破对我影响深远：

3.1.1 从"黑盒"到"白盒"的转变

传统AI模型往往被视为黑盒系统，而推理技术实现了向白盒系统的转变。这种转变不仅仅是技术上的进步，更是思维模式的革新。通过显性化推理过程，我们能够：

理解模型局限：明确识别模型在哪些方面存在不足

针对性改进：基于推理过程分析进行有针对性的优化

建立信任：透明的推理过程有助于建立用户信任

3.1.2 计算资源分配的哲学思考

推理技术强调在推理过程中增加计算资源投入，这一理念让我重新思考AI系统的设计哲学。传统上我们追求快速响应，但推理技术告诉我们，在某些场景下，深度思考比快速响应更有价值。这种权衡需要在具体应用中仔细考量。

3.1.3 自我纠正的工程意义

自我纠正机制不仅仅是技术实现，更体现了工程化的质量保证思想。通过建立系统化的评估和改进流程，AI系统能够持续提升输出质量，这种思想可以推广到其他AI系统设计中。

3.2 实践应用中的难点与对策

3.2.1 推理路径选择难题

在实际应用中，如何选择合适的推理路径是一个重要挑战。我的经验是：

建立评估标准：制定明确的推理质量评估指标

多路径比较：同时探索多个路径并进行比较

动态调整：根据中间结果动态调整推理策略

3.2.2 计算效率与推理深度的平衡

推理技术往往需要更多的计算资源，如何在效率和深度之间找到平衡是关键：

分层推理：根据问题复杂度采用不同深度的推理

早期终止：在确定无解时及时终止推理过程

缓存机制：重用相似的推理结果提高效率

3.2.3 领域知识的有效整合

将领域知识有效整合到推理过程中是另一个难点：

知识表示：选择合适的知识表示形式

推理规则：将领域规则转化为推理约束

验证机制：建立领域特定的验证标准

3.3 未来实践展望

基于对推理技术的深入理解，我对未来实践有以下展望：

3.3.1 技术融合创新

推理技术可以与其他AI技术深度融合：

与强化学习结合：将推理过程作为强化学习的策略网络

与知识图谱结合：利用结构化知识增强推理能力

与多模态学习结合：支持跨模态的复杂推理

3.3.2 行业应用深化

推理技术在各个行业都有广阔的应用前景：

金融风控：复杂的风险评估和决策支持

智能制造：生产过程的优化和故障诊断

教育个性化：自适应学习路径规划

3.3.3 开发工具完善

期待出现更多支持推理技术开发的工具：

可视化调试工具：直观展示推理过程和结果

性能分析工具：帮助优化推理效率和效果

标准化接口：简化推理技术的集成和使用

四、总结与展望

模型推理技术代表了AI发展的重要方向，它不仅是技术上的进步，更是AI系统设计理念的革新。通过显性化推理过程，AI系统变得更加透明、可靠和强大。

未来，随着计算能力的提升和算法的优化，推理技术将在更多复杂场景中发挥关键作用。同时，我们也需要关注推理技术的可解释性、安全性和伦理问题，确保这项技术能够为人类社会带来真正的价值。

作为技术人员，我们需要持续学习和实践，将推理技术与具体业务场景深度结合，推动AI技术在各个领域的创新应用。只有深入理解技术本质，才能更好地发挥其潜力，创造更大的价值。

写在最后

💬 互动提问

你觉得在你的工作中，哪些场景最适合应用推理技术？

欢迎在评论区分享你的想法和经验，我们一起探讨推理技术的更多可能性！

✅ 行动建议

今天就试试！ 选择一个你熟悉的复杂问题，尝试用思维链的方式进行分析和解决。

关注我们，下期将带来更多AI技术干货分享！

技术提示：本文内容基于《智能体设计模式》第17章"推理技术"的学习总结，适合AI技术爱好者和从业者参考。

本文基于《智能体设计模式》第17章"推理技术"内容的学习总结，结合个人技术实践和思考撰写而成。

Chapter 17: Reasoning Techniques | 第17章：推理技术

English: This chapter delves into advanced reasoning methodologies for intelligent agents, focusing on multi-step logical inferences and problem-solving. These techniques go beyond simple sequential operations, making the agent's internal reasoning explicit. This allows agents to break down problems, consider intermediate steps, and reach more robust and accurate conclusions. A core principle among these advanced methods is the allocation of increased computational resources during inference. This means granting the agent, or the underlying LLM, more processing time or steps to process a query and generate a response. Rather than a quick, single pass, the agent can engage in iterative refinement, explore multiple solution paths, or utilize external tools. This extended processing time during inference often significantly enhances accuracy, coherence, and robustness, especially for complex problems requiring deeper analysis and deliberation.

中文： 本章深入探讨智能代理的先进推理方法，专注于多步逻辑推理和问题解决。这些技术超越了简单的顺序操作，使代理的内部推理过程显性化。这使得代理能够分解问题、考虑中间步骤，并得出更加稳健和准确的结论。这些先进方法中的一个核心原则是在推理过程中分配更多的计算资源。这意味着给予代理或底层大语言模型（LLM）更多的处理时间或步骤来处理查询并生成响应。代理不再进行快速的单次传递，而是可以进行迭代优化、探索多种解决方案路径，或利用外部工具。这种在推理过程中的扩展处理时间通常会显著提高准确性、连贯性和稳健性，特别是对于需要更深入分析和深思熟虑的复杂问题。

Practical Applications & Use Cases | 实际应用与用例

English: Practical applications include:

Complex Question Answering: Facilitating the resolution of multi-hop queries, which necessitate the integration of data from diverse sources and the execution of logical deductions, potentially involving the examination of multiple reasoning paths, and benefiting from extended inference time to synthesize information.
Mathematical Problem Solving: Enabling the division of mathematical problems into smaller, solvable components, illustrating the step-by-step process, and employing code execution for precise computations, where prolonged inference enables more intricate code generation and validation.
Code Debugging and Generation: Supporting an agent's explanation of its rationale for generating or correcting code, pinpointing potential issues sequentially, and iteratively refining the code based on test results (Self-Correction), leveraging extended inference time for thorough debugging cycles.
Strategic Planning: Assisting in the development of comprehensive plans through reasoning across various options, consequences, and preconditions, and adjusting plans based on real-time feedback (ReAct), where extended deliberation can lead to more effective and reliable plans.
Medical Diagnosis: Aiding an agent in systematically assessing symptoms, test outcomes, and patient histories to reach a diagnosis, articulating its reasoning at each phase, and potentially utilizing external instruments for data retrieval (ReAct). Increased inference time allows for a more comprehensive differential diagnosis.
Legal Analysis: Supporting the analysis of legal documents and precedents to formulate arguments or provide guidance, detailing the logical steps taken, and ensuring logical consistency through self-correction. Increased inference time allows for more in-depth legal research and argument construction.

中文： 实际应用包括：

复杂问答： 促进多跳查询的解决，这需要整合来自不同来源的数据并执行逻辑推理，可能涉及检查多种推理路径，并受益于扩展的推理时间来综合信息。
数学问题解决： 使数学问题能够分解为更小的、可解决的组件，展示逐步过程，并采用代码执行进行精确计算，其中延长的推理能够实现更复杂的代码生成和验证。
代码调试与生成： 支持代理解释其生成或纠正代码的理由，按顺序指出潜在问题，并根据测试结果迭代优化代码（自我纠正），利用扩展的推理时间进行彻底的调试循环。
战略规划： 通过跨各种选项、后果和先决条件的推理来协助制定综合计划，并根据实时反馈调整计划（ReAct），其中扩展的深思熟虑可以带来更有效和可靠的计划。
医疗诊断： 帮助代理系统地评估症状、测试结果和患者病史以得出结论，在每个阶段阐明其推理，并可能利用外部工具进行数据检索（ReAct）。增加的推理时间允许进行更全面的鉴别诊断。
法律分析： 支持分析法律文件和先例以制定论据或提供指导，详细说明所采取的逻辑步骤，并通过自我纠正确保逻辑一致性。增加的推理时间允许进行更深入的法律研究和论据构建。

Reasoning techniques | 推理技术

English: To start, let's delve into the core reasoning techniques used to enhance the problem-solving abilities of AI models..

中文： 首先，让我们深入研究用于增强AI模型问题解决能力的核心推理技术。

Chain-of-Thought (CoT) prompting significantly enhances LLMs complex reasoning abilities by mimicking a step-by-step thought process (see Fig. 1). Instead of providing a direct answer, CoT prompts guide the model to generate a sequence of intermediate reasoning steps. This explicit breakdown allows LLMs to tackle complex problems by decomposing them into smaller, more manageable sub-problems. This technique markedly improves the model's performance on tasks requiring multi-step reasoning, such as arithmetic, common sense reasoning, and symbolic manipulation. A primary advantage of CoT is its ability to transform a difficult, single-step problem into a series of simpler steps, thereby increasing the transparency of the LLM's reasoning process. This approach not only boosts accuracy but also offers valuable insights into the model's decision-making, aiding in debugging and comprehension. CoT can be implemented using various strategies, including offering few-shot examples that demonstrate step-by-step reasoning or simply instructing the model to "think step by step." Its effectiveness stems from its ability to guide the model's internal processing toward a more deliberate and logical progression. As a result, Chain-of-Thought has become a cornerstone technique for enabling advanced reasoning capabilities in contemporary LLMs. This enhanced transparency and breakdown of complex problems into manageable sub-problems is particularly important for autonomous agents, as it enables them to perform more reliable and auditable actions in complex environments.

**思维链（Chain-of-Thought, CoT）**提示通过模仿逐步思维过程显著增强了大语言模型（LLMs）的复杂推理能力（见图1）。CoT提示不是提供直接答案，而是引导模型生成一系列中间推理步骤。这种显式分解使LLMs能够通过将复杂问题分解为更小、更易管理的子问题来解决它们。这种技术显著提高了模型在需要多步推理的任务上的表现，如算术、常识推理和符号操作。CoT的一个主要优势是它能够将困难的单步问题转化为一系列更简单的步骤，从而增加LLM推理过程的透明度。这种方法不仅提高了准确性，还为模型的决策制定提供了有价值的见解，有助于调试和理解。CoT可以通过各种策略实现，包括提供展示逐步推理的少样本示例，或简单地指示模型"逐步思考"。其有效性源于它能够引导模型的内部处理朝着更加深思熟虑和逻辑化的方向发展。因此，思维链已成为在现代LLMs中实现高级推理能力的基石技术。这种增强的透明度和将复杂问题分解为可管理子问题的能力对于自主代理特别重要，因为它使它们能够在复杂环境中执行更可靠和可审计的操作。

Fig. 1: CoT prompt alongside the detailed, step-by-step response generated by the agent. | 图1：CoT提示与代理生成的详细逐步响应

English: Let's see an example. It begins with a set of instructions that tell the AI how to think, defining its persona and a clear five-step process to follow. This is the prompt that initiates structured thinking.

Following that, the example shows the CoT process in action. The section labeled "Agent's Thought Process" is the internal monologue where the model executes the instructed steps. This is the literal "chain of thought." Finally, the "Agent's Final Answer" is the polished, comprehensive output generated as a result of that careful, step-by-step reasoning process

中文： 让我们看一个例子。它从一组告诉AI如何思考的指令开始，定义其角色和一个清晰的五步过程。这是启动结构化思考的提示。

接下来，该示例展示了CoT过程的实际行动。标有"代理思维过程"的部分是模型执行指令步骤的内部独白。这就是字面意义上的"思维链"。最后，"代理最终答案"是经过仔细、逐步推理过程生成的精致、全面的输出。

You are an Information Retrieval Agent. Your goal is to answer the user's question comprehensively and accurately by thinking step-by-step. Here's the process you must follow:

1. **Analyze the Query:** Understand the core subject and specific requirements of the user's question. Identify key entities, keywords, and the type of information being sought.

2. **Formulate Search Queries (for Knowledge Base):** Based on your analysis, generate a list of precise search queries that you would use to retrieve relevant information from a knowledge base or external tools.

3. **Simulate Information Retrieval (Self-Correction/Reasoning):** For each search query, mentally consider what kind of information you expect to find. If you were to retrieve the content, what would be the most relevant snippets? Think about potential ambiguities or missing pieces.

4. **Synthesize Information:** Based on the simulated retrieval and your understanding of the user's original query, synthesize the gathered information into a coherent and complete answer. Ensure all aspects of the query are addressed.

5. **Review and Refine:** Before finalizing, critically evaluate your answer. Is it accurate? Is it comprehensive? Is it easy to understand? Is it concise? If not, identify what needs to be improved and how.



**User Query:** "Explain the main differences between classical computers and quantum computers, and briefly describe a potential application of quantum computing."

**Agent's Thought Process (Internal CoT Output - encouraged by the prompt):**

**Thought 1 (Analyze Query):** The user wants two things: 
1. Main differences between classical and quantum computers.
2. One potential application of quantum computing.
Keywords: classical computers, quantum computers, differences, potential application.

**Thought 2 (Formulate Search Queries):**
- "differences classical vs quantum computers"
- "how quantum computers work"
- "how classical computers work"
- "applications of quantum computing"

**Thought 3 (Simulate Information Retrieval & Identify Key Concepts):**
- From "differences": Expect concepts like bits vs. qubits, superposition, entanglement, probabilistic nature, computational power for specific problems.
- From "how they work": Classical uses transistors/binary. Quantum uses quantum mechanics.
- From "applications": Expect drug discovery, material science, cryptography, optimization.

**Thought 4 (Synthesize Information):**
Classical computers use bits (0 or 1), processing information sequentially. Quantum computers use qubits, which can be 0, 1, or both simultaneously (superposition), and can be entangled, allowing for vastly different computational approaches. This enables them to solve certain problems (like factoring large numbers or simulating complex molecules) far more efficiently than classical computers. A potential application is in drug discovery, where quantum computers could simulate molecular interactions with unprecedented accuracy, accelerating the development of new medicines.

**Thought 5 (Review and Refine):**
- Differences are covered (bits vs qubits, superposition, entanglement, processing).
- Application is given (drug discovery).
- Is it brief? Yes.
- Is it clear? Yes.
- Looks good.

**Agent's Final Answer (Output to User):**
Classical computers process information using bits, which can be either a 0 or a 1 at any given time, performing operations sequentially. In contrast, quantum computers utilize qubits, which can exist in multiple states simultaneously through superposition and can be linked through entanglement, allowing for exponentially more complex computations. This fundamental difference enables quantum machines to tackle certain problems, such as simulating molecular structures or breaking complex encryption, that are intractable for even the most powerful classical supercomputers. A significant potential application of quantum computing lies in drug discovery, where its ability to precisely model molecular behavior could revolutionize the development of new pharmaceuticals.

Tree-of-Thought (ToT) is a reasoning technique that builds upon Chain-of-Thought (CoT). It allows large language models to explore multiple reasoning paths by branching into different intermediate steps, forming a tree structure (see Fig. 2) This approach supports complex problem-solving by enabling backtracking, self-correction, and exploration of alternative solutions. Maintaining a tree of possibilities allows the model to evaluate various reasoning trajectories before finalizing an answer. This iterative process enhances the model's ability to handle challenging tasks that require strategic planning and decision-making.

Fig.2: Example of Tree of Thoughts | 图2：思维树示例

English: Tree-of-Thought (ToT) is a reasoning technique that builds upon Chain-of-Thought (CoT). It allows large language models to explore multiple reasoning paths by branching into different intermediate steps, forming a tree structure. This approach supports complex problem-solving by enabling backtracking, self-correction, and exploration of alternative solutions. Maintaining a tree of possibilities allows the model to evaluate various reasoning trajectories before finalizing an answer. This iterative process enhances the model's ability to handle challenging tasks that require strategic planning and decision-making.

中文： **思维树（Tree-of-Thought, ToT）**是一种建立在思维链（CoT）基础上的推理技术。它允许大语言模型通过分支到不同的中间步骤来探索多种推理路径，形成树状结构。这种方法通过支持回溯、自我纠正和替代解决方案的探索来支持复杂问题解决。维护可能性树允许模型在最终确定答案之前评估各种推理轨迹。这种迭代过程增强了模型处理需要战略规划和决策的挑战性任务的能力。

English: Self-correction, also known as self-refinement, is a crucial aspect of an agent's reasoning process, particularly within Chain-of-Thought prompting. It involves the agent's internal evaluation of its generated content and intermediate thought processes. This critical review enables the agent to identify ambiguities, information gaps, or inaccuracies in its understanding or solutions. This iterative cycle of reviewing and refining allows the agent to adjust its approach, improve response quality, and ensure accuracy and thoroughness before delivering a final output. This internal critique enhances the agent's capacity to produce reliable and high-quality results, as demonstrated in examples within the dedicated Chapter 4.

中文： 自我纠正，也称为自我优化，是代理推理过程中的一个关键方面，特别是在思维链提示中。它涉及代理对其生成内容和中间思维过程的内部评估。这种批判性审查使代理能够识别其理解或解决方案中的模糊性、信息缺口或不准确性。这种审查和优化的迭代循环允许代理调整其方法、提高响应质量，并在提供最终输出之前确保准确性和彻底性。这种内部批评增强了代理产生可靠和高质量结果的能力，如第4章中的示例所示。

English: This example demonstrates a systematic process of self-correction, crucial for refining AI-generated content. It involves an iterative loop of drafting, reviewing against original requirements, and implementing specific improvements. The illustration begins by outlining the AI's function as a "Self-Correction Agent" with a defined five-step analytical and revision workflow. Following this, a subpar "Initial Draft" of a social media post is presented. The "Self-Correction Agent's Thought Process" forms the core of the demonstration. Here, the Agent critically evaluates the draft according to its instructions, pinpointing weaknesses such as low engagement and a vague call to action. It then suggests concrete enhancements, including the use of more impactful verbs and emojis. The process concludes with the "Final Revised Content," a polished and notably improved version that integrates the self-identified adjustments.

中文： 这个示例展示了自我纠正的系统过程，对于优化AI生成的内容至关重要。它涉及起草、根据原始要求审查和实施具体改进的迭代循环。说明首先概述了AI作为"自我纠正代理"的功能，具有定义的五步分析和修订工作流程。接下来，呈现了一个次优的社交媒体帖子"初始草稿"。"自我纠正代理思维过程"构成了演示的核心。在这里，代理根据其指令批判性地评估草稿，指出诸如参与度低和号召性用语模糊等弱点。然后，它建议具体的增强措施，包括使用更有影响力的动词和表情符号。该过程以"最终修订内容"结束，这是一个经过润色的、明显改进的版本，整合了自我识别的调整。

You are a highly critical and detail-oriented Self-Correction Agent. Your task is to review a previously generated piece of content against its original requirements and identify areas for improvement. Your goal is to refine the content to be more accurate, comprehensive, engaging, and aligned with the prompt. Here's the process you must follow for self-correction:

1. **Understand Original Requirements:** Review the initial prompt/requirements that led to the content's creation. What was the *original intent*? What were the key constraints or goals?

2. **Analyze Current Content:** Read the provided content carefully.

3. **Identify Discrepancies/Weaknesses:** Compare the current content against the original requirements. Look for:
   * **Accuracy Issues:** Are there any factual errors or misleading statements?
   * **Completeness Gaps:** Does it fully address all aspects of the original prompt? Is anything missing?
   * **Clarity & Coherence:** Is the language clear, concise, and easy to understand? Does it flow logically?
   * **Tone & Style:** Does it match the desired tone and style (e.g., professional, engaging, concise)?
   * **Engagement:** Is it captivating? Does it hold the reader's attention?
   * **Redundancy/Verbosity:** Can any parts be condensed or removed without losing meaning?

4. **Propose Specific Improvements:** For each identified weakness, suggest concrete and actionable changes. Do not just state the problem; propose a solution.

5. **Generate Revised Content:** Based on your proposed improvements, rewrite the original content to incorporate all the necessary changes. Ensure the revised content is polished and ready for final use.

**Original Prompt/Requirements:** "Write a short, engaging social media post (max 150 characters) announcing a new eco-friendly product line: 'GreenTech Gadgets'."

**Initial Draft (Content to be corrected):** "We have new products. They are green and techy. Buy GreenTech Gadgets now!"

**Self-Correction Agent's Thought Process (Internal CoT Output - encouraged by the prompt):**

**Thought 1 (Understand Original Requirements):**
- Short, engaging social media post
- Max 150 characters
- Announce "GreenTech Gadgets" - a new eco-friendly product line

**Thought 2 (Analyze Current Content):**
"We have new products. They are green and techy. Buy GreenTech Gadgets now!" (64 characters)

**Thought 3 (Identify Discrepancies/Weaknesses):**
- **Engaging:** Not very engaging. Sounds generic.
- **Eco-friendly:** Implied by "green," but not explicitly highlighted as a benefit
- **Call to Action:** "Buy now!" is okay, but could be more compelling
- **Conciseness:** It's short, but could use stronger verbs/phrasing
- **Impact:** Lacks excitement for a new product launch

**Thought 4 (Propose Specific Improvements):**
- Make it more dynamic and exciting
- Emphasize "eco-friendly" and "innovation"
- Use stronger verbs and more active voice
- Add a clear, engaging call to action
- Ensure it stays under 150 characters

**Thought 5 (Generate Revised Content):**
Original: "We have new products. They are green and techy. Buy GreenTech Gadgets now!"

Revised idea 1: "🌱 Discover GreenTech Gadgets! Our new eco-friendly line blends innovation with sustainability. Go green, go smart! Shop now! #EcoFriendly #GreenTech" (148 characters)

This revision directly addresses eco-friendliness, uses emojis for engagement, adds a stronger call to action, and includes relevant hashtags while staying within the character limit.

**Self-Correction Agent's Final Revised Content (Output to User):**
🌱 Discover GreenTech Gadgets! Our new eco-friendly line blends innovation with sustainability. Go green, go smart! Shop now! #EcoFriendly #GreenTech

Fundamentally, this technique integrates a quality control measure directly into the Agent's content generation, yielding more refined, precise, and superior results that more effectively meet intricate user demands.

从根本上说，这种技术将质量控制措施直接集成到代理的内容生成中，产生更精细、精确和卓越的结果，更有效地满足复杂的用户需求。

Program-Aided Language Models (PALMs) integrate LLMs with symbolic reasoning capabilities. This integration allows the LLM to generate and execute code, such as Python, as part of its problem-solving process. PALMs offload complex calculations, logical operations, and data manipulation to a deterministic programming environment. This approach utilizes the strengths of traditional programming for tasks where LLMs might exhibit limitations in accuracy or consistency. When faced with symbolic challenges, the model can produce code, execute it, and convert the results into natural language. This hybrid methodology combines the LLM's understanding and generation abilities with precise computation, enabling the model to address a wider range of complex problems with potentially increased reliability and accuracy. This is important for agents as it allows them to perform more accurate and reliable actions by leveraging precise computation alongside their understanding and generation capabilities. An example is the use of external tools within Google's ADK for generating code.

**程序辅助语言模型（PALMs）**将LLMs与符号推理能力相结合。这种集成允许LLM生成和执行代码（如Python），作为其问题解决过程的一部分。PALMs将复杂计算、逻辑操作和数据操作转移到确定性编程环境中。这种方法利用传统编程的优势来处理LLMs可能在准确性或一致性方面表现出局限性的任务。当面临符号挑战时，模型可以生成代码、执行代码，并将结果转换为自然语言。这种混合方法将LLM的理解和生成能力与精确计算相结合，使模型能够处理更广泛的复杂问题，并可能提高可靠性和准确性。这对于代理很重要，因为它允许它们通过利用精确计算以及理解和生成能力来执行更准确和可靠的操作。一个例子是在Google的ADK中使用外部工具生成代码。

from google.adk.tools import agent_tool
from google.adk.agents import Agent
from google.adk.tools import google_search
from google.adk.code_executors import BuiltInCodeExecutor

search_agent = Agent(
    model='gemini-2.0-flash',
    name='SearchAgent',
    instruction="""
    You're a specialist in Google Search
    """,
    tools=[google_search],
)

coding_agent = Agent(
    model='gemini-2.0-flash',
    name='CodeAgent',
    instruction="""
    You're a specialist in Code Execution
    """,
    code_executor=[BuiltInCodeExecutor],
)

root_agent = Agent(
    name="RootAgent",
    model="gemini-2.0-flash",
    description="Root Agent",
    tools=[
        agent_tool.AgentTool(agent=search_agent),
        agent_tool.AgentTool(agent=coding_agent)
    ],
)

Reinforcement Learning with Verifiable Rewards (RLVR): While effective, the standard Chain-of-Thought (CoT) prompting used by many LLMs is a somewhat basic approach to reasoning. It generates a single, predetermined line of thought without adapting to the complexity of the problem. To overcome these limitations, a new class of specialized "reasoning models" has been developed. These models operate differently by dedicating a variable amount of "thinking" time before providing an answer. This "thinking" process produces a more extensive and dynamic Chain-of-Thought that can be thousands of tokens long. This extended reasoning allows for more complex behaviors like self-correction and backtracking, with the model dedicating more effort to harder problems. The key innovation enabling these models is a training strategy called Reinforcement Learning from Verifiable Rewards (RLVR). By training the model on problems with known correct answers (like math or code), it learns through trial and error to generate effective, long-form reasoning. This allows the model to evolve its problem-solving abilities without direct human supervision. Ultimately, these reasoning models don't just produce an answer; they generate a "reasoning trajectory" that demonstrates advanced skills like planning, monitoring, and evaluation. This enhanced ability to reason and strategize is fundamental to the development of autonomous AI agents, which can break down and solve complex tasks with minimal human intervention.

**可验证奖励的强化学习（RLVR）：**尽管有效，但许多LLMs使用的标准思维链（CoT）提示是一种相对基础的推理方法。它生成单一的、预先确定的思维线，而不适应问题的复杂性。为了克服这些限制，已经开发了一类新的专业"推理模型"。这些模型的运作方式不同，通过在提供答案之前投入可变的"思考"时间。这种"思考"过程产生更广泛和动态的思维链，可以长达数千个标记。这种扩展推理允许更复杂的行为，如自我纠正和回溯，模型在更难的问题上投入更多精力。使这些模型成为可能的关键创新是一种称为可验证奖励的强化学习（RLVR）的训练策略。通过在具有已知正确答案的问题（如数学或代码）上训练模型，它通过试错学习生成有效的长篇推理。这使得模型能够在没有直接人工监督的情况下发展其问题解决能力。最终，这些推理模型不仅产生答案；它们生成一个展示高级技能（如规划、监控和评估）的"推理轨迹"。这种增强的推理和策略制定能力对于开发能够分解和解决复杂任务的自主AI代理至关重要，且只需最少的人工干预。

ReAct (Reasoning and Acting, see Fig. 3, where KB stands for Knowledge Base) is a paradigm that integrates Chain-of-Thought (CoT) prompting with an agent's ability to interact with external environments through tools. Unlike generative models that produce a final answer, a ReAct agent reasons about which actions to take. This reasoning phase involves an internal planning process, similar to CoT, where the agent determines its next steps, considers available tools, and anticipates outcomes. Following this, the agent acts by executing a tool or function call, such as querying a database, performing a calculation, or interacting with an API.

ReAct（推理与行动，见图3，其中KB代表知识库）是一种将思维链（CoT）提示与代理通过工具与外部环境交互的能力相结合的范式。与产生最终答案的生成模型不同，ReAct代理推理要采取哪些行动。这个推理阶段涉及内部规划过程，类似于CoT，代理确定下一步、考虑可用工具并预测结果。随后，代理通过执行工具或函数调用来行动，例如查询数据库、执行计算或与API交互。

Fig.3: Reasoning and Act | 图3：推理与行动

English: ReAct (Reasoning and Acting, see Fig. 3, where KB stands for Knowledge Base) is a paradigm that integrates Chain-of-Thought (CoT) prompting with an agent's ability to interact with external environments through tools. Unlike generative models that produce a final answer, a ReAct agent reasons about which actions to take. This reasoning phase involves an internal planning process, similar to CoT, where the agent determines its next steps, considers available tools, and anticipates outcomes. Following this, the agent acts by executing a tool or function call, such as querying a database, performing a calculation, or interacting with an API.

ReAct以交错方式运作：代理执行行动，观察结果，并将此观察结果纳入后续推理。这种"思考、行动、观察、思考..."的迭代循环允许代理动态调整其计划、纠正错误，并实现需要与环境进行多次交互的目标。与线性CoT相比，这提供了更稳健和灵活的问题解决方法，因为代理会响应实时反馈。通过将语言模型的理解和生成能力与使用工具的能力相结合，ReAct使代理能够执行需要推理和实际执行的复杂任务。这种方法对于代理至关重要，因为它允许它们不仅进行推理，还能实际执行步骤并与动态环境交互。

English: ReAct operates in an interleaved manner: the agent executes an action, observes the outcome, and incorporates this observation into subsequent reasoning. This iterative loop of "Thought, Action, Observation, Thought..." allows the agent to dynamically adapt its plan, correct errors, and achieve goals requiring multiple interactions with the environment. This provides a more robust and flexible problem-solving approach compared to linear CoT, as the agent responds to real-time feedback. By combining language model understanding and generation with the capability to use tools, ReAct enables agents to perform complex tasks requiring both reasoning and practical execution. This approach is crucial for agents as it allows them to not only reason but also to practically execute steps and interact with dynamic environments.

CoD（辩论链）是微软提出的正式AI框架，其中多个不同的模型协作和争论以解决问题，超越单个AI的"思维链"。该系统像AI委员会会议一样运作，不同模型提出初始想法，批评彼此的推理，并交换反驳论点。主要目标是通过利用集体智能来提高准确性、减少偏见并提高最终答案的整体质量。作为同行评审的AI版本，此方法创建了推理过程的透明和可信记录。最终，它代表了从单个代理提供答案到协作代理团队共同寻找更稳健和验证的解决方案的转变。

English: CoD (Chain of Debates) is a formal AI framework proposed by Microsoft where multiple, diverse models collaborate and argue to solve a problem, moving beyond a single AI's "chain of thought." This system operates like an AI council meeting, where different models present initial ideas, critique each other's reasoning, and exchange counterarguments. The primary goal is to enhance accuracy, reduce bias, and improve the overall quality of the final answer by leveraging collective intelligence. Functioning as an AI version of peer review, this method creates a transparent and trustworthy record of the reasoning process. Ultimately, it represents a shift from a solitary Agent providing an answer to a collaborative team of Agents working together to find a more robust and validated solution.

GoD（辩论图）是先进的代理框架，将讨论重新想象为动态、非线性网络，而不是简单的链。在此模型中，论点是单个节点，由表示"支持"或"反驳"等关系的边连接，反映真实辩论的多线程性质。这种结构允许新的探究线索动态分支、独立发展，甚至随时间合并。结论不是在序列末尾达成，而是通过在整个图中识别最稳健和得到充分支持的论点集群来达成。在这种情况下，"得到充分支持"指的是牢固建立和可验证的知识。这可以包括被认为是基本真理的信息，意味着它本质上是正确的并被广泛接受为事实。此外，它包括通过搜索基础获得的事实证据，其中信息根据外部来源和真实世界数据进行验证。最后，它还涉及在辩论期间由多个模型达成的共识，表明对所呈现信息的很高程度的协议和信心。这种综合方法确保为正在讨论的信息提供更稳健和可靠的基础。

English: GoD (Graph of Debates) is an advanced Agentic framework that reimagines discussion as a dynamic, non-linear network rather than a simple chain. In this model, arguments are individual nodes connected by edges that signify relationships like 'supports' or 'refutes,' reflecting the multi-threaded nature of real debate. This structure allows new lines of inquiry to dynamically branch off, evolve independently, and even merge over time. A conclusion is reached not at the end of a sequence, but by identifying the most robust and well-supported cluster of arguments within the entire graph. In this context, "well-supported" refers to knowledge that is firmly established and verifiable. This can include information considered to be ground truth, which means it is inherently correct and widely accepted as fact. Additionally, it encompasses factual evidence obtained through search grounding, where information is validated against external sources and real-world data. Finally, it also pertains to a consensus reached by multiple models during a debate, indicating a high degree of agreement and confidence in the information presented. This comprehensive approach ensures a more robust and reliable foundation for the information being discussed.

**MASS（可选高级主题）：**对多代理系统设计的深入分析表明，其有效性关键取决于用于编程单个代理的提示质量以及决定其交互的拓扑结构。设计这些系统的复杂性很大，因为它涉及庞大而复杂的搜索空间。为了应对这一挑战，开发了称为多代理系统搜索（MASS）的新框架，以自动化和优化MAS的设计。

English: MASS (optional advanced topic): An in-depth analysis of the design of multi-agent systems reveals that their effectiveness is critically dependent on both the quality of the prompts used to program individual agents and the topology that dictates their interactions. The complexity of designing these systems is significant, as it involves a vast and intricate search space. To address this challenge, a novel framework called Multi-Agent System Search (MASS) was developed to automate and optimize the design of MAS.

MASS采用多阶段优化策略，通过交错提示和拓扑优化来系统地导航复杂的设计空间（见图4）。

English: MASS employs a multi-stage optimization strategy that systematically navigates the complex design space by interleaving prompt and topology optimization (see Fig. 4).

1. 块级提示优化：过程从单个代理类型或"块"的提示本地优化开始，以确保每个组件在集成到更大系统之前有效执行其角色。这个初始步骤至关重要，因为它确保后续拓扑优化建立在表现良好的代理基础上，而不是遭受配置不良的代理的复合影响。例如，在为HotpotQA数据集优化时，"辩论者"代理的提示被创造性地设计为指示其充当"主要出版物的专家事实检查员"。其优化任务是仔细审查来自其他代理的建议答案，将它们与提供的上下文段落进行交叉引用，并识别任何不一致或不受支持的声明。在块级优化期间发现的这种专业角色扮演提示旨在使辩论者代理在放入更大工作流程之前高度有效地综合信息。

English: #### 1. Block-Level Prompt Optimization: The process begins with a local optimization of prompts for individual agent types, or "blocks," to ensure each component performs its role effectively before being integrated into a larger system. This initial step is crucial as it ensures that the subsequent topology optimization builds upon well-performing agents, rather than suffering from the compounding impact of poorly configured ones. For example, when optimizing for the HotpotQA dataset, the prompt for a "Debator" agent is creatively framed to instruct it to act as an "expert fact-checker for a major publication". Its optimized task is to meticulously review proposed answers from other agents, cross-reference them with provided context passages, and identify any inconsistencies or unsupported claims. This specialized role-playing prompt, discovered during block-level optimization, aims to make the debator agent highly effective at synthesizing information before it's even placed into a larger workflow.

2. 工作流程拓扑优化：本地优化后，MASS通过从可定制设计空间中选择和排列不同的代理交互来优化工作流程拓扑。为了使此搜索高效，MASS采用影响加权方法。此方法通过测量每个拓扑相对于基线代理的性能增益来计算其"增量影响"，并使用这些分数引导搜索走向更有前景的组合。例如，在为MBPP编码任务优化时，拓扑搜索发现特定混合工作流程最有效。找到的最佳拓扑不是简单的结构，而是具有外部工具使用的迭代优化过程的组合。具体来说，它由一个预测器代理组成，该代理参与多轮反思，其代码由一个执行器代理验证，该执行器代理根据测试用例运行代码。这个发现的工作流程表明，对于编码，结合迭代自我纠正与外部验证的结构优于简单的MAS设计。

English: #### 2. Workflow Topology Optimization: Following local optimization, MASS optimizes the workflow topology by selecting and arranging different agent interactions from a customizable design space. To make this search efficient, MASS employs an influence-weighted method. This method calculates the "incremental influence" of each topology by measuring its performance gain relative to a baseline agent and uses these scores to guide the search toward more promising combinations. For instance, when optimizing for the MBPP coding task, the topology search discovers that a specific hybrid workflow is most effective. The best-found topology is not a simple structure but a combination of an iterative refinement process with external tool use. Specifically, it consists of one predictor agent that engages in several rounds of reflection, with its code being verified by one executor agent that runs the code against test cases. This discovered workflow shows that for coding, a structure that combines iterative self-correction with external verification is superior to simpler MAS designs.

图4：（作者提供）：多代理系统搜索（MASS）框架是一个三阶段优化过程，导航包含可优化提示（指令和演示）和可配置代理构建块（聚合、反思、辩论、总结和工具使用）的搜索空间。第一阶段，块级提示优化，独立优化每个代理模块的提示。第二阶段，工作流程拓扑优化，从影响加权设计空间中对有效系统配置进行采样，整合优化后的提示。最后阶段，工作流程级提示优化，在识别出第二阶段的最佳工作流程后，对整个多代理系统的提示进行第二轮优化。

English: #### Fig. 4: (Courtesy of the Authors): The Multi-Agent System Search (MASS) Framework is a three-stage optimization process that navigates a search space encompassing optimizable prompts (instructions and demonstrations) and configurable agent building blocks (Aggregate, Reflect, Debate, Summarize, and Tool-use). The first stage, Block-level Prompt Optimization, independently optimizes prompts for each agent module. Stage two, Workflow Topology Optimization, samples valid system configurations from an influence-weighted design space, integrating the optimized prompts. The final stage, Workflow-level Prompt Optimization, involves a second round of prompt optimization for the entire multi-agent system after the optimal workflow from Stage two has been identified.

3. 工作流程级提示优化：最后阶段涉及整个系统提示的全局优化。在识别出表现最佳的工作流程后，提示作为单个集成实体进行微调，以确保它们为编排量身定制，并且代理相互依赖性得到优化。例如，在为DROP数据集找到最佳拓扑后，最终优化阶段优化"预测器"代理的提示。最终优化的提示非常详细，首先向代理提供数据集本身的摘要，指出其专注于"提取问题回答"和"数字信息"。然后，它包括正确问答行为的少样本示例，并将核心指令设计为高风险场景："您是一个高度专业化的AI，负责为紧急新闻报道提取关键数字信息。现场广播依赖于您的准确性和速度"。这种多方面的提示，结合元知识、示例和角色扮演，专门为最终工作流程调整，以最大化准确性。

English: #### 3. Workflow-Level Prompt Optimization: The final stage involves a global optimization of the entire system's prompts. After identifying the best-performing topology, the prompts are fine-tuned as a single, integrated entity to ensure they are tailored for orchestration and that agent interdependencies are optimized. As an example, after finding the best topology for the DROP dataset, the final optimization stage refines the "Predictor" agent's prompt. The final, optimized prompt is highly detailed, beginning by providing the agent with a summary of the dataset itself, noting its focus on "extractive question answering" and "numerical information". It then includes few-shot examples of correct question-answering behavior and frames the core instruction as a high-stakes scenario: "You are a highly specialized AI tasked with extracting critical numerical information for an urgent news report. A live broadcast is relying on your accuracy and speed". This multi-faceted prompt, combining meta-knowledge, examples, and role-playing, is tuned specifically for the final workflow to maximize accuracy.

关键发现和原则：实验表明，由MASS优化的MAS在一系列任务上显著优于现有的手动设计系统和其他自动化设计方法。从这项研究中得出的有效MAS的关键设计原则有三方面：

在组合代理之前，用高质量的提示优化单个代理。
通过组合有影响力的拓扑而不是探索无约束的搜索空间来构建MAS。
通过最终的、工作流程级联合优化来建模和优化代理之间的相互依赖性。

English: #### Key Findings and Principles: Experiments demonstrate that MAS optimized by MASS significantly outperform existing manually designed systems and other automated design methods across a range of tasks. The key design principles for effective MAS, as derived from this research, are threefold:

Optimize individual agents with high-quality prompts before composing them.
Construct MAS by composing influential topologies rather than exploring an unconstrained search space.
Model and optimize the interdependencies between agents through a final, workflow-level joint optimization.

English: Building on our discussion of key reasoning techniques, let's first examine a core performance principle: the Scaling Inference Law for LLMs. This law states that a model's performance predictably improves as the computational resources allocated to it increase. We can see this principle in action in complex systems like Deep Research, where an AI agent leverages these resources to autonomously investigate a topic by breaking it down into sub-questions, using Web search as a tool, and synthesizing its findings.

中文： 在我们讨论关键推理技术的基础上，让我们首先考察一个核心性能原则：LLM的推理扩展定律。该定律指出，模型的性能随着分配给它计算资源的增加而可预测地提高。我们可以在像深度研究这样的复杂系统中看到这一原则的实际应用，其中AI代理利用这些资源自主调查主题，将其分解为子问题，使用网络搜索作为工具，并综合其发现。

**深度研究。**术语"深度研究"描述了一类AI代理工具，旨在充当不知疲倦、有条不紊的研究助手。该领域的主要平台包括Perplexity AI、Google的Gemini研究能力和OpenAI在ChatGPT中的高级功能（见图5）。

English: Deep Research. The term "Deep Research" describes a category of AI Agentic tools designed to act as tireless, methodical research assistants. Major platforms in this space include Perplexity AI, Google's Gemini research capabilities, and OpenAI's advanced functions within ChatGPT (see Fig.5).

图5：用于信息收集的Google深度研究

English: ![][image5]Fig. 5: Google Deep Research for Information Gathering

这些工具引入的一个根本性转变是搜索过程本身的变化。标准搜索提供即时链接，将综合工作留给您。深度研究在不同的模型上运作。在这里，您为AI分配一个复杂的查询，并授予它"时间预算"——通常是几分钟。作为这种耐心的回报，您会收到一份详细的报告。

English: A fundamental shift introduced by these tools is the change in the search process itself. A standard search provides immediate links, leaving the work of synthesis to you. Deep Research operates on a different model. Here, you task an AI with a complex query and grant it a "time budget"—usually a few minutes. In return for this patience, you receive a detailed report.

在此期间，AI以代理方式代表您工作。它自主执行一系列复杂的步骤，这些步骤对个人来说将非常耗时：

English: During this time, the AI works on your behalf in an agentic way. It autonomously performs a series of sophisticated steps that would be incredibly time-consuming for a person:

**初始探索：**它根据您的初始提示运行多个有针对性的搜索。
**推理和优化：**它阅读并分析第一波结果，综合发现，并批判性地识别需要更多细节的缺口、矛盾或领域。
**后续调查：**基于其内部推理，它进行新的、更细致的搜索来填补这些缺口并深化其理解。
**最终综合：**经过几轮这种迭代搜索和推理后，它将所有验证的信息编译成单个、连贯和结构化的摘要。

English: 1. Initial Exploration: It runs multiple, targeted searches based on your initial prompt.
2. Reasoning and Refinement: It reads and analyzes the first wave of results, synthesizes the findings, and critically identifies gaps, contradictions, or areas that require more detail.

Follow-up Inquiry: Based on its internal reasoning, it conducts new, more nuanced searches to fill those gaps and deepen its understanding.
Final Synthesis: After several rounds of this iterative searching and reasoning, it compiles all the validated information into a single, cohesive, and structured summary.

这种系统方法确保全面和充分推理的响应，显著提高信息收集的效率和深度，从而促进更具代理性的决策制定。

English: This systematic approach ensures a comprehensive and well-reasoned response, significantly enhancing the efficiency and depth of information gathering, thereby facilitating more agentic decision-making.

推理扩展定律

这一关键原则规定了LLM性能与其在操作阶段（称为推理）期间分配的计算资源之间的关系。推理扩展定律不同于更熟悉的训练扩展定律，后者专注于模型质量在模型创建过程中如何随着数据量和计算能力的增加而提高。相反，该定律专门检查当LLM积极生成输出或答案时发生的动态权衡。

English: Scaling Inference Law

This critical principle dictates the relationship between an LLM's performance and the computational resources allocated during its operational phase, known as inference. The Inference Scaling Law differs from the more familiar scaling laws for training, which focus on how model quality improves with increased data volume and computational power during a model's creation. Instead, this law specifically examines the dynamic trade-offs that occur when an LLM is actively generating an output or answer.

该定律的一个基石是揭示，通过增加推理时的计算投资，通常可以从相对较小的LLM中获得卓越的结果。这不一定是使用更强大的GPU，而是采用更复杂或资源密集的推理策略。这种策略的主要示例是指示模型生成多个潜在答案——可能通过多样化束搜索或自一致性方法等技术——然后采用选择机制来识别最佳输出。这种迭代优化或多候选生成过程需要更多的计算周期，但可以显著提高最终响应的质量。

English: A cornerstone of this law is the revelation that superior results can frequently be achieved from a comparatively smaller LLM by augmenting the computational investment at inference time. This doesn't necessarily mean using a more powerful GPU, but rather employing more sophisticated or resource-intensive inference strategies. A prime example of such a strategy is instructing the model to generate multiple potential answers—perhaps through techniques like diverse beam search or self-consistency methods—and then employing a selection mechanism to identify the most optimal output. This iterative refinement or multiple-candidate generation process demands more computational cycles but can significantly elevate the quality of the final response.

这一原则为代理系统中LLM部署的明智和经济合理的决策提供了关键框架。它挑战了更大的模型将始终产生更好性能的直觉观念。该定律假设，在推理期间被授予更大量"思考预算"的较小模型，有时可以超越依赖更简单、计算密集度较低的生成过程的更大模型的性能。这里的"思考预算"指的是在推理过程中应用的额外计算步骤或复杂算法，允许较小模型在确定答案之前探索更广泛的可能性或应用更严格的内部检查。

English: This principle offers a crucial framework for informed and economically sound decision-making in the deployment of Agents systems. It challenges the intuitive notion that a larger model will always yield better performance. The law posits that a smaller model, when granted a more substantial "thinking budget" during inference, can occasionally surpass the performance of a much larger model that relies on a simpler, less computationally intensive generation process. The "thinking budget" here refers to the additional computational steps or complex algorithms applied during inference, allowing the smaller model to explore a wider range of possibilities or apply more rigorous internal checks before settling on an answer.

因此，推理扩展定律成为构建高效和成本效益的代理系统的基本原则。它为仔细平衡几个相互关联的因素提供了方法论：

**模型大小：**较小模型在内存和存储方面的需求本身就较低。
**响应延迟：**虽然推理时计算的增加会增加延迟，但该定律有助于识别性能增益超过这种增加的点，或如何战略性地应用计算以避免过度延迟。
**运营成本：**部署和运行较大模型通常由于功耗和基础设施需求的增加而产生更高的持续运营成本。该定律展示了如何在不必要地增加这些成本的情况下优化性能。

English: Consequently, the Scaling Inference Law becomes fundamental to constructing efficient and cost-effective Agentic systems. It provides a methodology for meticulously balancing several interconnected factors:

Model Size: Smaller models are inherently less demanding in terms of memory and storage.
Response Latency: While increased inference-time computation can add to latency, the law helps identify the point at which the performance gains outweigh this increase, or how to strategically apply computation to avoid excessive delays.
Operational Cost: Deploying and running larger models typically incurs higher ongoing operational costs due to increased power consumption and infrastructure requirements. The law demonstrates how to optimize performance without unnecessarily escalating these costs.

通过理解和应用推理扩展定律，开发者和组织可以做出战略选择，为特定代理应用带来最佳性能，确保计算资源分配到对LLM输出质量和效用影响最大的地方。这允许更细致和经济的AI部署方法，超越简单的"越大越好"范式。

English: By understanding and applying the Scaling Inference Law, developers and organizations can make strategic choices that lead to optimal performance for specific agentic applications, ensuring that computational resources are allocated where they will have the most significant impact on the quality and utility of the LLM's output. This allows for more nuanced and economically viable approaches to AI deployment, moving beyond a simple "bigger is better" paradigm.

实践代码示例

English: # Hands-On Code Example

Google开源的DeepSearch代码可通过gemini-fullstack-langgraph-quickstart存储库获得（图6）。该存储库为开发者提供了使用Gemini 2.5和LangGraph编排框架构建全栈AI代理的模板。这个开源栈有助于基于代理的架构实验，并可与本地LLMs（如Gemma）集成。它利用Docker和模块化项目脚手架进行快速原型设计。应该注意的是，此版本作为结构良好的演示，并非旨在作为生产就绪的后端。

English: The DeepSearch code, open-sourced by Google, is available through the gemini-fullstack-langgraph-quickstart repository (Fig. 6). This repository provides a template for developers to construct full-stack AI agents using Gemini 2.5 and the LangGraph orchestration framework. This open-source stack facilitates experimentation with agent-based architectures and can be integrated with local LLLMs such as Gemma. It utilizes Docker and modular project scaffolding for rapid prototyping. It should be noted that this release serves as a well-structured demonstration and is not intended as a production-ready backend.

图6：（作者提供）具有多个反思步骤的DeepSearch示例

English: ![][image6]

Fig. 6: (Courtesy of authors) Example of DeepSearch with multiple Reflection steps

该项目提供了一个全栈应用程序，具有React前端和LangGraph后端，专为高级研究和对话AI设计。LangGraph代理使用GoogleGemini模型动态生成搜索查询，并通过Google搜索API集成网络研究。该系统采用反思推理来识别知识缺口，迭代优化搜索，并综合带有引用的答案。前端和后端支持热重载。项目结构包括单独的前端/和后端/目录。设置要求包括Node.js、npm、Python 3.8+和GoogleGemini API密钥。在后端的.env文件中配置API密钥后，可以安装后端（使用pip install .）和前端（npm install）的依赖项。可以使用make dev并发运行开发服务器，或单独运行。后端代理在后端/src/agent/graph.py中定义，生成初始搜索查询，进行网络研究，执行知识缺口分析，迭代优化查询，并使用Gemini模型综合带引用的答案。生产部署涉及后端服务器交付静态前端构建，需要Redis用于流式传输实时输出，以及Postgres数据库用于管理数据。可以使用docker-compose up构建和运行Docker镜像，这也需要LangSmith API密钥用于docker-compose.yml示例。该应用程序利用React与Vite、Tailwind CSS、Shadcn UI、LangGraph和GoogleGemini。项目根据Apache许可证2.0版授权。

English: This project provides a full-stack application featuring a React frontend and a LangGraph backend, designed for advanced research and conversational AI. A LangGraph agent dynamically generates search queries using Google Gemini models and integrates web research via the Google Search API. The system employs reflective reasoning to identify knowledge gaps, refine searches iteratively, and synthesize answers with citations. The frontend and backend support hot-reloading. The project's structure includes separate frontend/ and backend/ directories. Requirements for setup include Node.js, npm, Python 3.8+, and a Google Gemini API key. After configuring the API key in the backend's .env file, dependencies for both the backend (using pip install .) and frontend (npm install) can be installed. Development servers can be run concurrently with make dev or individually. The backend agent, defined in backend/src/agent/graph.py, generates initial search queries, conducts web research, performs knowledge gap analysis, refines queries iteratively, and synthesizes a cited answer using a Gemini model. Production deployment involves the backend server delivering a static frontend build and requires Redis for streaming real-time output and a Postgres database for managing data. A Docker image can be built and run using docker-compose up, which also requires a LangSmith API key for the docker-compose.yml example. The application utilizes React with Vite, Tailwind CSS, Shadcn UI, LangGraph, and Google Gemini. The project is licensed under the Apache License 2.0.

# 创建我们的代理图
builder = StateGraph(OverallState, config_schema=Configuration)
# 定义我们将循环的节点
builder.add_node("generate_query", generate_query)
builder.add_node("web_research", web_research)
builder.add_node("reflection", reflection)
builder.add_node("finalize_answer", finalize_answer)
# 设置入口点为`generate_query`
# 这意味着这个节点是第一个被调用的
builder.add_edge(START, "generate_query")
# 添加条件边以在并行分支中继续使用搜索查询
builder.add_conditional_edges(
    "generate_query", continue_to_web_research, ["web_research"]
)
# 对网络研究进行反思
builder.add_edge("web_research", "reflection")
# 评估研究
builder.add_conditional_edges(
    "reflection", evaluate_research, ["web_research", "finalize_answer"]
)
# 最终确定答案
builder.add_edge("finalize_answer", END)
graph = builder.compile(name="pro-search-agent")

图4：具有LangGraph的DeepSearch示例（来自后端/src/agent/graph.py的代码）

English: Fig.4: Example of DeepSearch with LangGraph (code from backend/src/agent/graph.py)

那么，代理如何思考？

English: # So, what do agents think?

总之，代理的思考过程是一种结构化方法，结合推理和行动来解决问题。这种方法允许代理显式规划其步骤，监控其进展，并与外部工具交互以收集信息。

English: In summary, an agent's thinking process is a structured approach that combines reasoning and acting to solve problems. This method allows an agent to explicitly plan its steps, monitor its progress, and interact with external tools to gather information.

其核心，代理的"思考"由强大的LLM促进。这个LLM生成一系列指导代理后续行动的想法。该过程通常遵循思考-行动-观察循环：

English: At its core, the agent's "thinking" is facilitated by a powerful LLM. This LLM generates a series of thoughts that guide the agent's subsequent actions. The process typically follows a thought-action-observation loop:

**思考：**代理首先生成一个文本思考，分解问题、制定计划或分析当前情况。这种内部独白使代理的推理过程透明且可引导。
**行动：**基于思考，代理从预定义的离散选项集中选择行动。例如，在问答场景中，行动空间可能包括在线搜索、从特定网页检索信息或提供最终答案。
**观察：**然后，代理根据所采取行动从环境中接收反馈。这可能是网络搜索结果或网页内容。

English: 1. Thought: The agent first generates a textual thought that breaks down the problem, formulates a plan, or analyzes the current situation. This internal monologue makes the agent's reasoning process transparent and steerable.
2. Action: Based on the thought, the agent selects an action from a predefined, discrete set of options. For example, in a question-answering scenario, the action space might include searching online, retrieving information from a specific webpage, or providing a final answer.

Observation: The agent then receives feedback from its environment based on the action taken. This could be the results of a web search or the content of a webpage.

这个循环重复进行，每次观察都为下一次思考提供信息，直到代理确定它已达到最终解决方案并执行"完成"行动。

English: This cycle repeats, with each observation informing the next thought, until the agent determines that it has reached a final solution and performs a "finish" action.

这个循环会重复进行，每次观察结果都会为下一次思考提供信息，直到代理确定已达到最终解决方案并执行"完成"操作。

The effectiveness of this approach relies on the advanced reasoning and planning capabilities of the underlying LLM. To guide the agent, the ReAct framework often employs few-shot learning, where the LLM is provided with examples of human-like problem-solving trajectories. These examples demonstrate how to effectively combine thoughts and actions to solve similar tasks.

这种方法的有效性依赖于底层LLM的高级推理和规划能力。为了指导代理，ReAct框架通常采用小样本学习，为LLM提供类人问题解决轨迹的示例。这些示例展示了如何有效结合思考和行动来解决类似任务。

The frequency of an agent's thoughts can be adjusted depending on the task. For knowledge-intensive reasoning tasks like fact-checking, thoughts are typically interleaved with every action to ensure a logical flow of information gathering and reasoning. In contrast, for decision-making tasks that require many actions, such as navigating a simulated environment, thoughts may be used more sparingly, allowing the agent to decide when thinking is necessary

代理思考的频率可以根据任务进行调整。对于像事实核查这样的知识密集型推理任务，思考通常会与每个行动交织在一起，以确保信息收集和推理的逻辑流程。相比之下，对于需要许多行动的决策任务（如在模拟环境中导航），思考可能更加节约使用，让代理自行决定何时需要思考。

At a Glance

概览

What: Complex problem-solving often requires more than a single, direct answer, posing a significant challenge for AI. The core problem is enabling AI agents to tackle multi-step tasks that demand logical inference, decomposition, and strategic planning. Without a structured approach, agents may fail to handle intricacies, leading to inaccurate or incomplete conclusions. These advanced reasoning methodologies aim to make an agent's internal "thought" process explicit, allowing it to systematically work through challenges.

什么：复杂问题解决通常需要的不只是单一、直接的答案，这给AI带来了重大挑战。核心问题是让AI代理能够处理需要逻辑推理、分解和战略规划的多步骤任务。如果没有结构化方法，代理可能无法处理复杂性，导致不准确或不完整的结论。这些高级推理方法旨在使代理的内部"思考"过程明确化，让其能够系统地应对挑战。

Why: The standardized solution is a suite of reasoning techniques that provide a structured framework for an agent's problem-solving process. Methodologies like Chain-of-Thought (CoT) and Tree-of-Thought (ToT) guide LLMs to break down problems and explore multiple solution paths. Self-Correction allows for the iterative refinement of answers, ensuring higher accuracy. Agentic frameworks like ReAct integrate reasoning with action, enabling agents to interact with external tools and environments to gather information and adapt their plans. This combination of explicit reasoning, exploration, refinement, and tool use creates more robust, transparent, and capable AI systems.

为什么：标准化解决方案是一套推理技术，为代理的问题解决过程提供结构化框架。Chain-of-Thought (CoT)和Tree-of-Thought (ToT)等方法指导LLM分解问题并探索多种解决方案路径。Self-Correction允许迭代优化答案，确保更高准确性。ReAct等代理框架将推理与行动相结合，使代理能够与外部工具和环境交互以收集信息并调整计划。这种显式推理、探索、优化和工具使用的结合创造了更强大、透明和能干的AI系统。

Rule of thumb: Use these reasoning techniques when a problem is too complex for a single-pass answer and requires decomposition, multi-step logic, interaction with external data sources or tools, or strategic planning and adaptation. They are ideal for tasks where showing the "work" or thought process is as important as the final answer.

经验法则：当问题过于复杂，无法通过单次回答解决，且需要分解、多步骤逻辑、与外部数据源或工具交互，或需要战略规划和适应时，请使用这些推理技术。它们非常适合那些"工作"或思考过程与最终答案同等重要的任务。

Visual summary

视觉总结

Fig. 7: Reasoning design pattern

图7：推理设计模式

Key Takeaways

关键要点

By making their reasoning explicit, agents can formulate transparent, multi-step plans, which is the foundational capability for autonomous action and user trust.
通过使推理过程明确化，代理能够制定透明、多步骤的计划，这是自主行动和用户信任的基础能力。
The ReAct framework provides agents with their core operational loop, empowering them to move beyond mere reasoning and interact with external tools to dynamically act and adapt within an environment.\ ReAct框架为代理提供了核心操作循环，使其能够超越单纯的推理，与外部工具交互，在环境中动态行动和适应。
The Scaling Inference Law implies an agent's performance is not just about its underlying model size, but its allocated "thinking time," allowing for more deliberate and higher-quality autonomous actions.\ 推理扩展定律表明，代理的性能不仅取决于其底层模型大小，还取决于其分配的"思考时间"，从而实现更审慎和更高质量的自主行动。
Chain-of-Thought (CoT) serves as an agent's internal monologue, providing a structured way to formulate a plan by breaking a complex goal into a sequence of manageable actions.\ Chain-of-Thought (CoT)作为代理的内部独白，提供了一种结构化方法，通过将复杂目标分解为一系列可管理的行动来制定计划。
Tree-of-Thought and Self-Correction give agents the crucial ability to deliberate, allowing them to evaluate multiple strategies, backtrack from errors, and improve their own plans before execution.\ Tree-of-Thought和Self-Correction赋予代理关键的审议能力，使其能够评估多种策略，从错误中回溯，并在执行前改进自身计划。
Collaborative frameworks like Chain of Debates (CoD) signal the shift from solitary agents to multi-agent systems, where teams of agents can reason together to tackle more complex problems and reduce individual biases.\ Chain of Debates (CoD)等协作框架标志着从单一代理向多代理系统的转变，代理团队可以共同推理来解决更复杂的问题并减少个体偏见。
Applications like Deep Research demonstrate how these techniques culminate in agents that can execute complex, long-running tasks, such as in-depth investigation, completely autonomously on a user's behalf.\ 深度研究等应用展示了这些技术如何最终形成能够完全自主地代表用户执行复杂、长期任务（如深入调查）的代理。
To build effective teams of agents, frameworks like MASS automate the optimization of how individual agents are instructed and how they interact, ensuring the entire multi-agent system performs optimally.
为了构建有效的代理团队，MASS等框架自动化了如何指导单个代理以及它们如何交互的优化过程，确保整个多代理系统实现最佳性能。
By integrating these reasoning techniques, we build agents that are not just automated but truly autonomous, capable of being trusted to plan, act, and solve complex problems without direct supervision.
通过整合这些推理技术，我们构建的代理不仅仅是自动化的，而是真正自主的，能够被信任来规划、行动和解决复杂问题，无需直接监督。

Conclusions

结论

Modern AI is evolving from passive tools into autonomous agents, capable of tackling complex goals through structured reasoning. This agentic behavior begins with an internal monologue, powered by techniques like Chain-of-Thought (CoT), which allows an agent to formulate a coherent plan before acting. True autonomy requires deliberation, which agents achieve through Self-Correction and Tree-of-Thought (ToT), enabling them to evaluate multiple strategies and independently improve their own work. The pivotal leap to fully agentic systems comes from the ReAct framework, which empowers an agent to move beyond thinking and start acting by using external tools. This establishes the core agentic loop of thought, action, and observation, allowing the agent to dynamically adapt its strategy based on environmental feedback.

现代AI正从被动工具发展为能够通过结构化推理处理复杂目标的自主代理。这种代理行为始于内部独白，由Chain-of-Thought (CoT)等技术提供支持，使代理能够在行动前制定连贯的计划。真正的自主性需要审议能力，代理通过Self-Correction和Tree-of-Thought (ToT)实现这一点，使其能够评估多种策略并独立改进自身工作。向完全代理系统的关键飞跃来自ReAct框架，它使代理能够超越思考，开始使用外部工具行动。这建立了思考、行动和观察的核心代理循环，让代理能够根据环境反馈动态调整其策略。

An agent's capacity for deep deliberation is fueled by the Scaling Inference Law, where more computational "thinking time" directly translates into more robust autonomous actions. The next frontier is the multi-agent system, where frameworks like Chain of Debates (CoD) create collaborative agent societies that reason together to achieve a common goal. This is not theoretical; agentic applications like Deep Research already demonstrate how autonomous agents can execute complex, multi-step investigations on a user's behalf. The overarching goal is to engineer reliable and transparent autonomous agents that can be trusted to independently manage and solve intricate problems. Ultimately, by combining explicit reasoning with the power to act, these methodologies are completing the transformation of AI into truly agentic problem-solvers.

代理的深度审议能力由推理扩展定律提供动力，其中更多的计算"思考时间"直接转化为更强大的自主行动。下一个前沿是多代理系统，其中Chain of Debates (CoD)等框架创建了协作代理社会，共同推理以实现共同目标。这不是理论性的；深度研究等代理应用已经展示了自主代理如何能够代表用户执行复杂的多步骤调查。总体目标是设计可靠且透明的自主代理，能够被信任来独立管理和解决复杂问题。最终，通过将显式推理与行动能力相结合，这些方法正在完成AI向真正代理问题解决者的转变。

References

参考文献

Relevant research includes: