AI智能体规划能力:从理论到实践的深度思考
作为一名技术开发者,最近研读《Chapter 6: Planning》让我对AI智能体的规划能力有了全新认识。今天就来和大家分享我的学习心得和实践思考。
📌 1. 规划模式:AI智能体的"大脑"与核心驱动力
阅读《Chapter 6: Planning》后,我深刻认识到规划能力是AI智能体从简单工具迈向智能合作伙伴的关键飞跃。
文中的规划模式让我联想到实际开发中的体验:当一个系统具备将复杂目标分解为相互依赖的操作序列的能力时,它才能真正体现智能行为。
这种规划能力与现代软件开发中的系统架构设计有异曲同工之妙。正如我们在设计复杂系统时会先制定架构蓝图,AI智能体也需要通过规划来构建行动的逻辑框架。
不同的是,AI智能体的规划是动态的,能够根据环境变化实时调整,这让我对"自适应系统"有了新的理解。
📌 2. 灵活性与确定性的平衡艺术
文中提到的"灵活性与可预测性之间的权衡"点明了规划模式在实际应用中的核心挑战。
这让我反思了过去参与的项目:在某些场景下,我们过度追求灵活性,反而导致系统行为不可预测;而在另一些场景中,过于僵化的流程又限制了系统的适应性。
滚动计划法中的"近细远粗"原则给了我很大启发。我们可以借鉴这一方法,在AI智能体规划中:
- ✅ 近期任务:制定详细计划
- ✅ 远期任务:保持粗粒度规划
- ✅ 执行推进:不断细化调整
这种动态调整的策略既能保持方向性,又能容纳不确定性。
📌 3. 从理论到实践:规划模式的技术实现
通过文中的Crew AI示例代码,我看到了规划模式的具体实现路径。任务分解和序列执行是核心环节,这要求我们在设计AI系统时,需要充分考虑状态管理和依赖关系。
在实际开发中,我们可以借鉴文中的思路,将复杂业务流程如"新员工入职"分解为离散的、可执行的子任务。
这种分解的好处:
- 🚀 系统更可控
- 🚀 提高可维护性
- 🚀 增强可调试性
例如,我们可以创建有向任务图来管理任务间的依赖关系,确保计划逻辑的严密性。
📌 4. 深度研究系统:规划模式的高级形态
Google DeepResearch和OpenAI Deep Research API展示了规划模式在复杂信息处理领域的强大能力。
这些系统不仅制定研究计划,还能通过迭代循环动态调整搜索策略,这与人类研究员的思考模式高度相似。
从技术架构角度看,DeepResearch系统的模块化设计值得借鉴:
搜索模块 → 解析模块 → 推理模块 → 合成模块
它将搜索、解析、推理、合成等能力封装为独立模块,通过Agentic工作流协调运作。这种设计提高了系统的可扩展性和可维护性,为构建复杂AI应用提供了参考框架。
📌 5. 规划能力与个人效率的共鸣
作为技术人员,文中的规划概念让我联想到个人工作效率的提升。
调查显示:
- 📊 74.7% 的受访者做事前会制定计划
- 📊 85.9% 的人认为制定计划能提高效率
这与AI智能体的规划价值惊人地一致——无论是人类还是AI系统,有序的计划都是高效达成目标的基础。
文中提到的"将大计划拆分成小目标"的方法,不仅是AI规划的核心策略,也是我们管理复杂项目的有效手段。
我的实践经验:将庞大技术项目分解为可衡量的里程碑,不仅能提高执行效率,还能增强团队的动力和成就感。
📌 6. AI规划的未来展望与技术挑战
从文中的"量子城市"案例中,我看到了AI规划在城市治理和复杂系统管理方面的潜力。
上海通过AI驱动的"量子城市"项目,构建了数字孪生系统,让城市规划更加科学精准。这种大规模、实时性的规划应用,对传统软件开发范式提出了新的挑战。
未来技术挑战:
- ⚠️ 算法透明度
- ⚠️ 决策可解释性
- ⚠️ 系统可靠性
正如Geoffrey Hinton警告的,我们需要确保AI系统始终服务于人类利益,这在规划领域尤为重要。
💡 写在最后
通过深入学习《Chapter 6: Planning》,我认识到规划能力是连接AI潜力与实际应用的关键桥梁。
作为技术人员,我们应当将文中的理念融入系统设计和开发实践中,构建既智能又可靠的AI系统。
规划不仅仅是技术的实现,更是一种思维方式的转变。它要求我们从被动响应转向主动设计,从孤立功能转向系统思考。
我相信,掌握规划模式的精髓,将帮助我们在AI时代构建出真正具有价值和影响力的技术解决方案。
Chapter 6: Planning
第六章:规划模式
Intelligent behavior often involves more than just reacting to the immediate input. It requires foresight, breaking down complex tasks into smaller, manageable steps, and strategizing how to achieve a desired outcome. This is where the Planning pattern comes into play. At its core, planning is the ability for an agent or a system of agents to formulate a sequence of actions to move from an initial state towards a goal state.
智能行为通常不仅仅是对即时输入做出反应。它需要远见、将复杂任务分解为更小、可管理的步骤,并制定实现预期结果的策略。这就是规划模式发挥作用的地方。从本质上讲,规划是智能体或智能体系统制定一系列行动以从初始状态向目标状态移动的能力。
Planning Pattern Overview
规划模式概述
In the context of AI, it's helpful to think of a planning agent as a specialist to whom you delegate a complex goal. When you ask it to "organize a team offsite," you are defining the what—the objective and its constraints—but not the how. The agent's core task is to autonomously chart a course to that goal. It must first understand the initial state (e.g., budget, number of participants, desired dates) and the goal state (a successfully booked offsite), and then discover the optimal sequence of actions to connect them. The plan is not known in advance; it is created in response to the request.
在人工智能的背景下,将规划智能体视为您委托复杂目标的专家是有帮助的。当您要求它"组织团队外出活动"时,您定义的是"什么"——目标及其约束——而不是"如何"。智能体的核心任务是自主制定实现该目标的路线。它必须首先理解初始状态(例如预算、参与者数量、期望日期)和目标状态(成功预订的外出活动),然后发现连接它们的最佳行动序列。计划不是预先知道的;它是根据请求创建的。
A hallmark of this process is adaptability. An initial plan is merely a starting point, not a rigid script. The agent's real power is its ability to incorporate new information and steer the project around obstacles. For instance, if the preferred venue becomes unavailable or a chosen caterer is fully booked, a capable agent doesn't simply fail. It adapts. It registers the new constraint, re-evaluates its options, and formulates a new plan, perhaps by suggesting alternative venues or dates.
这个过程的一个标志是适应性。初始计划仅仅是一个起点,而不是严格的脚本。智能体的真正力量在于它能够整合新信息并引导项目绕过障碍。例如,如果首选场地不可用或选定的餐饮服务商已满,有能力的智能体不会简单地失败。它会适应。它会记录新的约束条件,重新评估其选项,并制定新的计划,可能通过建议替代场地或日期。
However, it is crucial to recognize the trade-off between flexibility and predictability. Dynamic planning is a specific tool, not a universal solution. When a problem's solution is already well-understood and repeatable, constraining the agent to a predetermined, fixed workflow is more effective. This approach limits the agent's autonomy to reduce uncertainty and the risk of unpredictable behavior, guaranteeing a reliable and consistent outcome. Therefore, the decision to use a planning agent versus a simple task-execution agent hinges on a single question: does the "how" need to be discovered, or is it already known?
然而,认识到灵活性和可预测性之间的权衡至关重要。动态规划是一种特定工具,而不是通用解决方案。当问题的解决方案已经被充分理解且可重复时,将智能体限制在预定的固定工作流程中更为有效。这种方法限制了智能体的自主性,以减少不确定性和不可预测行为的风险,保证可靠和一致的结果。因此,使用规划智能体与简单任务执行智能体的决定取决于一个单一问题:"如何"需要被发现,还是已经知道?
Practical Applications & Use Cases
实际应用与用例
The Planning pattern is a core computational process in autonomous systems, enabling an agent to synthesize a sequence of actions to achieve a specified goal, particularly within dynamic or complex environments. This process transforms a high-level objective into a structured plan composed of discrete, executable steps.
规划模式是自主系统中的核心计算过程,使智能体能够合成一系列行动以实现特定目标,特别是在动态或复杂环境中。这个过程将高级目标转化为由离散、可执行步骤组成的结构化计划。
In domains such as procedural task automation, planning is used to orchestrate complex workflows. For example, a business process like onboarding a new employee can be decomposed into a directed sequence of sub-tasks, such as creating system accounts, assigning training modules, and coordinating with different departments. The agent generates a plan to execute these steps in a logical order, invoking necessary tools or interacting with various systems to manage dependencies.
在程序化任务自动化等领域,规划用于编排复杂的工作流程。例如,像新员工入职这样的业务流程可以分解为有向的子任务序列,例如创建系统账户、分配培训模块以及与不同部门协调。智能体生成一个计划,以逻辑顺序执行这些步骤,调用必要的工具或与各种系统交互以管理依赖关系。
Within robotics and autonomous navigation, planning is fundamental for state-space traversal. A system, whether a physical robot or a virtual entity, must generate a path or sequence of actions to transition from an initial state to a goal state. This involves optimizing for metrics such as time or energy consumption while adhering to environmental constraints, like avoiding obstacles or following traffic regulations.
在机器人技术和自主导航中,规划对于状态空间遍历至关重要。一个系统,无论是物理机器人还是虚拟实体,必须生成一条路径或行动序列以从初始状态过渡到目标状态。这涉及优化指标,如时间或能耗,同时遵守环境约束,如避开障碍物或遵守交通规则。
This pattern is also critical for structured information synthesis. When tasked with generating a complex output like a research report, an agent can formulate a plan that includes distinct phases for information gathering, data summarization, content structuring, and iterative refinement. Similarly, in customer support scenarios involving multi-step problem resolution, an agent can create and follow a systematic plan for diagnosis, solution implementation, and escalation.
这种模式对于结构化信息合成也至关重要。当被要求生成像研究报告这样的复杂输出时,智能体可以制定一个计划,包括信息收集、数据汇总、内容结构和迭代改进等不同阶段。类似地,在涉及多步骤问题解决的客户支持场景中,智能体可以创建并遵循一个系统的诊断、解决方案实施和升级计划。
In essence, the Planning pattern allows an agent to move beyond simple, reactive actions to goal-oriented behavior. It provides the logical framework necessary to solve problems that require a coherent sequence of interdependent operations.
本质上,规划模式使智能体能够超越简单的反应性行动,实现目标导向的行为。它为解决需要连贯的相互依赖操作序列的问题提供了必要的逻辑框架。
Hands-on code (Crew AI)
实践代码 (Crew AI)
The following section will demonstrate an implementation of the Planner pattern using the Crew AI framework. This pattern involves an agent that first formulates a multi-step plan to address a complex query and then executes that plan sequentially.
以下部分将演示使用Crew AI框架实现规划器模式。这种模式涉及一个智能体,它首先制定一个多步骤计划来处理复杂查询,然后按顺序执行该计划。
import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
from langchain_openai import ChatOpenAI
# Load environment variables from .env file for security
load_dotenv()
# 1. Explicitly define the language model for clarity
llm = ChatOpenAI(model="gpt-4-turbo")
# 2. Define a clear and focused agent
planner_writer_agent = Agent(
role='Article Planner and Writer',
goal='Plan and then write a concise, engaging summary on a specified topic.',
backstory=(
'You are an expert technical writer and content strategist. '
'Your strength lies in creating a clear, actionable plan before writing, '
'ensuring the final summary is both informative and easy to digest.'
),
verbose=True,
allow_delegation=False,
llm=llm # Assign the specific LLM to the agent
)
# 3. Define a task with a more structured and specific expected output
topic = "The importance of Reinforcement Learning in AI"
high_level_task = Task(
description=(
f"1. Create a bullet-point plan for a summary on the topic: '{topic}'.\n"
f"2. Write the summary based on your plan, keeping it around 200 words."
),
expected_output=(
"A final report containing two distinct sections:\n\n"
"### Plan\n"
"- A bulleted list outlining the main points of the summary.\n\n"
"### Summary\n"
"- A concise and well-structured summary of the topic."
),
agent=planner_writer_agent,
)
# Create the crew with a clear process
crew = Crew(
agents=[planner_writer_agent],
tasks=[high_level_task],
process=Process.sequential,
)
# Execute the task
print("## Running the planning and writing task ##")
result = crew.kickoff()
print("\n\n---\n## Task Result ##\n---")
print(result)
This code uses the CrewAI library to create an AI agent that plans and writes a summary on a given topic. It starts by importing necessary libraries, including Crew.ai and langchain_openai, and loading environment variables from a .env file. A ChatOpenAI language model is explicitly defined for use with the agent. An Agent named planner_writer_agent is created with a specific role and goal: to plan and then write a concise summary. The agent's backstory emphasizes its expertise in planning and technical writing. A Task is defined with a clear description to first create a plan and then write a summary on the topic "The importance of Reinforcement Learning in AI", with a specific format for the expected output. A Crew is assembled with the agent and task, set to process them sequentially. Finally, the crew.kickoff() method is called to execute the defined task and the result is printed.
这段代码使用CrewAI库创建一个AI智能体,用于规划并撰写给定主题的摘要。它首先导入必要的库,包括Crew.ai和langchain_openai,并从.env文件加载环境变量。明确定义了一个ChatOpenAI语言模型供智能体使用。创建了一个名为planner_writer_agent的智能体,具有特定的角色和目标:先规划然后撰写简洁摘要。智能体的背景故事强调了其在规划和技术写作方面的专业知识。定义了一个任务,具有清晰的描述,要求首先创建一个计划,然后撰写关于"强化学习在AI中的重要性"主题的摘要,并对预期输出有特定格式要求。将智能体和任务组装成一个Crew,设置为按顺序处理它们。最后,调用crew.kickoff()方法来执行定义的任务并打印结果。
Google DeepResearch
Google DeepResearch
Google Gemini DeepResearch (see Fig.1) is an agent-based system designed for autonomous information retrieval and synthesis. It functions through a multi-step agentic pipeline that dynamically and iteratively queries Google Search to systematically explore complex topics. The system is engineered to process a large corpus of web-based sources, evaluate the collected data for relevance and knowledge gaps, and perform subsequent searches to address them. The final output consolidates the vetted information into a structured, multi-page summary with citations to the original sources.
Google Gemini DeepResearch(见图1)是一个基于智能体的系统,专为自主信息检索和合成而设计。它通过多步骤的智能体管道运行,动态且迭代地查询Google搜索以系统性地探索复杂主题。该系统经过工程设计,能够处理大量基于网络的来源,评估收集数据的相关性和知识差距,并执行后续搜索来解决这些问题。最终输出将经过验证的信息整合成一个结构化的多页摘要,并引用原始来源。
Expanding on this, the system's operation is not a single query-response event but a managed, long-running process. It begins by deconstructing a user's prompt into a multi-point research plan (see Fig. 1), which is then presented to the user for review and modification. This allows for a collaborative shaping of the research trajectory before execution. Once the plan is approved, the agentic pipeline initiates its iterative search-and-analysis loop. This involves more than just executing a series of predefined searches; the agent dynamically formulates and refines its queries based on the information it gathers, actively identifying knowledge gaps, corroborating data points, and resolving discrepancies.
进一步扩展,该系统的操作不是单一的查询-响应事件,而是一个受管理的长期运行过程。它首先将用户的提示解构为一个多点研究计划(见图1),然后呈现给用户进行审查和修改。这允许在执行前协作塑造研究轨迹。一旦计划获得批准,智能体管道启动其迭代的搜索-分析循环。这不仅仅是执行一系列预定义的搜索;智能体根据收集到的信息动态制定和优化其查询,主动识别知识差距、验证数据点并解决差异。
Fig. 1: Google Deep Research agent generating an execution plan for using Google Search as a tool.
图1:Google Deep Research智能体生成使用Google搜索作为工具的执行计划。
A key architectural component is the system's ability to manage this process asynchronously. This design ensures that the investigation, which can involve analyzing hundreds of sources, is resilient to single-point failures and allows the user to disengage and be notified upon completion. The system can also integrate user-provided documents, combining information from private sources with its web-based research. The final output is not merely a concatenated list of findings but a structured, multi-page report. During the synthesis phase, the model performs a critical evaluation of the collected information, identifying major themes and organizing the content into a coherent narrative with logical sections. The report is designed to be interactive, often including features like an audio overview, charts, and links to the original cited sources, allowing for verification and further exploration by the user. In addition to the synthesized results, the model explicitly returns the full list of sources it searched and consulted (see Fig.2). These are presented as citations, providing complete transparency and direct access to the primary information. This entire process transforms a simple query into a comprehensive, synthesized body of knowledge.
一个关键的架构组件是系统异步管理此过程的能力。这种设计确保调查(可能涉及分析数百个来源)对单点故障具有弹性,并允许用户断开连接并在完成时收到通知。系统还可以集成用户提供的文档,将来自私有来源的信息与其基于网络的研究相结合。最终输出不仅仅是调查结果的串联列表,而是一个结构化的多页报告。在合成阶段,模型对收集的信息进行批判性评估,识别主要主题,并将内容组织成具有逻辑部分的连贯叙述。报告设计为交互式,通常包括音频概述、图表和原始引用来源的链接等功能,允许用户进行验证和进一步探索。除了合成结果外,模型还明确返回其搜索和咨询的完整来源列表(见图2)。这些以引用的形式呈现,提供完全透明度和对主要信息的直接访问。整个过程将简单查询转化为全面的、合成的知识体系。
Fig. 2: An example of Deep Research plan being executed, resulting in Google Search being used as a tool to search various web sources.
图2:Deep Research计划执行示例,导致Google搜索被用作搜索各种网络来源的工具。
By mitigating the substantial time and resource investment required for manual data acquisition and synthesis, Gemini DeepResearch provides a more structured and exhaustive method for information discovery. The system's value is particularly evident in complex, multi-faceted research tasks across various domains.
通过减轻手动数据采集和合成所需的大量时间和资源投入,Gemini DeepResearch提供了一种更结构化和详尽的信息发现方法。该系统在各种领域的复杂、多方面的研究任务中的价值尤为明显。
For instance, in competitive analysis, the agent can be directed to systematically gather and collate data on market trends, competitor product specifications, public sentiment from diverse online sources, and marketing strategies. This automated process replaces the laborious task of manually tracking multiple competitors, allowing analysts to focus on higher-order strategic interpretation rather than data collection (see Fig. 3).
例如,在竞争分析中,可以指导智能体系统地收集和整理市场趋势、竞争对手产品规格、来自不同在线来源的公众情绪和营销策略的数据。这个自动化过程取代了手动跟踪多个竞争对手的繁琐任务,使分析师能够专注于更高层次的战略解释而不是数据收集(见图3)。
Fig. 3: Final output generated by the Google Deep Research agent, analyzing on our behalf sources obtained using Google Search as a tool.
图3:Google Deep Research智能体生成的最终输出,代表我们分析使用Google搜索作为工具获取的来源。
Similarly, in academic exploration, the system serves as a powerful tool for conducting extensive literature reviews. It can identify and summarize foundational papers, trace the development of concepts across numerous publications, and map out emerging research fronts within a specific field, thereby accelerating the initial and most time-consuming phase of academic inquiry.
类似地,在学术探索中,该系统是进行广泛文献综述的强大工具。它可以识别和总结基础论文,追踪概念在众多出版物中的发展,并绘制特定领域内新兴研究前沿的地图,从而加速学术探究中最耗时的初始阶段。
The efficiency of this approach stems from the automation of the iterative search-and-filter cycle, which is a core bottleneck in manual research. Comprehensiveness is achieved by the system's capacity to process a larger volume and variety of information sources than is typically feasible for a human researcher within a comparable timeframe. This broader scope of analysis helps to reduce the potential for selection bias and increases the likelihood of uncovering less obvious but potentially critical information, leading to a more robust and well-supported understanding of the subject matter.
这种方法的效率源于迭代搜索和过滤循环的自动化,这是手动研究的核心瓶颈。全面性是通过系统处理比人类研究人员在可比时间框架内通常可行的更大数量和更多样化的信息源的能力实现的。这种更广泛的分析范围有助于减少选择偏差的可能性,并增加发现不太明显但可能关键的信息的可能性,从而对主题有更稳健和充分支持的理解。
OpenAI Deep Research API
OpenAI Deep Research API
The OpenAI Deep Research API is a specialized tool designed to automate complex research tasks. It utilizes an advanced, agentic model that can independently reason, plan, and synthesize information from real-world sources. Unlike a simple Q&A model, it takes a high-level query and autonomously breaks it down into sub-questions, performs web searches using its built-in tools, and delivers a structured, citation-rich final report. The API provides direct programmatic access to this entire process, using at the time of writing models like o3-deep-research-2025-06-26 for high-quality synthesis and the faster o4-mini-deep-research-2025-06-26 for latency-sensitive application
OpenAI Deep Research API是一个专门设计用于自动化复杂研究任务的工具。它利用先进的智能体模型,可以独立推理、规划并从现实世界来源合成信息。与简单的问答模型不同,它接受高级查询并自主将其分解为子问题,使用其内置工具执行网络搜索,并提供结构化的、引用丰富的最终报告。该API提供对整个过程的直接程序化访问,在撰写本文时使用像o3-deep-research-2025-06-26这样的模型进行高质量合成,以及更快的o4-mini-deep-research-2025-06-26用于延迟敏感的应用。
The Deep Research API is useful because it automates what would otherwise be hours of manual research, delivering professional-grade, data-driven reports suitable for informing business strategy, investment decisions, or policy recommendations. Its key benefits include:
Deep Research API很有用,因为它自动化了原本需要数小时手动研究的工作,提供专业级、数据驱动的报告,适用于为商业战略、投资决策或政策建议提供信息。其主要优势包括:
- Structured, Cited Output: It produces well-organized reports with inline citations linked to source metadata, ensuring claims are verifiable and data-backed.
- 结构化、引用的输出: 它生成组织良好的报告,带有链接到源元数据的行内引用,确保声明可验证且基于数据。
- Transparency: Unlike the abstracted process in ChatGPT, the API exposes all intermediate steps, including the agent's reasoning, the specific web search queries it executed, and any code it ran. This allows for detailed debugging, analysis, and a deeper understanding of how the final answer was constructed.
- 透明度: 与ChatGPT中的抽象过程不同,该API暴露所有中间步骤,包括智能体的推理、它执行的特定网络搜索查询以及它运行的任何代码。这允许详细的调试、分析,并更深入地理解最终答案是如何构建的。
- Extensibility: It supports the Model Context Protocol (MCP), enabling developers to connect the agent to private knowledge bases and internal data sources, blending public web research with proprietary information.
- 可扩展性: 它支持模型上下文协议(MCP),使开发人员能够将智能体连接到私有知识库和内部数据源,将公共网络研究与专有信息混合。
To use the API, you send a request to the client.responses.create endpoint, specifying a model, an input prompt, and the tools the agent can use. The input typically includes a system_message that defines the agent's persona and desired output format, along with the user_query. You must also include the web_search_preview tool and can optionally add others like code_interpreter or custom MCP tools (see Chapter 10) for internal data.
要使用该API,您向client.responses.create端点发送请求,指定模型、输入提示和智能体可以使用的工具。输入通常包括定义智能体角色和期望输出格式的system_message,以及user_query。您还必须包括web_search_preview工具,并可以选择添加其他工具,如code_interpreter或自定义MCP工具(见第10章)用于内部数据。
from openai import OpenAI
# Initialize the client with your API key
client = OpenAI(api_key="YOUR_OPENAI_API_KEY")
# Define the agent's role and the user's research question
system_message = """You are a professional researcher preparing a structured, data-driven report. Focus on data-rich insights, use reliable sources, and include inline citations."""
user_query = "Research the economic impact of semaglutide on global healthcare systems."
# Create the Deep Research API call
response = client.responses.create(
model="o3-deep-research-2025-06-26",
input=[
{
"role": "developer",
"content": [{"type": "input_text", "text": system_message}]
},
{
"role": "user",
"content": [{"type": "input_text", "text": user_query}]
}
],
reasoning={"summary": "auto"},
tools=[{"type": "web_search_preview"}]
)
# Access and print the final report from the response
final_report = response.output[-1].content[0].text
print(final_report)
# --- ACCESS INLINE CITATIONS AND METADATA ---
print("--- CITATIONS ---")
annotations = response.output[-1].content[0].annotations
if not annotations:
print("No annotations found in the report.")
else:
for i, citation in enumerate(annotations):
# The text span the citation refers to
cited_text = final_report[citation.start_index:citation.end_index]
print(f"Citation {i+1}:")
print(f" Cited Text: {cited_text}")
print(f" Title: {citation.title}")
print(f" URL: {citation.url}")
print(f" Location: chars {citation.start_index}–{citation.end_index}")
print("\n" + "="*50 + "\n")
# --- INSPECT INTERMEDIATE STEPS ---
print("--- INTERMEDIATE STEPS ---")
# 1. Reasoning Steps: Internal plans and summaries generated by the model.
try:
reasoning_step = next(item for item in response.output if item.type == "reasoning")
print("\n[Found a Reasoning Step]")
for summary_part in reasoning_step.summary:
print(f" - {summary_part.text}")
except StopIteration:
print("\nNo reasoning steps found.")
# 2. Web Search Calls: The exact search queries the agent executed.
try:
search_step = next(item for item in response.output if item.type == "web_search_call")
print("\n[Found a Web Search Call]")
print(f" Query Executed: '{search_step.action['query']}'")
print(f" Status: {search_step.status}")
except StopIteration:
print("\nNo web search steps found.")
# 3. Code Execution: Any code run by the agent using the code interpreter.
try:
code_step = next(item for item in response.output if item.type == "code_interpreter_call")
print("\n[Found a Code Execution Step]")
print(" Code Input:")
print(f" ```python\n{code_step.input}\n ```")
print(" Code Output:")
print(f" {code_step.output}")
except StopIteration:
print("\nNo code execution steps found.")
This code snippet utilizes the OpenAI API to perform a "Deep Research" task. It starts by initializing the OpenAI client with your API key, which is crucial for authentication. Then, it defines the role of the AI agent as a professional researcher and sets the user's research question about the economic impact of semaglutide. The code constructs an API call to the o3-deep-research-2025-06-26 model, providing the defined system message and user query as input. It also requests an automatic summary of the reasoning and enables web search capabilities. After making the API call, it extracts and prints the final generated report.
这段代码片段利用OpenAI API执行"Deep Research"任务。它首先使用您的API密钥初始化OpenAI客户端,这对于身份验证至关重要。然后,它将AI智能体的角色定义为专业研究人员,并设置用户关于semaglutide经济影响的研究问题。代码构建一个对o3-deep-research-2025-06-26模型的API调用,提供定义的system message和user query作为输入。它还请求自动总结推理过程并启用网络搜索功能。在进行API调用后,它提取并打印最终生成的报告。
Subsequently, it attempts to access and display inline citations and metadata from the report's annotations, including the cited text, title, URL, and location within the report. Finally, it inspects and prints details about the intermediate steps the model took, such as reasoning steps, web search calls (including the query executed), and any code execution steps if a code interpreter was used.
随后,它尝试访问和显示报告注释中的行内引用和元数据,包括引用的文本、标题、URL和报告中的位置。最后,它检查并打印模型采取的中间步骤的详细信息,例如推理步骤、网络搜索调用(包括执行的查询),以及如果使用了代码解释器,则包括任何代码执行步骤。
At a Glance
概览
What: Complex problems often cannot be solved with a single action and require foresight to achieve a desired outcome. Without a structured approach, an agentic system struggles to handle multifaceted requests that involve multiple steps and dependencies. This makes it difficult to break down high-level objectives into a manageable series of smaller, executable tasks. Consequently, the system fails to strategize effectively, leading to incomplete or incorrect results when faced with intricate goals.
问题: 复杂问题通常无法通过单一行动解决,需要远见才能实现预期结果。没有结构化方法,智能体系统难以处理涉及多个步骤和依赖关系的多方面请求。这使得将高级目标分解为可管理的一系列较小、可执行的任务变得困难。因此,系统无法有效制定策略,在面对复杂目标时导致不完整或不正确的结果。
Why: The Planning pattern offers a standardized solution by having an agentic system first create a coherent plan to address a goal. It involves decomposing a high-level objective into a sequence of smaller, actionable steps or sub-goals. This allows the system to manage complex workflows, orchestrate various tools, and handle dependencies in a logical order. LLMs are particularly well-suited for this, as they can generate plausible and effective plans based on their vast training data. This structured approach transforms a simple reactive agent into a strategic executor that can proactively work towards a complex objective and even adapt its plan if necessary.
解决方案: 规划模式通过让智能体系统首先创建一个连贯的计划来解决目标,提供了一个标准化解决方案。它涉及将高级目标分解为一系列较小的、可操作的步骤或子目标。这使得系统能够管理复杂的工作流程、编排各种工具并以逻辑顺序处理依赖关系。大语言模型特别适合这一点,因为它们可以根据其庞大的训练数据生成合理且有效的计划。这种结构化方法将简单的反应性智能体转变为战略执行者,可以主动朝着复杂目标努力,并在必要时调整其计划。
Rule of thumb: Use this pattern when a user's request is too complex to be handled by a single action or tool. It is ideal for automating multi-step processes, such as generating a detailed research report, onboarding a new employee, or executing a competitive analysis. Apply the Planning pattern whenever a task requires a sequence of interdependent operations to reach a final, synthesized outcome.
经验法则: 当用户的请求过于复杂,无法通过单一行动或工具处理时,使用此模式。它非常适合自动化多步骤过程,例如生成详细的研究报告、新员工入职或执行竞争分析。每当任务需要一系列相互依赖的操作才能达到最终的合成结果时,应用规划模式。
Visual summary
Fig.4; Planning design pattern
视觉总结
![][image4]
图4:规划设计模式
Key Takeaways
关键要点
- Planning enables agents to break down complex goals into actionable, sequential steps.
- 规划使智能体能够将复杂目标分解为可操作的顺序步骤。
- It is essential for handling multi-step tasks, workflow automation, and navigating complex environments.
- 它对于处理多步骤任务、工作流自动化和导航复杂环境至关重要。
- LLMs can perform planning by generating step-by-step approaches based on task descriptions.
- 大语言模型可以通过基于任务描述生成逐步方法来执行规划。
- Explicitly prompting or designing tasks to require planning steps encourages this behavior in agent frameworks.
- 明确提示或设计需要规划步骤的任务可以鼓励智能体框架中的这种行为。
- Google Deep Research is an agent analyzing on our behalf sources obtained using Google Search as a tool. It reflects, plans, and executes
- Google Deep Research是一个代表我们分析使用Google搜索作为工具获取的来源的智能体。它反思、规划和执行。
Conclusion
结论
In conclusion, the Planning pattern is a foundational component that elevates agentic systems from simple reactive responders to strategic, goal-oriented executors. Modern large language models provide the core capability for this, autonomously decomposing high-level objectives into coherent, actionable steps. This pattern scales from straightforward, sequential task execution, as demonstrated by the CrewAI agent creating and following a writing plan, to more complex and dynamic systems. The Google DeepResearch agent exemplifies this advanced application, creating iterative research plans that adapt and evolve based on continuous information gathering. Ultimately, planning provides the essential bridge between human intent and automated execution for complex problems. By structuring a problem-solving approach, this pattern enables agents to manage intricate workflows and deliver comprehensive, synthesized results.
总之,规划模式是一个基础组件,将智能体系统从简单的反应性响应者提升为战略性的、目标导向的执行者。现代大语言模型为此提供了核心能力,自主地将高级目标分解为连贯的、可操作的步骤。这种模式从简单的顺序任务执行(如CrewAI智能体创建和遵循写作计划所示)扩展到更复杂和动态的系统。Google DeepResearch智能体体现了这种高级应用,创建基于持续信息收集而适应和演化的迭代研究计划。最终,规划为复杂问题提供了人类意图和自动化执行之间的基本桥梁。通过结构化问题解决方法,这种模式使智能体能够管理复杂的工作流程并提供全面的、合成的结果。
References
参考文献
- Google DeepResearch (Gemini Feature): gemini.google.com
- OpenAI ,Introducing deep research openai.com/index/intro…
- Perplexity, Introducing Perplexity Deep Research, [www.perplexity.ai/hub/blog/in…]