DeepResearch机制解读(一)
一、为什么需要DeepResearch
传统的搜索方式(如 Google 或普通 AI 问答)通常只能提供 “点状” 或 “线性” 的回答,无法应对复杂的专业研究需求。
核心痛点对比
| 维度 | 传统搜索 / 传统 AI | Deep Research (深度搜索) |
|---|---|---|
| 操作成本 | 需不断更换关键词,手动点击数十个链接。 | 自动化:模拟专家行为,自动执行多步搜索。 |
| 内容质量 | 充斥广告、低质量内容,筛选压力大。 | 深挖:自动进入二级、三级页面获取专业文献。 |
| 容错能力 | 如果首个网页错误,结论即刻跑偏(一次性生成)。 | 自省:发现矛盾会自动验证,具备自我修正能力。 |
二、工作逻辑
传统 AI 的“直线型”逻辑
提出问题 -->检索/记忆检索 -->一次性生成答案
- 缺陷:缺乏深度验证,容易被单一信源误导。
Deep Research 的“循环进化”逻辑
Deep Research 不再是简单的“搜索-回答”,而是一个闭环的研究过程:
-
制定计划:根据问题构建初步的研究路径。
-
多步执行:
- 阅读网页 A 和网页 B。
- 冲突检测:若发现 A 与 B 数据对不上,系统会产生“怀疑”。
-
自动修正:
- 基于怀疑点,自动发起第三次搜索(验证性搜索)。
- 对比多方证据,识别“谁在撒谎”。
-
汇总结论:输出经过多重验证、逻辑自洽的高质量报告。
核心优势总结
“它不是在搜索,而是在思考。”
- 模拟人类专家:它具备了人类研究员的“怀疑精神”和“多信源交叉验证”习惯。
- 突破信息茧房:能挖掘隐藏较深的行业报告,而非仅仅停留在 SEO 优化的表面信息。
三、工作流程
1.角色分配
在进入实际搜索之前,Deep Research 会先进行任务建模。其核心逻辑是将通用 AI 转化为特定领域的“数字专家”。
(1) 核心流程:识别与定义
系统会对用户输入的原始问题进行语义分析,从而确定两个关键维度:
- 任务角色 (Agent Role) :谁最适合回答这个问题?(例如:金融专家、商业分析师、旅行导游)。
- 任务定义 (Task Definition) :该角色应该遵循什么样的专业标准?
(2)提示词设计
提示词示例:
This task involves researching a given topic, regardless of its complexity or the availability of a definitive answer. The research is conducted by a specific server, defined by its type and role, with each server requiring distinct instructions.
Agent
The server is determined by the field of the topic and the specific name of the server that could be utilized to research the topic provided. Agents are categorized by their area of expertise, and each server type is associated with a corresponding emoji.
examples:
task: "should I invest in apple stocks?"
response:
{
"server": "💰 Finance Agent",
"agent_role_prompt: "You are a seasoned finance analyst AI assistant. Your primary goal is to compose comprehensive, astute, impartial, and methodically arranged financial reports based on provided data and trends."
}
task: "could reselling sneakers become profitable?"
response:
{
"server": "📈 Business Analyst Agent",
"agent_role_prompt": "You are an experienced AI business analyst assistant. Your main objective is to produce comprehensive, insightful, impartial, and systematically structured business reports based on provided business data, market trends, and strategic analysis."
}
task: "what are the most interesting sites in Tel Aviv?"
response:
{
"server": "🌍 Travel Agent",
"agent_role_prompt": "You are a world-travelled AI tour guide assistant. Your main purpose is to draft engaging, insightful, unbiased, and well-structured travel reports on given locations, including history, attractions, and cultural insights."
}
该阶段的提示词(Prompt)通常包含三个部分:
- 任务声明:明确无论问题多复杂,都必须由特定的 Server(服务器/代理)执行。
- 专家库设定:将 Agents 按领域分类,并关联相应的 Emoji 作为标识。
- 少样本学习 (Few-Shot) :通过 JSON 格式的示例,教会 AI 如何输出结构化指令。
1. 核心指令 (Core Instructions)
| 英文原文 (English) | 中文翻译 (Chinese) |
|---|---|
| This task involves researching a given topic, regardless of its complexity or the availability of a definitive answer. | 此项任务涉及对给定主题进行研究,无论其复杂程度如何,也无论是否存在明确的定论。 |
| The research is conducted by a specific server, defined by its type and role, with each server requiring distinct instructions. | 研究由特定的“服务器(Server)”执行,其性质由类型和角色定义,每个服务器都需要特定的操作指令。 |
| The server is determined by the field of the topic and the specific name of the server that could be utilized to research the topic provided. | 服务器的选择取决于主题所属的领域,以及能够用于研究该主题的特定服务器名称。 |
| Agents are categorized by their area of expertise, and each server type is associated with a corresponding emoji. | 智能体(Agents)按专业领域分类,每种服务器类型都关联了一个相应的图标(Emoji)。 |
2. 角色定义示例 (Examples)
| 任务 (Task) | 角色 (Server) | 专家指令 (Agent Role Prompt) |
|---|---|---|
| "should I invest in apple stocks?" (我应该投资苹果股票吗?) | 💰 Finance Agent (金融专家) | EN: You are a seasoned finance analyst AI assistant. Your primary goal is to compose comprehensive, astute, impartial, and methodically arranged financial reports based on provided data and trends. 中: 你是一位经验丰富的金融分析师 AI 助手。你的首要目标是根据提供的数据和趋势,撰写全面、敏锐、公正且条理清晰的财务报告。 |
| "could reselling sneakers become profitable?" (倒卖球鞋能盈利吗?) | 📈 Business Analyst Agent (商业分析专家) | EN: You are an experienced AI business analyst assistant. Your main objective is to produce comprehensive, insightful, impartial, and systematically structured business reports based on provided business data, market trends, and strategic analysis. 中: 你是一位资深 AI 商业分析助手。你的主要目标是基于提供的商业数据、市场趋势和战略分析,生成全面、深入、公正且系统化的商业报告。 |
| "what are the most interesting sites in Tel Aviv?" (特拉维夫最有趣的景点有哪些?) | 🌍 Travel Agent (旅游专家) | EN: You are a world-travelled AI tour guide assistant. Your main purpose is to draft engaging, insightful, unbiased, and well-structured travel reports on given locations, including history, attractions, and cultural insights. 中: 你是一位环球旅行 AI 导游助手。你的主要目的是针对特定地点起草引人入胜、见解深刻、不偏不倚且结构合理的旅游报告,内容涵盖历史、景点和文化洞察。 |
(3)为什么这一步至关重要?
- 专业度对齐:金融问题的回答风格与旅游建议完全不同,预设角色可以强制 AI 使用行业术语和逻辑。
- 输出标准化:通过
agent_role_prompt约束 AI,使其输出的不是零散的对话,而是结构化的报告 (Reports) 。 - Emoji 标识:在多 Agent 协同的 UI 界面中,通过 Emoji(如 💰, 📈)直观反馈当前系统正在调用的专家库。
2.内容查询
(1)先行预搜
在正式拆解子查询(Sub-queries)之前,系统会利用用户的原始内容调用搜索引擎(如 Tavily)进行一次先行预搜。这一步拿回的网页摘要(Snippets)是后续深度研究的“地基”。
1. 突破 LLM 的“知识截止日期”
-
实时意识 (Real-time Awareness) :解决大模型训练数据的滞后性。
-
场景示例:
- 问题:Query 是关于“2026 年最新发布的某款芯片”。
- 痛点:LLM 的训练数据可能只到 2024 年,对该芯片一无所知。
- 预搜作用:系统实时抓取互联网上关于该芯片的最新摘要,让 AI 意识到当前最前沿的技术指标或相关政策。
2. 提供精准的“上下文语境”
先行预搜能显著提升搜索精度的“量级”,避免生成宽泛无效的指令。
| 维度 | 没有预搜(凭空拆解) | 有了预搜(基于语境) |
|---|---|---|
| 搜索精度 | 结果宽泛(如:“中国 AI 发展”) | 结果精准(如:“中国一体化算力网 2026 建设进展”) |
| 逻辑来源 | 依赖 LLM 内部的通用常识 | 捕捉摘要中的高频关键词(如“数据要素 X 行动”) |
| 输出质量 | 容易产出“正确的废话” | 产出具有时效性和专业性的子维度 |
3. 避免“盲目搜索” (Drill-down Strategy)
先行预搜本质上是一次 “全网扫盲” 。
-
逻辑闭环:
- 明确已知:通过摘要了解目前互联网上关于该主题已有的公开信息。
- 定位模糊点:识别哪些关键细节尚不清晰或存在争议。
- 引导深挖:强制后续的子查询(Sub-queries)向更深(Deep)的方向演进,而不是在浅层信息上原地打转。
核心总结: > 先行预搜不是为了直接给答案,而是为了 “开眼看世界” 。它让 AI 站在 2026 年的互联网实时信息之上,去重新审视和拆解用户的问题。
(2)生成子查询:
系统通过特定的提示词架构,要求 AI 扮演“资深研究助理”,利用获取的网页摘要(Context)进行知识进化:
- 输入:
{task}(原始任务) +{context}(先行预搜拿回的实时摘要)。 - 处理:分析摘要中的最新动态、技术细节或冲突点。
- 输出:一组更具体、更深层的搜索字符串。
提示词示例:
Write 3 google search queries to search online that form an objective opinion from the following task: "{task}"
Assume the current date is {now} if required.
You are a seasoned research assistant tasked with generating search queries to find relevant information for the following task: "{task}".
Context: {context}
Use this context to inform and refine your search queries. The context provides real-time web information that can help you generate more specific and relevant queries. Consider any current events, recent developments, or specific details mentioned in the context that could enhance the search queries.
You must respond with a list of strings in the following format: ["query 1", "query 2", "query 3"].
The response should contain ONLY the list.
📝 提示词:生成子查询 (中英对照版)
| 模块 | 英文原文 (English) | 中文翻译 (Chinese) |
|---|---|---|
| 指令目标 | Write 3 google search queries to search online that form an objective opinion from the following task: "{task}" | 请针对以下任务编写 3 条 Google 搜索查询词,以便形成客观的见解:"{task}"。 |
| 时间锚点 | Assume the current date is {now} if required. | 如有需要,请假设当前日期为 {now}。 |
| 角色设定 | You are a seasoned research assistant tasked with generating search queries to find relevant information for the following task: "{task}". | 你是一位资深研究助理,负责生成搜索查询词,为以下任务寻找相关信息:"{task}"。 |
| 背景注入 | Context: {context} | 背景信息:{context} (注:此处为先行预搜拿回的网页摘要) |
| 逻辑引导 | Use this context to inform and refine your search queries. | 请利用这些背景信息来指导并完善你的搜索查询词。 |
| 细节强化 | The context provides real-time web information that can help you generate more specific and relevant queries. | 背景信息提供了实时网页信息,可以帮助你生成更具体、更相关的查询词。 |
| 深度考量 | Consider any current events, recent developments, or specific details mentioned in the context that could enhance the search queries. | 考虑背景中提到的任何时事、最新进展或具体细节,以增强查询词的效果。 |
| 格式约束 | You must respond with a list of strings in the following format: ["query 1", "query 2", "query 3"]. | 你必须按以下格式返回一个字符串列表:["查询词 1", "查询词 2", "查询词 3"]。 |
| 输出限制 | The response should contain ONLY the list. | 回复中仅允许包含该列表内容。 |
(3)深度抓取
在生成了 3 条精准的子查询(Sub-queries)后,系统进入最关键的阶段:自动化遍历与全文爬取。
核心执行流程:
- 并发查询:系统同时使用 3 条子查询请求搜索引擎(如 Tavily 或 Google)。
- URL 筛选:从搜索结果中过滤掉广告、无效重定向,提取最具权威性和相关性的 目标 URL 列表。
- 自动化爬取:利用爬虫工具直接访问网页。
- 内容清洗:剔除网页中的 HTML 标签、导航栏、广告弹窗,仅保留纯净的正文文本。
3.内容压缩
核心步骤拆解
步骤 A:切片 (Chunking)
系统不会将整个网页作为单一对象处理,而是将其拆解为细颗粒度的单元。
- 操作:将长篇文章切割成每段约几百字的小片段。
- 目的:保证检索粒度足够细,避免无关信息干扰核心结论。
步骤 B:向量化 (Embedding)
这是 Embedding 模型发挥威力的核心环节。
- 原理:将用户的查询词(Query)和所有文本切片(Chunks)映射到高维数学空间中,转化成一串数字向量。
- 意义:将“文字含义”转化为“数学位置”,使 AI 具备理解语义相似性的能力。
步骤 C:相似度匹配 (Vector Similarity)
通过数学计算(通常采用 余弦相似度 Cosine Similarity),找出空间中距离最近的片段。
- 逻辑:即便字面上没有重复的关键词,只要语义接近(如“利润”与“收益”),系统也能精准捕捉。
步骤 D:压缩 (Compression)
这是最后的筛选与封装过程。
- 策略:根据预设参数(如
max_results=10),只保留相似度得分最高的前 10 个片段。 - 结果:将这些最精华的片段拼接成一个最终的字符串返回给大模型进行后续分析。
4.报告生成
提示词示例:
{content}
---
Using the above information, answer the following query or task: "武汉房价趋势" in a detailed report --
The report should focus on the answer to the query, should be well structured, informative,
in-depth, and comprehensive, with facts and numbers if available and at least 1200 words.
You should strive to write the report as long as you can using all relevant and necessary information provided.
Please follow all of the following guidelines in your report:
- You MUST determine your own concrete and valid opinion based on the given information. Do NOT defer to general and meaningless conclusions.
- You MUST write the report with markdown syntax and APA format.
- Structure your report with clear markdown headers: use # for the main title, ## for major sections, and ### for subsections.
- Use markdown tables when presenting structured data or comparisons to enhance readability.
- You MUST prioritize the relevance, reliability, and significance of the sources you use. Choose trusted sources over less reliable ones.
- You must also prioritize new articles over older articles if the source can be trusted.
- You MUST NOT include a table of contents, but DO include proper markdown headers (# ## ###) to structure your report clearly.
- Use in-text citation references in APA format and make it with markdown hyperlink placed at the end of the sentence or paragraph that references them like this: ([in-text citation](url)).
- Don't forget to add a reference list at the end of the report in APA format and full url links without hyperlinks.
-
You MUST write all used source urls at the end of the report as references, and make sure to not add duplicated sources, but only one reference for each.
Every url should be hyperlinked: [url website](url)
Additionally, you MUST include hyperlinks to the relevant URLs wherever they are referenced in the report:
eg: Author, A. A. (Year, Month Date). Title of web page. Website Name. [url website](url)
- Write the report in a Objective (impartial and unbiased presentation of facts and findings) tone.
You MUST write the report in the following language: Chinese.
Please do your best, this is very important to my career.
Assume that the current date is 2026-03-02.
全项中英对照表 (Deep Research 专用)
| 模块 | 英文原文 (English) | 中文翻译 (Chinese) |
|---|---|---|
| 任务目标 | Using the above information, answer the following query or task: "{task}" in a detailed report. | 利用上述信息,针对以下查询或任务撰写一份详细报告:"{task}"。 |
| 质量基调 | The report should be well structured, informative, in-depth, and comprehensive, with facts and numbers if available. | 报告应结构良好、信息丰富、深入且全面,并尽可能包含事实和数据。 |
| 长度约束 | At least 1200 words. You should strive to write the report as long as you can using all relevant information. | 篇幅至少 1200 字。应尽力利用所有相关且必要的信息,写得越详尽越好。 |
| 核心观点 | You MUST determine your own concrete and valid opinion... Do NOT defer to general and meaningless conclusions. | 你必须根据已有信息得出自己具体且有效的观点。严禁推诿于笼统、无意义的结论。 |
| 结构化规范 | Structure your report with clear markdown headers: # for main title, ## for major sections, and ### for subsections. | 使用清晰的 Markdown 标题构建报告:# 用于主标题,## 用于主要章节,### 用于子章节。 |
| 数据可视化 | Use markdown tables when presenting structured data or comparisons to enhance readability. | 在展示结构化数据或进行对比时,必须使用 Markdown 表格以增强可读性。 |
| 信源优先级 | You MUST prioritize the relevance, reliability, and significance of the sources you use. | 你必须优先考虑所用来源的相关性、可靠性和重要性。 |
| 时效性原则 | You must also prioritize new articles over older articles if the source can be trusted. | 在来源可靠的前提下,你必须优先使用新近的文章而非陈旧资料。 |
| 文中引用 | Use in-text citation references in APA format and make it with markdown hyperlink: (citation). | 使用 APA 格式进行文中引用,并辅以 Markdown 超链接,格式如:([引用说明](url))。 |
| 末尾参考文献 | Add a reference list at the end in APA format and full www.google.com/search?q=ur… links without hyperlinks. | 在报告末尾添加符合 APA 规范的参考文献列表,包含完整的 URL(此处不带超链接)。 |
| 链接完整性 | You MUST write all used source www.google.com/search?q=ur… at the end... make sure to not add duplicated sources. | 你必须在报告末尾列出所有引用的 URL,且确保不重复列出,每个来源仅限一条。 |
| 专业口吻 | Write the report in a Objective (impartial and unbiased) tone. | 请以客观(公正且无偏见地呈现事实与发现)的口吻撰写报告。 |
| 语言与时间 | Write in Chinese. Assume that the current date is {now}. | 必须使用中文撰写。假设当前日期是 {now}。 |