基于 Python 的 RAG 开发手册——带动态检索的 Agentic RAG

0 阅读29分钟

引言

“带动态检索的 Agentic RAG”这一章介绍了检索增强生成(Retrieval-Augmented Generation, RAG)的下一阶段演进:在这一阶段中,静态流水线被具备自适应能力、能够进行决策的智能体所取代,这些智能体可以推理何时检索、如何检索,以及检索什么内容。与依赖固定检索器和固定问答流程的传统 RAG 系统不同,Agentic RAG 利用自主智能体来动态选择检索策略、过滤或重排序文档,甚至在需要时调用外部工具。

这种方法使系统能够处理复杂的多步骤任务,适应不断变化的用户查询,并提供更准确、更符合上下文的答案。通过将检索与智能体推理相结合,本章为构建一种更智能、更灵活、更具目标导向的 RAG 架构奠定了基础,使其从简单的知识查找进一步迈向交互式问题求解。


结构

在本章中,我们将涵盖以下主题:

  • 软件要求
  • 自查询智能体
  • 面向任务的工具增强型智能体
  • 上下文感知对话智能体
  • 动态重排序智能体
  • 自适应摘要智能体
  • 思维链检索智能体
  • 混合检索智能体
  • 时间感知检索智能体
  • 流式检索智能体
  • 查询扩展智能体

学习目标

在本章结束时,读者将掌握理解与实践技能,能够设计超越静态流水线的检索增强系统,并采用自适应、链式驱动的方法。本章旨在展示:智能体如何根据当前任务动态选择检索策略、改写查询、重排结果,或调用专用工具。通过掌握这些技术,读者将学会构建能够根据用户意图、上下文以及特定领域需求进行智能调整的 RAG 系统,最终在真实应用中提供更准确、更可解释、更有用的响应。


软件要求

本书中的每个概念后面都会配有一个相关的实践示例(recipe),即使用 Python 编写的可运行代码。你会在所有示例中看到代码注释,它们会逐行解释每一行代码的作用。

运行这些示例所需的软件环境如下:

  • 系统:至少具备 16.0 GB RAM 的计算机
  • 操作系统:Windows
  • Python:Python 3.13.3 或更高版本
  • LangChain:1.0.5
  • LLM 模型:Ollama 的 llama3.2:3b
  • 程序输入文件:程序中使用的输入文件可在本书的 Git 仓库中获取

要运行程序,请执行 Python 命令 pip install <packages name>,安装各个示例中提到的依赖包。安装完成后,在你的开发环境中运行示例里提到的 Python 脚本(.py 文件)即可。

图 11.1 展示了 Agentic RAG:

图 11.1:Agentic RAG


自查询智能体

自查询智能体(self-querying agent)是一种专门的检索方法,它使 LLM 能够通过提取搜索意图和相关元数据过滤条件,将用户问题重写为结构化查询。它并不只是依赖原始的相似度搜索,而是会理解自然语言形式的查询,并将其转化为用于约束检索的条件,例如按日期、作者、主题或文档类型进行过滤。

这使系统能够从知识库中仅获取最相关的上下文,并尽量避免噪声;因此,在元数据对结果精炼起关键作用的场景中,它尤其强大。在 RAG 的语境下,自查询智能体展示了如何将推理与结构化过滤结合起来,从而生成更准确、更聚焦、更具上下文感知能力的答案。

示例 112

本示例展示了如何实现自查询智能体:

  • 使用一个小模型创建文本生成流水线,用于演示
  • 创建一个带元数据的简单数据集,用于演示
  • 使用 Hugging Face Embeddings 从文档构建向量存储
  • 自查询智能体函数处理问题、检索文档并生成答案
  • 打印结果

安装所需依赖包:

pip install langchain langchain-community transformers sentence-transformers faiss-cpu

self_querying_agent.py

请参考以下代码:

# self_querying_agent.py
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from transformers import pipeline
import json
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 创建文本生成流水线,使用小模型做演示
generator = pipeline("text2text-generation", model="t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=generator)

# 2. 创建带元数据的简单数据集,用于演示
docs = [
    Document(page_content="Intermittent fasting improves insulin sensitivity.",
metadata={"topic": "health"}),
    Document(page_content="The capital of France is Paris.",
metadata={"topic": "geography"}),
    Document(page_content="Python is a popular programming language for AI.",
metadata={"topic": "technology"}),
]

# 3. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

# 4. 自查询智能体函数:处理问题、检索文档并生成答案
def self_query_agent(question: str):
    # Step 1: 生成 query + filters(JSON 字符串)
    prompt = f"""
Convert the user question into JSON with fields:
- query: the search query text
- filters: metadata filters (dictionary, or {{}} if none)
User question: {question}
"""
    raw = llm.invoke(prompt)
    try:
        parsed = json.loads(raw)
    except:
        start, end = raw.find("{"), raw.rfind("}")
        parsed = json.loads(raw[start:end+1]) if start != -1 else {"query": question, "filters": {}}

    query = parsed.get("query", question)
    filters = parsed.get("filters", {})
    retrieved_docs = vectorstore.similarity_search(query, k=2, filter=filters or None)
    context = "\n\n".join([f"{d.page_content}" for d in retrieved_docs])

    answer_prompt = f"Answer the question using the context below.\n\nQuestion: {question}\n\nContext:\n{context}"
    answer = llm.invoke(answer_prompt)
    return answer, retrieved_docs, parsed

# Demo Run
if __name__ == "__main__":
    # Example query
    q1 = "What do we know about health and insulin sensitivity?"
    ans1, docs1, parsed1 = self_query_agent(q1)

    # 5. 打印结果
    print("\nParsed Query:", parsed1)
    print("\nRetrieved Docs:", [d.page_content for d in docs1])
    print("\nAnswer:", ans1)

输出:

Parsed Query: {'query': 'What do we know about health and insulin sensitivity?', 'filters': {}}
Retrieved Docs: ['Intermittent fasting improves insulin sensitivity.', 'The capital of France is Paris.']
Answer: Intermittent fasting

面向任务的工具增强型智能体

面向任务的工具增强型智能体(task-oriented tool-augmented agent)通过为 RAG 系统配备可在解决用户查询时调用的外部工具,扩展了系统的能力,使其能够处理仅靠文本检索无法完成的任务。它不再只是搜索并生成答案,而是能够动态决定何时调用诸如计算器、基于知识的查找工具或 API 等外部工具,以更高效地完成任务。

这种面向任务的设计使智能体同时具备推理与执行能力,从而能够以更高的可靠性回答数值型、事实型或操作型问题。在本章中,这类智能体强调了:将工具集成进检索增强工作流后,系统会从一个被动的检索器转变为一个主动的问题解决者,能够应对多样化的现实世界需求。

示例 113

本示例展示了如何实现面向任务的工具增强型智能体:

  • 使用 Hugging Face 流水线在本地加载一个小型 LLM 用于文本生成
  • 创建一个带元数据的简单数据集,用于演示
  • 使用 Hugging Face Embeddings 从文档构建向量存储
  • 构造任务示例,展示智能体的检索与计算能力
  • 针对每个任务运行循环,以展示智能体的检索与计算能力,并打印结果

安装所需依赖包:

pip install langchain langchain-community transformers sentence-transformers faiss-cpu

task_oriented_agent.py

请参考以下代码:

# task_oriented_agent.py
# 一个面向任务的智能体,既能处理文档检索,也能执行简单数学计算。
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from transformers import pipeline
import re
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 使用 HuggingFace pipeline 在本地加载小型 LLM 进行文本生成
generator = pipeline("text2text-generation", model="t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=generator)

# 2. 创建简单数据集,用于演示
docs = [
    Document(page_content="The capital of France is Paris.",
metadata={"topic": "geography"}),
    Document(page_content="Python is a popular programming language for AI.",
metadata={"topic": "technology"}),
    Document(page_content="Intermittent fasting improves insulin sensitivity.",
metadata={"topic": "health"}),
]

# 3. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

# 用于简单数学计算的工具
def calculator_tool(query: str) -> str:
    """计算简单的数学表达式。"""
    query = query.lower()
    query = query.replace("multiplied by", "*")
    query = query.replace("times", "*")
    query = query.replace("plus", "+")
    query = query.replace("minus", "-")
    query = query.replace("divided by", "/")
    query = re.sub(r"[^0-9+-*/.() ]", "", query)
    try:
        return str(eval(query))
    except Exception as e:
        return f"Error: {e}"

# 用于文档检索和摘要的工具
def doc_search_tool(query: str) -> str:
    """检索最相关文档并进行总结。"""
    # 仅检索排名第一的文档
    retrieved_docs = vectorstore.similarity_search(query, k=1)
    if not retrieved_docs:
        return "No relevant information found."

    context = retrieved_docs[0].page_content

    # 生成简洁、完整句子的答案
    prompt = f"Answer the question using the following context. Make your answer concise and in full sentences.\n\nQuestion: {query}\nContext: {context}"
    return llm.invoke(prompt)

# 简单判断任务是数学题还是文档检索
def is_math_task(task: str) -> bool:
    task_lower = task.lower()
    if re.search(r"\d", task_lower):
        return True
    math_words = ["calculate", "plus", "minus", "multiply", "times", "divided by", "add"]
    return any(word in task_lower for word in math_words)

# 面向任务的智能体函数
def task_oriented_agent(task: str) -> str:
    if is_math_task(task):
        return calculator_tool(task)
    else:
        return doc_search_tool(task)

if __name__ == "__main__":
    # 4. 任务示例:展示智能体的检索与计算能力
    tasks = [
        "What is 25 multiplied by 4?",
        "Tell me about health and insulin sensitivity.",
        "What is the capital of France?"
    ]

    # 5. 对每个任务运行循环,展示检索与计算能力并打印结果
    for task in tasks:
        print("\n--- Task ---")
        print(task)
        result = task_oriented_agent(task)
        print("\nAnswer:", result)

输出:

What is 25 multiplied by 4?
Answer: 100
--- Task ---
Tell me about health and insulin sensitivity.
Answer: Intermittent fasting improves insulin sensitivity. Context: Intermittent fasting improves insulin sensitivity.
--- Task ---
What is the capital of France?
Answer: Paris

上下文感知对话智能体

上下文感知对话智能体(context-aware conversational agent)专注于在 RAG 系统的多轮交互中保持连续性和一致性。与将每次查询都独立处理的简单检索器不同,这类智能体会利用对话历史、用户意图以及上下文线索来优化检索和生成。通过将回答建立在先前对话和相关文档之上,它能够生成更自然、更一致、更以用户为中心的响应。

这种方法在客服支持、教学辅导或研究助理等场景中尤其有价值,因为在这些场景下,理解上下文对于提供准确且有意义的回答至关重要。在本章中,它说明了 RAG 系统如何演进为更智能、更接近人类的对话伙伴。

示例 114

本示例展示了如何实现一个上下文感知对话智能体:

  • 使用本地模型(如 t5-small)创建文本生成流水线
  • 创建一个示例文档存储
  • 使用 HuggingFace embeddings 从文档构建向量存储
  • 构造示例输入,测试智能体的上下文感知与数学处理能力
  • 运行智能体并打印响应结果

安装所需依赖包:

pip install transformers langchain-community faiss-cpu torch sentence-transformers

context_aware_conversational_agent.py

请参考以下代码:

# context_aware_conversational_agent.py
# 一个上下文感知对话智能体,使用本地 LLM、文档存储和计算器工具,
# 结合对话记忆来回答用户查询。
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from transformers import pipeline
import re
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 创建文本生成流水线,使用本地模型(如 t5-small)
generator = pipeline("text2text-generation", model="t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=generator)

# 2. 创建示例文档存储
docs = [
    Document(page_content="The capital of France is Paris.", metadata={"topic": "geography"}),
    Document(page_content="Python is a popular programming language for AI.", metadata={"topic": "technology"}),
    Document(page_content="Intermittent fasting improves insulin sensitivity.", metadata={"topic": "health"}),
]

# 3. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

# 计算器工具
def calculator_tool(query: str) -> str:
    query = query.lower()
    query = query.replace("multiplied by", "*").replace("times", "*")
    query = query.replace("plus", "+").replace("minus", "-")
    query = query.replace("divided by", "/")
    query = re.sub(r"[^0-9+-*/.() ]", "", query)
    try:
        return str(eval(query))
    except Exception as e:
        return f"Error: {e}"

def is_math_task(task: str) -> bool:
    return bool(re.search(r"\d", task)) or any(
        word in task.lower() for word in ["plus", "minus", "multiply", "times", "divided by", "add"]
    )

# 文档检索工具
def doc_search_tool(query: str) -> str:
    retrieved_docs = vectorstore.similarity_search(query, k=1)
    if not retrieved_docs:
        return "No relevant information found."
    context = retrieved_docs[0].page_content
    prompt = (
        f"Answer the question concisely in full sentences using the following context if relevant.\n\n"
        f"Context: {context}\n"
        f"Question: {query}\nAnswer:"
    )
    response = llm.invoke(prompt)
    # 去除指令回显
    response = re.sub(r"(Answer concisely.*|Benutzer:)", "", response, flags=re.IGNORECASE).strip()
    return response

# 保存上下文的对话历史
conversation_history = []

# 带记忆和避免重复的上下文感知对话智能体
def context_aware_agent(user_input: str) -> str:
    """
    结合对话历史和相关文档生成响应。
    避免重复。
    """
    retrieved_info = doc_search_tool(user_input)

    # 仅使用上下文构造最简提示词
    prompt_for_llm = (
        f"Using the following context, answer the question concisely in full sentences.\n\n"
        f"Context: {retrieved_info}\n"
        f"Question: {user_input}\nAnswer:"
    )
    response = llm.invoke(prompt_for_llm)

    # 去除重复输入或指令回显
    response = re.sub(rf"{re.escape(user_input)}", "", response, flags=re.IGNORECASE).strip()
    response = re.sub(r"(Answer concisely.*|User:.*|Benutzer:)", "", response, flags=re.IGNORECASE).strip()

    # 更新对话历史
    conversation_history.append(f"User: {user_input}")
    conversation_history.append(f"Assistant: {response}")

    return response

# 具备数学感知的对话智能体
def agent(user_input: str) -> str:
    if is_math_task(user_input):
        return calculator_tool(user_input)
    else:
        return context_aware_agent(user_input)

# 示例用法
if __name__ == "__main__":
    # 3. 示例输入:测试上下文感知与数学处理能力
    inputs = [
        "Hello! Can you tell me about health and insulin sensitivity?",
        "What is the capital of France?",
        "Also, what programming language is popular for AI?",
        "Thanks! And what is 10 plus 5?"
    ]

    # 4. 运行智能体并打印响应
    for user_input in inputs:
        print("\n--- User ---")
        print(user_input)
        answer = agent(user_input)
        print("\n--- Assistant ---")
        print(answer)

输出:

--- User ---
Hello! Can you tell me about health and insulin sensitivity?
--- Assistant ---
Intermittent fasting improves insulin sensitivity.

--- User ---
What is the capital of France?
--- Assistant ---
France

--- User ---
Also, what programming language is popular for AI?
--- Assistant ---
Python

--- User ---
Thanks! And what is 10 plus 5?
--- Assistant ---
15

动态重排序智能体

动态重排序智能体(dynamic re-ranking agent)通过根据用户查询的上下文相关性,对已检索到的文档进行智能重排,从而增强 RAG 系统。它不再只依赖静态相似度分数,而是引入额外的推理层,例如语义对齐、查询意图或回答质量,以动态地优先排序最有用的证据。

这样可以确保生成模型接收到最符合上下文的文档,从而减少噪声并提高答案准确率。在本章中,它展示了自适应重排序如何显著提升检索精度,尤其是在复杂或存在歧义的查询中——在这些情况下,初始检索结果可能会混杂不同质量的内容。

示例 115

本示例展示了如何实现动态重排序智能体:

  • 使用 Hugging Face pipeline 在本地加载小型 LLM 进行文本生成
  • 创建一个带元数据的简单数据集,用于演示
  • 使用 Hugging Face embeddings 从文档构建向量存储
  • 对检索到的文档进行动态重排序,并基于排序结果打印响应

安装所需依赖包:

pip install langchain langchain-community transformers torch faiss-cpu sentence-transformers

dynamic_reranking_agent.py

请参考以下代码:

# dynamic_reranking_agent.py
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from transformers.pipelines import pipeline
import re
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 使用 HuggingFace pipeline 在本地加载小型 LLM 进行文本生成
generator = pipeline("text2text-generation", model="t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=generator)

# 2. 创建简单数据集,用于演示
docs = [
    Document(page_content="The capital of France is Paris.", metadata={"topic": "geography"}),
    Document(page_content="Python is a popular programming language for AI.", metadata={"topic": "technology"}),
    Document(page_content="Regular exercise helps improve insulin sensitivity.", metadata={"topic": "health"}),
]

# 3. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

# 文档检索与简单打分
def filtered_similarity_search(query, k=2):
    results = vectorstore.similarity_search(query, k=k)
    query_words = set(query.lower().split())
    scored_docs = []
    for doc in results:
        score = len(set(doc.page_content.lower().split()) & query_words)
        scored_docs.append((score, doc))

    # 按分数降序排序
    scored_docs.sort(reverse=True, key=lambda x: x[0])
    return scored_docs

# 计算器工具
def calculator_tool(query: str) -> str:
    query = query.lower()
    query = query.replace("multiplied by", "*").replace("times", "*")
    query = query.replace("plus", "+").replace("minus", "-")
    query = query.replace("divided by", "/")
    query = re.sub(r"[^0-9+-*/.() ]", "", query)
    try:
        return str(eval(query))
    except Exception as e:
        return f"Error: {e}"

def is_math_task(task: str) -> bool:
    return bool(re.search(r"\d", task)) or any(
        word in task.lower() for word in ["plus", "minus", "multiply", "times", "divided by", "add"]
    )

# 智能体函数
def agent(user_input: str) -> str:
    if is_math_task(user_input):
        return calculator_tool(user_input)

    # 检索并排序相关文档
    scored_docs = filtered_similarity_search(user_input, k=2)

    print("\n--- Retrieved & Ranked Documents ---")
    for i, (score, doc) in enumerate(scored_docs, start=1):
        print(f"{i}. Score: {score}, Content: {doc.page_content}")

    # 只取排名第一的文档作为上下文,避免重复
    context = scored_docs[0][1].page_content if scored_docs else ""

    # 简单提示词:避免在多处重复问题
    prompt = f"Context: {context}\nAnswer the following question concisely:\n{user_input}"

    response = llm.invoke(prompt).strip()

    # 去除 T5 有时附带的重复问题或指令
    response = re.sub(r"^.*?Answer\s*[:-]?", "", response, flags=re.IGNORECASE).strip()

    return response

# 示例对话
if __name__ == "__main__":
    inputs = [
        "What is the capital of France?",
        "Which programming language is popular for AI?",
        "What is 10 plus 5?"
    ]

    # 4. 动态重排检索结果,并根据排序结果打印响应
    for user_input in inputs:
        print("\n--- User ---")
        print(user_input)
        answer = agent(user_input)
        print("\n--- Assistant ---")
        print(answer)

输出:

--- User ---
What is the capital of France?
--- Retrieved & Ranked Documents ---
1. Score: 4, Content: The capital of France is Paris.
2. Score: 1, Content: Python is a popular programming language for AI.
--- Assistant ---
The capital of France is Paris.

--- User ---
Which programming language is popular for AI?
--- Retrieved & Ranked Documents ---
1. Score: 5, Content: Python is a popular programming language for AI.
2. Score: 1, Content: The capital of France is Paris.
--- Assistant ---
Python is a popular programming language for AI.

--- User ---
What is 10 plus 5?
--- Assistant ---
15

自适应摘要智能体

自适应摘要智能体(adaptive summarization agent)旨在生成简洁且具上下文感知能力的摘要,并能够根据用户查询和信息需求进行调整。它并不会生成固定长度的摘要,而是通过动态调整细节层次来适应不同需求:对于宽泛查询,提供高层次概览;对于具体问题,则提供更细粒度的洞察。

通过利用检索结果并采用灵活的摘要策略,这类智能体能够降低用户的认知负担,避免因细节过多而造成信息淹没,并确保最相关的要点得到突出。本章说明了自适应摘要如何通过在完整性与清晰度之间取得平衡,从而提升 RAG 系统的用户体验。

示例 116

本示例展示了如何编写一个自适应摘要智能体程序:

  • 使用本地模型(如 t5-small)生成文本生成流水线
  • 创建一个示例文档存储
  • 使用 Hugging Face embeddings 从文档构建向量存储
  • 构造示例输入,测试智能体的自适应摘要与数学处理能力
  • 运行智能体并打印响应

安装所需依赖包:

pip install langchain-community transformers faiss-cpu

adaptive_summarization_agent.py

请参考以下代码:

# adaptive_summarization_agent.py
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from transformers.pipelines import pipeline
import re
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 使用本地模型(如 t5-small)生成文本生成流水线
generator = pipeline("text2text-generation", model="t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=generator)

# 2. 创建示例文档存储
docs = [
    Document(page_content="The capital of France is Paris.",
metadata={"topic": "geography"}),
]

# 3. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

# 计算器工具
def calculator_tool(query: str) -> str:
    query = query.lower()
    query = query.replace("multiplied by", "*").replace("times", "*")
    query = query.replace("plus", "+").replace("minus", "-")
    query = query.replace("divided by", "/")
    query = re.sub(r"[^0-9+-*/.() ]", "", query)
    try:
        return str(eval(query))
    except Exception as e:
        return f"Error: {e}"

def is_math_task(task: str) -> bool:
    return bool(re.search(r"\d", task)) or any(
        word in task.lower() for word in ["plus", "minus", "multiply", "times", "divided by", "add"]
    )

# 自适应摘要智能体
def summarize_query(query: str) -> str:
    retrieved_docs = vectorstore.similarity_search(query, k=3)
    if not retrieved_docs:
        return "No relevant information found."

    # 如果排名第一的文档明显高度相关,则直接返回
    top_doc = retrieved_docs[0]
    if len(retrieved_docs) == 1 or query.lower() in top_doc.page_content.lower():
        return top_doc.page_content

    # 否则,对前几个文档进行总结
    context = " ".join([doc.page_content for doc in retrieved_docs])
    prompt = f"Summarize the following information in 1-2 concise English sentences:\n{context}"
    response = generator(prompt)[0]["generated_text"].strip()
    return response

# 结合数学处理能力的智能体
def agent(user_input: str) -> str:
    if is_math_task(user_input):
        return calculator_tool(user_input)
    else:
        return summarize_query(user_input)

# 示例对话
if __name__ == "__main__":
    # 4. 示例输入:测试自适应摘要与数学处理能力
    inputs = [
        "What is the capital of France?",
        "What is 10 plus 5?"
    ]

    # 5. 运行智能体并打印响应
    for user_input in inputs:
        print("\n--- User ---")
        print(user_input)
        answer = agent(user_input)
        print("\n--- Assistant ---")
        print(answer)

输出:

--- User ---
What is the capital of France?
--- Assistant ---
The capital of France is Paris.

--- User ---
What is 10 plus 5?
--- Assistant ---
15

思维链检索智能体

思维链检索智能体(chain-of-thought retrieval agent)通过在检索信息之前,先显式地将复杂查询拆解为更小、更有逻辑性的步骤,从而增强 RAG 系统的推理能力。它不是直接根据检索到的文档生成答案,而是先通过中间步骤进行推理:提出澄清性子问题、识别知识缺口,并逐步收集支持性证据。

这种结构化推理过程能够提升检索精度、减少幻觉,并确保最终响应更牢靠地建立在源材料之上。在本章中,它展示了:将思维链推理引入检索过程后,RAG 系统会变得更加透明、更可靠,也更能处理多步骤或复杂精细的问题。

示例 117

本示例展示了如何编写思维链检索智能体程序:

  • 使用本地模型(如 t5-small)生成文本生成流水线
  • 创建一个带元数据的示例文档
  • 使用 Hugging Face embeddings 从文档构建向量存储
  • 根据查询返回基于思维链的响应,并打印结果

安装所需依赖包:

pip install langchain langchain_community transformers torch faiss-cpu

cot_rag_agent.py

请参考以下代码:

# cot_rag_agent.py
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_core.documents import Document
from transformers.pipelines import pipeline
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 使用本地模型(如 t5-small)生成文本生成流水线
generator = pipeline("text2text-generation", model="t5-small", max_length=256)
llm = HuggingFacePipeline(pipeline=generator)

# 2. 创建示例文档
rag_docs = [
    Document(
        page_content="RAG combines retrieval and generation to improve answer accuracy. "
                     "It retrieves relevant documents before generating responses.",
        metadata={"topic": "RAG basics"}
    ),
    Document(
        page_content="Dense retrieval uses embeddings to find semantically similar text chunks. "
                     "FAISS is commonly used for fast vector retrieval.",
        metadata={"topic": "Dense Retrieval"}
    ),
    Document(
        page_content="Chain-of-Thought prompting encourages step-by-step reasoning for complex queries. "
                     "It helps the model produce more logical answers.",
        metadata={"topic": "CoT"}
    ),
]

# 3. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(rag_docs, embeddings)

# 思维链检索智能体
def cot_rag_agent(query: str) -> str:
    # 检索最相关的前 2 篇 RAG 文档
    retrieved_docs = vectorstore.similarity_search(query, k=2)
    if not retrieved_docs:
        return "No relevant information found."

    # 思维链推理提示词
    context_text = " ".join([doc.page_content for doc in retrieved_docs])
    prompt = (
        f"Think step by step and answer the question using the context from the RAG documents.\n\n"
        f"Context: {context_text}\n"
        f"Question: {query}\nAnswer:"
    )

    # 生成答案
    response = llm.invoke(prompt).strip()
    return response

if __name__ == "__main__":
    queries = [
        "What is RAG and how does it help?",
        "Explain dense retrieval in simple terms.",
        "Why use Chain-of-Thought prompting?"
    ]

    # 4. 根据查询返回基于思维链的响应并打印结果
    for q in queries:
        print("\n--- User ---")
        print(q)
        answer = cot_rag_agent(q)
        print("\n--- Assistant ---")
        print(answer)

输出:

--- User ---
What is RAG and how does it help?
--- Assistant ---
Context: RAG combines retrieval and generation to improve answer accuracy. It retrieves relevant documents before generating responses. Dense retrieval uses embeddings to find semantically similar text chunks. FAISS is commonly used for fast vector retrieval.

--- User ---
Explain dense retrieval in simple terms.
--- Assistant ---
Dense retrieval uses embeddings to find semantically similar text chunks. FAISS is commonly used for fast vector retrieval. RAG combines retrieval and generation to improve answer accuracy. It retrieves relevant documents before generating responses.

--- User ---
Why use Chain-of-Thought prompting?
--- Assistant ---
Chain-of-Thought prompting encourage step-by-step reasoning for complex queries. It helps the model produce more logical answers. RAG combines retrieval and generation to improve answer accuracy.

混合检索智能体

混合检索智能体(hybrid retrieval agent)将稠密检索与稀疏检索两种方法的优势结合起来,以在 RAG 系统中提供更准确、更全面的结果。由嵌入驱动的稠密检索能够捕捉语义含义,而稀疏检索(如 BM25)则擅长基于关键词的匹配。

通过融合这两种方式,智能体能够在召回率与精确率之间取得平衡,确保既能检索到上下文相关的文档,也不会遗漏关键词完全匹配的重要内容。这种混合策略降低了遗漏关键信息的风险,并提高了系统对不同类型查询的鲁棒性。本章强调,混合检索使 RAG 系统在现实世界场景中表现得更有效,因为现实中的查询在复杂度、措辞和具体程度上都可能存在很大差异。

示例 118

本示例展示了如何实现混合检索智能体:

  • 创建一个示例文档存储
  • 使用 Hugging Face embeddings 从文档构建向量存储
  • 为程序准备查询
  • 使用混合检索(同时采用稠密与稀疏技术)生成响应

安装所需依赖包:

pip install langchain langchain_community transformers sentence-transformers faiss-cpu torch

hybrid_retrieval_agent.py

请参考以下代码:

# hybrid_retrieval_agent.py
from langchain_huggingface import HuggingFacePipeline
from langchain_community.vectorstores import FAISS
from langchain_core.documents import Document
from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.metrics.pairwise import cosine_similarity
from langchain_huggingface import HuggingFaceEmbeddings
import re
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 创建示例文档存储
docs = [
    Document(page_content="RAG stands for Retrieval-Augmented Generation.", metadata={"topic": "RAG"}),
    Document(page_content="Dense embeddings capture semantic meaning of text.", metadata={"topic": "RAG"}),
    Document(page_content="BM25 is a sparse retrieval method based on keyword matching.", metadata={"topic": "RAG"}),
    Document(page_content="Hybrid retrieval combines dense and sparse methods for better results.", metadata={"topic": "RAG"}),
]

# 2. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="all-MiniLM-L6-v2")
vectorstore = FAISS.from_documents(docs, embeddings)

def dense_retrieval(query, k=2):
    results = vectorstore.similarity_search(query, k=k)
    return [doc.page_content for doc in results]

# 稀疏检索(BM25 / TF-IDF)
tfidf_vectorizer = TfidfVectorizer()
tfidf_matrix = tfidf_vectorizer.fit_transform([doc.page_content for doc in docs])

def sparse_retrieval(query, k=2):
    query_vec = tfidf_vectorizer.transform([query])
    scores = cosine_similarity(query_vec, tfidf_matrix)[0]
    top_indices = scores.argsort()[::-1][:k]
    return [docs[i].page_content for i in top_indices]

# 混合检索(dense + sparse)
def hybrid_retrieval(query, k=2):
    dense_results = dense_retrieval(query, k=5)
    sparse_results = sparse_retrieval(query, k=5)

    # 合并结果并去重
    combined = list(dict.fromkeys(dense_results + sparse_results))
    return combined[:k]

# 智能体
def agent(query):
    # 简单数学检测
    if re.search(r"\d", query) or any(op in query for op in ["plus","minus","multiply","divided"]):
        try:
            return str(eval(query.replace("plus","+").replace("minus","-").replace("multiply","*").replace("divided","/")))
        except:
            return "Error in calculation"

    # 否则执行混合检索
    results = hybrid_retrieval(query)
    return "Relevant info:\n" + "\n".join(results)

# 示例
if __name__ == "__main__":
    # 3. 为程序准备查询
    queries = [
        "What is RAG?",
        "Explain hybrid retrieval",
        "10 plus 5"
    ]

    # 4. 使用混合检索生成响应(同时采用稠密与稀疏技术)
    for q in queries:
        print("\n--- User ---")
        print(q)
        print("\n--- Assistant ---")
        print(agent(q))

输出:

--- User ---
What is RAG?
--- Assistant ---
Relevant info:
RAG stands for Retrieval-Augmented Generation.
BM25 is a sparse retrieval method based on keyword matching.

--- User ---
Explain hybrid retrieval
--- Assistant ---
Relevant info:
Hybrid retrieval combines dense and sparse methods for better results.
BM25 is a sparse retrieval method based on keyword matching.

--- User ---
10 plus 5
--- Assistant ---
15

时间感知检索智能体

时间感知检索智能体(time-aware retrieval agent)通过在检索过程中引入时间上下文,增强了 RAG 系统,使其生成的回答不仅相关,而且是最新的。许多查询——例如关于金融事件、法律更新或健康指南的问题——都要求系统知道信息是在何时发布或何时生效的。

通过利用文档元数据中的时间戳,并应用时间过滤或时间排序,这类智能体会优先选择最新、且在上下文中最合适的来源。这一能力能够减少过时或误导性输出,使系统在知识快速演化的动态领域中尤其有价值。本章展示了时间感知如何为基于 RAG 的应用增加一层关键的精确性与可靠性。

示例 119

本示例展示了如何实现时间感知检索智能体:

安装所需依赖包:

pip install langchain-community faiss-cpu sentence-transformers

time_aware_retrieval_agent.py

请参考以下代码:

# time_aware_retrieval_agent.py
from datetime import datetime
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
import warnings
warnings.filterwarnings("ignore", category=FutureWarning, module="torch")

# 1. 使用 HuggingFace embeddings 设置本地 embeddings
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")

# 2. 带时间戳的示例文档
docs = [
    {"page_content": "RAG improves document-grounded responses.", "metadata": {"date": "2024-09-01"}},
    {"page_content": "FAISS is a library for efficient similarity search.", "metadata": {"date": "2025-01-10"}},
]

# 3. 构建 FAISS 向量存储
vectorstore = FAISS.from_texts(
    texts=[doc["page_content"] for doc in docs],
    embedding=embeddings,
    metadatas=[doc["metadata"] for doc in docs]
)

# 时间感知检索器(优先返回较新的文档)
def time_aware_retrieve(query, top_k=1):
    retriever = vectorstore.as_retriever(search_kwargs={"k": top_k})
    results = retriever.invoke(query)

    # 按日期排序(最新优先)
    results.sort(key=lambda x: x.metadata.get("date", "1900-01-01"), reverse=True)
    return results

# 4. 模拟查询示例
queries = [
    "How does RAG work?",
    "Tell me about FAISS."
]

# 5. 对这些查询进行基于时间的检索并打印响应
for q in queries:
    results = time_aware_retrieve(q, top_k=1)
    print("\n--- User ---")
    print(q)
    print("\n--- Assistant ---")
    if results:
        for r in results:
            print(f"{r.page_content} (Date: {r.metadata['date']})")
    else:
        print("No relevant information found.")

输出:

--- User ---
How does RAG work?
--- Assistant ---
RAG improves document-grounded responses. (Date: 2024-09-01)

--- User ---
Tell me about FAISS.
--- Assistant ---
FAISS is a library for efficient similarity search. (Date: 2025-01-10)

流式检索智能体

流式检索智能体(streaming retrieval agent)专注于处理实时或持续更新的信息流,因此特别适用于知识高度动态、快速变化的场景,例如金融市场、实时新闻或传感器数据。它不再只依赖静态文档集合,而是能够在数据到达时实时检索、处理并整合这些输入,从而确保响应反映的是最新可用上下文。

通过将流式输入与检索和生成结合起来,这类智能体使系统能够持续保持最新状态,并及时响应快速变化的用户需求。本章强调,具备流式感知能力的检索为 RAG 应用带来了敏捷性和即时性,在静态知识库与实时智能之间架起了一座桥梁。

示例 120

本示例展示了如何实现流式检索智能体:

  • 准备用于展示流式检索的示例文档
  • 使用 HuggingFace embeddings 从文档构建向量存储
  • 加载一个本地小模型
  • 为流式检索准备查询
  • 创建用于流式输出的提示词
  • 获取流式智能体的输出并打印

安装所需依赖包:

pip install langchain langchain-community faiss-cpu sentence-transformers transformers

streaming_retrieval_agent.py

请参考以下代码:

# streaming_retrieval_agent.py
from langchain_community.vectorstores import FAISS
from langchain_community.embeddings import HuggingFaceEmbeddings
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM, pipeline

# 1. 示例文档:展示流式检索
docs = [
    "RAG improves answers by grounding them in retrieved documents.",
    "FAISS is a library for fast similarity search in vector databases.",
    "Python is widely used in artificial intelligence."
]

# 2. 使用 HuggingFace embeddings 从文档构建向量存储
embeddings = HuggingFaceEmbeddings(model_name="sentence-transformers/all-MiniLM-L6-v2")
vectorstore = FAISS.from_texts(docs, embedding=embeddings)

# 3. 加载本地小模型
model_id = "google/flan-t5-small"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForSeq2SeqLM.from_pretrained(model_id)
pipe = pipeline("text2text-generation", model=model, tokenizer=tokenizer, max_length=128)

# 4. 为流式检索准备查询
query = "How does RAG work?"
print("\n--- User ---")
print(query)

retriever = vectorstore.as_retriever()
docs = retriever.invoke(query)   # 新方法(替代 get_relevant_documents)
context = " ".join([d.page_content for d in docs])

# 5. 创建流式提示词
prompt = f"Answer concisely using context: {context}. Question: {query}"

# 6. 获取流式智能体的输出并打印
print("\n--- Assistant (streaming) ---")
output = pipe(prompt, max_new_tokens=50, clean_up_tokenization_spaces=True)[0]["generated_text"]

# 模拟流式输出
for word in output.split():
    print(word, end=" ", flush=True)

输出:

--- User ---
How does RAG work?
--- Assistant (streaming) ---
improves answers by grounding them in retrieved documents

查询扩展智能体

查询扩展智能体(query expansion agent)通过为用户查询补充额外上下文、同义词或相关概念,来增强检索效果。它不再只依赖原始查询中的精确措辞,而是会扩展搜索空间,以捕捉更多相关文档,并减少因词汇不匹配而错失重要信息的风险。

例如,一个关于 heart attack(心脏病发作)的查询,可能会被扩展为包含 myocardial infarction(心肌梗死)或 cardiac arrest(心搏骤停)等术语。在本章中,这类智能体展示了查询扩展如何提升召回率、平衡精确率,并使 RAG 系统能够在不同领域中更有效地理解用户意图。

示例 121

本示例展示了如何实现查询扩展智能体:

  • 创建一个带 ID 的简单数据集,用于演示
  • 创建查询示例,展示查询扩展
  • 对每个查询执行扩展,并使用稠密检索器获取响应
  • 打印查询及其返回结果

安装所需依赖包:

pip install sentence-transformers faiss-cpu nltk torch

query_expansion_agent.py

请参考以下代码:

# query_expansion_agent.py
import nltk
from nltk.corpus import wordnet as wn
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np

# ---------------- Setup ----------------
nltk.download("wordnet", quiet=True)
nltk.download("omw-1.4", quiet=True)
EMBED_MODEL = "sentence-transformers/all-MiniLM-L6-v2"
TOP_K = 2   # 每个扩展查询返回的结果数

# 1. 创建一个带 id 的简单数据集,用于演示
DOCS = [
    {"id": "doc1", "text": "Alice wrote about intermittent fasting and health."},
    {"id": "doc2", "text": "The 2008 financial crisis impacted global markets."},
    {"id": "doc3", "text": "The French Revolution began in 1789."},
    {"id": "doc4", "text": "Research shows fasting improves insulin sensitivity."},
    {"id": "doc5", "text": "Global financial markets collapsed in 2008."},
]

# ---------------- Query Expansion ----------------
def expand_query(query: str, max_synonyms: int = 2):
    tokens = query.lower().split()
    expansions = set([query])  # 包含原始查询
    for word in tokens:
        for syn in wn.synsets(word):
            for lemma in syn.lemmas()[:max_synonyms]:
                new_q = query.replace(word, lemma.name().replace("_", " "))
                expansions.add(new_q)
    return list(expansions)

# ---------------- Dense Index ----------------
class DenseRetriever:
    def __init__(self, docs):
        self.docs = docs
        self.model = SentenceTransformer(EMBED_MODEL)
        self.embeddings = np.array(self.model.encode([d["text"] for d in docs], convert_to_numpy=True))
        self.index = faiss.IndexFlatL2(self.embeddings.shape[1])
        self.index.add(self.embeddings)

    def search(self, query: str, k: int = TOP_K):
        q_vec = self.model.encode([query], convert_to_numpy=True)
        D, I = self.index.search(q_vec, k)
        results = []
        for idx, dist in zip(I[0], D[0]):
            results.append({"id": self.docs[idx]["id"], "text": self.docs[idx]["text"], "score": float(dist)})
        return results

# ---------------- Query Expansion Agent ----------------
class QueryExpansionAgent:
    def __init__(self, docs):
        self.retriever = DenseRetriever(docs)

    def run(self, query: str):
        expanded = expand_query(query)
        print(f"Original Query: {query}")
        print(f"Expanded Queries: {expanded}\n")

        seen = {}
        for q in expanded:
            hits = self.retriever.search(q, k=TOP_K)
            for h in hits:
                if h["id"] not in seen or h["score"] < seen[h["id"]]["score"]:
                    seen[h["id"]] = h

        # 按相似度分数排序
        results = sorted(seen.values(), key=lambda x: x["score"])
        return results

# ---------------- Example ----------------
if __name__ == "__main__":
    agent = QueryExpansionAgent(DOCS)

    # 2. 创建查询以展示查询扩展
    queries = [
        "Tell me about health",
        "Financial events after 2005",
        "History before 1900"
    ]

    # 3. 对每个查询执行扩展,并使用稠密检索器获取响应
    # 4. 打印查询和返回结果
    for q in queries:
        results = agent.run(q)
        print(f"Results for '{q}':")
        for r in results[:3]:
            print(f"{r['text']} (id={r['id']}, score={r['score']:.4f})")
        print("-" * 60)

输出:

Original Query: Tell me about health
Expanded Queries: ['Tell me almost health', 'Tell me about wellness', 'Tell me approximately health', 'Tell me around health', 'Tell me about health', 'Tell Pine Tree State about health', 'Tell Maine about health', 'Tell me astir health']
Results for 'Tell me about health':
Alice wrote about intermittent fasting and health. (id=doc1, score=1.2788)
Research shows fasting improves insulin sensitivity. (id=doc4, score=1.5752)
------------------------------------------------------------
Original Query: Financial events after 2005
Expanded Queries: ['Financial events after 2005', 'Financial events later 2005', 'Financial case after 2005', 'Financial events subsequently 2005', 'Financial event after 2005', 'Financial consequence after 2005', 'Financial effect after 2005']
Results for 'Financial events after 2005':
The 2008 financial crisis impacted global markets. (id=doc2, score=1.0439)
Global financial markets collapsed in 2008. (id=doc5, score=1.1847)
------------------------------------------------------------
Original Query: History before 1900
Expanded Queries: ['History ahead 1900', 'History earlier 1900', 'History before 1900', 'History in front 1900']
Results for 'History before 1900':
The French Revolution began in 1789. (id=doc3, score=1.3599)
The 2008 financial crisis impacted global markets. (id=doc2, score=1.7255)
------------------------------------------------------------

结论

总的来说,本章强调了检索系统如何从静态流水线演进为具备自适应能力、能够做出决策的智能体,这些智能体能够围绕上下文、任务需求,以及时间或语义层面的细微差别进行推理。通过利用动态检索策略,这些智能体不仅仅是“取回信息”,而是能够智能地决定检索什么、何时检索,以及如何检索知识,从而更好地支持用户意图。

这一智能体层为系统带来了比传统 RAG 方法更高的灵活性、更好的准确性以及更丰富的交互能力。