🚀 从零到上手:RAG 实战指南

90 阅读40分钟

从 RAG 技术原理到实战应用

作为一名参与过多个AI项目开发的技术人员,当我学习《Chapter 14: Knowledge Retrieval (RAG)》时,对RAG技术体系有了更系统化的理解。RAG解决了大语言模型的核心痛点——知识静态性问题。通过动态检索外部知识源,让LLM能够生成基于实时、准确信息的回答。这种模式正在重塑AI应用开发的基本范式。

📌 01 故事开场

我在做一个医疗问答系统。医生想查最新的治疗方案,模型说得有板有眼,却总是过时。那时我第一次认真看 RAG 技术。它让模型能够去“翻资料”,再回来回答。效果立刻稳定下来。

📌 02 价值与原理

RAG 的核心价值 是给模型接上外部知识。模型不再只依赖训练时的旧数据,而且能引用最新来源。

工作流程很简单。先把用户问题转成向量。在知识库做语义搜索。把相关内容和原始问题一起喂给模型。模型再生成答案。这个过程让答案更准、更稳。

📌 03 关键技术组件

向量化与语义搜索 是地基。我们用 OpenAI Embeddings 搭配 Weaviate。和关键词搜索比,准确率提升明显。

文档分块 很关键。重叠式分块能保住上下文,不会把关键信息切散。技术文档里,这点尤其重要。

👉 模块示例:五步实操指南(含列表、引用、图片)

✅ 建议:先做基础 RAG,跑通检索与生成。再考虑进阶方案。

  1. 选定数据源与范围
  2. 做清洗与切分(使用重叠式分块)
  3. 生成向量并写入向量库
  4. 设计检索策略(语义检索,必要时多路检索)
  5. 组装提示与上下文,调用模型生成

📌 04 真实场景

在医疗场景里,MetaSepsisKnowHub 聚合了脓毒症的海量知识。用 RAG 技术 给出个性化建议。专家评审里,事实准确度更高,知识回忆更好。

在零售场景里,九数算法中台用了 RAG 做商品咨询。当有人问两款手机的区别,系统能检索到参数,再做对比。客服压力明显下降,用户体验也更平稳。

📌 05 进阶方向

Graph RAG 处理复杂关系。信息分散、需要多跳推理时,它更可靠。蚂蚁的研究显示,这类问题上效果更好。

同济的做法也很有启发。通用模型做意图理解,专用模型做确定性推理。用蒙特卡洛树搜索在检索与生成里做多路径优化。把知识分层为事实、信息、知识三层,匹配不同复杂度的查询。

Agentic RAG 会主动评估来源。当两份数据互相矛盾时,它倾向选择更权威的那份,而不是把内容硬拼在一起。

📌 06 项目实施要点

性能与成本 要一起考虑。英特尔和火山引擎的经验是,用硬件加速和软件优化,可以把成本压住。用 CPU 的内置加速器,16vCPU 就能跑 7B 模型。入门更容易。

知识库质量 决定上限。我们做了定期更新。人工审核和自动验证结合。来源要可信,时间要新。

📌 07 心得与建议

可以从简单开始。先跑通基础 RAG。再逐步尝试 Graph RAGAgentic RAG

评估要跟上。自动评估与人工评审配合。持续看准确率、召回率、可解释性。

在医疗和金融等高风险领域,最好给出溯源。让答案能链接到出处。这样更可信。

实时对话里,延迟和精度要平衡。场景不同,策略不同。

📌 08 写在最后

RAG 技术 正在往更强的推理系统演进。资源更优化,框架更成熟。它会进更多关键行业。

如果你是开发者,可以先用 LangChain 做小项目。把流程跑顺。再逐步加深。这样稳,也容易复盘。

👉 互动与问答

Q1: 在你的工作里,经验和 AI 哪个更重要

A: 看任务类型。规则清晰的任务,AI 更省力。需要判断与责任的任务,经验更重要。

Q2: RAG 会不会让系统变复杂

A: 会多几个环节。但风险可控。先把检索与分块做好,再优化策略。

Q3: 什么时候考虑 Graph RAG

A: 信息关系复杂、需要多跳推理时再上。基础 RAG 跑稳最重要。

➡️ 行动建议

今天就试试。选一份文档,做清洗与分块。建立一个小型向量库。用语义检索接入模型。先跑通一个问答。


Chapter 14: Knowledge Retrieval (RAG)

第14章:知识检索(RAG)

LLMs exhibit substantial capabilities in generating human-like text. However, their knowledge base is typically confined to the data on which they were trained, limiting their access to real-time information, specific company data, or highly specialized details. Knowledge Retrieval (RAG, or Retrieval Augmented Generation), addresses this limitation. RAG enables LLMs to access and integrate external, current, and context-specific information, thereby enhancing the accuracy, relevance, and factual basis of their outputs.

LLMs(大语言模型)展现出了强大的类人文本生成能力。然而,它们的知识库通常局限于训练数据,限制了它们获取实时信息、特定公司数据或高度专业化细节的能力。**知识检索(RAG,或Retrieval Augmented Generation)**解决了这一限制。RAG使LLMs能够访问和整合外部、当前和特定上下文的信息,从而提高其输出的准确性、相关性和事实基础。

For AI agents, this is crucial as it allows them to ground their actions and responses in real-time, verifiable data beyond their static training. This capability enables them to perform complex tasks accurately, such as accessing the latest company policies to answer a specific question or checking current inventory before placing an order. By integrating external knowledge, RAG transforms agents from simple conversationalists into effective, data-driven tools capable of executing meaningful work.

对于AI智能体来说,这一点至关重要,因为它允许它们将行动和响应建立在超越静态训练的实时、可验证数据的基础上。这种能力使它们能够准确执行复杂任务,例如访问最新的公司政策来回答特定问题,或在下单前检查当前库存。通过整合外部知识,RAG将智能体从简单的对话者转变为能够执行有意义工作的有效、数据驱动的工具。

Knowledge Retrieval (RAG) Pattern Overview

知识检索(RAG)模式概述

The Knowledge Retrieval (RAG) pattern significantly enhances the capabilities of LLMs by granting them access to external knowledge bases before generating a response. Instead of relying solely on their internal, pre-trained knowledge, RAG allows LLMs to "look up" information, much like a human might consult a book or search the internet. This process empowers LLMs to provide more accurate, up-to-date, and verifiable answers.

知识检索(RAG)模式通过在生成响应前让LLMs访问外部知识库,显著增强了它们的能力。RAG使LLMs能够"查找"信息,而不仅仅依赖其内部的预训练知识,这类似于人类查阅书籍或搜索互联网。这一过程使LLMs能够提供更准确、更新和可验证的答案。

When a user poses a question or gives a prompt to an AI system using RAG, the query isn't sent directly to the LLM. Instead, the system first scours a vast external knowledge base—a highly organized library of documents, databases, or web pages—for relevant information. This search is not a simple keyword match; it's a "semantic search" that understands the user's intent and the meaning behind their words. This initial search pulls out the most pertinent snippets or "chunks" of information. These extracted pieces are then "augmented," or added, to the original prompt, creating a richer, more informed query. Finally, this enhanced prompt is sent to the LLM. With this additional context, the LLM can generate a response that is not only fluent and natural but also factually grounded in the retrieved data.

当用户向使用RAG的AI系统提出问题或给出提示时,查询不会直接发送给LLM。相反,系统首先在庞大的外部知识库(一个高度组织的文档、数据库或网页库)中搜索相关信息。这种搜索不是简单的关键词匹配;而是一种"语义搜索",能够理解用户的意图和其言语背后的含义。这种初步搜索会提取出最相关的信息片段或""。然后,这些提取的信息被"增强"或添加到原始提示中,创建一个更丰富、更有信息量的查询。最后,这个增强后的提示被发送给LLM。有了这些额外的上下文,LLM可以生成一个不仅流畅自然,而且基于检索到的数据有事实依据的响应。

The RAG framework provides several significant benefits. It allows LLMs to access up-to-date information, thereby overcoming the constraints of their static training data. This approach also reduces the risk of "hallucination"—the generation of false information—by grounding responses in verifiable data. Moreover, LLMs can utilize specialized knowledge found in internal company documents or wikis. A vital advantage of this process is the capability to offer "citations," which pinpoint the exact source of information, thereby enhancing the trustworthiness and verifiability of the AI's responses..

RAG框架提供了几个显著的优势。它允许LLMs访问最新信息,从而克服其静态训练数据的限制。这种方法还通过将响应建立在可验证数据的基础上,降低了"幻觉"(即生成错误信息)的风险。此外,LLMs可以利用内部公司文档或维基中发现的专业知识。这一过程的一个重要优势是能够提供"引用",精确指出信息的准确来源,从而增强AI响应的可信度和可验证性。

To fully appreciate how RAG functions, it's essential to understand a few core concepts (see Fig.1):

要充分理解RAG的运作方式,必须了解几个核心概念(见图1):

Embeddings: In the context of LLMs, embeddings are numerical representations of text, such as words, phrases, or entire documents. These representations are in the form of a vector, which is a list of numbers. The key idea is to capture the semantic meaning and the relationships between different pieces of text in a mathematical space. Words or phrases with similar meanings will have embeddings that are closer to each other in this vector space. For instance, imagine a simple 2D graph. The word "cat" might be represented by the coordinates (2, 3), while "kitten" would be very close at (2.1, 3.1). In contrast, the word "car" would have a distant coordinate like (8, 1), reflecting its different meaning. In reality, these embeddings are in a much higher-dimensional space with hundreds or even thousands of dimensions, allowing for a very nuanced understanding of language.

嵌入(Embeddings):在LLMs的上下文中,嵌入是文本(如单词、短语或整个文档)的数值表示。这些表示采用向量形式,即一个数字列表。核心思想是在数学空间中捕捉语义含义和不同文本片段之间的关系。具有相似含义的单词或短语在这个向量空间中的嵌入会更接近。例如,想象一个简单的2D图形。单词"cat"可能由坐标(2, 3)表示,而"kitten"会非常接近,位于(2.1, 3.1)。相比之下,单词"car"会有一个遥远的坐标,如(8, 1),反映其不同的含义。实际上,这些嵌入位于更高维度的空间中,有数百甚至数千个维度,允许对语言有非常细微的理解。

Text Similarity: Text similarity refers to the measure of how alike two pieces of text are. This can be at a surface level, looking at the overlap of words (lexical similarity), or at a deeper, meaning-based level. In the context of RAG, text similarity is crucial for finding the most relevant information in the knowledge base that corresponds to a user's query. For instance, consider the sentences: "What is the capital of France?" and "Which city is the capital of France?". While the wording is different, they are asking the same question. A good text similarity model would recognize this and assign a high similarity score to these two sentences, even though they only share a few words. This is often calculated using the embeddings of the texts.

文本相似性(Text Similarity):文本相似性是指衡量两段文本相似程度的指标。这可以是表面层次的,查看单词的重叠(词汇相似性),也可以是更深层次的,基于含义的相似性。在RAG的上下文中,文本相似性对于在知识库中找到与用户查询相对应的最相关信息至关重要。例如,考虑以下句子:"What is the capital of France?"和"Which city is the capital of France?"。虽然措辞不同,但它们问的是同一个问题。一个好的文本相似性模型会识别这一点,并为这两个句子分配高相似度分数,即使它们只共享几个单词。这通常使用文本的嵌入来计算。

Semantic Similarity and Distance: Semantic similarity is a more advanced form of text similarity that focuses purely on the meaning and context of the text, rather than just the words used. It aims to understand if two pieces of text convey the same concept or idea. Semantic distance is the inverse of this; a high semantic similarity implies a low semantic distance, and vice versa. In RAG, semantic search relies on finding documents with the smallest semantic distance to the user's query. For instance, the phrases "a furry feline companion" and "a domestic cat" have no words in common besides "a". However, a model that understands semantic similarity would recognize that they refer to the same thing and would consider them to be highly similar. This is because their embeddings would be very close in the vector space, indicating a small semantic distance. This is the "smart search" that allows RAG to find relevant information even when the user's wording doesn't exactly match the text in the knowledge base.

语义相似性和距离(Semantic Similarity and Distance):语义相似性是文本相似性的一种更高级形式,它纯粹关注文本的含义和上下文,而不仅仅是使用的单词。它旨在理解两段文本是否传达相同的概念或想法。语义距离是其反义词;高语义相似性意味着低语义距离,反之亦然。在RAG中,语义搜索依赖于找到与用户查询语义距离最小的文档。例如,短语"a furry feline companion"和"a domestic cat"除了"a"之外没有共同的单词。然而,理解语义相似性的模型会认识到它们指的是同一件事,并认为它们高度相似。这是因为它们的嵌入在向量空间中非常接近,表明语义距离很小。这就是允许RAG即使在用户措辞与知识库中的文本不完全匹配的情况下也能找到相关信息的"智能搜索"。

ScreenShot_2025-11-17_144342_710.png

Fig.1: RAG Core Concepts: Chunking, Embeddings, and Vector Database 图1:RAG核心概念:分块、嵌入和向量数据库

Chunking of Documents: Chunking is the process of breaking down large documents into smaller, more manageable pieces, or "chunks." For a RAG system to work efficiently, it cannot feed entire large documents into the LLM. Instead, it processes these smaller chunks. The way documents are chunked is important for preserving the context and meaning of the information. For instance, instead of treating a 50-page user manual as a single block of text, a chunking strategy might break it down into sections, paragraphs, or even sentences. For instance, a section on "Troubleshooting" would be a separate chunk from the "Installation Guide." When a user asks a question about a specific problem, the RAG system can then retrieve the most relevant troubleshooting chunk, rather than the entire manual. This makes the retrieval process faster and the information provided to the LLM more focused and relevant to the user's immediate need. Once documents are chunked, the RAG system must employ a retrieval technique to find the most relevant pieces for a given query. The primary method is vector search, which uses embeddings and semantic distance to find chunks that are conceptually similar to the user's question. An older, but still valuable, technique is BM25, a keyword-based algorithm that ranks chunks based on term frequency without understanding semantic meaning. To get the best of both worlds, hybrid search approaches are often used, combining the keyword precision of BM25 with the contextual understanding of semantic search. This fusion allows for more robust and accurate retrieval, capturing both literal matches and conceptual relevance.

文档分块(Chunking of Documents)分块是将大型文档分解成更小、更易于管理的片段或"块"的过程。为了让RAG系统高效工作,它不能将整个大型文档输入到LLM中。相反,它处理这些较小的块。文档的分块方式对于保留信息的上下文和含义很重要。例如,分块策略可能不会将50页的用户手册视为单个文本块,而是将其分解为章节、段落甚至句子。例如,"故障排除"部分将是与"安装指南"分开的块。当用户询问特定问题时,RAG系统可以检索最相关的故障排除块,而不是整个手册。这使检索过程更快,提供给LLM的信息更集中,更符合用户的即时需求。一旦文档被分块,RAG系统必须使用检索技术来为给定查询找到最相关的片段。主要方法是向量搜索,它使用嵌入和语义距离来找到在概念上与用户问题相似的块。一种较旧但仍然有价值的技术是BM25,一种基于关键词的算法,它根据词频对块进行排序,而不理解语义含义。为了兼顾两者的优点,通常使用混合搜索方法,将BM25的关键词精度与语义搜索的上下文理解相结合。这种融合允许更强大和准确的检索,捕捉字面匹配和概念相关性。

Vector databases: A vector database is a specialized type of database designed to store and query embeddings efficiently. After documents are chunked and converted into embeddings, these high-dimensional vectors are stored in a vector database. Traditional retrieval techniques, like keyword-based search, are excellent at finding documents containing exact words from a query but lack a deep understanding of language. They wouldn't recognize that "furry feline companion" means "cat." This is where vector databases excel. They are built specifically for semantic search. By storing text as numerical vectors, they can find results based on conceptual meaning, not just keyword overlap. When a user's query is also converted into a vector, the database uses highly optimized algorithms (like HNSW - Hierarchical Navigable Small World) to rapidly search through millions of vectors and find the ones that are "closest" in meaning. This approach is far superior for RAG because it uncovers relevant context even if the user's phrasing is completely different from the source documents. In essence, while other techniques search for words, vector databases search for meaning. This technology is implemented in various forms, from managed databases like Pinecone and Weaviate to open-source solutions such as Chroma DB, Milvus, and Qdrant. Even existing databases can be augmented with vector search capabilities, as seen with Redis, Elasticsearch, and Postgres (using the pgvector extension). The core retrieval mechanisms are often powered by libraries like Meta AI's FAISS or Google Research's ScaNN, which are fundamental to the efficiency of these systems.

向量数据库(Vector databases)向量数据库是一种专门设计用于高效存储和查询嵌入的数据库类型。在文档被分块并转换为嵌入后,这些高维向量被存储在向量数据库中。传统的检索技术,如基于关键词的搜索,在查找包含查询中确切单词的文档方面表现出色,但缺乏对语言的深入理解。它们不会认识到"furry feline companion"的意思是"cat"。这正是向量数据库的优势所在。它们是专门为语义搜索构建的。通过将文本存储为数值向量,它们可以基于概念含义找到结果,而不仅仅是关键词重叠。当用户的查询也被转换为向量时,数据库使用高度优化的算法(如HNSW-层次可导航小世界)来快速搜索数百万个向量,并找到在含义上"最接近"的向量。这种方法对于RAG来说要优越得多,因为即使用户的措辞与源文档完全不同,它也能发现相关上下文。本质上,当其他技术搜索单词时,向量数据库搜索含义。这项技术以各种形式实现,从PineconeWeaviate等托管数据库到Chroma DBMilvusQdrant等开源解决方案。甚至现有的数据库也可以通过向量搜索功能进行增强,如RedisElasticsearchPostgres(使用pgvector扩展)。核心检索机制通常由Meta AI的FAISSGoogle Research的ScaNN等库提供支持,这些库对这些系统的效率至关重要。

RAG's Challenges: Despite its power, the RAG pattern is not without its challenges. A primary issue arises when the information needed to answer a query is not confined to a single chunk but is spread across multiple parts of a document or even several documents. In such cases, the retriever might fail to gather all the necessary context, leading to an incomplete or inaccurate answer. The system's effectiveness is also highly dependent on the quality of the chunking and retrieval process; if irrelevant chunks are retrieved, it can introduce noise and confuse the LLM. Furthermore, effectively synthesizing information from potentially contradictory sources remains a significant hurdle for these systems. Besides that, another challenge is that RAG requires the entire knowledge base to be pre-processed and stored in specialized databases, such as vector or graph databases, which is a considerable undertaking. Consequently, this knowledge requires periodic reconciliation to remain up-to-date, a crucial task when dealing with evolving sources like company wikis. This entire process can have a noticeable impact on performance, increasing latency, operational costs, and the number of tokens used in the final prompt.

RAG的挑战(RAG's Challenges):尽管RAG模式功能强大,但它并非没有挑战。一个主要问题出现在回答查询所需的信息不限于单个块,而是分布在文档的多个部分甚至多个文档中的情况。在这种情况下,检索器可能无法收集所有必要的上下文,导致不完整或不准确的答案。系统的有效性还高度依赖于分块和检索过程的质量;如果检索到不相关的块,可能会引入噪声并使LLM困惑。此外,有效合成来自潜在矛盾来源的信息仍然是这些系统面临的重大障碍。此外,另一个挑战是,RAG要求将整个知识库预先处理并存储在专门的数据库中,如向量数据库或图数据库,这是一项相当大的工作。因此,这些知识需要定期协调以保持最新状态,这在处理公司维基等不断发展的源时是一项关键任务。整个过程可能会对性能产生明显影响,增加延迟、运营成本以及最终提示中使用的令牌数量。

In summary, the Retrieval-Augmented Generation (RAG) pattern represents a significant leap forward in making AI more knowledgeable and reliable. By seamlessly integrating an external knowledge retrieval step into the generation process, RAG addresses some of the core limitations of standalone LLMs. The foundational concepts of embeddings and semantic similarity, combined with retrieval techniques like keyword and hybrid search, allow the system to intelligently find relevant information, which is made manageable through strategic chunking. This entire retrieval process is powered by specialized vector databases designed to store and efficiently query millions of embeddings at scale. While challenges in retrieving fragmented or contradictory information persist, RAG empowers LLMs to produce answers that are not only contextually appropriate but also anchored in verifiable facts, fostering greater trust and utility in AI.

总之,**检索增强生成(RAG)**模式代表了使AI更具知识和可靠性的重大飞跃。通过将外部知识检索步骤无缝集成到生成过程中,RAG解决了独立LLMs的一些核心限制。嵌入和语义相似性的基础概念,结合关键词和混合搜索等检索技术,使系统能够智能地找到相关信息,并通过战略性分块使其易于管理。整个检索过程由专门的向量数据库提供支持,这些数据库设计用于大规模存储和高效查询数百万个嵌入。虽然在检索分散或矛盾信息方面的挑战仍然存在,但RAG使LLMs能够产生不仅上下文适当,而且基于可验证事实的答案,从而增强对AI的信任和实用性。

Graph RAG: GraphRAG is an advanced form of Retrieval-Augmented Generation that utilizes a knowledge graph instead of a simple vector database for information retrieval. It answers complex queries by navigating the explicit relationships (edges) between data entities (nodes) within this structured knowledge base. A key advantage is its ability to synthesize answers from information fragmented across multiple documents, a common failing of traditional RAG. By understanding these connections, GraphRAG provides more contextually accurate and nuanced responses.

图RAG(Graph RAG)GraphRAG是检索增强生成的一种高级形式,它利用知识图谱而不是简单的向量数据库进行信息检索。它通过导航这个结构化知识库中数据实体(节点)之间的显式关系(边)来回答复杂查询。一个关键优势是它能够从分散在多个文档中的信息中合成答案,这是传统RAG的常见缺点。通过理解这些连接,GraphRAG提供了更符合上下文的准确和细微的响应。

Use cases include complex financial analysis, connecting companies to market events, and scientific research for discovering relationships between genes and diseases. The primary drawback, however, is the significant complexity, cost, and expertise required to build and maintain a high-quality knowledge graph. This setup is also less flexible and can introduce higher latency compared to simpler vector search systems. The system's effectiveness is entirely dependent on the quality and completeness of the underlying graph structure. Consequently, GraphRAG offers superior contextual reasoning for intricate questions but at a much higher implementation and maintenance cost. In summary, it excels where deep, interconnected insights are more critical than the speed and simplicity of standard RAG.

用例包括复杂的金融分析、将公司与市场事件联系起来,以及用于发现基因和疾病之间关系的科学研究。然而,主要缺点是构建和维护高质量知识图谱所需的显著复杂性、成本和专业知识。与更简单的向量搜索系统相比,这种设置也不太灵活,可能会引入更高的延迟。系统的有效性完全取决于底层图结构的质量和完整性。因此,GraphRAG为复杂问题提供了优越的上下文推理,但实现和维护成本要高得多。总之,在深度、相互关联的洞察比标准RAG的速度和简单性更关键的情况下,它表现出色。

Agentic RAG: An evolution of this pattern, known as Agentic RAG (see Fig.2), introduces a reasoning and decision-making layer to significantly enhance the reliability of information extraction. Instead of just retrieving and augmenting, an "agent"—a specialized AI component—acts as a critical gatekeeper and refiner of knowledge. Rather than passively accepting the initially retrieved data, this agent actively interrogates its quality, relevance, and completeness, as illustrated by the following scenarios.

智能体RAG(Agentic RAG):这种模式的一种演变,称为智能体RAG(见图2),引入了一个推理和决策层,以显著提高信息提取的可靠性。除了检索和增强之外,"智能体"—一个专门的AI组件—充当知识的关键把关者和精炼者。这个智能体不是被动接受最初检索的数据,而是积极询问其质量、相关性和完整性,如以下场景所示。

First, an agent excels at reflection and source validation. If a user asks, "What is our company's policy on remote work?" a standard RAG might pull up a 2020 blog post alongside the official 2025 policy document. The agent, however, would analyze the documents' metadata, recognize the 2025 policy as the most current and authoritative source, and discard the outdated blog post before sending the correct context to the LLM for a precise answer.

首先,智能体擅长反思和源验证。如果用户问:"我们公司的远程工作政策是什么?"标准RAG可能会提取出一篇2020年的博客文章和2025年的官方政策文档。然而,智能体会分析文档的元数据,将2025年政策识别为最新和最权威的来源,并在将正确的上下文发送给LLM以获得精确答案之前,丢弃过时的博客文章。

ScreenShot_2025-11-17_144354_487.png

Fig.2: Agentic RAG introduces a reasoning agent that actively evaluates, reconciles, and refines retrieved information to ensure a more accurate and trustworthy final response. 图2:智能体RAG引入了一个推理智能体,它积极评估、协调和完善检索到的信息,以确保最终响应更加准确和可信。

Second, an agent is adept at reconciling knowledge conflicts. Imagine a financial analyst asks, "What was Project Alpha's Q1 budget?" The system retrieves two documents: an initial proposal stating a €50,000 budget and a finalized financial report listing it as €65,000. An Agentic RAG would identify this contradiction, prioritize the financial report as the more reliable source, and provide the LLM with the verified figure, ensuring the final answer is based on the most accurate data.

其次,智能体擅长协调知识冲突。想象一下,一位财务分析师问:"Alpha项目第一季度的预算是多少?"系统检索到两个文档:一份初始提案,预算为50,000欧元;一份最终财务报告,预算为65,000欧元。智能体RAG会识别这种矛盾,将财务报告视为更可靠的来源,并向LLM提供经过验证的数字,确保最终答案基于最准确的数据。

Third, an agent can perform multi-step reasoning to synthesize complex answers. If a user asks, "How do our product's features and pricing compare to Competitor X's?" the agent would decompose this into separate sub-queries. It would initiate distinct searches for its own product's features, its pricing, Competitor X's features, and Competitor X's pricing. After gathering these individual pieces of information, the agent would synthesize them into a structured, comparative context before feeding it to the LLM, enabling a comprehensive response that a simple retrieval could not have produced.

第三,智能体可以执行多步推理来合成复杂的答案。如果用户问:"我们的产品特性和定价与竞争对手X相比如何?"智能体会将此分解为单独的子查询。它会针对自己产品的特性、定价、竞争对手X的特性和竞争对手X的定价进行不同的搜索。在收集这些单独的信息后,智能体会将它们合成一个结构化的比较上下文,然后再将其提供给LLM,从而实现简单检索无法产生的全面响应。

Fourth, an agent can identify knowledge gaps and use external tools. Suppose a user asks, "What was the market's immediate reaction to our new product launched yesterday?" The agent searches the internal knowledge base, which is updated weekly, and finds no relevant information. Recognizing this gap, it can then activate a tool—such as a live web-search API—to find recent news articles and social media sentiment. The agent then uses this freshly gathered external information to provide an up-to-the-minute answer, overcoming the limitations of its static internal database.

第四,智能体可以识别知识差距并使用外部工具。假设用户问:"市场对我们昨天推出的新产品的即时反应如何?"智能体搜索每周更新的内部知识库,但没有找到相关信息。认识到这一差距后,它可以激活一个工具(如实时网络搜索API)来查找最近的新闻文章和社交媒体情绪。然后,智能体使用这种新收集的外部信息提供最新的答案,克服了静态内部数据库的限制。

Challenges of Agentic RAG: While powerful, the agentic layer introduces its own set of challenges. The primary drawback is a significant increase in complexity and cost. Designing, implementing, and maintaining the agent's decision-making logic and tool integrations requires substantial engineering effort and adds to computational expenses. This complexity can also lead to increased latency, as the agent's cycles of reflection, tool use, and multi-step reasoning take more time than a standard, direct retrieval process. Furthermore, the agent itself can become a new source of error; a flawed reasoning process could cause it to get stuck in useless loops, misinterpret a task, or improperly discard relevant information, ultimately degrading the quality of the final response.

智能体RAG的挑战(Challenges of Agentic RAG):虽然功能强大,但智能体层也带来了一系列自身的挑战。主要缺点是复杂性和成本的显著增加。设计、实现和维护智能体的决策逻辑和工具集成需要大量的工程工作,并增加了计算成本。这种复杂性还可能导致延迟增加,因为智能体的反思、工具使用和多步推理周期比标准的直接检索过程花费更多时间。此外,智能体本身可能成为新的错误来源;有缺陷的推理过程可能导致它陷入无用的循环、误解任务或不当丢弃相关信息,最终降低最终响应的质量。

In summary: Agentic RAG represents a sophisticated evolution of the standard retrieval pattern, transforming it from a passive data pipeline into an active, problem-solving framework. By embedding a reasoning layer that can evaluate sources, reconcile conflicts, decompose complex questions, and use external tools, agents dramatically improve the reliability and depth of the generated answers. This advancement makes the AI more trustworthy and capable, though it comes with important trade-offs in system complexity, latency, and cost that must be carefully managed.

总结:智能体RAG代表了标准检索模式的复杂演变,将其从被动的数据管道转变为主动的问题解决框架。通过嵌入一个能够评估来源、协调冲突、分解复杂问题和使用外部工具的推理层,智能体显著提高了生成答案的可靠性和深度。这一进步使AI更加可靠和强大,尽管它在系统复杂性、延迟和成本方面带来了必须仔细管理的重要权衡。

Practical Applications & Use Cases

实际应用和用例

Knowledge Retrieval (RAG) is changing how Large Language Models (LLMs) are utilized across various industries, enhancing their ability to provide more accurate and contextually relevant responses.

知识检索(RAG)正在改变大型语言模型(LLMs)在各个行业中的使用方式,增强它们提供更准确和上下文相关响应的能力。

Applications include:

应用包括:

  • Enterprise Search and Q&A: Organizations can develop internal chatbots that respond to employee inquiries using internal documentation such as HR policies, technical manuals, and product specifications. The RAG system extracts relevant sections from these documents to inform the LLM's response.

  • 企业搜索和问答(Enterprise Search and Q&A):组织可以开发内部聊天机器人,使用HR政策、技术手册和产品规格等内部文档来响应用户的查询。RAG系统从这些文档中提取相关部分,为LLM的响应提供信息。

  • Customer Support and Helpdesks: RAG-based systems can offer precise and consistent responses to customer queries by accessing information from product manuals, frequently asked questions (FAQs), and support tickets. This can reduce the need for direct human intervention for routine issues.

  • 客户支持和帮助台(Customer Support and Helpdesks):基于RAG的系统可以通过访问产品手册、常见问题解答(FAQs)和支持工单中的信息,为客户查询提供精确和一致的响应。这可以减少对常规问题的直接人工干预需求。

  • Personalized Content Recommendation: Instead of basic keyword matching, RAG can identify and retrieve content (articles, products) that is semantically related to a user's preferences or previous interactions, leading to more relevant recommendations.

  • 个性化内容推荐(Personalized Content Recommendation):RAG可以识别和检索与用户偏好或之前互动在语义上相关的内容(文章、产品),而不是基本的关键词匹配,从而提供更相关的推荐。

  • News and Current Events Summarization: LLMs can be integrated with real-time news feeds. When prompted about a current event, the RAG system retrieves recent articles, allowing the LLM to produce an up-to-date summary.

  • 新闻和时事摘要(News and Current Events Summarization):LLMs可以与实时新闻源集成。当被问及当前事件时,RAG系统检索最近的文章,允许LLM生成最新的摘要。

By incorporating external knowledge, RAG extends the capabilities of LLMs beyond simple communication to function as knowledge processing systems.

通过整合外部知识,RAG将LLMs的能力从简单的通信扩展到作为知识处理系统运行。

Hands-On Code Example (ADK)

实践代码示例(ADK)

To illustrate the Knowledge Retrieval (RAG) pattern, let's see three examples.

为了说明知识检索(RAG)模式,让我们看三个例子。

First, is how to use Google Search to do RAG and ground LLMs to search results. Since RAG involves accessing external information, the Google Search tool is a direct example of a built-in retrieval mechanism that can augment an LLM's knowledge.

首先,是如何使用Google搜索来执行RAG并将LLMs与搜索结果结合。由于RAG涉及访问外部信息,Google搜索工具是内置检索机制的直接示例,可以增强LLM的知识。

from google.adk.tools import google_search
from google.adk.agents import Agent

search_agent = Agent(
    name="research_assistant",
    model="gemini-2.0-flash-exp",
    instruction="You help users research topics. When asked, use the Google Search tool",
    tools=[google_search]
)

Second, this section explains how to utilize Vertex AI RAG capabilities within the Google ADK. The code provided demonstrates the initialization of VertexAiRagMemoryService from the ADK. This allows for establishing a connection to a Google Cloud Vertex AI RAG Corpus. The service is configured by specifying the corpus resource name and optional parameters such as SIMILARITY_TOP_K and VECTOR_DISTANCE_THRESHOLD. These parameters influence the retrieval process. SIMILARITY_TOP_K defines the number of top similar results to be retrieved. VECTOR_DISTANCE_THRESHOLD sets a limit on the semantic distance for the retrieved results. This setup enables agents to perform scalable and persistent semantic knowledge retrieval from the designated RAG Corpus. The process effectively integrates Google Cloud's RAG functionalities into an ADK agent, thereby supporting the development of responses grounded in factual data.

其次,本节解释如何在Google ADK中利用Vertex AI RAG功能。提供的代码演示了如何从ADK初始化VertexAiRagMemoryService。这允许建立与Google Cloud Vertex AI RAG Corpus的连接。通过指定corpus资源名称和可选参数(如SIMILARITY_TOP_K和VECTOR_DISTANCE_THRESHOLD)来配置该服务。这些参数影响检索过程。SIMILARITY_TOP_K定义要检索的最相似结果的数量。VECTOR_DISTANCE_THRESHOLD设置检索结果的语义距离限制。这种设置使智能体能够从指定的RAG Corpus执行可扩展和持久的语义知识检索。该过程有效地将Google Cloud的RAG功能集成到ADK智能体中,从而支持基于事实数据的响应开发。

# Import the necessary VertexAiRagMemoryService class from the google.adk.memory module.
from google.adk.memory import VertexAiRagMemoryService

RAG_CORPUS_RESOURCE_NAME = "projects/your-gcp-project-id/locations/us-central1/ragCorpora/your-corpus-id"

# Define an optional parameter for the number of top similar results to retrieve.
# This controls how many relevant document chunks the RAG service will return.
SIMILARITY_TOP_K = 5

# Define an optional parameter for the vector distance threshold.
# This threshold determines the maximum semantic distance allowed for retrieved results;
# results with a distance greater than this value might be filtered out.
VECTOR_DISTANCE_THRESHOLD = 0.7

# Initialize an instance of VertexAiRagMemoryService.
# This sets up the connection to your Vertex AI RAG Corpus.
# - rag_corpus: Specifies the unique identifier for your RAG Corpus.
# - similarity_top_k: Sets the maximum number of similar results to fetch.
# - vector_distance_threshold: Defines the similarity threshold for filtering results.
memory_service = VertexAiRagMemoryService(
    rag_corpus=RAG_CORPUS_RESOURCE_NAME,
    similarity_top_k=SIMILARITY_TOP_K,
    vector_distance_threshold=VECTOR_DISTANCE_THRESHOLD
)

Hands-On Code Example (LangChain)

实践代码示例(LangChain)

Third, let's walk through a complete example using LangChain.

第三,让我们看一个使用LangChain的完整示例。

import os
import requests
from typing import List, Dict, Any, TypedDict
from langchain_community.document_loaders import TextLoader
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.output_parsers import StrOutputParser
from langchain_community.embeddings import OpenAIEmbeddings
from langchain_community.vectorstores import Weaviate
from langchain_openai import ChatOpenAI
from langchain.text_splitter import CharacterTextSplitter
from langchain.schema.runnable import RunnablePassthrough
from langgraph.graph import StateGraph, END
import weaviate
from weaviate.embedded import EmbeddedOptions
import dotenv

# Load environment variables (e.g., OPENAI_API_KEY)
dotenv.load_dotenv()

# Set your OpenAI API key (ensure it's loaded from .env or set here)
# os.environ["OPENAI_API_KEY"] = "YOUR_OPENAI_API_KEY"

# --- 1. Data Preparation (Preprocessing) ---
# Load data
url = "https://github.com/langchain-ai/langchain/blob/master/docs/docs/how_to/state_of_the_union.txt"
res = requests.get(url)
with open("state_of_the_union.txt", "w") as f:
    f.write(res.text)

loader = TextLoader('./state_of_the_union.txt')
documents = loader.load()

# Chunk documents
text_splitter = CharacterTextSplitter(chunk_size=500, chunk_overlap=50)
chunks = text_splitter.split_documents(documents)

# Embed and store chunks in Weaviate
client = weaviate.Client(
    embedded_options = EmbeddedOptions()
)

vectorstore = Weaviate.from_documents(
    client = client,
    documents = chunks,
    embedding = OpenAIEmbeddings(),
    by_text = False
)

# Define the retriever
retriever = vectorstore.as_retriever()

# Initialize LLM
llm = ChatOpenAI(model_name="gpt-3.5-turbo", temperature=0)

# --- 2. Define the State for LangGraph ---
class RAGGraphState(TypedDict):
    question: str
    documents: List[Document]
    generation: str

# --- 3. Define the Nodes (Functions) ---
def retrieve_documents_node(state: RAGGraphState) -> RAGGraphState:
    """Retrieves documents based on the user's question."""
    question = state["question"]
    documents = retriever.invoke(question)
    return {"documents": documents, "question": question, "generation": ""}

def generate_response_node(state: RAGGraphState) -> RAGGraphState:
    """Generates a response using the LLM based on retrieved documents."""
    question = state["question"]
    documents = state["documents"]
    
    # Prompt template from the PDF
    template = """You are an assistant for question-answering tasks. Use the following pieces of retrieved context to answer the question. If you don't know the answer, just say that you don't know. Use three sentences maximum and keep the answer concise.
    Question: {question}
    Context: {context}
    Answer: """
    
    prompt = ChatPromptTemplate.from_template(template)
    
    # Format the context from the documents
    context = "\n\n".join([doc.page_content for doc in documents])
    
    # Create the RAG chain
    rag_chain = prompt | llm | StrOutputParser()
    
    # Invoke the chain
    generation = rag_chain.invoke({"context": context, "question": question})
    
    return {"question": question, "documents": documents, "generation": generation}

# --- 4. Build the LangGraph Graph ---
workflow = StateGraph(RAGGraphState)

# Add nodes
workflow.add_node("retrieve", retrieve_documents_node)
workflow.add_node("generate", generate_response_node)

# Set the entry point
workflow.set_entry_point("retrieve")

# Add edges (transitions)
workflow.add_edge("retrieve", "generate")
workflow.add_edge("generate", END)

# Compile the graph
app = workflow.compile()

# --- 5. Run the RAG Application ---
if __name__ == "__main__":
    print("\n--- Running RAG Query ---")
    query = "What did the president say about Justice Breyer"
    inputs = {"question": query}
    for s in app.stream(inputs):
        print(s)
    
    print("\n--- Running another RAG Query ---")
    query_2 = "What did the president say about the economy?"
    inputs_2 = {"question": query_2}
    for s in app.stream(inputs_2):
        print(s)

This Python code illustrates a Retrieval-Augmented Generation (RAG) pipeline implemented with LangChain and LangGraph. The process begins with the creation of a knowledge base derived from a text document, which is segmented into chunks and transformed into embeddings. These embeddings are then stored in a Weaviate vector store, facilitating efficient information retrieval. A StateGraph in LangGraph is utilized to manage the workflow between two key functions: retrieve_documents_node and generate_response_node. The retrieve_documents_node function queries the vector store to identify relevant document chunks based on the user's input. Subsequently, the generate_response_node function utilizes the retrieved information and a predefined prompt template to produce a response using an OpenAI Large Language Model (LLM). The app.stream method allows the execution of queries through the RAG pipeline, demonstrating the system's capacity to generate contextually relevant outputs.

这段Python代码展示了使用LangChain和LangGraph实现的检索增强生成(RAG)流水线。该过程首先从文本文档创建知识库,将其分割成块并转换为嵌入。然后将这些嵌入存储在Weaviate向量存储中,以促进高效的信息检索。利用LangGraph中的StateGraph管理两个关键函数之间的工作流:retrieve_documents_nodegenerate_response_noderetrieve_documents_node函数查询向量存储,根据用户输入识别相关的文档块。随后,generate_response_node函数利用检索到的信息和预定义的提示模板,使用OpenAI大型语言模型(LLM)生成响应。app.stream方法允许通过RAG流水线执行查询,展示系统生成上下文相关输出的能力。

At Glance

概览

What: LLMs possess impressive text generation abilities but are fundamentally limited by their training data. This knowledge is static, meaning it doesn't include real-time information or private, domain-specific data. Consequently, their responses can be outdated, inaccurate, or lack the specific context required for specialized tasks. This gap restricts their reliability for applications demanding current and factual answers.

内容:LLMs拥有令人印象深刻的文本生成能力,但从根本上受到其训练数据的限制。这些知识是静态的,意味着它不包括实时信息或私人的、特定领域的数据。因此,它们的响应可能过时、不准确或缺乏专业任务所需的特定上下文。这种差距限制了它们在需要当前和事实性答案的应用中的可靠性。

Why: The Retrieval-Augmented Generation (RAG) pattern provides a standardized solution by connecting LLMs to external knowledge sources. When a query is received, the system first retrieves relevant information snippets from a specified knowledge base. These snippets are then appended to the original prompt, enriching it with timely and specific context. This augmented prompt is then sent to the LLM, enabling it to generate a response that is accurate, verifiable, and grounded in external data. This process effectively transforms the LLM from a closed-book reasoner into an open-book one, significantly enhancing its utility and trustworthiness.

原因:检索增强生成(RAG)模式通过将LLMs连接到外部知识源,提供了一个标准化的解决方案。当收到查询时,系统首先从指定的知识库中检索相关信息片段。然后将这些片段附加到原始提示中,以及时和特定的上下文丰富它。然后,这个增强的提示被发送给LLM,使其能够生成准确、可验证且基于外部数据的响应。这一过程有效地将LLM从闭卷推理者转变为开卷推理者,显著提高了其实用性和可信度。

Rule of thumb: Use this pattern when you need an LLM to answer questions or generate content based on specific, up-to-date, or proprietary information that was not part of its original training data. It is ideal for building Q&A systems over internal documents, customer support bots, and applications requiring verifiable, fact-based responses with citations.

经验法则:当你需要LLM基于特定、最新或专有信息回答问题或生成内容,而这些信息不是其原始训练数据的一部分时,使用这种模式。它非常适合构建基于内部文档的问答系统、客户支持机器人以及需要可验证、基于事实并带有引用的响应的应用程序。

Visual summary 视觉摘要

ScreenShot_2025-11-17_144410_165.png

Knowledge Retrieval pattern: an AI agent to query and retrieve information from structured databases 知识检索模式:AI智能体从结构化数据库中查询和检索信息

ScreenShot_2025-11-17_144418_957.png

Fig. 3: Knowledge Retrieval pattern: an AI agent to find and synthesize information from the public internet in response to user queries. 图3:知识检索模式:AI智能体从公共互联网中查找和合成信息以响应用户查询。

Key Takeaways

关键要点

  • Knowledge Retrieval (RAG) enhances LLMs by allowing them to access external, up-to-date, and specific information.

  • 知识检索(RAG)通过允许LLMs访问外部、最新和特定信息来增强它们的能力。

  • The process involves Retrieval (searching a knowledge base for relevant snippets) and Augmentation (adding these snippets to the LLM's prompt).

  • 该过程涉及检索(在知识库中搜索相关片段)和增强(将这些片段添加到LLM的提示中)。

  • RAG helps LLMs overcome limitations like outdated training data, reduces "hallucinations," and enables domain-specific knowledge integration.

  • RAG帮助LLMs克服过时的训练数据等限制,减少"幻觉",并实现特定领域知识的集成。

  • RAG allows for attributable answers, as the LLM's response is grounded in retrieved sources.

  • RAG允许可归因的答案,因为LLM的响应基于检索到的源。

  • GraphRAG leverages a knowledge graph to understand the relationships between different pieces of information, allowing it to answer complex questions that require synthesizing data from multiple sources.

  • GraphRAG利用知识图谱理解不同信息片段之间的关系,使其能够回答需要综合多个来源数据的复杂问题。

  • Agentic RAG moves beyond simple information retrieval by using an intelligent agent to actively reason about, validate, and refine external knowledge, ensuring a more accurate and reliable answer.

  • 智能体RAG超越了简单的信息检索,通过使用智能体积极推理、验证和完善外部知识,确保更准确和可靠的答案。

  • Practical applications span enterprise search, customer support, legal research, and personalized recommendations.

  • 实际应用涵盖企业搜索、客户支持、法律研究和个性化推荐。

Conclusion

结论

In conclusion, Retrieval-Augmented Generation (RAG) addresses the core limitation of a Large Language Model's static knowledge by connecting it to external, up-to-date data sources. The process works by first retrieving relevant information snippets and then augmenting the user's prompt, enabling the LLM to generate more accurate and contextually aware responses. This is made possible by foundational technologies like embeddings, semantic search, and vector databases, which find information based on meaning rather than just keywords. By grounding outputs in verifiable data, RAG significantly reduces factual errors and allows for the use of proprietary information, enhancing trust through citations.

总之,检索增强生成(RAG)通过将大型语言模型连接到外部、最新的数据源,解决了其静态知识的核心限制。该过程首先检索相关信息片段,然后增强用户的提示,使LLM能够生成更准确和上下文感知的响应。这得益于嵌入、语义搜索和向量数据库等基础技术,它们基于含义而非仅基于关键词查找信息。通过将输出建立在可验证数据的基础上,RAG显著减少了事实错误,并允许使用专有信息,通过引用增强了信任。

An advanced evolution, Agentic RAG, introduces a reasoning layer that actively validates, reconciles, and synthesizes retrieved knowledge for even greater reliability. Similarly, specialized approaches like GraphRAG leverage knowledge graphs to navigate explicit data relationships, allowing the system to synthesize answers to highly complex, interconnected queries. This agent can resolve conflicting information, perform multi-step queries, and use external tools to find missing data. While these advanced methods add complexity and latency, they drastically improve the depth and trustworthiness of the final response. Practical applications for these patterns are already transforming industries, from enterprise search and customer support to personalized content delivery. Despite the challenges, RAG is a crucial pattern for making AI more knowledgeable, reliable, and useful. Ultimately, it transforms LLMs from closed-book conversationalists into powerful, open-book reasoning tools.

作为一种高级演变,智能体RAG引入了一个推理层,它积极验证、协调和合成检索到的知识,以实现更高的可靠性。同样,GraphRAG等专业方法利用知识图谱导航显式数据关系,使系统能够合成对高度复杂、相互关联查询的答案。这种智能体可以解决冲突信息、执行多步查询并使用外部工具查找缺失数据。虽然这些高级方法增加了复杂性和延迟,但它们显著提高了最终响应的深度和可信度。这些模式的实际应用已经在改变各个行业,从企业搜索和客户支持到个性化内容交付。尽管面临挑战,RAG是使AI更具知识、更可靠和更有用的关键模式。最终,它将LLMs从闭卷对话者转变为强大的开卷推理工具。

References

参考文献

  1. Lewis, P., et al. (2020). Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks. arxiv.org/abs/2005.11…

  2. Lewis, P., et al. (2020). 面向知识密集型NLP任务的检索增强生成. arxiv.org/abs/2005.11…

  3. Google AI for Developers Documentation. Retrieval Augmented Generation - cloud.google.com/vertex-ai/g…

  4. Google AI开发者文档. 检索增强生成 - cloud.google.com/vertex-ai/g…

  5. Retrieval-Augmented Generation with Graphs (GraphRAG), arxiv.org/abs/2501.00…

  6. 基于图的检索增强生成(GraphRAG), arxiv.org/abs/2501.00…

  7. LangChain and LangGraph: Leonie Monigatti, "Retrieval-Augmented Generation (RAG): From Theory to LangChain Implementation," https://medium.com/data-science/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2

  8. LangChain和LangGraph: Leonie Monigatti, "检索增强生成(RAG):从理论到LangChain实现," https://medium.com/data-science/retrieval-augmented-generation-rag-from-theory-to-langchain-implementation-4e9bd5f6a4f2

  9. Google Cloud Vertex AI RAG Corpus https://cloud.google.com/vertex-ai/generative-ai/docs/rag-engine/manage-your-rag-corpus#corpus-management

  10. Google Cloud Vertex AI RAG语料库 https://cloud.google.com/vertex-ai/generative-ai/docs/rag-engine/manage-your-rag-corpus#corpus-management