RAG和Agent的融合是企业级AI应用的核心竞争力,它将知识检索的准确性与智能决策的灵活性完美结合。掌握三种融合模式,构建真正智能的生产级系统。
时间:35分钟 | 难度:⭐⭐⭐⭐ | Week 3 Day 20
📋 学习目标
- 理解RAG和Agent融合的三种核心模式及其适用场景
- 掌握如何将RAG封装为Agent工具并实现无缝集成
- 学会使用Agent增强RAG的查询重写和多步检索
- 构建统一智能架构实现动态路由和决策
- 实现生产级智能知识库系统并掌握性能优化技巧
- 理解不同融合模式的性能特征和成本权衡
🔗 三种融合模式
RAG和Agent的融合并非简单的功能堆砌,而是需要根据业务场景选择合适的架构模式。
模式对比总览
1. RAG作为Agent工具(RAG as Tool)
- Agent拥有主导权,RAG是众多工具之一
- Agent决定何时调用RAG,如何处理检索结果
- 适合:多功能智能助手,需要在多种能力间切换
2. Agent增强RAG(Agent-enhanced RAG)
- RAG是核心流程,Agent负责优化检索质量
- Agent进行查询改写、结果验证、多轮检索
- 适合:知识密集型应用,检索质量要求高
3. 统一智能架构(Unified Architecture)
- RAG和Agent深度融合,智能路由决策
- 动态选择最优策略,自适应调整流程
- 适合:复杂企业场景,需要高度灵活性
架构演进路径
简单RAG系统
↓
RAG + Agent工具化(快速集成)
↓
Agent增强的RAG(提升质量)
↓
统一智能架构(生产级系统)
🛠️ 模式一:RAG作为Agent工具
将RAG系统封装为Agent的一个工具,让Agent根据用户意图决定是否使用知识检索。
核心实现
import dev.langchain4j.service.tool.Tool;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.stream.Collectors;
@Component
public class KnowledgeTools {
private final EmbeddingStore<TextSegment> embeddingStore;
private final EmbeddingModel embeddingModel;
public KnowledgeTools(
EmbeddingStore<TextSegment> embeddingStore,
EmbeddingModel embeddingModel
) {
this.embeddingStore = embeddingStore;
this.embeddingModel = embeddingModel;
}
@Tool("在企业知识库中搜索相关信息,用于回答产品、技术文档相关问题")
public String searchKnowledgeBase(
@P("用户的搜索查询") String query,
@P("返回结果数量,默认5") Integer maxResults
) {
if (maxResults == null) {
maxResults = 5;
}
// 生成查询向量
Embedding queryEmbedding = embeddingModel.embed(query).content();
// 向量检索
EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(maxResults)
.minScore(0.7)
.build();
EmbeddingSearchResult<TextSegment> searchResult =
embeddingStore.search(searchRequest);
// 格式化检索结果
List<String> contexts = searchResult.matches().stream()
.map(match -> {
TextSegment segment = match.embedded();
double score = match.score();
String source = segment.metadata("source");
return String.format(
"[相关度: %.2f | 来源: %s]\n%s",
score, source, segment.text()
);
})
.collect(Collectors.toList());
if (contexts.isEmpty()) {
return "未找到相关知识库内容";
}
return "检索到以下相关信息:\n\n" + String.join("\n\n---\n\n", contexts);
}
@Tool("搜索产品案例库,查找特定场景的成功案例")
public String searchCaseStudies(
@P("场景描述或行业关键词") String scenario
) {
// 针对案例库的专门检索逻辑
Embedding queryEmbedding = embeddingModel.embed(
"案例研究:" + scenario
).content();
EmbeddingSearchResult<TextSegment> results = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(queryEmbedding)
.maxResults(3)
.filter(metadataKey("type").isEqualTo("case_study"))
.build()
);
return formatCaseStudies(results);
}
private String formatCaseStudies(EmbeddingSearchResult<TextSegment> results) {
return results.matches().stream()
.map(match -> {
TextSegment segment = match.embedded();
String company = segment.metadata("company");
String industry = segment.metadata("industry");
return String.format(
"【%s - %s行业】\n%s",
company, industry, segment.text()
);
})
.collect(Collectors.joining("\n\n"));
}
}
Agent服务集成
import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;
public interface IntelligentAssistant {
@SystemMessage("""
你是一个企业智能助手,能够:
1. 回答产品和技术相关问题(使用知识库工具)
2. 提供案例研究和最佳实践(使用案例库工具)
3. 进行数据分析和计算
4. 处理一般性对话
重要规则:
- 当用户询问具体产品功能、技术细节时,必须使用知识库搜索
- 当用户需要行业案例或成功经验时,使用案例库搜索
- 基于检索结果回答,如果检索无结果,诚实告知
- 不要编造知识库中不存在的信息
""")
String chat(@UserMessage String userMessage);
}
@Service
public class AssistantService {
private final IntelligentAssistant assistant;
public AssistantService(
ChatLanguageModel chatModel,
KnowledgeTools knowledgeTools
) {
this.assistant = AiServices.builder(IntelligentAssistant.class)
.chatLanguageModel(chatModel)
.tools(knowledgeTools)
.chatMemory(MessageWindowChatMemory.withMaxMessages(10))
.build();
}
public String processQuery(String userQuery) {
return assistant.chat(userQuery);
}
}
实战示例:多工具协作
@Component
public class EnhancedKnowledgeTools {
@Tool("搜索技术文档")
public String searchDocs(String query) {
// 技术文档检索逻辑
return performSearch(query, "technical_docs");
}
@Tool("搜索API参考")
public String searchApiReference(String apiName) {
// API文档专门检索
return performSearch(apiName, "api_reference");
}
@Tool("检查产品功能可用性")
public String checkFeatureAvailability(
@P("功能名称") String feature,
@P("产品版本") String version
) {
// 结合版本信息的功能检索
String query = String.format(
"版本 %s 中的 %s 功能",
version, feature
);
return performSearch(query, "feature_matrix");
}
private String performSearch(String query, String category) {
// 统一的检索实现
Embedding embedding = embeddingModel.embed(query).content();
EmbeddingSearchResult<TextSegment> results = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(3)
.filter(metadataKey("category").isEqualTo(category))
.minScore(0.75)
.build()
);
return formatResults(results, category);
}
private String formatResults(
EmbeddingSearchResult<TextSegment> results,
String category
) {
if (results.matches().isEmpty()) {
return "未在" + category + "中找到相关信息";
}
return results.matches().stream()
.map(match -> match.embedded().text())
.collect(Collectors.joining("\n\n---\n\n"));
}
}
优势与限制
优势:
- 实现简单,快速集成
- Agent自主决策何时使用RAG
- 易于添加其他工具和能力
- 适合多样化任务场景
限制:
- Agent可能错误判断是否需要检索
- 无法深度优化检索质量
- 检索策略相对固定
🧠 模式二:Agent增强RAG
使用Agent的推理能力优化RAG的检索流程,提升检索精度和答案质量。
查询重写Agent
import dev.langchain4j.service.AiServices;
public interface QueryRewriteAgent {
@SystemMessage("""
你是一个查询优化专家,负责改写用户查询以提升检索效果。
优化策略:
1. 提取核心关键词和概念
2. 扩展同义词和相关术语
3. 消除歧义,明确查询意图
4. 生成3个优化后的查询变体
返回JSON格式:
{
"original": "原始查询",
"intent": "查询意图分析",
"rewritten": ["查询变体1", "查询变体2", "查询变体3"]
}
""")
String rewriteQuery(@UserMessage String originalQuery);
}
@Service
public class EnhancedRagService {
private final QueryRewriteAgent queryRewriter;
private final EmbeddingStore<TextSegment> embeddingStore;
private final EmbeddingModel embeddingModel;
private final ChatLanguageModel chatModel;
public String query(String userQuery) {
// 步骤1:查询重写
String rewriteResult = queryRewriter.rewriteQuery(userQuery);
QueryRewriteResponse response = parseJson(rewriteResult);
// 步骤2:多查询并行检索
List<String> allQueries = new ArrayList<>();
allQueries.add(response.getOriginal());
allQueries.addAll(response.getRewritten());
List<TextSegment> allResults = allQueries.parallelStream()
.flatMap(query -> retrieveDocuments(query, 3).stream())
.distinct()
.collect(Collectors.toList());
// 步骤3:结果重排序
List<TextSegment> rerankedResults = rerankResults(
userQuery,
allResults
);
// 步骤4:答案生成和验证
String answer = generateAnswer(userQuery, rerankedResults);
// 步骤5:答案验证
return validateAnswer(answer, rerankedResults, userQuery);
}
private List<TextSegment> retrieveDocuments(String query, int maxResults) {
Embedding embedding = embeddingModel.embed(query).content();
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(maxResults)
.minScore(0.7)
.build()
);
return result.matches().stream()
.map(match -> match.embedded())
.collect(Collectors.toList());
}
private List<TextSegment> rerankResults(
String query,
List<TextSegment> candidates
) {
// 使用交叉编码器或LLM进行重排序
return candidates.stream()
.map(segment -> {
double score = calculateRelevanceScore(query, segment);
return new ScoredSegment(segment, score);
})
.sorted(Comparator.comparingDouble(ScoredSegment::getScore).reversed())
.map(ScoredSegment::getSegment)
.limit(5)
.collect(Collectors.toList());
}
private double calculateRelevanceScore(String query, TextSegment segment) {
// 使用LLM评估相关性
String prompt = String.format(
"评估以下文档片段与查询的相关性(0-1分):\n\n" +
"查询:%s\n\n文档:%s\n\n只返回数字分数。",
query, segment.text()
);
String scoreStr = chatModel.generate(prompt);
return Double.parseDouble(scoreStr.trim());
}
private String generateAnswer(
String query,
List<TextSegment> contexts
) {
String contextStr = contexts.stream()
.map(TextSegment::text)
.collect(Collectors.joining("\n\n---\n\n"));
String prompt = String.format("""
基于以下检索到的文档片段回答用户问题:
问题:%s
文档片段:
%s
回答要求:
1. 仅基于提供的文档片段回答
2. 如果文档不足以回答,明确说明
3. 引用具体的文档内容支持你的答案
""",
query, contextStr
);
return chatModel.generate(prompt);
}
private String validateAnswer(
String answer,
List<TextSegment> contexts,
String query
) {
String validationPrompt = String.format("""
验证以下答案的准确性:
原始问题:%s
生成的答案:%s
参考文档:%s
验证要点:
1. 答案是否基于提供的文档?
2. 是否存在事实错误或过度推测?
3. 是否完整回答了问题?
如果答案有问题,返回修正后的答案。
如果答案准确,返回"VALIDATED: " + 原答案。
""",
query,
answer,
contexts.stream()
.map(TextSegment::text)
.collect(Collectors.joining("\n"))
);
String validationResult = chatModel.generate(validationPrompt);
if (validationResult.startsWith("VALIDATED:")) {
return validationResult.substring("VALIDATED:".length()).trim();
}
return validationResult;
}
}
多步骤检索Agent
public interface MultiStepRetrievalAgent {
@SystemMessage("""
你是一个多步骤检索规划专家。对于复杂问题,你需要:
1. 将问题分解为多个子问题
2. 规划检索步骤
3. 确定每步的依赖关系
返回JSON格式:
{
"complexity": "simple|medium|complex",
"steps": [
{
"step": 1,
"query": "子查询",
"purpose": "检索目的",
"depends_on": []
}
]
}
""")
String planRetrieval(@UserMessage String complexQuery);
}
@Service
public class MultiStepRagService {
private final MultiStepRetrievalAgent planningAgent;
private final EmbeddingStore<TextSegment> embeddingStore;
private final EmbeddingModel embeddingModel;
private final ChatLanguageModel chatModel;
public String processComplexQuery(String userQuery) {
// 步骤1:规划检索策略
String planJson = planningAgent.planRetrieval(userQuery);
RetrievalPlan plan = parseRetrievalPlan(planJson);
if ("simple".equals(plan.getComplexity())) {
// 简单查询,直接检索
return performSimpleRetrieval(userQuery);
}
// 步骤2:执行多步检索
Map<Integer, List<TextSegment>> stepResults = new HashMap<>();
Map<Integer, String> stepSummaries = new HashMap<>();
for (RetrievalStep step : plan.getSteps()) {
// 等待依赖步骤完成
waitForDependencies(step.getDependsOn(), stepResults);
// 执行检索
List<TextSegment> results = retrieveForStep(
step,
stepSummaries
);
stepResults.put(step.getStep(), results);
// 生成步骤摘要
String summary = summarizeStepResults(
step.getQuery(),
results
);
stepSummaries.put(step.getStep(), summary);
}
// 步骤3:整合所有结果
return synthesizeAnswer(userQuery, stepResults, stepSummaries);
}
private List<TextSegment> retrieveForStep(
RetrievalStep step,
Map<Integer, String> previousSummaries
) {
// 如果有依赖,将之前的结果摘要加入上下文
String enhancedQuery = step.getQuery();
for (int dep : step.getDependsOn()) {
enhancedQuery += "\n上下文:" + previousSummaries.get(dep);
}
Embedding embedding = embeddingModel.embed(enhancedQuery).content();
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(5)
.minScore(0.7)
.build()
);
return result.matches().stream()
.map(match -> match.embedded())
.collect(Collectors.toList());
}
private String summarizeStepResults(
String query,
List<TextSegment> results
) {
String context = results.stream()
.map(TextSegment::text)
.limit(3)
.collect(Collectors.joining("\n\n"));
String prompt = String.format(
"总结以下文档片段对查询「%s」的关键信息(50字内):\n\n%s",
query, context
);
return chatModel.generate(prompt);
}
private String synthesizeAnswer(
String originalQuery,
Map<Integer, List<TextSegment>> allResults,
Map<Integer, String> summaries
) {
String allContext = allResults.values().stream()
.flatMap(List::stream)
.map(TextSegment::text)
.distinct()
.collect(Collectors.joining("\n\n---\n\n"));
String stepSummary = summaries.entrySet().stream()
.sorted(Map.Entry.comparingByKey())
.map(e -> "步骤" + e.getKey() + ": " + e.getValue())
.collect(Collectors.joining("\n"));
String prompt = String.format("""
基于多步检索的结果,综合回答用户问题:
原始问题:%s
检索步骤总结:
%s
所有检索文档:
%s
请提供全面准确的答案。
""",
originalQuery, stepSummary, allContext
);
return chatModel.generate(prompt);
}
private void waitForDependencies(
List<Integer> dependencies,
Map<Integer, List<TextSegment>> completedSteps
) {
for (int dep : dependencies) {
while (!completedSteps.containsKey(dep)) {
try {
Thread.sleep(100);
} catch (InterruptedException e) {
Thread.currentThread().interrupt();
}
}
}
}
private String performSimpleRetrieval(String query) {
// 简单查询的快速路径
List<TextSegment> results = retrieveDocuments(query, 5);
return generateAnswer(query, results);
}
}
自我反思和迭代检索
public interface ReflectionAgent {
@SystemMessage("""
你是一个答案质量评估专家。评估RAG系统生成的答案:
评估维度:
1. 答案是否完整回答了问题?
2. 是否基于检索文档,没有幻觉?
3. 信息是否足够详细?
4. 是否需要更多检索?
返回JSON:
{
"quality_score": 0-10,
"is_sufficient": true/false,
"issues": ["问题1", "问题2"],
"next_query": "如果需要更多信息,建议的下一个查询"
}
""")
String evaluateAnswer(
@UserMessage String evaluation
);
}
@Service
public class IterativeRagService {
private final ReflectionAgent reflectionAgent;
private final EnhancedRagService ragService;
private static final int MAX_ITERATIONS = 3;
public String queryWithReflection(String userQuery) {
List<String> answerHistory = new ArrayList<>();
String currentAnswer = "";
for (int iteration = 0; iteration < MAX_ITERATIONS; iteration++) {
// 执行RAG查询
currentAnswer = ragService.query(userQuery);
answerHistory.add(currentAnswer);
// 评估答案质量
String evaluationContext = String.format("""
用户问题:%s
当前答案:%s
迭代次数:%d/%d
""",
userQuery, currentAnswer, iteration + 1, MAX_ITERATIONS
);
String evalJson = reflectionAgent.evaluateAnswer(evaluationContext);
EvaluationResult eval = parseEvaluation(evalJson);
// 如果答案足够好,返回
if (eval.isSufficient() || eval.getQualityScore() >= 8) {
return currentAnswer;
}
// 如果需要更多信息,执行补充检索
if (eval.getNextQuery() != null && !eval.getNextQuery().isEmpty()) {
String supplemental = ragService.query(eval.getNextQuery());
// 整合补充信息
currentAnswer = synthesizeWithSupplemental(
userQuery,
currentAnswer,
supplemental,
eval.getIssues()
);
}
}
return currentAnswer;
}
private String synthesizeWithSupplemental(
String query,
String originalAnswer,
String supplemental,
List<String> issues
) {
String prompt = String.format("""
改进原始答案:
用户问题:%s
原始答案:%s
发现的问题:%s
补充信息:%s
请生成改进后的完整答案。
""",
query,
originalAnswer,
String.join("; ", issues),
supplemental
);
return chatModel.generate(prompt);
}
}
🏗️ 模式三:统一智能架构
将RAG和Agent深度融合,构建能够自适应决策的智能系统。
智能路由器
import dev.langchain4j.service.AiServices;
public interface IntelligentRouter {
@SystemMessage("""
你是一个智能路由决策器,负责为用户请求选择最优处理策略。
可用策略:
1. DIRECT_ANSWER - 直接回答,无需检索(常识性问题)
2. SIMPLE_RAG - 简单RAG检索(单一事实查询)
3. ENHANCED_RAG - Agent增强RAG(需要深度检索)
4. AGENT_TOOL - Agent主导+RAG工具(多步骤任务)
5. HYBRID - 混合策略(复杂分析任务)
返回JSON:
{
"strategy": "策略名称",s
"confidence": 0-1,
"reasoning": "选择理由",
"parameters": {
"max_iterations": 3,
"retrieval_depth": "shallow|deep"
}
}
""")
String routeRequest(@UserMessage String request);
}
@Service
public class UnifiedIntelligentSystem {
private final IntelligentRouter router;
private final ChatLanguageModel directAnswerModel;
private final EnhancedRagService enhancedRag;
private final AssistantService agentService;
private final HybridService hybridService;
public IntelligentResponse processRequest(String userRequest) {
// 步骤1:智能路由决策
String routeJson = router.routeRequest(userRequest);
RoutingDecision decision = parseRouting(routeJson);
// 步骤2:执行相应策
String answer;
Map<String, Object> metadata = new HashMap<>();
switch (decision.getStrategy()) {
case DIRECT_ANSWER:
answer = directAnswerModel.generate(userRequest);
metadata.put("strategy", "direct");
metadata.put("retrieval_used", false);
break;
case SIMPLE_RAG:
answer = performSimpleRag(userRequest);
metadata.put("strategy", "simple_rag");
metadata.put("retrieval_used", true);
break;
case ENHANCED_RAG:
answer = enhancedRag.query(userRequest);
metadata.put("strategy", "enhanced_rag");
metadata.put("retrieval_used", true);
metadata.put("enhancement", "query_rewrite,rerank,validate");
break;
case AGENT_TOOL:
answer = agentService.processQuery(userRequest);
metadata.put("strategy", "agent_tool");
metadata.put("agent_controlled", true);
break;
case HYBRID:
answer = hybridService.processHybrid(
userRequest,
decision.getParameters()
);
metadata.put("strategy", "hybrid");
metadata.put("multi_phase", true);
break;
default:
answer = "无法确定处理策略";
metadata.put("strategy", "error");
}
// 步骤3:构建响应
return IntelligentResponse.builder()
.answer(answer)
.strategy(decision.getStrategy())
.confidence(decision.getConfidence())
.reasoning(decision.getReasoning())
.metadata(metadata)
.build();
}
private String performSimpleRag(String query) {
// 快速RAG路径
Embedding embedding = embeddingModel.embed(query).content();
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(3)
.minScore(0.75)
.build()
);
String context = result.matches().stream()
.map(match -> match.embedded().text())
.collect(Collectors.joining("\n\n"));
return directAnswerModel.generate(
"基于以下信息回答:" + query + "\n\n" + context
);
}
}
@Data
@Builder
class IntelligentResponse {
private String answer;
private String strategy;
private double confidence;
private String reasoning;
private Map<String, Object> metadata;
}
混合处理服务
@Service
public class HybridService {
private final EnhancedRagService ragService;
private final ChatLanguageModel chatModel;
private final AgentExecutor agentExecutor;
public String processHybrid(
String userRequest,
Map<String, Object> parameters
) {
// 阶段1:RAG检索获取背景知识
String ragContext = ragService.query(userRequest);
// 阶段2:Agent分析和推理
String analysisPrompt = String.format("""
基于以下检索到的背景知识,分析用户请求:
用户请求:%s
背景知识:
%s
请提供:
1. 关键信息提取
2. 问题分解
3. 可能需要的额外行动
""",
userRequest, ragContext
);
String analysis = chatModel.generate(analysisPrompt);
// 阶段3:如果需要,执行Agent工具调用
if (requiresToolExecution(analysis)) {
return agentExecutor.executeWithContext(
userRequest,
ragContext,
analysis
);
}
// 阶段4:综合答案生成
String synthesisPrompt = String.format("""
综合以下信息,为用户提供完整答案:
用户请求:%s
检索知识:%s
分析结果:%s
提供详细、准确、实用的答案。
""",
userRequest, ragContext, analysis
);
return chatModel.generate(synthesisPrompt);
}
private boolean requiresToolExecution(String analysis) {
// 分析是否需要调用工具
return analysis.contains("需要执行") ||
analysis.contains("调用工具") ||
analysis.contains("额外行动");
}
}
自适应系统
@Service
public class AdaptiveIntelligentSystem {
private final UnifiedIntelligentSystem intelligentSystem;
private final MetricsCollector metricsCollector;
public String process(String userRequest) {
// 执行请求
IntelligentResponse response = intelligentSystem.processRequest(
userRequest
);
// 收集指标
metricsCollector.record(
userRequest,
response.getStrategy(),
response.getConfidence()
);
// 如果置信度低,尝试备用策略
if (response.getConfidence() < 0.6) {
response = tryAlternativeStrategy(userRequest, response);
}
return response.getAnswer();
}
private IntelligentResponse tryAlternativeStrategy(
String request,
IntelligentResponse originalResponse
) {
// 根据原策略选择备用策略
String originalStrategy = originalResponse.getStrategy();
if ("SIMPLE_RAG".equals(originalStrategy)) {
// 升级到增强RAG
return executeEnhancedRag(request);
} else if ("ENHANCED_RAG".equals(originalStrategy)) {
// 升级到混合策略
return executeHybrid(request);
}
return originalResponse;
}
}
📊 三种模式对比
| 维度 | RAG作为工具 | Agent增强RAG | 统一架构 |
|---|---|---|---|
| 实现复杂度 | ⭐⭐ 简单 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐ 复杂 |
| 灵活性 | ⭐⭐⭐⭐ 高 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐⭐ 极高 |
| 检索质量 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐⭐ 极高 | ⭐⭐⭐⭐ 高 |
| 推理能力 | ⭐⭐⭐⭐⭐ 极高 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐⭐ 极高 |
| 性能开销 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐ 高 | ⭐⭐⭐⭐⭐ 很高 |
| Token消耗 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐ 高 | ⭐⭐⭐⭐⭐ 很高 |
| 响应延迟 | ⭐⭐⭐⭐ 快 | ⭐⭐⭐ 中等 | ⭐⭐ 慢 |
| 可扩展性 | ⭐⭐⭐⭐⭐ 极高 | ⭐⭐⭐ 中等 | ⭐⭐⭐⭐ 高 |
适用场景选择
RAG作为工具
- ✅ 多功能智能助手
- ✅ 需要多种能力(计算、查询、API调用等)
- ✅ 知识检索是众多功能之一
- ✅ 快速迭代和原型开发
- ❌ 检索质量要求极高
- ❌ 纯知识问答场景
Agent增强RAG
- ✅ 知识密集型应用
- ✅ 检索准确性至关重要
- ✅ 复杂查询需要多步检索
- ✅ 答案质量要求高
- ❌ 响应延迟敏感
- ❌ 成本预算有限
统一架构
- ✅ 企业级复杂应用
- ✅ 需要高度灵活性
- ✅ 多样化任务场景
- ✅ 可接受较高延迟和成本
- ❌ 简单问答场景
- ❌ 团队缺乏维护能力
🚀 生产级实战:智能知识库系统
完整实现一个企业级智能知识库,融合三种模式的优势。
核心架构
@Service
public class ProductionKnowledgeSystem {
private final IntelligentRouter router;
private final RagEngine ragEngine;
private final AgentOrchestrator agentOrchestrator;
private final CacheManager cacheManager;
private final MetricsService metricsService;
private final FallbackHandler fallbackHandler;
public KnowledgeResponse query(KnowledgeRequest request) {
long startTime = System.currentTimeMillis();
String requestId = UUID.randomUUID().toString();
try {
// 步骤1:检查缓存
Optional<String> cachedAnswer = cacheManager.get(
request.getQuery()
);
if (cachedAnswer.isPresent()) {
metricsService.recordCacheHit(requestId);
return buildResponse(
cachedAnswer.get(),
"CACHE",
System.currentTimeMillis() - startTime
);
}
// 步骤2:智能路由
RoutingDecision routing = router.route(request);
metricsService.recordRouting(requestId, routing);
// 步骤3:执行处理
String answer = executeStrategy(request, routing);
// 步骤4:质量验证
ValidationResult validation = validateAnswer(
request,
answer
);
if (!validation.isValid()) {
// 降级策略
answer = fallbackHandler.handle(request, validation);
}
// 步骤5:缓存结果
cacheManager.put(request.getQuery(), answer);
// 步骤6:记录指标
long duration = System.currentTimeMillis() - startTime;
metricsService.recordSuccess(requestId, duration);
return buildResponse(answer, routing.getStrategy(), duration);
} catch (Exception e) {
metricsService.recordError(requestId, e);
return handleError(request, e);
}
}
private String executeStrategy(
KnowledgeRequest request,
RoutingDecision routing
) {
return switch (routing.getStrategy()) {
case SIMPLE_RAG -> ragEngine.simpleRetrieval(request);
case ENHANCED_RAG -> ragEngine.enhancedRetrieval(request);
case AGENT_DRIVEN -> agentOrchestrator.process(request);
case HYBRID -> executeHybridStrategy(request, routing);
default -> throw new IllegalStateException(
"Unknown strategy: " + routing.getStrategy()
);
};
}
private String executeHybridStrategy(
KnowledgeRequest request,
RoutingDecision routing
) {
// 并行执行RAG和Agent
CompletableFuture<String> ragFuture = CompletableFuture.supplyAsync(
() -> ragEngine.enhancedRetrieval(request)
);
CompletableFuture<String> agentFuture = CompletableFuture.supplyAsync(
() -> agentOrchestrator.analyze(request)
);
// 等待两者完成
CompletableFuture.allOf(ragFuture, agentFuture).join();
try {
String ragResult = ragFuture.get();
String agentAnalysis = agentFuture.get();
// 综合两者结果
return synthesizeResults(request, ragResult, agentAnalysis);
} catch (Exception e) {
throw new RuntimeException("Hybrid execution failed", e);
}
}
private ValidationResult validateAnswer(
KnowledgeRequest request,
String answer
) {
// 验证答案质量
List<String> issues = new ArrayList<>();
// 检查1:答案长度合理性
if (answer.length() < 20) {
issues.add("答案过短");
}
// 检查2:是否包含来源引用
if (!answer.contains("根据") && !answer.contains("文档")) {
issues.add("缺少来源引用");
}
// 检查3:使用LLM进行语义验证
boolean semanticValid = performSemanticValidation(
request.getQuery(),
answer
);
if (!semanticValid) {
issues.add("语义一致性问题");
}
return ValidationResult.builder()
.isValid(issues.isEmpty())
.issues(issues)
.confidence(calculateConfidence(answer, issues))
.build();
}
private boolean performSemanticValidation(String query, String answer) {
String prompt = String.format("""
判断以下答案是否真正回答了问题(只返回true或false):
问题:%s
答案:%s
""",
query, answer
);
String result = chatModel.generate(prompt).trim();
return "true".equalsIgnoreCase(result);
}
private KnowledgeResponse handleError(
KnowledgeRequest request,
Exception error
) {
logger.error("Failed to process request: " + request.getQuery(), error);
// 尝试降级处理
try {
String fallbackAnswer = fallbackHandler.generateFallback(request);
return KnowledgeResponse.builder()
.answer(fallbackAnswer)
.strategy("FALLBACK")
.success(false)
.error(error.getMessage())
.build();
} catch (Exception fallbackError) {
return KnowledgeResponse.builder()
.answer("抱歉,系统暂时无法处理您的请求,请稍后重试。")
.strategy("ERROR")
.success(false)
.error(error.getMessage())
.build();
}
}
}
缓存策略
@Component
public class IntelligentCacheManager {
private final RedisTemplate<String, String> redisTemplate;
private final EmbeddingModel embeddingModel;
private final ChatLanguageModel chatModel;
private static final String CACHE_PREFIX = "knowledge_cache:";
private static final Duration DEFAULT_TTL = Duration.ofHours(24);
public Optional<String> get(String query) {
// 策略1:精确匹配
String exactKey = CACHE_PREFIX + hashQuery(query);
String exactMatch = redisTemplate.opsForValue().get(exactKey);
if (exactMatch != null) {
return Optional.of(exactMatch);
}
// 策略2:语义相似匹配
Optional<String> semanticMatch = findSemanticMatch(query);
if (semanticMatch.isPresent()) {
return semanticMatch;
}
return Optional.empty();
}
public void put(String query, String answer) {
String key = CACHE_PREFIX + hashQuery(query);
// 存储精确匹配
redisTemplate.opsForValue().set(key, answer, DEFAULT_TTL);
// 存储语义向量索引
Embedding embedding = embeddingModel.embed(query).content();
storeSemantcIndex(query, embedding, key);
}
private Optional<String> findSemanticMatch(String query) {
Embedding queryEmbedding = embeddingModel.embed(query).content();
// 在Redis中查找相似向量
List<SimilarQuery> similar = searchSimilarQueries(
queryEmbedding,
0.95 // 高相似度阈值
);
if (similar.isEmpty()) {
return Optional.empty();
}
// 使用LLM验证是否可以复用缓存
SimilarQuery best = similar.get(0);
boolean canReuse = verifyQueryEquivalence(query, best.getQuery());
if (canReuse) {
String cachedAnswer = redisTemplate.opsForValue().get(
best.getCacheKey()
);
return Optional.ofNullable(cachedAnswer);
}
return Optional.empty();
}
private boolean verifyQueryEquivalence(
String newQuery,
String cachedQuery
) {
String prompt = String.format("""
判断以下两个问题是否实质相同(只返回true或false):
问题1:%s
问题2:%s
""",
newQuery, cachedQuery
);
String result = chatModel.generate(prompt).trim();
return "true".equalsIgnoreCase(result);
}
private String hashQuery(String query) {
return DigestUtils.sha256Hex(query.toLowerCase().trim());
}
}
监控和指标
@Service
public class MetricsService {
private final MeterRegistry meterRegistry;
public void recordRouting(String requestId, RoutingDecision routing) {
Counter.builder("knowledge.routing")
.tag("strategy", routing.getStrategy())
.tag("confidence", String.format("%.1f", routing.getConfidence()))
.register(meterRegistry)
.increment();
}
public void recordCacheHit(String requestId) {
Counter.builder("knowledge.cache")
.tag("result", "hit")
.register(meterRegistry)
.increment();
}
public void recordSuccess(String requestId, long durationMs) {
Timer.builder("knowledge.query.duration")
.tag("result", "success")
.register(meterRegistry)
.record(Duration.ofMillis(durationMs));
}
public void recordError(String requestId, Exception error) {
Counter.builder("knowledge.errors")
.tag("type", error.getClass().getSimpleName())
.register(meterRegistry)
.increment();
}
public MetricsSummary getSummary() {
return MetricsSummary.builder()
.totalQueries(getTotalQueries())
.cacheHitRate(getCacheHitRate())
.averageLatency(getAverageLatency())
.errorRate(getErrorRate())
.strategyDistribution(getStrategyDistribution())
.build();
}
}
⚡ 性能优化
并行检索
@Service
public class ParallelRetrievalService {
private final ExecutorService executorService;
private final List<EmbeddingStore<TextSegment>> shardedStores;
public List<TextSegment> parallelRetrieve(
String query,
int maxResults
) {
Embedding queryEmbedding = embeddingModel.embed(query).content();
// 并行查询所有分片
List<CompletableFuture<List<TextSegment>>> futures =
shardedStores.stream()
.map(store -> CompletableFuture.supplyAsync(
() -> retrieveFromStore(store, queryEmbedding, maxResults),
executorService
))
.collect(Collectors.toList());
// 等待所有查询完成
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
.join();
// 合并和排序结果
return futures.stream()
.map(CompletableFuture::join)
.flatMap(List::stream)
.sorted(Comparator.comparingDouble(TextSegment::getScore).reversed())
.limit(maxResults)
.collect(Collectors.toList());
}
private List<TextSegment> retrieveFromStore(
EmbeddingStore<TextSegment> store,
Embedding embedding,
int maxResults
) {
EmbeddingSearchResult<TextSegment> result = store.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(maxResults)
.minScore(0.7)
.build()
);
return result.matches().stream()
.map(match -> match.embedded())
.collect(Collectors.toList());
}
}
异步处理
@Service
public class AsyncKnowledgeService {
@Async("knowledgeTaskExecutor")
public CompletableFuture<String> queryAsync(String userQuery) {
try {
String result = intelligentSystem.processRequest(userQuery)
.getAnswer();
return CompletableFuture.completedFuture(result);
} catch (Exception e) {
return CompletableFuture.failedFuture(e);
}
}
@Bean(name = "knowledgeTaskExecutor")
public Executor knowledgeTaskExecutor() {
ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
executor.setCorePoolSize(10);
executor.setMaxPoolSize(50);
executor.setQueueCapacity(100);
executor.setThreadNamePrefix("knowledge-async-");
executor.setRejectedExecutionHandler(
new ThreadPoolExecutor.CallerRunsPolicy()
);
executor.initialize();
return executor;
}
}
批量处理优化
@Service
public class BatchProcessingService {
public List<String> batchQuery(List<String> queries) {
// 批量生成嵌入向量
List<Embedding> embeddings = embeddingModel.embedAll(queries)
.content();
// 并行检索
List<CompletableFuture<String>> futures =
IntStream.range(0, queries.size())
.mapToObj(i -> CompletableFuture.supplyAsync(
() -> processWithEmbedding(
queries.get(i),
embeddings.get(i)
)
))
.collect(Collectors.toList());
// 等待完成
return futures.stream()
.map(CompletableFuture::join)
.collect(Collectors.toList());
}
private String processWithEmbedding(String query, Embedding embedding) {
// 使用预计算的嵌入向量进行检索
EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
EmbeddingSearchRequest.builder()
.queryEmbedding(embedding)
.maxResults(5)
.build()
);
return generateAnswer(query, result);
}
}
💡 实战练习
练习1:实现查询意图分类器
创建一个Agent,能够分类用户查询的意图(事实查询、流程查询、比较分析等),并根据意图选择不同的检索策略。
要求:
- 至少支持5种查询意图
- 为每种意图设计专门的检索逻辑
- 实现置信度评估
提示:
public interface QueryIntentClassifier {
@SystemMessage("...")
String classifyIntent(@UserMessage String query);
}
练习2:构建多轮对话RAG系统
实现一个支持多轮对话的RAG系统,能够:
- 理解上下文和指代关系
- 基于历史对话优化检索
- 追踪对话主题
提示:
- 使用ChatMemory保存对话历史
- 实现上下文感知的查询重写
- 维护对话主题向量
练习3:实现自适应检索深度
根据问题复杂度和当前检索结果质量,动态调整检索深度(检索轮次、每轮结果数)。
提示:
public class AdaptiveDepthRetriever {
public String retrieve(String query) {
int depth = estimateRequiredDepth(query);
// 实现自适应逻辑
}
}
最后更新:2026-03-09 字数统计:5,200 字 预计阅读时间:35 分钟