20-RAG和Agent融合最佳实践

6 阅读16分钟

RAG和Agent的融合是企业级AI应用的核心竞争力,它将知识检索的准确性与智能决策的灵活性完美结合。掌握三种融合模式,构建真正智能的生产级系统。

时间:35分钟 | 难度:⭐⭐⭐⭐ | Week 3 Day 20


📋 学习目标

  • 理解RAG和Agent融合的三种核心模式及其适用场景
  • 掌握如何将RAG封装为Agent工具并实现无缝集成
  • 学会使用Agent增强RAG的查询重写和多步检索
  • 构建统一智能架构实现动态路由和决策
  • 实现生产级智能知识库系统并掌握性能优化技巧
  • 理解不同融合模式的性能特征和成本权衡

🔗 三种融合模式

RAG和Agent的融合并非简单的功能堆砌,而是需要根据业务场景选择合适的架构模式。

模式对比总览

1. RAG作为Agent工具(RAG as Tool)

  • Agent拥有主导权,RAG是众多工具之一
  • Agent决定何时调用RAG,如何处理检索结果
  • 适合:多功能智能助手,需要在多种能力间切换

2. Agent增强RAG(Agent-enhanced RAG)

  • RAG是核心流程,Agent负责优化检索质量
  • Agent进行查询改写、结果验证、多轮检索
  • 适合:知识密集型应用,检索质量要求高

3. 统一智能架构(Unified Architecture)

  • RAG和Agent深度融合,智能路由决策
  • 动态选择最优策略,自适应调整流程
  • 适合:复杂企业场景,需要高度灵活性

架构演进路径

简单RAG系统
    ↓
RAG + Agent工具化(快速集成)
    ↓
Agent增强的RAG(提升质量)
    ↓
统一智能架构(生产级系统)

🛠️ 模式一:RAG作为Agent工具

将RAG系统封装为Agent的一个工具,让Agent根据用户意图决定是否使用知识检索。

核心实现

import dev.langchain4j.service.tool.Tool;
import dev.langchain4j.data.document.Document;
import dev.langchain4j.data.segment.TextSegment;
import dev.langchain4j.store.embedding.EmbeddingSearchRequest;
import dev.langchain4j.store.embedding.EmbeddingSearchResult;
import org.springframework.stereotype.Component;
import java.util.List;
import java.util.stream.Collectors;

@Component
public class KnowledgeTools {

    private final EmbeddingStore<TextSegment> embeddingStore;
    private final EmbeddingModel embeddingModel;

    public KnowledgeTools(
        EmbeddingStore<TextSegment> embeddingStore,
        EmbeddingModel embeddingModel
    ) {
        this.embeddingStore = embeddingStore;
        this.embeddingModel = embeddingModel;
    }

    @Tool("在企业知识库中搜索相关信息,用于回答产品、技术文档相关问题")
    public String searchKnowledgeBase(
        @P("用户的搜索查询") String query,
        @P("返回结果数量,默认5") Integer maxResults
    ) {
        if (maxResults == null) {
            maxResults = 5;
        }

        // 生成查询向量
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        // 向量检索
        EmbeddingSearchRequest searchRequest = EmbeddingSearchRequest.builder()
            .queryEmbedding(queryEmbedding)
            .maxResults(maxResults)
            .minScore(0.7)
            .build();

        EmbeddingSearchResult<TextSegment> searchResult =
            embeddingStore.search(searchRequest);

        // 格式化检索结果
        List<String> contexts = searchResult.matches().stream()
            .map(match -> {
                TextSegment segment = match.embedded();
                double score = match.score();
                String source = segment.metadata("source");
                return String.format(
                    "[相关度: %.2f | 来源: %s]\n%s",
                    score, source, segment.text()
                );
            })
            .collect(Collectors.toList());

        if (contexts.isEmpty()) {
            return "未找到相关知识库内容";
        }

        return "检索到以下相关信息:\n\n" + String.join("\n\n---\n\n", contexts);
    }

    @Tool("搜索产品案例库,查找特定场景的成功案例")
    public String searchCaseStudies(
        @P("场景描述或行业关键词") String scenario
    ) {
        // 针对案例库的专门检索逻辑
        Embedding queryEmbedding = embeddingModel.embed(
            "案例研究:" + scenario
        ).content();

        EmbeddingSearchResult<TextSegment> results = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(queryEmbedding)
                .maxResults(3)
                .filter(metadataKey("type").isEqualTo("case_study"))
                .build()
        );

        return formatCaseStudies(results);
    }

    private String formatCaseStudies(EmbeddingSearchResult<TextSegment> results) {
        return results.matches().stream()
            .map(match -> {
                TextSegment segment = match.embedded();
                String company = segment.metadata("company");
                String industry = segment.metadata("industry");
                return String.format(
                    "【%s - %s行业】\n%s",
                    company, industry, segment.text()
                );
            })
            .collect(Collectors.joining("\n\n"));
    }
}

Agent服务集成

import dev.langchain4j.service.AiServices;
import dev.langchain4j.service.SystemMessage;
import dev.langchain4j.service.UserMessage;
import dev.langchain4j.memory.chat.MessageWindowChatMemory;

public interface IntelligentAssistant {

    @SystemMessage("""
        你是一个企业智能助手,能够:
        1. 回答产品和技术相关问题(使用知识库工具)
        2. 提供案例研究和最佳实践(使用案例库工具)
        3. 进行数据分析和计算
        4. 处理一般性对话

        重要规则:
        - 当用户询问具体产品功能、技术细节时,必须使用知识库搜索
        - 当用户需要行业案例或成功经验时,使用案例库搜索
        - 基于检索结果回答,如果检索无结果,诚实告知
        - 不要编造知识库中不存在的信息
        """)
    String chat(@UserMessage String userMessage);
}

@Service
public class AssistantService {

    private final IntelligentAssistant assistant;

    public AssistantService(
        ChatLanguageModel chatModel,
        KnowledgeTools knowledgeTools
    ) {
        this.assistant = AiServices.builder(IntelligentAssistant.class)
            .chatLanguageModel(chatModel)
            .tools(knowledgeTools)
            .chatMemory(MessageWindowChatMemory.withMaxMessages(10))
            .build();
    }

    public String processQuery(String userQuery) {
        return assistant.chat(userQuery);
    }
}

实战示例:多工具协作

@Component
public class EnhancedKnowledgeTools {

    @Tool("搜索技术文档")
    public String searchDocs(String query) {
        // 技术文档检索逻辑
        return performSearch(query, "technical_docs");
    }

    @Tool("搜索API参考")
    public String searchApiReference(String apiName) {
        // API文档专门检索
        return performSearch(apiName, "api_reference");
    }

    @Tool("检查产品功能可用性")
    public String checkFeatureAvailability(
        @P("功能名称") String feature,
        @P("产品版本") String version
    ) {
        // 结合版本信息的功能检索
        String query = String.format(
            "版本 %s 中的 %s 功能",
            version, feature
        );
        return performSearch(query, "feature_matrix");
    }

    private String performSearch(String query, String category) {
        // 统一的检索实现
        Embedding embedding = embeddingModel.embed(query).content();

        EmbeddingSearchResult<TextSegment> results = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(embedding)
                .maxResults(3)
                .filter(metadataKey("category").isEqualTo(category))
                .minScore(0.75)
                .build()
        );

        return formatResults(results, category);
    }

    private String formatResults(
        EmbeddingSearchResult<TextSegment> results,
        String category
    ) {
        if (results.matches().isEmpty()) {
            return "未在" + category + "中找到相关信息";
        }

        return results.matches().stream()
            .map(match -> match.embedded().text())
            .collect(Collectors.joining("\n\n---\n\n"));
    }
}

优势与限制

优势:

  • 实现简单,快速集成
  • Agent自主决策何时使用RAG
  • 易于添加其他工具和能力
  • 适合多样化任务场景

限制:

  • Agent可能错误判断是否需要检索
  • 无法深度优化检索质量
  • 检索策略相对固定

🧠 模式二:Agent增强RAG

使用Agent的推理能力优化RAG的检索流程,提升检索精度和答案质量。

查询重写Agent

import dev.langchain4j.service.AiServices;

public interface QueryRewriteAgent {

    @SystemMessage("""
        你是一个查询优化专家,负责改写用户查询以提升检索效果。

        优化策略:
        1. 提取核心关键词和概念
        2. 扩展同义词和相关术语
        3. 消除歧义,明确查询意图
        4. 生成3个优化后的查询变体

        返回JSON格式:
        {
            "original": "原始查询",
            "intent": "查询意图分析",
            "rewritten": ["查询变体1", "查询变体2", "查询变体3"]
        }
        """)
    String rewriteQuery(@UserMessage String originalQuery);
}

@Service
public class EnhancedRagService {

    private final QueryRewriteAgent queryRewriter;
    private final EmbeddingStore<TextSegment> embeddingStore;
    private final EmbeddingModel embeddingModel;
    private final ChatLanguageModel chatModel;

    public String query(String userQuery) {
        // 步骤1:查询重写
        String rewriteResult = queryRewriter.rewriteQuery(userQuery);
        QueryRewriteResponse response = parseJson(rewriteResult);

        // 步骤2:多查询并行检索
        List<String> allQueries = new ArrayList<>();
        allQueries.add(response.getOriginal());
        allQueries.addAll(response.getRewritten());

        List<TextSegment> allResults = allQueries.parallelStream()
            .flatMap(query -> retrieveDocuments(query, 3).stream())
            .distinct()
            .collect(Collectors.toList());

        // 步骤3:结果重排序
        List<TextSegment> rerankedResults = rerankResults(
            userQuery,
            allResults
        );

        // 步骤4:答案生成和验证
        String answer = generateAnswer(userQuery, rerankedResults);

        // 步骤5:答案验证
        return validateAnswer(answer, rerankedResults, userQuery);
    }

    private List<TextSegment> retrieveDocuments(String query, int maxResults) {
        Embedding embedding = embeddingModel.embed(query).content();

        EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(embedding)
                .maxResults(maxResults)
                .minScore(0.7)
                .build()
        );

        return result.matches().stream()
            .map(match -> match.embedded())
            .collect(Collectors.toList());
    }

    private List<TextSegment> rerankResults(
        String query,
        List<TextSegment> candidates
    ) {
        // 使用交叉编码器或LLM进行重排序
        return candidates.stream()
            .map(segment -> {
                double score = calculateRelevanceScore(query, segment);
                return new ScoredSegment(segment, score);
            })
            .sorted(Comparator.comparingDouble(ScoredSegment::getScore).reversed())
            .map(ScoredSegment::getSegment)
            .limit(5)
            .collect(Collectors.toList());
    }

    private double calculateRelevanceScore(String query, TextSegment segment) {
        // 使用LLM评估相关性
        String prompt = String.format(
            "评估以下文档片段与查询的相关性(0-1分):\n\n" +
            "查询:%s\n\n文档:%s\n\n只返回数字分数。",
            query, segment.text()
        );

        String scoreStr = chatModel.generate(prompt);
        return Double.parseDouble(scoreStr.trim());
    }

    private String generateAnswer(
        String query,
        List<TextSegment> contexts
    ) {
        String contextStr = contexts.stream()
            .map(TextSegment::text)
            .collect(Collectors.joining("\n\n---\n\n"));

        String prompt = String.format("""
            基于以下检索到的文档片段回答用户问题:

            问题:%s

            文档片段:
            %s

            回答要求:
            1. 仅基于提供的文档片段回答
            2. 如果文档不足以回答,明确说明
            3. 引用具体的文档内容支持你的答案
            """,
            query, contextStr
        );

        return chatModel.generate(prompt);
    }

    private String validateAnswer(
        String answer,
        List<TextSegment> contexts,
        String query
    ) {
        String validationPrompt = String.format("""
            验证以下答案的准确性:

            原始问题:%s

            生成的答案:%s

            参考文档:%s

            验证要点:
            1. 答案是否基于提供的文档?
            2. 是否存在事实错误或过度推测?
            3. 是否完整回答了问题?

            如果答案有问题,返回修正后的答案。
            如果答案准确,返回"VALIDATED: " + 原答案。
            """,
            query,
            answer,
            contexts.stream()
                .map(TextSegment::text)
                .collect(Collectors.joining("\n"))
        );

        String validationResult = chatModel.generate(validationPrompt);

        if (validationResult.startsWith("VALIDATED:")) {
            return validationResult.substring("VALIDATED:".length()).trim();
        }

        return validationResult;
    }
}

多步骤检索Agent

public interface MultiStepRetrievalAgent {

    @SystemMessage("""
        你是一个多步骤检索规划专家。对于复杂问题,你需要:
        1. 将问题分解为多个子问题
        2. 规划检索步骤
        3. 确定每步的依赖关系

        返回JSON格式:
        {
            "complexity": "simple|medium|complex",
            "steps": [
                {
                    "step": 1,
                    "query": "子查询",
                    "purpose": "检索目的",
                    "depends_on": []
                }
            ]
        }
        """)
    String planRetrieval(@UserMessage String complexQuery);
}

@Service
public class MultiStepRagService {

    private final MultiStepRetrievalAgent planningAgent;
    private final EmbeddingStore<TextSegment> embeddingStore;
    private final EmbeddingModel embeddingModel;
    private final ChatLanguageModel chatModel;

    public String processComplexQuery(String userQuery) {
        // 步骤1:规划检索策略
        String planJson = planningAgent.planRetrieval(userQuery);
        RetrievalPlan plan = parseRetrievalPlan(planJson);

        if ("simple".equals(plan.getComplexity())) {
            // 简单查询,直接检索
            return performSimpleRetrieval(userQuery);
        }

        // 步骤2:执行多步检索
        Map<Integer, List<TextSegment>> stepResults = new HashMap<>();
        Map<Integer, String> stepSummaries = new HashMap<>();

        for (RetrievalStep step : plan.getSteps()) {
            // 等待依赖步骤完成
            waitForDependencies(step.getDependsOn(), stepResults);

            // 执行检索
            List<TextSegment> results = retrieveForStep(
                step,
                stepSummaries
            );
            stepResults.put(step.getStep(), results);

            // 生成步骤摘要
            String summary = summarizeStepResults(
                step.getQuery(),
                results
            );
            stepSummaries.put(step.getStep(), summary);
        }

        // 步骤3:整合所有结果
        return synthesizeAnswer(userQuery, stepResults, stepSummaries);
    }

    private List<TextSegment> retrieveForStep(
        RetrievalStep step,
        Map<Integer, String> previousSummaries
    ) {
        // 如果有依赖,将之前的结果摘要加入上下文
        String enhancedQuery = step.getQuery();
        for (int dep : step.getDependsOn()) {
            enhancedQuery += "\n上下文:" + previousSummaries.get(dep);
        }

        Embedding embedding = embeddingModel.embed(enhancedQuery).content();

        EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(embedding)
                .maxResults(5)
                .minScore(0.7)
                .build()
        );

        return result.matches().stream()
            .map(match -> match.embedded())
            .collect(Collectors.toList());
    }

    private String summarizeStepResults(
        String query,
        List<TextSegment> results
    ) {
        String context = results.stream()
            .map(TextSegment::text)
            .limit(3)
            .collect(Collectors.joining("\n\n"));

        String prompt = String.format(
            "总结以下文档片段对查询「%s」的关键信息(50字内):\n\n%s",
            query, context
        );

        return chatModel.generate(prompt);
    }

    private String synthesizeAnswer(
        String originalQuery,
        Map<Integer, List<TextSegment>> allResults,
        Map<Integer, String> summaries
    ) {
        String allContext = allResults.values().stream()
            .flatMap(List::stream)
            .map(TextSegment::text)
            .distinct()
            .collect(Collectors.joining("\n\n---\n\n"));

        String stepSummary = summaries.entrySet().stream()
            .sorted(Map.Entry.comparingByKey())
            .map(e -> "步骤" + e.getKey() + ": " + e.getValue())
            .collect(Collectors.joining("\n"));

        String prompt = String.format("""
            基于多步检索的结果,综合回答用户问题:

            原始问题:%s

            检索步骤总结:
            %s

            所有检索文档:
            %s

            请提供全面准确的答案。
            """,
            originalQuery, stepSummary, allContext
        );

        return chatModel.generate(prompt);
    }

    private void waitForDependencies(
        List<Integer> dependencies,
        Map<Integer, List<TextSegment>> completedSteps
    ) {
        for (int dep : dependencies) {
            while (!completedSteps.containsKey(dep)) {
                try {
                    Thread.sleep(100);
                } catch (InterruptedException e) {
                    Thread.currentThread().interrupt();
                }
            }
        }
    }

    private String performSimpleRetrieval(String query) {
        // 简单查询的快速路径
        List<TextSegment> results = retrieveDocuments(query, 5);
        return generateAnswer(query, results);
    }
}

自我反思和迭代检索

public interface ReflectionAgent {

    @SystemMessage("""
        你是一个答案质量评估专家。评估RAG系统生成的答案:

        评估维度:
        1. 答案是否完整回答了问题?
        2. 是否基于检索文档,没有幻觉?
        3. 信息是否足够详细?
        4. 是否需要更多检索?

        返回JSON:
        {
            "quality_score": 0-10,
            "is_sufficient": true/false,
            "issues": ["问题1", "问题2"],
            "next_query": "如果需要更多信息,建议的下一个查询"
        }
        """)
    String evaluateAnswer(
        @UserMessage String evaluation
    );
}

@Service
public class IterativeRagService {

    private final ReflectionAgent reflectionAgent;
    private final EnhancedRagService ragService;

    private static final int MAX_ITERATIONS = 3;

    public String queryWithReflection(String userQuery) {
        List<String> answerHistory = new ArrayList<>();
        String currentAnswer = "";

        for (int iteration = 0; iteration < MAX_ITERATIONS; iteration++) {
            // 执行RAG查询
            currentAnswer = ragService.query(userQuery);
            answerHistory.add(currentAnswer);

            // 评估答案质量
            String evaluationContext = String.format("""
                用户问题:%s

                当前答案:%s

                迭代次数:%d/%d
                """,
                userQuery, currentAnswer, iteration + 1, MAX_ITERATIONS
            );

            String evalJson = reflectionAgent.evaluateAnswer(evaluationContext);
            EvaluationResult eval = parseEvaluation(evalJson);

            // 如果答案足够好,返回
            if (eval.isSufficient() || eval.getQualityScore() >= 8) {
                return currentAnswer;
            }

            // 如果需要更多信息,执行补充检索
            if (eval.getNextQuery() != null && !eval.getNextQuery().isEmpty()) {
                String supplemental = ragService.query(eval.getNextQuery());

                // 整合补充信息
                currentAnswer = synthesizeWithSupplemental(
                    userQuery,
                    currentAnswer,
                    supplemental,
                    eval.getIssues()
                );
            }
        }

        return currentAnswer;
    }

    private String synthesizeWithSupplemental(
        String query,
        String originalAnswer,
        String supplemental,
        List<String> issues
    ) {
        String prompt = String.format("""
            改进原始答案:

            用户问题:%s

            原始答案:%s

            发现的问题:%s

            补充信息:%s

            请生成改进后的完整答案。
            """,
            query,
            originalAnswer,
            String.join("; ", issues),
            supplemental
        );

        return chatModel.generate(prompt);
    }
}

🏗️ 模式三:统一智能架构

将RAG和Agent深度融合,构建能够自适应决策的智能系统。

智能路由器

import dev.langchain4j.service.AiServices;

public interface IntelligentRouter {

    @SystemMessage("""
        你是一个智能路由决策器,负责为用户请求选择最优处理策略。

        可用策略:
        1. DIRECT_ANSWER - 直接回答,无需检索(常识性问题)
        2. SIMPLE_RAG - 简单RAG检索(单一事实查询)
        3. ENHANCED_RAG - Agent增强RAG(需要深度检索)
        4. AGENT_TOOL - Agent主导+RAG工具(多步骤任务)
        5. HYBRID - 混合策略(复杂分析任务)

        返回JSON:
        {
            "strategy": "策略名称",s
            "confidence": 0-1,
            "reasoning": "选择理由",
            "parameters": {
                "max_iterations": 3,
                "retrieval_depth": "shallow|deep"
            }
        }
        """)
    String routeRequest(@UserMessage String request);
}

@Service
public class UnifiedIntelligentSystem {

    private final IntelligentRouter router;
    private final ChatLanguageModel directAnswerModel;
    private final EnhancedRagService enhancedRag;
    private final AssistantService agentService;
    private final HybridService hybridService;

    public IntelligentResponse processRequest(String userRequest) {
        // 步骤1:智能路由决策
        String routeJson = router.routeRequest(userRequest);
        RoutingDecision decision = parseRouting(routeJson);

        // 步骤2:执行相应策
        String answer;
        Map<String, Object> metadata = new HashMap<>();

        switch (decision.getStrategy()) {
            case DIRECT_ANSWER:
                answer = directAnswerModel.generate(userRequest);
                metadata.put("strategy", "direct");
                metadata.put("retrieval_used", false);
                break;

            case SIMPLE_RAG:
                answer = performSimpleRag(userRequest);
                metadata.put("strategy", "simple_rag");
                metadata.put("retrieval_used", true);
                break;

            case ENHANCED_RAG:
                answer = enhancedRag.query(userRequest);
                metadata.put("strategy", "enhanced_rag");
                metadata.put("retrieval_used", true);
                metadata.put("enhancement", "query_rewrite,rerank,validate");
                break;

            case AGENT_TOOL:
                answer = agentService.processQuery(userRequest);
                metadata.put("strategy", "agent_tool");
                metadata.put("agent_controlled", true);
                break;

            case HYBRID:
                answer = hybridService.processHybrid(
                    userRequest,
                    decision.getParameters()
                );
                metadata.put("strategy", "hybrid");
                metadata.put("multi_phase", true);
                break;

            default:
                answer = "无法确定处理策略";
                metadata.put("strategy", "error");
        }

        // 步骤3:构建响应
        return IntelligentResponse.builder()
            .answer(answer)
            .strategy(decision.getStrategy())
            .confidence(decision.getConfidence())
            .reasoning(decision.getReasoning())
            .metadata(metadata)
            .build();
    }

    private String performSimpleRag(String query) {
        // 快速RAG路径
        Embedding embedding = embeddingModel.embed(query).content();

        EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(embedding)
                .maxResults(3)
                .minScore(0.75)
                .build()
        );

        String context = result.matches().stream()
            .map(match -> match.embedded().text())
            .collect(Collectors.joining("\n\n"));

        return directAnswerModel.generate(
            "基于以下信息回答:" + query + "\n\n" + context
        );
    }
}

@Data
@Builder
class IntelligentResponse {
    private String answer;
    private String strategy;
    private double confidence;
    private String reasoning;
    private Map<String, Object> metadata;
}

混合处理服务

@Service
public class HybridService {

    private final EnhancedRagService ragService;
    private final ChatLanguageModel chatModel;
    private final AgentExecutor agentExecutor;

    public String processHybrid(
        String userRequest,
        Map<String, Object> parameters
    ) {
        // 阶段1:RAG检索获取背景知识
        String ragContext = ragService.query(userRequest);

        // 阶段2:Agent分析和推理
        String analysisPrompt = String.format("""
            基于以下检索到的背景知识,分析用户请求:

            用户请求:%s

            背景知识:
            %s

            请提供:
            1. 关键信息提取
            2. 问题分解
            3. 可能需要的额外行动
            """,
            userRequest, ragContext
        );

        String analysis = chatModel.generate(analysisPrompt);

        // 阶段3:如果需要,执行Agent工具调用
        if (requiresToolExecution(analysis)) {
            return agentExecutor.executeWithContext(
                userRequest,
                ragContext,
                analysis
            );
        }

        // 阶段4:综合答案生成
        String synthesisPrompt = String.format("""
            综合以下信息,为用户提供完整答案:

            用户请求:%s

            检索知识:%s

            分析结果:%s

            提供详细、准确、实用的答案。
            """,
            userRequest, ragContext, analysis
        );

        return chatModel.generate(synthesisPrompt);
    }

    private boolean requiresToolExecution(String analysis) {
        // 分析是否需要调用工具
        return analysis.contains("需要执行") ||
               analysis.contains("调用工具") ||
               analysis.contains("额外行动");
    }
}

自适应系统

@Service
public class AdaptiveIntelligentSystem {

    private final UnifiedIntelligentSystem intelligentSystem;
    private final MetricsCollector metricsCollector;

    public String process(String userRequest) {
        // 执行请求
        IntelligentResponse response = intelligentSystem.processRequest(
            userRequest
        );

        // 收集指标
        metricsCollector.record(
            userRequest,
            response.getStrategy(),
            response.getConfidence()
        );

        // 如果置信度低,尝试备用策略
        if (response.getConfidence() < 0.6) {
            response = tryAlternativeStrategy(userRequest, response);
        }

        return response.getAnswer();
    }

    private IntelligentResponse tryAlternativeStrategy(
        String request,
        IntelligentResponse originalResponse
    ) {
        // 根据原策略选择备用策略
        String originalStrategy = originalResponse.getStrategy();

        if ("SIMPLE_RAG".equals(originalStrategy)) {
            // 升级到增强RAG
            return executeEnhancedRag(request);
        } else if ("ENHANCED_RAG".equals(originalStrategy)) {
            // 升级到混合策略
            return executeHybrid(request);
        }

        return originalResponse;
    }
}

📊 三种模式对比

维度RAG作为工具Agent增强RAG统一架构
实现复杂度⭐⭐ 简单⭐⭐⭐ 中等⭐⭐⭐⭐ 复杂
灵活性⭐⭐⭐⭐ 高⭐⭐⭐ 中等⭐⭐⭐⭐⭐ 极高
检索质量⭐⭐⭐ 中等⭐⭐⭐⭐⭐ 极高⭐⭐⭐⭐ 高
推理能力⭐⭐⭐⭐⭐ 极高⭐⭐⭐ 中等⭐⭐⭐⭐⭐ 极高
性能开销⭐⭐⭐ 中等⭐⭐⭐⭐ 高⭐⭐⭐⭐⭐ 很高
Token消耗⭐⭐⭐ 中等⭐⭐⭐⭐ 高⭐⭐⭐⭐⭐ 很高
响应延迟⭐⭐⭐⭐ 快⭐⭐⭐ 中等⭐⭐ 慢
可扩展性⭐⭐⭐⭐⭐ 极高⭐⭐⭐ 中等⭐⭐⭐⭐ 高

适用场景选择

RAG作为工具

  • ✅ 多功能智能助手
  • ✅ 需要多种能力(计算、查询、API调用等)
  • ✅ 知识检索是众多功能之一
  • ✅ 快速迭代和原型开发
  • ❌ 检索质量要求极高
  • ❌ 纯知识问答场景

Agent增强RAG

  • ✅ 知识密集型应用
  • ✅ 检索准确性至关重要
  • ✅ 复杂查询需要多步检索
  • ✅ 答案质量要求高
  • ❌ 响应延迟敏感
  • ❌ 成本预算有限

统一架构

  • ✅ 企业级复杂应用
  • ✅ 需要高度灵活性
  • ✅ 多样化任务场景
  • ✅ 可接受较高延迟和成本
  • ❌ 简单问答场景
  • ❌ 团队缺乏维护能力

🚀 生产级实战:智能知识库系统

完整实现一个企业级智能知识库,融合三种模式的优势。

核心架构

@Service
public class ProductionKnowledgeSystem {

    private final IntelligentRouter router;
    private final RagEngine ragEngine;
    private final AgentOrchestrator agentOrchestrator;
    private final CacheManager cacheManager;
    private final MetricsService metricsService;
    private final FallbackHandler fallbackHandler;

    public KnowledgeResponse query(KnowledgeRequest request) {
        long startTime = System.currentTimeMillis();
        String requestId = UUID.randomUUID().toString();

        try {
            // 步骤1:检查缓存
            Optional<String> cachedAnswer = cacheManager.get(
                request.getQuery()
            );
            if (cachedAnswer.isPresent()) {
                metricsService.recordCacheHit(requestId);
                return buildResponse(
                    cachedAnswer.get(),
                    "CACHE",
                    System.currentTimeMillis() - startTime
                );
            }

            // 步骤2:智能路由
            RoutingDecision routing = router.route(request);
            metricsService.recordRouting(requestId, routing);

            // 步骤3:执行处理
            String answer = executeStrategy(request, routing);

            // 步骤4:质量验证
            ValidationResult validation = validateAnswer(
                request,
                answer
            );

            if (!validation.isValid()) {
                // 降级策略
                answer = fallbackHandler.handle(request, validation);
            }

            // 步骤5:缓存结果
            cacheManager.put(request.getQuery(), answer);

            // 步骤6:记录指标
            long duration = System.currentTimeMillis() - startTime;
            metricsService.recordSuccess(requestId, duration);

            return buildResponse(answer, routing.getStrategy(), duration);

        } catch (Exception e) {
            metricsService.recordError(requestId, e);
            return handleError(request, e);
        }
    }

    private String executeStrategy(
        KnowledgeRequest request,
        RoutingDecision routing
    ) {
        return switch (routing.getStrategy()) {
            case SIMPLE_RAG -> ragEngine.simpleRetrieval(request);
            case ENHANCED_RAG -> ragEngine.enhancedRetrieval(request);
            case AGENT_DRIVEN -> agentOrchestrator.process(request);
            case HYBRID -> executeHybridStrategy(request, routing);
            default -> throw new IllegalStateException(
                "Unknown strategy: " + routing.getStrategy()
            );
        };
    }

    private String executeHybridStrategy(
        KnowledgeRequest request,
        RoutingDecision routing
    ) {
        // 并行执行RAG和Agent
        CompletableFuture<String> ragFuture = CompletableFuture.supplyAsync(
            () -> ragEngine.enhancedRetrieval(request)
        );

        CompletableFuture<String> agentFuture = CompletableFuture.supplyAsync(
            () -> agentOrchestrator.analyze(request)
        );

        // 等待两者完成
        CompletableFuture.allOf(ragFuture, agentFuture).join();

        try {
            String ragResult = ragFuture.get();
            String agentAnalysis = agentFuture.get();

            // 综合两者结果
            return synthesizeResults(request, ragResult, agentAnalysis);
        } catch (Exception e) {
            throw new RuntimeException("Hybrid execution failed", e);
        }
    }

    private ValidationResult validateAnswer(
        KnowledgeRequest request,
        String answer
    ) {
        // 验证答案质量
        List<String> issues = new ArrayList<>();

        // 检查1:答案长度合理性
        if (answer.length() < 20) {
            issues.add("答案过短");
        }

        // 检查2:是否包含来源引用
        if (!answer.contains("根据") && !answer.contains("文档")) {
            issues.add("缺少来源引用");
        }

        // 检查3:使用LLM进行语义验证
        boolean semanticValid = performSemanticValidation(
            request.getQuery(),
            answer
        );

        if (!semanticValid) {
            issues.add("语义一致性问题");
        }

        return ValidationResult.builder()
            .isValid(issues.isEmpty())
            .issues(issues)
            .confidence(calculateConfidence(answer, issues))
            .build();
    }

    private boolean performSemanticValidation(String query, String answer) {
        String prompt = String.format("""
            判断以下答案是否真正回答了问题(只返回true或false):

            问题:%s
            答案:%s
            """,
            query, answer
        );

        String result = chatModel.generate(prompt).trim();
        return "true".equalsIgnoreCase(result);
    }

    private KnowledgeResponse handleError(
        KnowledgeRequest request,
        Exception error
    ) {
        logger.error("Failed to process request: " + request.getQuery(), error);

        // 尝试降级处理
        try {
            String fallbackAnswer = fallbackHandler.generateFallback(request);
            return KnowledgeResponse.builder()
                .answer(fallbackAnswer)
                .strategy("FALLBACK")
                .success(false)
                .error(error.getMessage())
                .build();
        } catch (Exception fallbackError) {
            return KnowledgeResponse.builder()
                .answer("抱歉,系统暂时无法处理您的请求,请稍后重试。")
                .strategy("ERROR")
                .success(false)
                .error(error.getMessage())
                .build();
        }
    }
}

缓存策略

@Component
public class IntelligentCacheManager {

    private final RedisTemplate<String, String> redisTemplate;
    private final EmbeddingModel embeddingModel;
    private final ChatLanguageModel chatModel;

    private static final String CACHE_PREFIX = "knowledge_cache:";
    private static final Duration DEFAULT_TTL = Duration.ofHours(24);

    public Optional<String> get(String query) {
        // 策略1:精确匹配
        String exactKey = CACHE_PREFIX + hashQuery(query);
        String exactMatch = redisTemplate.opsForValue().get(exactKey);
        if (exactMatch != null) {
            return Optional.of(exactMatch);
        }

        // 策略2:语义相似匹配
        Optional<String> semanticMatch = findSemanticMatch(query);
        if (semanticMatch.isPresent()) {
            return semanticMatch;
        }

        return Optional.empty();
    }

    public void put(String query, String answer) {
        String key = CACHE_PREFIX + hashQuery(query);

        // 存储精确匹配
        redisTemplate.opsForValue().set(key, answer, DEFAULT_TTL);

        // 存储语义向量索引
        Embedding embedding = embeddingModel.embed(query).content();
        storeSemantcIndex(query, embedding, key);
    }

    private Optional<String> findSemanticMatch(String query) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        // 在Redis中查找相似向量
        List<SimilarQuery> similar = searchSimilarQueries(
            queryEmbedding,
            0.95  // 高相似度阈值
        );

        if (similar.isEmpty()) {
            return Optional.empty();
        }

        // 使用LLM验证是否可以复用缓存
        SimilarQuery best = similar.get(0);
        boolean canReuse = verifyQueryEquivalence(query, best.getQuery());

        if (canReuse) {
            String cachedAnswer = redisTemplate.opsForValue().get(
                best.getCacheKey()
            );
            return Optional.ofNullable(cachedAnswer);
        }

        return Optional.empty();
    }

    private boolean verifyQueryEquivalence(
        String newQuery,
        String cachedQuery
    ) {
        String prompt = String.format("""
            判断以下两个问题是否实质相同(只返回true或false):

            问题1:%s
            问题2:%s
            """,
            newQuery, cachedQuery
        );

        String result = chatModel.generate(prompt).trim();
        return "true".equalsIgnoreCase(result);
    }

    private String hashQuery(String query) {
        return DigestUtils.sha256Hex(query.toLowerCase().trim());
    }
}

监控和指标

@Service
public class MetricsService {

    private final MeterRegistry meterRegistry;

    public void recordRouting(String requestId, RoutingDecision routing) {
        Counter.builder("knowledge.routing")
            .tag("strategy", routing.getStrategy())
            .tag("confidence", String.format("%.1f", routing.getConfidence()))
            .register(meterRegistry)
            .increment();
    }

    public void recordCacheHit(String requestId) {
        Counter.builder("knowledge.cache")
            .tag("result", "hit")
            .register(meterRegistry)
            .increment();
    }

    public void recordSuccess(String requestId, long durationMs) {
        Timer.builder("knowledge.query.duration")
            .tag("result", "success")
            .register(meterRegistry)
            .record(Duration.ofMillis(durationMs));
    }

    public void recordError(String requestId, Exception error) {
        Counter.builder("knowledge.errors")
            .tag("type", error.getClass().getSimpleName())
            .register(meterRegistry)
            .increment();
    }

    public MetricsSummary getSummary() {
        return MetricsSummary.builder()
            .totalQueries(getTotalQueries())
            .cacheHitRate(getCacheHitRate())
            .averageLatency(getAverageLatency())
            .errorRate(getErrorRate())
            .strategyDistribution(getStrategyDistribution())
            .build();
    }
}

⚡ 性能优化

并行检索

@Service
public class ParallelRetrievalService {

    private final ExecutorService executorService;
    private final List<EmbeddingStore<TextSegment>> shardedStores;

    public List<TextSegment> parallelRetrieve(
        String query,
        int maxResults
    ) {
        Embedding queryEmbedding = embeddingModel.embed(query).content();

        // 并行查询所有分片
        List<CompletableFuture<List<TextSegment>>> futures =
            shardedStores.stream()
                .map(store -> CompletableFuture.supplyAsync(
                    () -> retrieveFromStore(store, queryEmbedding, maxResults),
                    executorService
                ))
                .collect(Collectors.toList());

        // 等待所有查询完成
        CompletableFuture.allOf(futures.toArray(new CompletableFuture[0]))
            .join();

        // 合并和排序结果
        return futures.stream()
            .map(CompletableFuture::join)
            .flatMap(List::stream)
            .sorted(Comparator.comparingDouble(TextSegment::getScore).reversed())
            .limit(maxResults)
            .collect(Collectors.toList());
    }

    private List<TextSegment> retrieveFromStore(
        EmbeddingStore<TextSegment> store,
        Embedding embedding,
        int maxResults
    ) {
        EmbeddingSearchResult<TextSegment> result = store.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(embedding)
                .maxResults(maxResults)
                .minScore(0.7)
                .build()
        );

        return result.matches().stream()
            .map(match -> match.embedded())
            .collect(Collectors.toList());
    }
}

异步处理

@Service
public class AsyncKnowledgeService {

    @Async("knowledgeTaskExecutor")
    public CompletableFuture<String> queryAsync(String userQuery) {
        try {
            String result = intelligentSystem.processRequest(userQuery)
                .getAnswer();
            return CompletableFuture.completedFuture(result);
        } catch (Exception e) {
            return CompletableFuture.failedFuture(e);
        }
    }

    @Bean(name = "knowledgeTaskExecutor")
    public Executor knowledgeTaskExecutor() {
        ThreadPoolTaskExecutor executor = new ThreadPoolTaskExecutor();
        executor.setCorePoolSize(10);
        executor.setMaxPoolSize(50);
        executor.setQueueCapacity(100);
        executor.setThreadNamePrefix("knowledge-async-");
        executor.setRejectedExecutionHandler(
            new ThreadPoolExecutor.CallerRunsPolicy()
        );
        executor.initialize();
        return executor;
    }
}

批量处理优化

@Service
public class BatchProcessingService {

    public List<String> batchQuery(List<String> queries) {
        // 批量生成嵌入向量
        List<Embedding> embeddings = embeddingModel.embedAll(queries)
            .content();

        // 并行检索
        List<CompletableFuture<String>> futures =
            IntStream.range(0, queries.size())
                .mapToObj(i -> CompletableFuture.supplyAsync(
                    () -> processWithEmbedding(
                        queries.get(i),
                        embeddings.get(i)
                    )
                ))
                .collect(Collectors.toList());

        // 等待完成
        return futures.stream()
            .map(CompletableFuture::join)
            .collect(Collectors.toList());
    }

    private String processWithEmbedding(String query, Embedding embedding) {
        // 使用预计算的嵌入向量进行检索
        EmbeddingSearchResult<TextSegment> result = embeddingStore.search(
            EmbeddingSearchRequest.builder()
                .queryEmbedding(embedding)
                .maxResults(5)
                .build()
        );

        return generateAnswer(query, result);
    }
}

💡 实战练习

练习1:实现查询意图分类器

创建一个Agent,能够分类用户查询的意图(事实查询、流程查询、比较分析等),并根据意图选择不同的检索策略。

要求:

  • 至少支持5种查询意图
  • 为每种意图设计专门的检索逻辑
  • 实现置信度评估

提示:

public interface QueryIntentClassifier {
    @SystemMessage("...")
    String classifyIntent(@UserMessage String query);
}

练习2:构建多轮对话RAG系统

实现一个支持多轮对话的RAG系统,能够:

  • 理解上下文和指代关系
  • 基于历史对话优化检索
  • 追踪对话主题

提示:

  • 使用ChatMemory保存对话历史
  • 实现上下文感知的查询重写
  • 维护对话主题向量

练习3:实现自适应检索深度

根据问题复杂度和当前检索结果质量,动态调整检索深度(检索轮次、每轮结果数)。

提示:

public class AdaptiveDepthRetriever {
    public String retrieve(String query) {
        int depth = estimateRequiredDepth(query);
        // 实现自适应逻辑
    }
}

最后更新:2026-03-09 字数统计:5,200 字 预计阅读时间:35 分钟