AI 客服系统升级实战：多 Agent 路由 + 多轮记忆 + 敏感词过滤AI 客服系统升级实战：多 Agent 路由

AI 客服系统升级实战：多 Agent 路由 + 多轮记忆 + 敏感词过滤

从单 Agent 到专业分工体系，顺带把 ChatMemory 和敏感词过滤这两个真实业务场景打通

先说结论

上一篇搭好了基础框架：情绪分析 → 意图识别 → Agent 工具调用。但真实的客服场景里，一个全能 Agent 扛所有请求会有明显问题：Prompt 越写越长、工具越挂越多，最后模型的注意力开始跑偏。

这篇把后来做的三块改造拆开讲：

上篇已有的	本篇新增的
单一 CustomerServiceAgent	3 专业 Agent + Router 分流
Redis 自定义 ChatMemory	Spring AI 官方 JdbcChatMemoryRepository
纯 LLM 意图识别	LLM 识别 + 关键词兜底双保险
无安全过滤	双向敏感词过滤（输入 + 输出）

先说结论：Router + 多 Agent 的价值不在于"高大上"，而在于让每个 Agent 的 Prompt 专注、工具干净；多轮记忆迁移 Spring AI 官方实现后反而更省事，踩坑主要在 API 设计理解上。

1. 多 Agent 拆分 —— 别让一个 Agent 扛所有

为什么要拆

原来的 CustomerServiceAgent 要处理：产品咨询、订单查询、退款申请、物流追踪、投诉安抚……Prompt 超过 500 字，工具挂了 6 个。说实话跑起来以后发现模型经常"混淆"：用户说"东西用着不好用"它有时候当 RAG 处理（搜知识库），有时候当投诉处理（准备转人工），飘忽得很。

解法是拆成 3 个专注的 Agent，每个专注一件事：

graph TB
    User[用户消息] --> Router[CustomerSupportRouter]
    Router -->|ANGRY / COMPLAINT| Complaint[ComplaintAgent<br>情绪安抚 + 转人工优先]
    Router -->|售后意图| PostSales[PostSalesAgent<br>订单查询 + 退款 + 物流]
    Router -->|售前意图| PreSales[PreSalesAgent<br>产品咨询 + RAG 知识库]
    Router -->|失败兜底| Fallback[关键词规则 → 转人工]

BaseAgent 设计

三个 Agent 共享一套 ChatClient 组装逻辑，抽成基类：

public abstract class BaseAgent {

    protected final ChatClient chatClient;
    protected final ChatMemory chatMemory;

    protected BaseAgent(ChatClient chatClient, ChatMemory chatMemory) {
        this.chatClient = chatClient;
        this.chatMemory = chatMemory;
    }

    /**
     * 核心对话方法：记忆 Advisor + 额外 Advisor + 工具
     *
     * @param extraAdvisors 子类传入的额外 Advisor（如 RAG Advisor）
     */
    protected String chatWithAdvisors(String userMessage, String systemPrompt,
                                      String sessionId, Object... extraAdvisors) {
        var memoryAdvisor = MessageChatMemoryAdvisor.builder(chatMemory)
                .conversationId(sessionId)
                .build();

        // 区分 Advisor 和 Tool
        List<Advisor> advisors = new ArrayList<>();
        advisors.add(memoryAdvisor);
        List<Object> tools = new ArrayList<>();

        for (Object extra : extraAdvisors) {
            if (extra instanceof Advisor a) {
                advisors.add(a);
            } else {
                tools.add(extra);
            }
        }

        // ... 构建 ChatClient 调用链
    }
}

路由规则

CustomerSupportRouter 是一个纯规则路由器，不调 LLM，延迟稳定：

public String chat(RoutingContext ctx, String message,
                   String emotionStrategy, String sessionId) {
    // 1. 情绪优先：ANGRY 直接走投诉 Agent
    if (ctx.emotionLevel() == EmotionLevel.ANGRY
            || ctx.intentType() == IntentType.COMPLAINT) {
        return complaintAgent.chat(message, emotionStrategy, sessionId);
    }

    // 2. 售后意图 → 售后 Agent
    if (isPostSalesIntent(ctx.intentType())) {
        return postSalesAgent.chat(message, emotionStrategy, sessionId);
    }

    // 3. 其余 → 售前 Agent（产品咨询 + RAG 兜底）
    return preSalesAgent.chat(message, emotionStrategy, sessionId);
}

private boolean isPostSalesIntent(IntentType intent) {
    return intent == IntentType.ORDER_QUERY
            || intent == IntentType.REFUND
            || intent == IntentType.LOGISTICS;
}

RoutingContext 携带三个决策维度：

public record RoutingContext(
    EmotionLevel emotionLevel,    // 来自情绪分析
    IntentType   intentType,      // 来自意图识别
    SessionPhase sessionPhase     // 当前会话阶段（售前/售后/进行中）
) {}

踩坑提醒：最开始设计了 5 个 Agent（加了 TechnicalSupportAgent 和 RecommendationAgent），结果发现 IntentType.RAG 同时覆盖了技术支持和产品推荐，两个 Agent 永远不会被路由到。精简成 3 个反而更清晰。

为什么不用 Spring AI Alibaba 的 LlmRoutingAgent

看到 Spring AI Alibaba 源码里有 LlmRoutingAgent 就想直接用，结果发现 spring-ai-alibaba-graph 1.1.2.1 在 Maven 中央仓库根本不存在（404），1.1.2.2 才有但彼时项目还没升版本。

最终选择了手写规则路由。说实话对于客服场景，规则路由有自己的优势：

延迟稳定，不需要额外的 LLM 调用来做路由决策
行为可预期，QA 更容易写测试用例
出问题好排查，日志里能清楚看到走了哪条分支

当然缺点也有：新增意图类型时需要改代码。等后续有机会再评估升 LlmRoutingAgent。

意图识别降级策略

LLM 识别不是百分百可靠，设了一个置信度阈值兜底：

private IntentClassifier.IntentResult classifyIntent(String message, String sessionId) {
    try {
        IntentClassifier.IntentResult result = intentClassifier.classify(message, sessionId);
        // confidence 不够高，走关键词兜底
        if (result == null || result.confidence() < 0.3) {
            log.warn("意图识别置信度过低({})，降级为关键词匹配", 
                     result != null ? result.confidence() : "null");
            return quickRuleMatch(message);
        }
        return result;
    } catch (Exception e) {
        log.warn("意图识别异常，降级为关键词匹配", e);
        return quickRuleMatch(message);
    }
}

private IntentClassifier.IntentResult quickRuleMatch(String message) {
    if (message.contains("退款") || message.contains("退货")) {
        return IntentClassifier.IntentResult.of(IntentType.REFUND, 0.8);
    }
    if (message.contains("订单") || message.contains("快递") || message.contains("物流")) {
        return IntentClassifier.IntentResult.of(IntentType.ORDER_QUERY, 0.8);
    }
    if (message.contains("转人工") || message.contains("人工客服")) {
        return IntentClassifier.IntentResult.of(IntentType.HUMAN_TRANSFER, 0.95);
    }
    return IntentClassifier.IntentResult.of(IntentType.GENERAL, 0.5);
}

踩坑提醒：不要省略关键词兜底。GLM-4-Flash 偶尔会在高并发时返回不完整的 JSON，这时 confidence 解析失败，没有兜底的话直接 NPE。

--- —— 用 Spring AI 官方实现替代手写 Redis

原来的问题

原系统在 ConversationService 里每轮对话后手动调 agent.recordMemory() 写 Redis，然后在 CustomerServiceAgent 构造时再重新注入。两处管内存，偶尔会出现第 2 轮对话没拿到历史这种诡异 bug。

换成 Spring AI 官方的 MessageChatMemoryAdvisor 后，读写全部交给 Advisor 自动处理。

三层结构弄清楚再上手

直接看这张关系图，不然很容易搞混：

graph TB
    Advisor[MessageChatMemoryAdvisor<br>before: 读历史注入 prompt<br>after: 写本次对话] --> Memory
    Memory[MessageWindowChatMemory<br>implements ChatMemory<br>窗口截断：maxMessages] --> Repo
    Repo[JdbcChatMemoryRepository<br>implements ChatMemoryRepository<br>实际存取 PostgreSQL]

三个接口，三层职责，不要把 JdbcChatMemoryRepository 直接赋给 ChatMemory，它们是不同的接口。

配置代码

@Configuration
public class ChatMemoryConfig {

    @Bean
    public JdbcChatMemoryRepository jdbcChatMemoryRepository(JdbcTemplate jdbcTemplate) {
        // 自动建表（幂等），PostgreSQL 方言
        return JdbcChatMemoryRepository.builder()
                .jdbcTemplate(jdbcTemplate)
                .dialect(new PostgresChatMemoryRepositoryDialect())
                .build();
    }

    @Bean
    public ChatMemory chatMemory(JdbcChatMemoryRepository repository) {
        // maxMessages 从 DB 配置读取，支持运营后台调整（启动时读一次）
        int maxMessages = dict.getInt("session.history_max_rounds", 10) * 2;
        return MessageWindowChatMemory.builder()
                .chatMemoryRepository(repository)
                .maxMessages(maxMessages)
                .build();
    }
}

自动建的表结构：

CREATE TABLE IF NOT EXISTS SPRING_AI_CHAT_MEMORY (
    conversation_id  VARCHAR(255)  NOT NULL,
    content          TEXT          NOT NULL,
    type             VARCHAR(50)   NOT NULL,   -- USER / ASSISTANT / SYSTEM
    "timestamp"      TIMESTAMP     NOT NULL DEFAULT CURRENT_TIMESTAMP
);
-- timestamp 是 PG 关键字，必须加双引号
CREATE INDEX IF NOT EXISTS idx_chat_memory_conv_id
    ON SPRING_AI_CHAT_MEMORY (conversation_id, "timestamp");

Agent 侧用法（极简）

public String chatWithTools(String userMessage, String systemPrompt, String sessionId) {
    var memoryAdvisor = MessageChatMemoryAdvisor.builder(chatMemory)
            .conversationId(sessionId)   // 用 sessionId 隔离不同用户的历史
            .build();

    return chatClient.prompt()
            .system(systemPrompt)
            .user(userMessage)
            .advisors(
                    memoryAdvisor,   // 先注入历史，再走 RAG
                    ragAdvisor
            )
            .tools(tools)
            .call()
            .content();
}

recordMemory() 手动调用全部删掉，ConversationService 里也不用再管这件事。

踩坑提醒：MessageChatMemoryAdvisor.Builder 在 Spring AI 1.1.x 里没有 windowSize() 方法，窗口大小由 MessageWindowChatMemory 的 maxMessages 控制，不要在 Advisor 层找这个配置。

踩坑汇总：ChatMemory 依赖关系

做这块改造前，我把 Spring AI 的 ChatMemory 相关类搞混了好几次。直接把这张关系表贴出来省事：

类/接口	所在 jar	职责
`ChatMemory`	`spring-ai-model`	顶层接口，Advisor 依赖它
`MessageWindowChatMemory`	`spring-ai-model`	`ChatMemory` 实现，负责窗口截断
`ChatMemoryRepository`	`spring-ai-model`	存储层接口
`JdbcChatMemoryRepository`	`spring-ai-model-chat-memory-repository-jdbc`	JDBC 存储实现
`MessageChatMemoryAdvisor`	`spring-ai-client-chat`	Advisor，自动读写历史

pom 里只需要加这一个依赖，其余通过传递依赖自动带进来：

<dependency>
    <groupId>org.springframework.ai</groupId>
    <artifactId>spring-ai-model-chat-memory-repository-jdbc</artifactId>
</dependency>

spring-ai-model-chat-memory 这个 artifact 不需要单独引，ChatMemory、MessageWindowChatMemory 都在 spring-ai-model 里。

--- —— 双向过滤，配置存 DB

设计思路

敏感词列表放 system_config 表（JSON 格式），通过 SensitiveWordService 加载缓存，在 ConversationService 的两个位置插入：

sequenceDiagram
    participant U as 用户输入
    participant S as SensitiveWordService
    participant A as Agent
    participant O as AI输出

    U ->> S: filter(userInput)
    S -->> A: 过滤后文本
    A -->> S: filter(aiResponse)
    S -->> O: 过滤后回复

核心实现

@Service
public class SensitiveWordServiceImpl implements SensitiveWordService {

    // 从 DB 加载，内存缓存
    private volatile Set<String> sensitiveWords = new HashSet<>();

    @Override
    public String filter(String text) {
        if (!isEnabled() || text == null) return text;

        String result = text;
        String replaceChar = getReplaceChar();  // 默认 "***"
        for (String word : sensitiveWords) {
            result = result.replace(word, replaceChar);
        }
        return result;
    }

    @Override
    public boolean containsSensitiveWord(String text) {
        if (!isEnabled() || text == null) return false;
        return sensitiveWords.stream().anyMatch(text::contains);
    }
}

在 ConversationService 里嵌入

public String process(String userId, String source, String message) {
    // 0. 过滤用户输入
    String filteredMessage = message;
    if (sensitiveWordService.isEnabled() && sensitiveWordService.isFilterUserInput()) {
        if (sensitiveWordService.containsSensitiveWord(message)) {
            filteredMessage = sensitiveWordService.filter(message);
            log.warn("用户输入含敏感词，已过滤");
        }
    }

    // ... 正常对话流程 ...

    // 9. 过滤 AI 输出
    if (sensitiveWordService.isEnabled() && sensitiveWordService.isFilterAiOutput()) {
        response = sensitiveWordService.filter(response);
    }

    return response;
}

DB 配置示例：

INSERT INTO system_config (config_type, config_key, config_value) VALUES
('JSON', 'sensitive.words',       '["违禁词1","违禁词2"]'),
('JSON', 'sensitive.replace_char','***'),
('JSON', 'sensitive.enabled',     'true'),
('JSON', 'sensitive.filter_user_input',  'true'),
('JSON', 'sensitive.filter_ai_output',   'true');

踩坑提醒：敏感词列表首次加载依赖 DB 连接，如果 @PostConstruct 时 DB 还没有数据，sensitiveWords 会是空集合。建议在初始化方法里加 isEmpty() 检查，空时打 warn 日志提示。

4. 完整对话流程 —— 把三块串起来

升级后 ConversationService.process() 的完整执行链：

用户输入
  │
  ▼
敏感词过滤（用户输入）
  │
  ▼
会话创建/恢复（Redis + PostgreSQL）
  │
  ▼
情绪预检（ANGRY → 直接转人工，不走 Agent）
  │
  ▼
意图识别（GLM Few-shot → confidence ≥ 0.3 采用，否则关键词兜底）
  │
  ▼
ConversationContext 注入 ThreadLocal（Tool 从这里取 userId/sessionId）
  │
  ▼
CustomerSupportRouter.chat()
  ├─ ComplaintAgent（情绪 or 投诉意图）
  ├─ PostSalesAgent（订单/退款/物流）
  └─ PreSalesAgent（产品/知识库/其他）
      │
      ├─ MessageChatMemoryAdvisor（读 SPRING_AI_CHAT_MEMORY）
      ├─ KnowledgeRetrievalAdvisor（RAG 检索）
      └─ @Tool 工具调用（ReAct 模式）
  │
  ▼
敏感词过滤（AI 输出）
  │
  ▼
记录消息到 PostgreSQL + 返回回复

降级链设计

try {
    // 主路径：Router + Agent
    response = router.chat(routingContext, message, emotionStrategy, sessionId);
} catch (Exception e) {
    try {
        // 一级兜底：关键词规则路由
        response = fallbackRoute(userId, message, emotion);
    } catch (Exception e2) {
        // 二级兜底：直接转人工
        response = "系统异常，已为您转接人工客服。\n"
                 + humanTransferTool.transferToHuman("系统异常，自动转人工");
    }
}

本篇方案 vs 改造前对比

维度	改造前	改造后
Agent 数量	1 个全能 Agent	3 个专业 Agent + 规则路由
Prompt 长度	500+ 字，大杂烩	每 Agent ≤ 200 字，专注清晰
多轮记忆	手写 Redis + 手动 recordMemory()	Spring AI 官方 JdbcChatMemoryRepository，Advisor 自动读写
记忆依赖	Redis（额外部署成本）	PostgreSQL，与业务 DB 统一
敏感词	无	双向过滤，配置存 DB，可运营后台管理
意图识别	LLM 单点识别	LLM + 关键词兜底双保险
降级策略	无	Router 失败 → 关键词规则 → 转人工三级降级

几个做完之后的感悟

关于多 Agent 拆分：起初觉得 3 个 Agent 是"过度设计"，跑起来以后发现最明显的收益是 debug 容易多了。投诉 Agent 回了一句奇怪的话，我只要看 ComplaintAgent 的 Prompt 和上下文，不用在一个 800 字的 Prompt 大杂烩里找原因。

关于 Spring AI ChatMemory：官方封装比自己写 Redis 省事，但前提是看懂三层接口结构。文档这块写得不够直观，很多人（包括我）第一反应是"直接把 JdbcChatMemoryRepository 当 ChatMemory 用"，然后编译报错。建议把那张关系图打印出来贴桌上。

关于敏感词过滤：看起来简单，真正做完后发现有几个运营细节值得注意：

场景	建议处理方式
用户消息包含敏感词	替换后正常回复，不要直接拒绝（避免误伤）
AI 输出包含敏感词	替换后返回，同时 warn 日志记录（方便运营审查）
敏感词列表为空	打印 warn 日志提示管理员配置，不要 block 正常请求
新增敏感词	更新 DB 后需要触发缓存刷新，否则要等服务重启才生效

最后一条"缓存刷新"是目前还没做完的部分——SensitiveWordController 的管理接口，以及刷新缓存的 API，放在下一版迭代里。

源码怎么拿

公众号「亦暖筑序」底部菜单【获取源码】，Gitee 仓库直接拉。

源码里除了文章提到的这些，还有：

完整的 KnowledgeRetrievalAdvisor 实现（hybridSearch：向量 + 关键词双路检索）
意图识别 Few-shot Prompt 模板（含槽位提取逻辑，6 种意图类型）
SensitiveWordService 运营管理接口（增删改查敏感词，待完成中）

附录：踩坑速查表

整理一下这篇涉及到的坑，方便直接来查：

坑	现象	解决
用 5 个 Agent	`IntentType.RAG` 同时覆盖多个 Agent，部分 Agent 永远不被路由	精简为 3 个，IntentType 和 Agent 一一对应
`spring-ai-alibaba-graph` 1.1.2.1	Maven 下载 404	用手写 `CustomerSupportRouter` 规则路由替代
`JdbcChatMemoryRepository` 赋给 `ChatMemory`	编译报错：incompatible types	用 `MessageWindowChatMemory` 包装后再赋给 `ChatMemory`
`MessageChatMemoryAdvisor` 找不到 `windowSize()`	编译报错：cannot find method	窗口由 `MessageWindowChatMemory.maxMessages` 控制，不在 Advisor 层
`spring-ai-model-chat-memory` 下载失败	`ChatMemory` 找不到符号	实际在 `spring-ai-model` 里，不需要单独引
`timestamp` 建表失败	PG 保留字冲突	列名加双引号：`"timestamp"`
敏感词列表启动为空	DB 没数据或初始化时序问题	加 `isEmpty()` 检查，打 warn 日志，不 block 请求
LLM 意图识别 confidence 解析失败	高并发下返回不完整 JSON，NPE	加 try-catch + 关键词兜底，confidence 为 null 时走规则匹配
Lombok @Slf4j 编译失败 36 个错误	看起来像注解处理器没生效	根因是代码里有重复方法定义，先修代码错误再排查注解处理器

这张表是实际踩过的，不是凑字数的。

下一步

这篇到这里结束。系统现在跑起来的主链路已经相对完整。

但主链路跑通之后，第一个要补的不是 RAG，而是安全——一个没有鉴权、没有限流的 AI 接口，放到生产环境基本等于裸奔。

下一篇专门讲这块：

[04] AI 客服系统安全加固：JWT 鉴权 + Bucket4j 三层限流

覆盖内容：JWT Filter 链接入 Spring Security、三层令牌桶限流（全局 / 用户 / LLM 接口）、链路追踪 Filter、生产密钥强校验。项目里这些已经实现完整，下一篇逐层拆开讲。

RAG 知识库（向量检索、文档切片、混合检索）和转人工流程（HumanTransferTool + 工作台接受侧）计划放在后续篇章单独展开。