Spring AI 1.1.2 + Neo4j：用知识图谱增强 RAG 检索（下篇：检索与整合）本篇将介绍如何利用构建好

前言

在上篇中，我们完成了知识图谱的基础搭建：环境配置、实体模型设计、LLM 自动抽取实体，以及 Neo4j 的部署。本篇将介绍如何利用构建好的知识图谱进行检索，并整合到 RAG 流程中，实现知识图谱增强的智能问答系统。

上篇回顾：

✅ 搭建了 Spring AI 1.1.2 + Neo4j 开发环境
✅ 设计了实体节点（KgEntityNode）和关系（KgRelationship）模型
✅ 实现了 LLM 自动抽取实体和关系的功能
✅ 部署了 Neo4j 图数据库并配置了基础环境

本篇将在此基础上，实现完整的图谱检索和 RAG 整合。

实现步骤

实现知识图谱检索

先从用户问题中提取关键词，在图谱中找到匹配的实体节点，然后沿着关系边向外扩展，发现关联的实体，最后通过关联表反查出对应的知识库文档。整个过程就像在知识网络中"顺藤摸瓜"，从一个点出发找到周围相关的所有点。

⚠️ 注意：本文使用滑动窗口切片提取关键词，这种方式简单但存在局限性——无法匹配同义词（如"退钱"匹配不到"退款"），且会产生一些无意义的片段。

前置步骤：创建 Neo4j 全文索引

在使用图谱检索之前，必须先在 Neo4j 中创建全文索引。打开 Neo4j 浏览器（http://localhost:7474），执行以下语句：

CREATE FULLTEXT INDEX kg_entity_name_idx IF NOT EXISTS
FOR (n:KgEntity)
ON EACH [n.name];

这个索引用于实体名称的模糊匹配，支持中文分词和相关性评分，创建后可通过 SHOW INDEXES 查看状态。

下面是具体实现：

@Slf4j
@Service
@RequiredArgsConstructor
public class KgSearchServiceImpl implements KgSearchService {

    private final KgEntityRepository kgEntityRepository;
    private final Neo4jClient neo4jClient;
    private final AiKgEntitySourceDao aiKgEntitySourceDao;

    /**
     * 中文片段提取正则：2-8个字的连续中文
     */
    private static final Pattern CHINESE_PATTERN = Pattern.compile("[\\u4e00-\\u9fa5]{2,8}");

    @Override
    public List<Long> searchRelatedKnowledgeIds(String query, int depth, int topK) {
        // 1. 提取候选关键词
        List<String> keywords = extractKeywords(query);
        if (keywords.isEmpty()) {
            return Collections.emptyList();
        }

        // 2. 用关键词在全文索引中查找匹配的实体节点
        List<KgEntityNode> matchedEntities = searchEntities(keywords, topK);
        if (matchedEntities.isEmpty()) {
            return Collections.emptyList();
        }

        // 3. 获取匹配实体的 entityId 列表
        List<String> entityIds = matchedEntities.stream()
                .map(KgEntityNode::getEntityId)
                .collect(Collectors.toList());

        // 4. 从匹配实体出发，按深度扩展子图
        Set<String> allEntityIds = new LinkedHashSet<>(entityIds);
        List<String> relatedEntityIds = findSubGraph(entityIds, depth);
        if (CollectionUtils.isNotEmpty(relatedEntityIds)) {
            allEntityIds.addAll(relatedEntityIds);
        }

        // 5. 通过关联表反查 knowledgeId
        List<AiKgEntitySource> sources = aiKgEntitySourceDao.selectList(
                new LambdaQueryWrapper<AiKgEntitySource>()
                        .in(AiKgEntitySource::getEntityId, allEntityIds)
                        .select(AiKgEntitySource::getKnowledgeId)
        );

        return sources.stream()
                .map(AiKgEntitySource::getKnowledgeId)
                .distinct()
                .collect(Collectors.toList());
    }

    /**
     * 提取关键词
     */
    private List<String> extractKeywords(String query) {
        Set<String> keywords = new LinkedHashSet<>();
        Matcher matcher = CHINESE_PATTERN.matcher(query);

        while (matcher.find()) {
            String segment = matcher.group();
            // 原始片段作为候选
            keywords.add(segment);

            // 对长片段（>4字）进行滑动窗口拆分
            if (segment.length() > 4) {
                for (int i = 0; i <= segment.length() - 4; i++) {
                    keywords.add(segment.substring(i, i + 4));
                }
            }
            if (segment.length() > 2) {
                for (int i = 0; i <= segment.length() - 2; i++) {
                    keywords.add(segment.substring(i, i + 2));
                }
            }
        }

        return new ArrayList<>(keywords);
    }

    /**
     * 使用全文索引搜索实体
     */
    private List<KgEntityNode> searchEntities(List<String> keywords, int topK) {
        Set<Long> seenIds = new HashSet<>();
        List<KgEntityNode> result = new ArrayList<>();

        for (String keyword : keywords) {
            if (result.size() >= topK) {
                break;
            }
            int remaining = topK - result.size();
            List<KgEntityNode> entities = kgEntityRepository.fullTextSearchByName(keyword, remaining);
            if (entities == null) {
                continue;
            }
            for (KgEntityNode entity : entities) {
                if (entity != null && entity.getId() != null && seenIds.add(entity.getId())) {
                    result.add(entity);
                }
            }
        }

        return result;
    }

    /**
     * 查询子图（Cypher 查询）
     */
    private List<String> findSubGraph(List<String> entityIds, int depth) {
        // 安全限制：深度范围 1-5
        int safeDepth = Math.max(1, Math.min(depth, 5));

        // Cypher 查询：从起始节点出发，沿着关系遍历指定深度
        String cypher = "MATCH (start:KgEntity) WHERE start.entityId IN $entityIds " +
                "MATCH (start)-[:KG_RELATION*1.." + safeDepth + "]-(related) " +
                "RETURN DISTINCT related.entityId AS entityId " +
                "LIMIT 50";  // 限制返回数量，避免上下文过长

        return neo4jClient.query(cypher)
                .bind(entityIds).to("entityIds")
                .fetch()
                .all()
                .stream()
                .map(record -> record.get("entityId").toString())
                .toList();
    }
}

Repository 定义：

public interface KgEntityRepository extends Neo4jRepository<KgEntityNode, Long> {

    /**
     * 全文索引搜索实体名称
     */
    @Query("CALL db.index.fulltext.queryNodes('kg_entity_name_idx', $keyword) " +
            "YIELD node, score " +
            "RETURN node " +
            "ORDER BY score DESC " +
            "LIMIT $limit")
    List<KgEntityNode> fullTextSearchByName(@Param("keyword") String keyword, @Param("limit") int limit);

    /**
     * 删除指定知识ID关联的所有关系
     */
    @Query("MATCH (:KgEntity)-[r:KG_RELATION {knowledgeId: $knowledgeId}]->(:KgEntity) DELETE r")
    void deleteRelationsByKnowledgeId(@Param("knowledgeId") Long knowledgeId);

    /**
     * 删除仅由指定知识ID创建且无其他关系的孤立节点
     */
    @Query("MATCH (n:KgEntity {knowledgeId: $knowledgeId}) " +
            "WHERE NOT (n)-[:KG_RELATION]-() " +
            "DELETE n")
    void deleteOrphanNodesByKnowledgeId(@Param("knowledgeId") Long knowledgeId);

    /**
     * 删除所有图谱数据
     */
    @Query("MATCH (n:KgEntity) DETACH DELETE n")
    void deleteAllGraphData();
}

整个检索流程可以概括为五个步骤：首先用正则和滑动窗口从用户问题中提取关键词片段，然后在 Neo4j 全文索引中匹配实体节点并按相关性排序，接着从匹配的实体出发沿着关系边向外扩展（深度可配置），收集所有关联实体的 ID，最后通过关联表反查出对应的知识库文档 ID。这样就完成了从"用户问题"到"关联知识"的映射。

整合到 RAG 流程

RAG 整合的核心思路是双路检索：先用向量检索找到语义相似的文档，再用图谱检索发现关联的知识，最后将两者合并去重。关键点在于图谱检索不是直接返回文档，而是先找到关联的知识 ID，然后回向量库做二次检索并设置相似度阈值（0.6），这样既保证了关联性（来自图谱），又保证了相关性（向量相似度），自动过滤掉不相关的噪音文档。

下面是具体实现：

@Slf4j
@Service
@RequiredArgsConstructor
public class ChatServiceImpl implements ChatService {

    private final DynamicChatClientFactory dynamicChatClientFactory;
    private final AiModelConfigService aiModelConfigService;
    private final AiRolePromptService aiRolePromptService;
    private final VectorStore vectorStore;
    private final KgSearchService kgSearchService;

    @Override
    public Flux<String> chatStream(String message, String conversationId) {
        // 构建 ChatClient
        ChatClient chatClient = dynamicChatClientFactory.buildDefaultClient();

        // 获取角色配置
        AiModelConfig config = aiModelConfigService.getDefaultConfig();
        AiRolePrompt rolePrompt = null;
        boolean enableRag = true;
        if (config != null && config.getRoleId() != null) {
            rolePrompt = aiRolePromptService.getById(config.getRoleId());
            if (rolePrompt != null && rolePrompt.getIsRagEnabled() != null) {
                enableRag = rolePrompt.getIsRagEnabled() == 1;
            }
        }

        var prompt = chatClient.prompt()
                .advisors(a -> a.param(ChatMemory.CONVERSATION_ID, conversationId));

        String finalUserInput = message;

        // RAG 增强检索
        if (enableRag && rolePrompt != null) {
            // 1. 向量检索配置
            int topK = rolePrompt.getTopK() != null && rolePrompt.getTopK() > 0
                    ? rolePrompt.getTopK() : 5;
            Double threshold = rolePrompt.getSimilarityThreshold();

            SearchRequest.Builder builder = SearchRequest.builder()
                    .query(message)
                    .topK(topK);
            if (threshold != null) {
                builder = builder.similarityThreshold(threshold);
            }

            // 元数据过滤（可选）
            if (StringUtils.isNotBlank(rolePrompt.getCustomFilter())) {
                builder = builder.filterExpression(rolePrompt.getCustomFilter());
            }

            SearchRequest searchRequest = builder.build();

            // 2. 执行向量检索
            var docs = vectorStore.similaritySearch(searchRequest);

            List<String> contextParts = new ArrayList<>();
            Set<String> seenContents = new LinkedHashSet<>();
            for (Document doc : docs) {
                String content = doc.getText();
                if (content == null) {
                    content = doc.toString();
                }
                if (seenContents.add(content)) {
                    contextParts.add(content);
                }
            }

            // 3. 图谱增强检索（可选）
            boolean enableGraph = rolePrompt.getIsGraphEnabled() != null
                    && rolePrompt.getIsGraphEnabled() == 1;
            if (enableGraph) {
                int graphDepth = rolePrompt.getGraphDepth();
                int graphTopK = rolePrompt.getGraphTopK();

                // 3.1 通过图谱找到关联的知识ID
                List<Long> knowledgeIds = kgSearchService.searchRelatedKnowledgeIds(
                        message, graphDepth, graphTopK);

                if (CollectionUtils.isNotEmpty(knowledgeIds)) {
                    // 3.2 关键：用图谱找到的ID，回向量库做二次检索（带相似度过滤）
                    SearchRequest graphSearchRequest = SearchRequest.builder()
                            .query(message)
                            .topK(Math.min(knowledgeIds.size(), 20))
                            .similarityThreshold(0.6)
                            .filterExpression("id in [" + knowledgeIds.stream()
                                    .map(String::valueOf)
                                    .collect(Collectors.joining(",")) + "]")
                            .build();

                    List<Document> graphDocs = vectorStore.similaritySearch(graphSearchRequest);

                    for (Document doc : graphDocs) {
                        String content = doc.getText();
                        if (content != null && seenContents.add(content)) {
                            contextParts.add(content);
                        }
                    }
                }
            }

            // 4. 合并上下文并构建 Prompt
            String context = String.join("\n\n---\n\n", contextParts);

            String template = rolePrompt.getRagTemplate();
            if (StringUtils.isNotBlank(template)) {
                finalUserInput = template
                        .replace("{context}", context)
                        .replace("{query}", message);
            }
        }

        // 5. 流式生成回答
        return prompt.user(finalUserInput)
                .stream()
                .content();
    }
}

整个 RAG 流程分为五个步骤：

配置向量检索参数（topK、相似度阈值、元数据过滤）
执行向量检索获取语义相似的文档
如果启用了图谱检索，通过图谱找到关联的知识 ID 并回向量库做二次检索（设置 0.6 的相似度阈值过滤噪音）
将向量检索和图谱检索的结果合并去重
用模板替换上下文和问题构建最终的 Prompt 并流式输出

这样就实现了向量检索（找相似）+ 图谱检索（找关联）的双路增强。

配置说明

application.yml 完整配置

spring:
  # Spring AI 配置
  ai:
    openai:
      api-key: ${OPENAI_API_KEY}
      base-url: https://api.siliconflow.cn  # 兼容 OpenAI 协议的 API 地址
      embedding:
        options:
          model: Qwen/Qwen3-Embedding-0.6B  # 向量化模型

  # Neo4j 配置
  neo4j:
    uri: bolt://localhost:7687
    authentication:
      username: neo4j
      password: your_password

# 自定义配置
lanjii:
  ai:
    # 知识图谱实体抽取配置
    graph:
      extraction:
        base-url: https://api.siliconflow.cn
        api-key: ${OPENAI_API_KEY}
        model: Qwen/Qwen2.5-7B-Instruct  # 推荐使用小模型，成本低效果好
        timeout-seconds: 180  # 超时时间（秒）

效果对比

传统 RAG（仅向量检索）

用户提问: "退款需要什么条件？"

检索结果:

命中包含"退款"关键词的文档
可能遗漏相关的"订单状态"、"支付方式"等关联知识

知识图谱增强 RAG

用户提问: "退款需要什么条件？"

检索结果:

向量检索：命中"退款流程"文档
图谱检索：
- 匹配实体：退款、退款申请
- 扩展关联：订单状态、支付方式、退款时效、审核流程
- 反查知识：获取所有关联文档

优势: 不仅找到直接相关的退款文档，还能发现订单、支付等上下游关联知识，提供更全面的回答。

源码与在线体验

完整源码：gitee.com/leven2018/l…

欢迎 Star ⭐ 和 Fork，项目包含本文涉及的所有代码（MCP 集成、多模型动态切换、RAG 知识库等）。

在线体验：http://106.54.167.194/admin/index

总结

通过 Spring AI 1.1.2 + Neo4j 实现的知识图谱增强 RAG，能够有效解决传统 RAG 只关注语义相似度而忽略知识关联的问题。核心优势在于：

发现隐含关联: 通过图遍历发现向量检索遗漏的关联知识
提升召回率: 在保持精确度的同时，显著提升知识召回率
增强可解释性: 通过实体和关系，让检索过程更加透明

适用场景：

✅ 企业知识库问答（如：客服、内部文档）
✅ 政策法规咨询（如：税务、法律）
✅ 产品手册查询（如：操作指南、故障排查）
✅ 医疗健康咨询（如：疾病、药品、治疗方案）
✅ 教育培训（如：课程、考试、学习路径）

不适用场景：

❌ 实时性要求极高的场景（图谱构建有延迟）
❌ 知识之间无明显关联的场景（如：新闻资讯）
❌ 纯事实查询（如：天气、股票）