基于NestJS，由本体工程构建图数据库（Apache AGE）实现RAG本项目基于NestJS语言框架，以本体工程为前

1.引言

本项目基于NestJS语言框架，以本体工程为前置条件，实现基于知识图谱的RAG，数据库使用PostSQL+Apache AGE扩展。

2.本体工程介绍

简单介绍本体工程以及其与知识图谱的关系

2.1 什么是本体工程？

顾名思义，构建“本体”的工程。

　什么是“本体”？

本体（ontology）：一种结构化的知识模型，对某个领域中概念、概念之间关系及其约束的形式化、可机读的描述。

一个完整的领域本体包含 5 类核心要素：

　　概念（Class） ：领域内的核心实体类别，如 “人”、“疾病”

　　关系（Property/Relation）： 概念之间的关联，分两类：

　　　（1）数据属性（DataProperty）：概念与基础数据的关联，如 “人有年龄”

　　　（2）对象属性（ObjectProperty）：概念与概念的关联，如 “人有疾病”

　　实例（Individual）： 概念的具体对象，如 “张三” 是 “人” 的实例

　　公理（Axiom）： 领域内的规则或约束，如 “一份病例只能属于一个病人”

　　约束（Constraint）： 对属性或关系的限制，如 “年龄必须大于 0”

2.2本体工程与知识图谱的关系

知识图谱（Knowledge Graph） = 本体（模式/Schema）+ 实例数据（ individuals ）。

本体工程聚焦的是构建和管理“模式层”：

　　定义有哪些类：person, doctor, patient,disease,symptom...

　　定义有哪些关系：hasSymptom,hasAge,treat...

　　定义约束规则：hasAge的值必须大于0...

“模式”确定后，不断添加实例（individuals），这些规范化、标准化的实例即构成知识图谱

3.如何构建本体工程

3.1需求分析

明确本体的应用场景（如 “用于医疗诊断推理”）。

确定本体覆盖的领域范围（如 “仅包含疾病与应对药物”、“包含医生、病人、疾病、疾病等全部医院体系中的概念”）。

明确用户需求（如 “支持自动化症状匹配”）。

3.2领域知识获取

来源：领域专家、专业文献、现有数据

目标数据：

　　领域概念清单(class)

　　概念属性清单（data properties）

　　概念间关系清单(object properties/relation)

3.3本体建模

目的：将自然语言描述的知识，转化为机器可理解的形式化表示。

方法： 使用本体编辑工具（如 Protege、Neo4j），通过形式化语言（如 OWL、RDF）定义概念、关系、公理。

　　（１）构建概念层级（如 “ 人 → 病人”、“ 人 → 医生”）。

　　（２）定义属性与关系（如 “人” 的 “年龄” 属性，“药物”与“疾病”类之间为“治疗”的关系）

　　（３）添加公理与约束（如 “年龄≥0”）。

常用工具与技术栈

本体编辑工具：

　　Protege：免费开源，支持 OWL 语言，内置推理机，是学术和工业界最常用的工具。

　　Neo4j：图数据库，适合存储大规模本体（以图结构存储概念和关系），支持高效查询。

形式化语言：

　　RDF（Resource Description Framework）：三元组形式，用于描述资源（概念、实例）及其关系，是本体的数据基础格式。

　　OWL（Web Ontology Language）：最主流的本体描述语言，基于 RDF，支持类（Class）的层级关系，可添加逻辑约束。

推理机（hermit）：

　　Hermit、Fact++：用于本体一致性校验和逻辑推理（如自动推导子类关系、检测矛盾）。

本项目使用Protege工具构建本体，领域为“医疗诊断”。（本项目的本体构建比较简单，仅作为学习参考）

　　在顶层类owl：thing（OWL 本体的 默认顶级类，代表「所有实体的集合」）下构建了三个类：Department、MedicalEntity和person

　　其中Department为医务管理类，其具有子类major代表医疗专业，major又具有子类orthopedics（骨科）和dentistry（牙科）代表医生的专业，点击dentistry可以在右侧描述栏看到其为major的subclass 　　MedicalEntity（医疗实体类）和person（人物类）同理，不做赘述。

　　构建完class后，进行属性properties的构建，其分为object properties对象属性（代表对象概念之间的关联）和data properties数据属性（代表对象概念与基础数据的关联）

　　在本项目中构建了以下几个对象属性：causeSymptom（造成症状）,designTreatment（设计治疗方案）, hasDisease（具有疾病）,hasMajor（医疗专业）,hasSymptom（具有症状）,hasTreatment（具有治疗方案）和treat（治疗）。

　　其中，点击causeSymptom可以看到其“domains”为“disease”，“ranges”为“symptom”，这代表“disease”类具有“causeSymptom”对象属性，而其对象为“symptom”，

　　即关系链：disease→causeSymptom→symptom

　　其余对象属性也均其对应的domains与ranges，代表前者与后者的关系

　　在本项目中构建了一个数据属性dataproperties：hasAge，其domains为patient，ranges为xsd:integer[>= 0 , <= 120]，其表示patient类的数据属性hasAge的取值范围为[>= 0 , <= 120]。

　　到这里已经完成了类与对象属性、数据属性的构建，本体（ontology）的框架已经搭建完毕，后续根据项目需求可以再对本体的内容进行修改。

　　接下来就是往该本体内添加individuals（个体/实例），在本项目中添加了以下一些individuals：

　　其中toothDecay（龋齿）是disease类里面的一个实例，tpyes对应disease，其具有causeSymptom对象属性，而对象则是个体toothache（牙痛），toothache为symptom类中的一个实例。

即：toothDecay → causeSymptom → toothache

与之对应的是：disease → causeSymptom → symptom

　　再以PatientKate举例，其为patient类中的一个实例，具有hasSymptom对象属性，对象为toothache，还具有hsaTreatment对象属性，对象为PatientKateTreatment。另外，该实例还具有数据属性：hasAge，该属性的值为20.

　　本项目的本体构建流程介绍完毕，接下来介绍基于该本体工程的后续开发。

3.4本体构建完成后，怎么用？

　　使用protege构建完成后，可导出为ttl文件或owl文件，文件内容为形式化语言表示，可将文件解析为三元组构建知识图谱，也可借助支持 RDF/OWL 语义操作的工具或库，直接对ttl文件或owl文件进行增删改查　　

4.项目具体实现

4.1项目架构

本体工程RAG架构.png

　　本项目为基于 NestJS + PostgreSQL + Apache AGE + LLM 的知识图谱问答系统，采用双模存储和混合检索策略：

核心流程：

　　1.数据入库：TTL 文件通过 N3.js 解析，同步存储到关系表和图数据库

　　2.知识获取：优先使用图数据库进行精确实体关系检索，失败时降级到关系表模糊搜索

　　3.答案生成：基于检索到的知识图谱上下文，调用 LLM 生成自然语言回答

4.2文档解析模块

import { Injectable } from '@nestjs/common';
import { Parser as N3Parser } from 'n3';
import * as fs from 'fs';
import { Parser, Triple } from './parser.interface';

@Injectable()
export class TtlParserService implements Parser {
  async parse(filePath: string): Promise<Triple[]> {
    const parser = new N3Parser();
    const triples: Triple[] = [];

    const fileContent = await fs.promises.readFile(filePath, 'utf-8');

    return new Promise((resolve, reject) => {
      parser.parse(fileContent, (error, quad, prefixes) => {
        if (error) {
          reject(error);
        } else if (quad) {
          triples.push({
            subject: quad.subject.value,
            predicate: quad.predicate.value,
            object: quad.object.value,
          });
        } else {
          // 解析完成
          resolve(triples);
        }
      });
    });
  }
}

　　TTL文件解析器服务，使用N3库将RDF Turtle格式文件解析为结构化的三元组数组，通过异步Promise模式和事件驱动回调完成文件读取和数据转换过程

4.2知识库模块

TTL文件解析完成后，需要将数据存入数据库，本项目数据库采用双重索引机制：

关系型存储 (PostgreSQL)：

将三元组存入 kg_triples 表

使用 orIgnore() 避免重复数据

作为备份和降级方案

图数据库存储 (Apache AGE)：

将三元组转换为图结构

使用 Cypher 语句创建节点和关系

MERGE 操作确保幂等性

数据入库代码如下：

async storeTriples(triples: TripleInterface[], source?: string): Promise<void> {
    // ========== 第一步：存入关系型数据库 ==========
    // 将三元组转换为数据库实体格式
    const entities = triples.map(triple => ({
      subject: triple.subject,      // 主体（如：http://example.org/Patient1）
      predicate: triple.predicate,  // 谓词（如：http://example.org/hasDisease）
      object: triple.object,        // 客体（如：http://example.org/Diabetes）
      source: source || 'unknown',  // 数据来源
      confidence: 1.0,              // 置信度（默认为1.0，表示完全确定）
    }));

    // 批量插入到关系型数据库
    // orIgnore() 确保重复数据不会导致错误（幂等操作）
    await this.kgTripleRepository
      .createQueryBuilder()
      .insert()
      .into(KGTriple)
      .values(entities)
      .orIgnore()  // 如果数据已存在则忽略，避免主键冲突
      .execute();

    // ========== 第二步：存入图数据库（如果可用）==========
    if (this.isAgeEnabled) {
      this.logger.log(`Starting to store ${triples.length} triples into graph...`);
      await this.storeTriplesInGraph(triples);
      this.logger.log(`Finished storing triples into graph.`);
    }
  }

  /**
   * 将三元组批量存入图数据库
   * 
   * 使用分批处理策略避免：
   * 1. 单个事务过大导致内存问题
   * 2. 过多小事务导致性能下降
   * 
   * @param triples - 要存储的三元组数组
   */
  private async storeTriplesInGraph(triples: TripleInterface[]) {
    // 每批处理100个三元组，平衡性能和稳定性
    const batchSize = 100;

    // 分批遍历所有三元组
    for (let i = 0; i < triples.length; i += batchSize) {
      const batch = triples.slice(i, i + batchSize);
      await this.processGraphBatch(batch);
    }
  }

  /**
   * 处理单批三元组，将其转换为图结构并存入 Apache AGE
   * 
   * 图结构转换逻辑：
   * - 每个三元组的 subject 和 object 转换为 Entity 类型的节点
   * - 每个三元组的 predicate 转换为节点之间的 RELATIONSHIP 边
   * - 使用 MERGE 操作确保幂等性（节点和边存在则更新，不存在则创建）
   * 
   * @param batch - 一批三元组数据
   */
  private async processGraphBatch(batch: TripleInterface[]) {
    // 确保当前连接已加载 AGE 并设置正确的搜索路径
    // （每次查询前都需要设置，因为连接可能被池化复用）
    await this.dataSource.query(`LOAD 'age';`);
    await this.dataSource.query(`SET search_path = ag_catalog, "$user", public;`);

    // 准备 Cypher UNWIND 所需的参数数据
    const params = {
      batch: batch.map(t => {
        /**
         * 从 URI 中提取可读名称
         * 例如：
         * - "http://example.org/ontology#Patient1" → "Patient1"
         * - "http://example.org/ontology/Patient1" → "Patient1"
         * - "Patient1" → "Patient1"
         */
        const extractName = (uri: string) => {
          if (uri.includes('#')) return uri.split('#').pop();  // 处理 # 分隔的 URI
          if (uri.includes('/')) return uri.split('/').pop();  // 处理 / 分隔的 URI
          return uri;  // 如果不是 URI 格式，直接返回
        };

        return {
          s: t.subject,                    // 主体完整 URI
          s_name: extractName(t.subject),  // 主体可读名称
          p: t.predicate,                  // 谓词完整 URI
          p_name: extractName(t.predicate),// 谓词可读名称
          o: t.object,                     // 客体完整 URI
          o_name: extractName(t.object)    // 客体可读名称
        };
      })
    };

    /**
     * 构建 Cypher 查询语句
     * 
     * 查询逻辑说明：
     * 1. UNWIND $batch as row - 将批量数据展开为单行处理
     * 2. MERGE (s:Entity {uri: row.s}) - 合并/创建主体节点（基于 URI 唯一标识）
     * 3. SET s.name = row.s_name - 设置节点的可读名称属性
     * 4. MERGE (o:Entity {uri: row.o}) - 合并/创建客体节点
     * 5. MERGE (s)-[r:RELATIONSHIP {uri: row.p}]->(o) - 合并/创建关系边
     * 
     * MERGE 操作的特点：
     * - 如果匹配的节点/边存在，则返回现有的
     * - 如果不存在，则创建新的
     * - 这确保了操作的幂等性
     */
    const query = `
      SELECT * FROM cypher('${this.graphName}', $$
        UNWIND $batch as row
        MERGE (s:Entity {uri: row.s})
        SET s.name = row.s_name
        MERGE (o:Entity {uri: row.o})
        SET o.name = row.o_name
        MERGE (s)-[r:RELATIONSHIP {uri: row.p}]->(o)
        SET r.name = row.p_name
      $$, $1) as (a agtype);
    `;

    try {
      // 执行 Cypher 查询，参数以 JSON 字符串形式传递
      await this.dataSource.query(query, [JSON.stringify(params)]);
    } catch (error) {
      this.logger.error(`Failed to store graph batch: ${error.message}`);
      // 记录失败批次的第一条数据用于调试
      if (batch.length > 0) {
        this.logger.debug(`First item in failed batch: ${JSON.stringify(batch[0])}`);
      }
    }
  }

　　在进行数据库查询检索时，优先使用图数据库进行cypher查询，若查询失败则降级到关系型数据库查询，因为关系型数据库支持LIKE模糊匹配，当查询输入模糊时的匹配能力更强。

知识图谱查询代码如下，通过LLM对查询语句进行实体提取，将提取到的实体放入数据库进行查询。

async queryEntity(entityName: string): Promise<{
    success: boolean;
    triples: Triple[];
    count: number;
    message: string;
  }> {
    try {
      let triples: Triple[] = [];

      // 1. 优先尝试图数据库查询 (Apache AGE)
      try {
        triples = await this.kgStore.searchByGraph(entityName);

        if (triples.length > 0) {
          return {
            success: true,
            triples,
            count: triples.length,
            message: `Found ${triples.length} triples for entity ${entityName} (via Graph DB)`,
          };
        }
      } catch (graphError) {
        // 图查询失败，静默失败，降级到关系查询
      }

      // 2. 降级策略：使用关系型数据库查询（作为兜底）
      triples = await this.queryEngine.queryEntity(entityName);

      // 如果没找到，尝试URI匹配 (http://example.org/实体名)
      if (triples.length === 0) {
        const uriEntity = `http://example.org/${entityName}`;
        triples = await this.queryEngine.queryEntity(uriEntity);
      }

      // 如果还是没找到，尝试数据库模糊匹配 (推荐)
      if (triples.length === 0) {
        triples = await this.kgStore.searchByKeyword(entityName);
      }

      return {
        success: true,
        triples,
        count: triples.length,
        message: `Found ${triples.length} triples for entity ${entityName}`,
      };
    } catch (error) {
      return {
        success: false,
        triples: [],
        count: 0,
        message: `Entity query failed: ${error.message}`,
      };
    }
  }

4.3问答模块

问答系统核心服务代码如下，其中设计的策略选择部分包含KG策略（知识图谱策略）、RAG策略和混合策略（KG+RAG）

  async ask(request: QaRequestDto): Promise<QaResponseDto> {
    const startTime = Date.now();

    try {
      // 1. 问题预处理
      const cleanedQuestion = this.preprocessQuestion(request.question);

      // 2. 选择策略
      const strategy = this.selectStrategy(request.mode);
      const strategyConfig = this.buildStrategyConfig(request);

      // 3. 执行策略获取上下文和答案
      const strategyResult = await strategy.execute(cleanedQuestion, strategyConfig);

      // 4. 后处理答案
      const processedAnswer = this.postprocessAnswer(strategyResult.answer);

      // 5. 记录查询历史
      await this.recordQueryHistory({...});

      // 6. 构建响应
      const response: QaResponseDto = {
        answer: processedAnswer,
        sources: strategyResult.sources,
        strategy: request.mode,
        confidence: strategyResult.confidence,
        timestamp: new Date().toISOString(),
        metadata: {
          executionTime: strategyResult.executionTime,
          totalTime: Date.now() - startTime,
        },
      };

      return response;
    } catch (error) {
      // ... 错误处理
    }
  }

其中，KG策略流程包括：

实体抽取：使用LLM从问题中提取命名实体

关系抽取：使用LLM识别问题中暗示的关系类型

图谱查询：根据实体和关系在知识图谱中搜索相关三元组

答案生成：将三元组作为上下文，使用LLM生成答案

KG策略的问答代码如下：

  async execute(question: string, config?: Record<string, any>): Promise<QaStrategyResult> {
   // 1. 从问题中提取实体和关系（使用大模型）
   const entities = await this.extractEntities(question);
   const relations = await this.extractRelations(question);

   // 2. 查询知识图谱
   const kgResults = await this.queryKnowledgeGraph(entities, relations, maxDepth);

   // 3. 基于图谱结果生成答案
   const context = this.assembleKGContext(kgResults);
   const prompt = this.buildKGPrompt(question, context);
   let answer = await this.llmService.generate(prompt, undefined, {
     temperature,
     maxTokens,
   });

   // 4. 构建来源信息
   const sources: QaSource[] = kgResults.map(triple => ({
     type: 'kg' as const,
     content: `${triple.subject} -> ${triple.predicate} -> ${triple.object}`,
     // ...
   }));

   return { answer, sources, strategy: 'kg', confidence, executionTime };
 }

4.4其他服务

本项目使用的embedding模型为阿里百炼text-embedding-v4，向量维度1024（向量数据库维度需保持一致）；LLM为qwen-turbo，嵌入模型和llm均通过调用阿里百炼api使用。