项目代号:EnterpriseInsight-KG 领域:企业工商 / 商业情报 / 政务监管 核心场景:智能问答(KBQA + GraphRAG) 本体版本:v1.0.0 状态:开发实施阶段(详细设计已冻结,DDL 进入受控变更)
1. 本体设计范围与不范围
1.1 范围(In Scope)
本体覆盖以下知识:
- 主体:企业、自然人、政府机构、社会组织、产品、品牌
- 关系:股权、任职、对外投资、关联交易、上下游、司法、行政、新闻同框
- 事件:工商变更、司法案件、行政处罚、招投标、融资、舆情事件
- 属性:基础工商信息、信用评级、经营异常、专利、商标、资质许可
- 空间:行政区划、地址、园区
- 时间:所有关系与事件均带有效时间
1.2 不范围(Out of Scope)
明确不建模的内容(避免范围蔓延):
- ❌ 个人隐私信息(身份证号、家庭住址、手机号等敏感字段)—— 仅存哈希
- ❌ 企业财务报表明细 —— 由数据仓库管理,图谱仅引用关键指标
- ❌ 产品 SKU 级粒度 —— 仅到产品系列
- ❌ 详细法律条文 —— 仅作引用关系
2. 顶层本体(Upper Ontology)
Thing
├── Subject (主体) ← 可作为关系的主语/宾语
│ ├── LegalEntity (法人)
│ │ ├── Enterprise (企业)
│ │ ├── GovernmentAgency (政府机构)
│ │ └── SocialOrganization (社会组织:协会/基金会等)
│ ├── NaturalPerson (自然人)
│ └── UnincorporatedEntity (非法人组织:合伙企业/个体户)
├── IntangibleAsset (无形资产)
│ ├── Brand (品牌)
│ ├── Product (产品/服务)
│ ├── Patent (专利)
│ ├── Trademark (商标)
│ ├── Qualification (资质许可)
│ └── Standard (标准)
├── Event (事件) ← 时间点/区间发生的事
│ ├── BusinessChange (工商变更)
│ ├── LegalCase (司法案件)
│ ├── AdminPenalty (行政处罚)
│ ├── Tender (招投标)
│ ├── FinancingEvent (融资事件)
│ └── PublicOpinionEvent (舆情事件)
├── Place (空间)
│ ├── AdministrativeRegion (行政区划)
│ ├── Address (具体地址)
│ └── IndustrialPark (园区)
├── Concept (抽象概念)
│ ├── Industry (行业分类,国标 GB/T 4754-2017)
│ ├── BusinessScope (经营范围条目)
│ ├── Tag (标签)
│ └── Topic (舆情主题)
└── Document (文档) ← 知识来源
├── PublicAnnouncement (公开公告)
├── NewsArticle (新闻文章)
├── LegalDocument (法律文书)
└── InternalReport (内部报告)
继承深度上限:4 层(Thing 不计入)。
3. 实体类型详细定义
命名规范:节点 Label 用
PascalCase,属性用snake_case。所有实体必带_meta字段(参见 §6)。
3.1 Enterprise(企业)— 核心实体
标识
- 业务主键:
unified_credit_code(统一社会信用代码,18 位) - 辅助键:
registration_no(注册号,老制度,2015 年前) - 系统主键:
uuid
属性表
| 属性名 | 类型 | 必填 | 索引 | 取值约束 | 说明 |
|---|---|---|---|---|---|
uuid | String | ✓ | Unique | UUID v4 | 系统主键 |
unified_credit_code | String | ✓ | Unique | ^[0-9A-HJ-NPQRTUWXY]{18}$ | 统一社会信用代码 |
registration_no | String | Range | 老注册号 | 2015 年前注册企业 | |
name | String | ✓ | Text | 长度 ≤ 200 | 工商登记名称 |
aliases | List<String> | Fulltext | 曾用名/简称/英文名 | ||
legal_representative_name | String | Range | 法定代表人姓名(冗余字段,便于检索) | ||
registered_capital | Float | Range | ≥ 0 | 注册资本(万元) | |
paid_in_capital | Float | Range | ≥ 0 | 实缴资本(万元) | |
capital_currency | String | ISO 4217 | 币种,默认 CNY | ||
enterprise_type | String | ✓ | Range | 枚举见下 | 企业类型 |
establishment_date | Date | ✓ | Range | 成立日期 | |
business_term_start | Date | 营业期限起 | |||
business_term_end | Date | 营业期限止(null = 长期) | |||
registration_authority | String | Range | 登记机关 | ||
registration_status | String | ✓ | Range | 枚举见下 | 登记状态 |
industry_code | String | ✓ | Range | GB/T 4754-2017 | 国民经济行业代码 |
industry_name | String | Range | 行业名称(冗余) | ||
business_scope | String | Fulltext | 经营范围全文 | ||
email | String | RFC 5322 | |||
phone | String | E.164 | |||
website | String | URL | |||
is_listed | Boolean | Range | 是否上市 | ||
stock_code | String | Range | 股票代码(如有) | ||
stock_exchange | String | 枚举:SSE/SZSE/HKEX/NASDAQ/NYSE | 上市交易所 | ||
staff_size | String | Range | 枚举:<50, 50-100, 100-500, 500-1000, 1000-5000, >5000 | 人员规模 | |
is_high_tech | Boolean | 高新技术企业 | |||
is_specialized_new | Boolean | 专精特新 | |||
credit_rating | String | Range | AAA/AA/A/B/C/D | 信用评级 |
枚举值定义:
enterprise_type:
- LIMITED_LIABILITY # 有限责任公司
- JOINT_STOCK # 股份有限公司
- WHOLLY_FOREIGN_OWNED # 外商独资
- SINO_FOREIGN_JOINT_VENTURE # 中外合资
- PARTNERSHIP # 合伙企业
- SOLE_PROPRIETORSHIP # 个人独资
- INDIVIDUAL_BUSINESS # 个体工商户
- STATE_OWNED # 国有企业
- COLLECTIVE # 集体企业
- OTHER
registration_status:
- IN_BUSINESS # 在营/在册
- CANCELLED # 注销
- REVOKED # 吊销
- SUSPENDED # 停业
- LIQUIDATING # 清算中
- MIGRATED_OUT # 迁出
派生标签(多 Label)
:Enterprise:Listed—— 上市公司:Enterprise:HighTech—— 高新技术企业:Enterprise:SpecializedNew—— 专精特新:Enterprise:Abnormal—— 经营异常名录:Enterprise:Untrustworthy—— 严重违法失信:Enterprise:KeyMonitored—— 重点监控(业务标记)
派生标签由数据维护规则触发,不由人工设置。
来源系统优先级
工商总局 NECIPS > 各省企信公示 > 第三方数据商(天眼查/企查查/启信宝)> 自有抓取
完整性要求
- 必填字段填充率 ≥ 99%
- 行业代码填充率 ≥ 95%
- 经营范围填充率 ≥ 90%
3.2 NaturalPerson(自然人)
⚠️ 合规要点:个人信息处理需遵循《个人信息保护法》,本图谱仅存储与企业相关的公开任职信息,不存储完整身份证号、不存储未公开的联系方式。
标识
- 业务主键:
person_hash(姓名+证件号尾 4 位+出生年月的 SHA256) - 辅助键:
(name, gender, birth_year)三元组
不使用完整身份证号作为主键,规避 PII 集中存储风险。同名同姓不同人通过其他维度(关联企业、地区、出生年份)消歧。
属性表
| 属性名 | 类型 | 必填 | 索引 | 说明 |
|---|---|---|---|---|
uuid | String | ✓ | Unique | 系统主键 |
person_hash | String | ✓ | Unique | 标识哈希 |
name | String | ✓ | Text | 姓名 |
aliases | List<String> | Fulltext | 别名/曾用名 | |
gender | String | Range | M/F/U | |
birth_year | Integer | Range | 出生年份(仅年) | |
nationality | String | Range | ISO 3166-1 alpha-2 | |
id_card_tail | String | 证件号后 4 位(用于消歧) | ||
is_pep | Boolean | Range | 是否政治公众人物 | |
is_sanctioned | Boolean | Range | 是否被制裁 | |
is_executed_dishonest | Boolean | Range | 是否被列为失信被执行人 |
派生标签
:NaturalPerson:Executive—— 担任过企业高管:NaturalPerson:Shareholder—— 持有企业股权:NaturalPerson:LegalRep—— 担任过法定代表人:NaturalPerson:HighRisk—— 命中风险名单
3.3 GovernmentAgency(政府机构)
| 属性名 | 类型 | 必填 | 索引 | 说明 |
|---|---|---|---|---|
uuid | String | ✓ | Unique | |
agency_code | String | ✓ | Unique | 机构代码 |
name | String | ✓ | Text | 机构名称 |
level | String | ✓ | Range | 中央/省/市/区县/乡镇 |
agency_type | String | ✓ | Range | 党委/人大/政府/政协/监察/司法/其他 |
parent_code | String | Range | 上级机构代码 | |
administrative_region_code | String | ✓ | Range | GB/T 2260 行政区划代码 |
3.4 SocialOrganization(社会组织)
涵盖协会、基金会、社团、民非。
| 属性名 | 类型 | 必填 | 说明 |
|---|---|---|---|
unified_credit_code | String | ✓ | 统一社会信用代码 |
name | String | ✓ | |
org_type | String | ✓ | 社会团体/基金会/民办非企业/事业单位 |
registration_authority | String | 登记管理机关 | |
competent_unit | String | 业务主管单位 | |
establishment_date | Date | ||
purpose | String | 宗旨业务范围 |
3.5 Brand / Product / Patent / Trademark / Qualification
Brand
properties:
uuid: String, unique
name: String, required, text-indexed
english_name: String
category: String # 行业类别
founded_year: Integer
Product
properties:
uuid: String, unique
name: String, required, text-indexed
category: String, required
description: Text
launch_date: Date
Patent
properties:
uuid: String, unique
patent_no: String, unique # 专利号
title: String, required, fulltext
patent_type: String # 发明/实用新型/外观设计
application_date: Date
publication_date: Date
grant_date: Date
status: String # 申请中/已授权/已失效/已转让
ipc_classification: List<String> # 国际专利分类
Trademark
properties:
uuid: String, unique
trademark_no: String, unique # 注册号
name: String, required
nice_class: String # 尼斯分类
application_date: Date
registration_date: Date
expiration_date: Date
status: String # 已注册/申请中/无效/续展
Qualification(资质许可)
properties:
uuid: String, unique
qualification_no: String, unique
qualification_type: String, required # 行政许可/资质等级/认证
qualification_name: String, required
issuing_authority: String # 发证机关
issue_date: Date
expiration_date: Date
status: String # VALID/EXPIRED/REVOKED
3.6 Event 类(事件)
事件是知识图谱情报场景的核心。所有事件统一抽象:
所有事件的公共属性:
common_event_properties:
uuid: String, unique
event_id: String, unique # 业务事件 ID
event_type: String, required # 事件类型枚举
event_date: Date, required # 发生日期
publish_date: Date # 公开日期
source_doc_id: String # 来源文档
source_url: String # 来源 URL
confidence: Float, [0,1]
status: String # PENDING/CONFIRMED/DISPUTED
3.6.1 BusinessChange(工商变更事件)
event_type: BusinessChange
extra_properties:
change_item: String, required # 变更项:法定代表人/注册资本/经营范围/股东/...
before_value: String # 变更前
after_value: String # 变更后
relations:
- (BusinessChange)-[:CHANGE_OF]->(Enterprise)
- (BusinessChange)-[:APPROVED_BY]->(GovernmentAgency)
3.6.2 LegalCase(司法案件)
event_type: LegalCase
extra_properties:
case_no: String, unique # 案号
case_type: String # 民事/刑事/行政/执行/破产
case_reason: String # 案由
court_name: String # 审理法院
trial_level: String # 一审/二审/再审
trial_date: Date
judgment_date: Date
judgment_result: String
amount_involved: Float # 涉案金额
relations:
- (LegalCase)-[:HAS_PARTY {role: 'PLAINTIFF|DEFENDANT|THIRD_PARTY'}]->(Subject)
- (LegalCase)-[:JUDGED_BY]->(GovernmentAgency)
- (LegalCase)-[:CITES_LAW]->(LegalDocument)
3.6.3 AdminPenalty(行政处罚)
event_type: AdminPenalty
extra_properties:
decision_no: String, unique # 决定文书号
violation_type: String # 违法类型
violation_description: Text
penalty_type: List<String> # 警告/罚款/没收/责令整改/吊销许可证
penalty_amount: Float
decision_date: Date
relations:
- (AdminPenalty)-[:PUNISHES]->(Subject)
- (AdminPenalty)-[:ISSUED_BY]->(GovernmentAgency)
3.6.4 Tender(招投标)
event_type: Tender
extra_properties:
tender_no: String, unique
project_name: String, required
tender_type: String # 公开招标/邀请招标/竞争性磋商
budget_amount: Float
winning_amount: Float
publish_date: Date
bid_opening_date: Date
relations:
- (Tender)-[:TENDERED_BY]->(Subject) # 招标方
- (Tender)-[:WON_BY]->(Subject) # 中标方
- (Tender)-[:BID_BY]->(Subject) # 投标方
- (Tender)-[:AGENT_BY]->(Subject) # 代理机构
3.6.5 FinancingEvent(融资事件)
event_type: FinancingEvent
extra_properties:
round: String # 天使/Pre-A/A/B/C/D/IPO/Pre-IPO/战略
amount: Float
currency: String # ISO 4217
valuation: Float # 投后估值
announcement_date: Date
relations:
- (FinancingEvent)-[:FINANCED]->(Enterprise) # 被投企业
- (FinancingEvent)-[:LED_BY]->(Subject) # 领投方
- (FinancingEvent)-[:CO_INVESTED_BY]->(Subject) # 跟投方
3.6.6 PublicOpinionEvent(舆情事件)
event_type: PublicOpinionEvent
extra_properties:
title: String, required
summary: Text
sentiment: String # POSITIVE/NEUTRAL/NEGATIVE
severity: String # LOW/MEDIUM/HIGH/CRITICAL
topic: String # 主题:产品质量/财务造假/管理层动荡/…
first_seen: DateTime
peak_date: DateTime
spread_score: Float
relations:
- (PublicOpinionEvent)-[:ABOUT]->(Subject)
- (PublicOpinionEvent)-[:REPORTED_IN]->(NewsArticle)
- (PublicOpinionEvent)-[:CATEGORIZED_AS]->(Topic)
3.7 Place 类
AdministrativeRegion(行政区划)
properties:
uuid: String, unique
region_code: String, unique # GB/T 2260
name: String, required
level: String # 国家/省/市/区县/乡镇/村
parent_code: String
relations:
- (AdministrativeRegion)-[:BELONGS_TO]->(AdministrativeRegion) # 上级
Address(具体地址)
properties:
uuid: String, unique
full_address: String, required, fulltext
province_code: String
city_code: String
district_code: String
street: String
coord: Point # 经纬度
geohash: String, indexed
3.8 Concept 类
Industry(行业)
properties:
uuid: String, unique
code: String, unique # 国民经济行业分类代码 GB/T 4754-2017
name: String, required
level: Integer # 1=门类, 2=大类, 3=中类, 4=小类
parent_code: String
relations:
- (Industry)-[:BELONGS_TO]->(Industry)
Tag(标签)
properties:
name: String
category: String # 业务标签/风险标签/能力标签/...
weight: Float
# Node Key: (name, category)
Topic(舆情主题)
properties:
uuid: String, unique
name: String, required
category: String
description: Text
3.9 Document 类
PublicAnnouncement / NewsArticle / LegalDocument / InternalReport
公共属性:
common_document_properties:
uuid: String, unique
doc_id: String, unique
title: String, required, fulltext
content: Text, fulltext # 全文(可选,敏感时不存)
summary: String # 摘要
publish_date: DateTime
source: String # 来源系统
source_url: String
doc_type: String
language: String # zh-CN/en/...
hash: String # 内容哈希,去重用
relations:
- (Document)-[:MENTIONS]->(Subject) # 提及实体
- (Document)-[:CITES]->(Document) # 引用其他文档
- (Document)-[:AUTHORED_BY]->(Subject) # 作者
4. 关系类型完整定义
4.1 股权与控制关系
HOLDS_SHARE(持股)
direction: (:Subject)-[:HOLDS_SHARE]->(:LegalEntity)
cardinality: N:N
properties:
percentage: Float, required # 持股比例(0-1)
subscribed_capital: Float # 认缴出资(万元)
paid_in_capital: Float # 实缴出资
capital_type: String # 货币/实物/知识产权/...
share_type: String # 普通股/优先股
valid_from: Date, required
valid_to: Date # null = 当前持有
source: String
confidence: Float
ACTUAL_CONTROLS(实际控制)— 派生关系
direction: (:Subject)-[:ACTUAL_CONTROLS]->(:LegalEntity)
cardinality: N:N
derived: true
properties:
control_path: List<String> # 控制路径(UUID 列表)
control_ratio: Float # 累计实际控制比例
derived_at: DateTime
derivation_rule: String # 派生规则版本
BENEFICIAL_OWNER_OF(最终受益人)— 派生关系
direction: (:NaturalPerson)-[:BENEFICIAL_OWNER_OF]->(:LegalEntity)
cardinality: N:N
derived: true
properties:
beneficial_ratio: Float
path_depth: Integer
derived_at: DateTime
4.2 任职关系
SERVES_AS(任职)
direction: (:NaturalPerson)-[:SERVES_AS]->(:LegalEntity)
cardinality: N:N
properties:
position: String, required # 职务:董事长/总经理/董事/监事/财务负责人/...
is_legal_rep: Boolean # 是否法定代表人
is_executive: Boolean # 是否高管
valid_from: Date, required
valid_to: Date
source: String
LEGAL_REP_OF(法定代表人)— 派生
direction: (:NaturalPerson)-[:LEGAL_REP_OF]->(:LegalEntity)
cardinality: 1:N # 一人可任多家
derived: true
# 等价于 SERVES_AS{is_legal_rep:true, valid_to:null}
4.3 企业间关系
BRANCH_OF(分支机构)
direction: (:Enterprise)-[:BRANCH_OF]->(:Enterprise)
properties:
branch_type: String # 分公司/办事处/代表处
PARENT_OF / SUBSIDIARY_OF(母子公司)— 派生
由 HOLDS_SHARE 比例 > 50% 派生。
SUPPLIES(供货)
direction: (:Enterprise)-[:SUPPLIES]->(:Enterprise)
properties:
product_category: String
cooperation_start: Date
cooperation_end: Date
amount_range: String # 合作金额区间
source_evidence: String # 信息来源(如公告/招股说明书)
COMPETES_WITH(竞争)
direction: (:Enterprise)-[:COMPETES_WITH]->(:Enterprise)
properties:
industry_code: String
competition_score: Float # 算法计算的竞争度
derived: Boolean
COOPERATES_WITH(合作/战略合作)
direction: (:Enterprise)-[:COOPERATES_WITH]->(:Enterprise)
properties:
cooperation_type: String # 战略合作/技术合作/合资/...
start_date: Date
end_date: Date
description: Text
4.4 地理与行业关系
REGISTERED_AT(注册地址)
direction: (:LegalEntity)-[:REGISTERED_AT]->(:Address)
properties:
valid_from: Date
valid_to: Date
LOCATED_IN(位于行政区划)
direction: (:Address)-[:LOCATED_IN]->(:AdministrativeRegion)
direction_alt: (:LegalEntity)-[:LOCATED_IN]->(:AdministrativeRegion) # 派生
IN_INDUSTRY(所属行业)
direction: (:Enterprise)-[:IN_INDUSTRY]->(:Industry)
properties:
is_primary: Boolean # 是否主行业
4.5 资产关系
OWNS_PATENT / OWNS_TRADEMARK
direction: (:Subject)-[:OWNS_PATENT|OWNS_TRADEMARK]->(:Patent|:Trademark)
properties:
ownership_type: String # SOLE/JOINT
valid_from: Date
valid_to: Date
HOLDS_QUALIFICATION
direction: (:LegalEntity)-[:HOLDS_QUALIFICATION]->(:Qualification)
PRODUCES / OPERATES_BRAND
direction: (:Enterprise)-[:PRODUCES]->(:Product)
direction: (:Enterprise)-[:OPERATES_BRAND]->(:Brand)
4.6 文档与提及关系
MENTIONS
direction: (:Document)-[:MENTIONS]->(:Subject|:Event)
properties:
positions: List<[start, end]> # 在文档中出现的位置
frequency: Integer
confidence: Float
sentiment: String # POSITIVE/NEUTRAL/NEGATIVE
4.7 关系汇总表
| # | 关系名 | 起点 | 终点 | 基数 | 是否派生 |
|---|---|---|---|---|---|
| 1 | HOLDS_SHARE | Subject | LegalEntity | N:N | 否 |
| 2 | ACTUAL_CONTROLS | Subject | LegalEntity | N:N | ✓ |
| 3 | BENEFICIAL_OWNER_OF | NaturalPerson | LegalEntity | N:N | ✓ |
| 4 | SERVES_AS | NaturalPerson | LegalEntity | N:N | 否 |
| 5 | LEGAL_REP_OF | NaturalPerson | LegalEntity | 1:N | ✓ |
| 6 | BRANCH_OF | Enterprise | Enterprise | N:1 | 否 |
| 7 | PARENT_OF | Enterprise | Enterprise | N:N | ✓ |
| 8 | SUBSIDIARY_OF | Enterprise | Enterprise | N:N | ✓ |
| 9 | SUPPLIES | Enterprise | Enterprise | N:N | 否 |
| 10 | COMPETES_WITH | Enterprise | Enterprise | N:N | 视情况 |
| 11 | COOPERATES_WITH | Enterprise | Enterprise | N:N | 否 |
| 12 | REGISTERED_AT | LegalEntity | Address | N:N | 否 |
| 13 | LOCATED_IN | Address/LegalEntity | AdministrativeRegion | N:1 | 否 |
| 14 | IN_INDUSTRY | Enterprise | Industry | N:N | 否 |
| 15 | OWNS_PATENT | Subject | Patent | N:N | 否 |
| 16 | OWNS_TRADEMARK | Subject | Trademark | N:N | 否 |
| 17 | HOLDS_QUALIFICATION | LegalEntity | Qualification | N:N | 否 |
| 18 | PRODUCES | Enterprise | Product | N:N | 否 |
| 19 | OPERATES_BRAND | Enterprise | Brand | N:N | 否 |
| 20 | CHANGE_OF | BusinessChange | Enterprise | N:1 | 否 |
| 21 | HAS_PARTY | LegalCase | Subject | N:N | 否 |
| 22 | JUDGED_BY | LegalCase | GovernmentAgency | N:1 | 否 |
| 23 | PUNISHES | AdminPenalty | Subject | N:N | 否 |
| 24 | ISSUED_BY | AdminPenalty | GovernmentAgency | N:1 | 否 |
| 25 | TENDERED_BY | Tender | Subject | N:1 | 否 |
| 26 | WON_BY | Tender | Subject | N:N | 否 |
| 27 | BID_BY | Tender | Subject | N:N | 否 |
| 28 | FINANCED | FinancingEvent | Enterprise | N:1 | 否 |
| 29 | LED_BY | FinancingEvent | Subject | N:N | 否 |
| 30 | CO_INVESTED_BY | FinancingEvent | Subject | N:N | 否 |
| 31 | ABOUT | PublicOpinionEvent | Subject | N:N | 否 |
| 32 | REPORTED_IN | PublicOpinionEvent | NewsArticle | N:N | 否 |
| 33 | MENTIONS | Document | Subject/Event | N:N | 否 |
| 34 | CITES | Document | Document | N:N | 否 |
| 35 | TAGGED_AS | Any | Tag | N:N | 否 |
5. 派生关系规则
派生关系由定时任务(Airflow DAG)根据规则计算生成,规则版本化管理。
5.1 ACTUAL_CONTROLS 派生规则 v1
// 规则:A 直接或间接持有 B 超过 50% 股权,则 A 实际控制 B
MATCH path = (a:Subject)-[r:HOLDS_SHARE*1..6]->(b:LegalEntity)
WHERE all(rel IN r WHERE rel.valid_to IS NULL OR rel.valid_to > date())
WITH a, b, path,
reduce(s = 1.0, rel IN r | s * rel.percentage) AS control_ratio
WHERE control_ratio > 0.5
WITH a, b, max(control_ratio) AS max_ratio
MERGE (a)-[c:ACTUAL_CONTROLS]->(b)
SET c.control_ratio = max_ratio,
c.derived_at = datetime(),
c.derivation_rule = 'v1.0',
c.derived = true;
5.2 BENEFICIAL_OWNER_OF 派生规则 v1
// 规则:自然人通过持股链最终持有企业 ≥ 25%
MATCH path = (p:NaturalPerson)-[r:HOLDS_SHARE*1..8]->(e:LegalEntity)
WHERE all(rel IN r WHERE rel.valid_to IS NULL)
WITH p, e, path,
reduce(s = 1.0, rel IN r | s * rel.percentage) AS beneficial_ratio,
length(path) AS depth
WHERE beneficial_ratio >= 0.25
WITH p, e, max(beneficial_ratio) AS max_ratio, min(depth) AS min_depth
MERGE (p)-[bo:BENEFICIAL_OWNER_OF]->(e)
SET bo.beneficial_ratio = max_ratio,
bo.path_depth = min_depth,
bo.derived_at = datetime(),
bo.derived = true;
5.3 派生关系维护准则
- 完全可重建:派生关系可以被清空后重新计算,不影响业务连续性
- 不接受外部写入:API 与 ETL 不允许写派生关系
- 版本化:规则迭代时同时保留旧关系并标注规则版本,灰度切换
- 可观测:每次重算记录
derived_at和影响节点/关系数
6. 元数据规范(强制)
每个节点、每条非派生关系必须有以下元数据字段:
_meta:
uuid: String, unique # 系统主键
source: String, required # 数据来源系统编码
source_id: String # 在源系统中的主键
source_record_url: String # 源记录链接(用于审计)
created_at: DateTime, required # 入图时间
updated_at: DateTime, required # 最后更新时间
confidence: Float, [0,1] # 置信度
status: String # ACTIVE/DEPRECATED/MERGED/DELETED
version: Integer # 版本号
merged_from: List<String> # 被合并的 uuid 列表
extracted_by: String # 抽取器/版本:rule_v1.0 / uie_v0.3 / llm_claude_opus_4_7
数据源编码规范:
NECIPS # 国家企业信用信息公示系统
SAIC_<省份> # 各省市监管局
JUDICIAL # 中国裁判文书网
CREDIT_CHINA # 信用中国
GOV_CN # 中国政府网
TYC # 天眼查
QCC # 企查查
QXB # 启信宝
INTERNAL # 自有系统
CRAWLER_<域名> # 自有爬虫
7. Neo4j Schema DDL(生产可执行)
文件:
schema/changesets/001_init_constraints.cypher
// ====================================================================
// 001. 初始约束(唯一性 + Node Key)
// ====================================================================
// Enterprise
CREATE CONSTRAINT enterprise_uuid_unique IF NOT EXISTS
FOR (n:Enterprise) REQUIRE n.uuid IS UNIQUE;
CREATE CONSTRAINT enterprise_uscc_unique IF NOT EXISTS
FOR (n:Enterprise) REQUIRE n.unified_credit_code IS UNIQUE;
CREATE CONSTRAINT enterprise_name_not_null IF NOT EXISTS
FOR (n:Enterprise) REQUIRE n.name IS NOT NULL;
// NaturalPerson
CREATE CONSTRAINT person_uuid_unique IF NOT EXISTS
FOR (n:NaturalPerson) REQUIRE n.uuid IS UNIQUE;
CREATE CONSTRAINT person_hash_unique IF NOT EXISTS
FOR (n:NaturalPerson) REQUIRE n.person_hash IS UNIQUE;
// GovernmentAgency
CREATE CONSTRAINT agency_uuid_unique IF NOT EXISTS
FOR (n:GovernmentAgency) REQUIRE n.uuid IS UNIQUE;
CREATE CONSTRAINT agency_code_unique IF NOT EXISTS
FOR (n:GovernmentAgency) REQUIRE n.agency_code IS UNIQUE;
// SocialOrganization
CREATE CONSTRAINT social_org_uuid_unique IF NOT EXISTS
FOR (n:SocialOrganization) REQUIRE n.uuid IS UNIQUE;
CREATE CONSTRAINT social_org_uscc_unique IF NOT EXISTS
FOR (n:SocialOrganization) REQUIRE n.unified_credit_code IS UNIQUE;
// Patent / Trademark / Qualification
CREATE CONSTRAINT patent_no_unique IF NOT EXISTS
FOR (n:Patent) REQUIRE n.patent_no IS UNIQUE;
CREATE CONSTRAINT trademark_no_unique IF NOT EXISTS
FOR (n:Trademark) REQUIRE n.trademark_no IS UNIQUE;
CREATE CONSTRAINT qualification_no_unique IF NOT EXISTS
FOR (n:Qualification) REQUIRE n.qualification_no IS UNIQUE;
// Address & Region
CREATE CONSTRAINT region_code_unique IF NOT EXISTS
FOR (n:AdministrativeRegion) REQUIRE n.region_code IS UNIQUE;
CREATE CONSTRAINT address_uuid_unique IF NOT EXISTS
FOR (n:Address) REQUIRE n.uuid IS UNIQUE;
// Industry
CREATE CONSTRAINT industry_code_unique IF NOT EXISTS
FOR (n:Industry) REQUIRE n.code IS UNIQUE;
// Tag - 复合 Node Key
CREATE CONSTRAINT tag_composite_key IF NOT EXISTS
FOR (n:Tag) REQUIRE (n.name, n.category) IS NODE KEY;
// Event 类
CREATE CONSTRAINT business_change_id_unique IF NOT EXISTS
FOR (n:BusinessChange) REQUIRE n.event_id IS UNIQUE;
CREATE CONSTRAINT legal_case_no_unique IF NOT EXISTS
FOR (n:LegalCase) REQUIRE n.case_no IS UNIQUE;
CREATE CONSTRAINT admin_penalty_no_unique IF NOT EXISTS
FOR (n:AdminPenalty) REQUIRE n.decision_no IS UNIQUE;
CREATE CONSTRAINT tender_no_unique IF NOT EXISTS
FOR (n:Tender) REQUIRE n.tender_no IS UNIQUE;
CREATE CONSTRAINT financing_id_unique IF NOT EXISTS
FOR (n:FinancingEvent) REQUIRE n.event_id IS UNIQUE;
CREATE CONSTRAINT opinion_id_unique IF NOT EXISTS
FOR (n:PublicOpinionEvent) REQUIRE n.event_id IS UNIQUE;
// Document
CREATE CONSTRAINT document_id_unique IF NOT EXISTS
FOR (n:Document) REQUIRE n.doc_id IS UNIQUE;
CREATE CONSTRAINT document_hash_unique IF NOT EXISTS
FOR (n:Document) REQUIRE n.hash IS UNIQUE;
文件:
schema/changesets/002_indexes.cypher
// ====================================================================
// 002. 索引
// ====================================================================
// Range Index - 等值/范围查询
CREATE INDEX enterprise_name_range IF NOT EXISTS
FOR (n:Enterprise) ON (n.name);
CREATE INDEX enterprise_status_industry IF NOT EXISTS
FOR (n:Enterprise) ON (n.registration_status, n.industry_code);
CREATE INDEX enterprise_establishment_date IF NOT EXISTS
FOR (n:Enterprise) ON (n.establishment_date);
CREATE INDEX person_name_range IF NOT EXISTS
FOR (n:NaturalPerson) ON (n.name);
CREATE INDEX address_geohash IF NOT EXISTS
FOR (n:Address) ON (n.geohash);
CREATE INDEX event_date_range IF NOT EXISTS
FOR (n:Event) ON (n.event_date);
// Text Index - CONTAINS / STARTS WITH
CREATE TEXT INDEX enterprise_name_text IF NOT EXISTS
FOR (n:Enterprise) ON (n.name);
CREATE TEXT INDEX person_name_text IF NOT EXISTS
FOR (n:NaturalPerson) ON (n.name);
// Point Index
CREATE POINT INDEX address_coord_point IF NOT EXISTS
FOR (n:Address) ON (n.coord);
// Full-text Index - 中文分词
CREATE FULLTEXT INDEX subject_fulltext IF NOT EXISTS
FOR (n:Enterprise|NaturalPerson|GovernmentAgency|SocialOrganization)
ON EACH [n.name, n.aliases]
OPTIONS {
indexConfig: {
`fulltext.analyzer`: 'cjk',
`fulltext.eventually_consistent`: true
}
};
CREATE FULLTEXT INDEX document_fulltext IF NOT EXISTS
FOR (n:Document) ON EACH [n.title, n.summary, n.content]
OPTIONS {
indexConfig: {
`fulltext.analyzer`: 'cjk',
`fulltext.eventually_consistent`: true
}
};
CREATE FULLTEXT INDEX business_scope_fulltext IF NOT EXISTS
FOR (n:Enterprise) ON EACH [n.business_scope]
OPTIONS {
indexConfig: {
`fulltext.analyzer`: 'cjk'
}
};
// Vector Index - 实体嵌入,用于对齐与 RAG
CREATE VECTOR INDEX enterprise_embedding IF NOT EXISTS
FOR (n:Enterprise) ON n.embedding
OPTIONS {
indexConfig: {
`vector.dimensions`: 1024,
`vector.similarity_function`: 'cosine'
}
};
CREATE VECTOR INDEX person_embedding IF NOT EXISTS
FOR (n:NaturalPerson) ON n.embedding
OPTIONS {
indexConfig: {
`vector.dimensions`: 1024,
`vector.similarity_function`: 'cosine'
}
};
// 关系上的索引(5.7+)
CREATE INDEX share_valid_from IF NOT EXISTS
FOR ()-[r:HOLDS_SHARE]-() ON (r.valid_from);
CREATE INDEX serves_as_valid IF NOT EXISTS
FOR ()-[r:SERVES_AS]-() ON (r.valid_from, r.valid_to);
文件:
schema/changesets/003_initial_data.cypher
// ====================================================================
// 003. 初始数据:行业分类(GB/T 4754-2017 截选)、行政区划(GB/T 2260)
// 实际部署时通过 CSV 批量导入,此处仅占位
// ====================================================================
// 行业分类(门类层)
UNWIND $industries AS row
MERGE (i:Industry {code: row.code})
SET i.name = row.name,
i.level = row.level,
i.parent_code = row.parent_code,
i.uuid = coalesce(i.uuid, randomUUID()),
i.created_at = coalesce(i.created_at, datetime()),
i.updated_at = datetime();
WITH 1 AS dummy
MATCH (child:Industry), (parent:Industry {code: child.parent_code})
WHERE child.parent_code IS NOT NULL
MERGE (child)-[:BELONGS_TO]->(parent);
7.1 Schema 变更管理
schema/
├── README.md
├── changelog.yaml # Liquibase 风格变更日志
├── changesets/
│ ├── 001_init_constraints.cypher
│ ├── 002_indexes.cypher
│ ├── 003_initial_data.cypher
│ ├── 004_add_event_label_hierarchy.cypher
│ └── ...
├── rollback/
│ ├── 001_drop_constraints.cypher
│ └── ...
└── seeds/ # 初始字典数据
├── industries_gbt4754.csv
├── regions_gbt2260.csv
└── pep_list.csv
变更日志 changelog.yaml 示例:
databaseChangeLog:
- changeSet:
id: 001
author: ontology-team
comment: "Initial constraints for Enterprise/NaturalPerson/Event"
changes:
- cypherFile: changesets/001_init_constraints.cypher
rollback:
- cypherFile: rollback/001_drop_constraints.cypher
- changeSet:
id: 002
author: ontology-team
comment: "Indexes including fulltext (CJK) and vector indexes"
changes:
- cypherFile: changesets/002_indexes.cypher
preconditions:
- sqlCheck:
expectedResult: 1
sql: "SHOW CONSTRAINTS WHERE name = 'enterprise_uscc_unique' RETURN 1"
8. 业务查询样例(验证本体可用性)
下列 30 个查询用于本体设计验证。若任何查询无法表达,则本体设计不完整。
8.1 基础事实查询
// Q1: 某企业的基本工商信息
MATCH (e:Enterprise {unified_credit_code: $uscc})
RETURN e;
// Q2: 某企业当前的法定代表人
MATCH (p:NaturalPerson)-[r:SERVES_AS]->(e:Enterprise {unified_credit_code: $uscc})
WHERE r.is_legal_rep = true AND r.valid_to IS NULL
RETURN p, r;
// Q3: 某企业当前股东列表(按持股比例降序)
MATCH (s)-[r:HOLDS_SHARE]->(e:Enterprise {unified_credit_code: $uscc})
WHERE r.valid_to IS NULL
RETURN s, r.percentage AS pct
ORDER BY pct DESC;
8.2 关联关系查询
// Q4: 某自然人控制的所有企业
MATCH (p:NaturalPerson {name: $name})-[:ACTUAL_CONTROLS]->(e:Enterprise)
RETURN e.name, e.unified_credit_code;
// Q5: 某企业的母公司链(向上)
MATCH path = (e:Enterprise {unified_credit_code: $uscc})-[:SUBSIDIARY_OF*1..5]->(parent)
RETURN path;
// Q6: 某企业的所有子公司(向下,全集)
MATCH (e:Enterprise {unified_credit_code: $uscc})<-[:SUBSIDIARY_OF*1..5]-(child)
RETURN DISTINCT child;
// Q7: 两个自然人是否存在共同任职/共同投资关系
MATCH (p1:NaturalPerson {name: $n1}), (p2:NaturalPerson {name: $n2})
MATCH (p1)-[:SERVES_AS|HOLDS_SHARE]->(e)<-[:SERVES_AS|HOLDS_SHARE]-(p2)
RETURN DISTINCT e;
// Q8: 两家企业的最短关联路径
MATCH (a:Enterprise {unified_credit_code: $a}),
(b:Enterprise {unified_credit_code: $b})
MATCH p = shortestPath((a)-[*..6]-(b))
RETURN p;
8.3 风险查询
// Q9: 某企业涉及的所有司法案件(按时间倒序)
MATCH (e:Enterprise {unified_credit_code: $uscc})<-[:HAS_PARTY]-(c:LegalCase)
RETURN c
ORDER BY c.judgment_date DESC;
// Q10: 某企业及其控制企业的所有行政处罚
MATCH (e:Enterprise {unified_credit_code: $uscc})
MATCH (e)<-[:ACTUAL_CONTROLS*0..3]-(controller)
MATCH (controller)-[:ACTUAL_CONTROLS*0..3]->(target:Enterprise)
MATCH (target)<-[:PUNISHES]-(p:AdminPenalty)
RETURN target.name, p
ORDER BY p.decision_date DESC;
// Q11: 失信被执行人控制的企业
MATCH (p:NaturalPerson {is_executed_dishonest: true})-[:ACTUAL_CONTROLS]->(e:Enterprise)
RETURN p.name, collect(e.name) AS controlled;
8.4 情报与画像查询
// Q12: 某企业全画像(一图全展示)
MATCH (e:Enterprise {unified_credit_code: $uscc})
OPTIONAL MATCH (e)<-[r1:SERVES_AS]-(p:NaturalPerson)
WHERE r1.valid_to IS NULL
OPTIONAL MATCH (e)<-[r2:HOLDS_SHARE]-(s)
WHERE r2.valid_to IS NULL
OPTIONAL MATCH (e)-[:SUBSIDIARY_OF]->(parent)
OPTIONAL MATCH (e)<-[:SUBSIDIARY_OF]-(child)
OPTIONAL MATCH (e)-[:IN_INDUSTRY]->(i)
OPTIONAL MATCH (e)-[:REGISTERED_AT]->(addr)
OPTIONAL MATCH (e)<-[:FINANCED]-(fe:FinancingEvent)
RETURN e, collect(DISTINCT p) AS execs,
collect(DISTINCT s) AS shareholders,
collect(DISTINCT parent) AS parents,
collect(DISTINCT child) AS subsidiaries,
i, addr, collect(DISTINCT fe) AS financings;
// Q13: 行业图谱:某行业最近 3 年融资 TOP 20
MATCH (e:Enterprise)-[:IN_INDUSTRY]->(i:Industry {code: $code})
MATCH (e)<-[:FINANCED]-(fe:FinancingEvent)
WHERE fe.announcement_date > date() - duration({years: 3})
RETURN e, sum(fe.amount) AS total_financing
ORDER BY total_financing DESC LIMIT 20;
// Q14: 同一受益人下的所有企业("一人多企"识别)
MATCH (p:NaturalPerson)-[:BENEFICIAL_OWNER_OF]->(e:Enterprise)
WITH p, collect(e) AS enterprises
WHERE size(enterprises) >= 5
RETURN p, enterprises;
// Q15: 关联交易筛查:股东与其投资企业之间的供货关系
MATCH (a)-[:HOLDS_SHARE]->(b:Enterprise)
MATCH (a)-[s:SUPPLIES]->(b)
RETURN a, b, s;
8.5 KBQA 典型问句覆盖
下列 15 个自然语言问题,均能由本体支撑(详见 KBQA 设计文档):
- 阿里巴巴的法定代表人是谁?
- 张三在哪些企业担任董事?
- 字节跳动的实际控制人是谁?
- 宁德时代的子公司有哪些?
- 某公司在 2023 年有哪些诉讼?
- 比亚迪和宁德时代有什么关联?
- 哪些公司在最近一年被列入经营异常名录?
- 新能源汽车行业最近 5 年获得 B 轮以上融资的公司有哪些?
- 某地区的高新技术企业有多少家?
- 某人最终受益的企业有哪些?
- 这家公司有什么行政处罚记录?
- 某专利的发明人和申请人是谁?
- 某品牌归属哪家企业?
- 这家公司在哪些招标项目中中标?
- 这家公司最近的负面舆情事件是什么?
9. 本体演进治理
9.1 角色与权责
| 角色 | 权限 | 责任 |
|---|---|---|
| 本体管理员(Ontologist) | 提出/审批本体变更 | 本体一致性、文档完整性 |
| 数据 Owner | 评审领域内变更 | 业务正确性 |
| 平台架构师 | 评审 Schema DDL | 性能与可扩展性 |
| QA | 验证回归测试 | 质量保障 |
9.2 变更流程
[提案] → [影响评估] → [Schema 评审会] → [灰度环境验证] → [生产发布] → [回归验证]
| | | | | |
Issue Doc 会议纪要 Staging 测试 DDL 上线 测试集回归
9.3 变更类型 SLA
| 类型 | 评审周期 | 必要审批 |
|---|---|---|
| PATCH(描述修改) | 1 天 | 本体管理员 |
| MINOR(新增) | 3 天 | 本体管理员 + 数据 Owner |
| MAJOR(破坏性) | 2 周 | 全体评审会 + 项目经理 |
9.4 已知 v1.0 → v2.0 演进方向
- 引入时空双时态(在金融场景需要)
- 增加 ESG 维度(环境/社会/治理评分)
- 引入关联交易专题事件 Subclass
- 引入跨境投资(境外母公司、VIE 架构建模)
10. 配套交付物
- ✅ 本设计书(
01_领域本体详细设计书.md) - ✅ Schema DDL(
schema/changesets/*.cypher) - ⏳ OWL 文件(用于跨系统交换,可选生成)
- ⏳ 实体字典 CSV(标签、行业、行政区划)
- ⏳ 业务查询库(
docs/queries/,本文档第 8 节的可执行版本) - ⏳ 测试数据样例(
tests/fixtures/,30 家典型企业的完整图)
文档状态:✅ 已冻结,进入 DDL 受控变更阶段
下一步:参见 02_代码工程脚手架.md