使用ChatGPT的知识图谱

254 阅读5分钟

GPT让人着迷。我很想亲自评估一下我的迷恋。这是我写的一些自定义代码的延伸,将自然语言转换为图形中的 "有意义 "表示,但文字相对简单,是关于使用天然植物治疗常见疾病的。为了设定一些背景,我做了我的nltk/ NLP学习,当时我努力地把一本书的内容转换成图,花了不止几个周末和大量的stackoverflow/d3.js线程来修复我的错误和清理代码。

有了这些经验,我今天晚上开始 "试驾 "GPT引擎。我做了两个相对简单的用例,一个是尝试用一个简单的图表来总结NIST 800 sp 53控制目录,而另一个则是相对更多的人类想象力驱动的,我想用Neo4j来叙述一个 "狼来了 "的简单故事。

比较简单的用例1是由NIST的信息驱动的,其中明确指出了 控制家族。为了向任何读者解释这一点,我首先要求在人/过程/技术之间划分这些家族的手段。

普通文本

Dividing the NIST 800-53 control families into categories of people, process, and technology can be a useful way to understand how different aspects of information security are related to each other. Here is one way that the control families could be divided into these categories:

People:
Awareness and Training
Personnel Security

Process:
Access Control
Audit and Accountability
Security Assessment and Authorization
Configuration Management
Contingency Planning
Identification and Authentication
Incident Response
Maintenance
Media Protection
Planning
Risk Assessment
System and Services Acquisition
Program Management

Technology:
Physical and Environmental Protection
System and Communication Protection
System and Information Integrity

It's worth noting that this division is somewhat subjective, and there may be other ways to categorize the control families based on different criteria or perspectives. However, I hope that this provides a useful starting point for thinking about how the different control families relate to each other.

在做了这样的分类之后,我想只在技术控制范围内进行钻研(为了保持相对的约束),在图表中表示它们。经过不超过5到7次的迭代,我能够与GPT引擎同步,以驱动物理、技术和流程项目之间的区分,从而得到以下代码/图形。

密码

MATCH (n:ControlFamily {name:'Physical and Environmental Protection'})
SET n.description = 'Physical and environmental protections for devices and equipment that have the capability to store, process, or transmit sensitive data.'

MATCH (m:ControlFamily {name:'System and Communications Protection'})
SET m.description = 'Protection measures for information systems, networks, and communications channels.'

MATCH (o:ControlFamily {name:'System and Information Integrity'})
SET o.description = 'Measures to ensure the integrity, availability, and confidentiality of information on systems and networks.'

MATCH (n:ControlFamily {name:'Physical and Environmental Protection'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Visitor Control'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Physical Access Authorizations'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Access Monitoring and Accountability'})

MATCH (n:ControlFamily {name:'Physical and Environmental Protection'})
MERGE (n)-[:HAS_PHYSICAL_ITEM]->(:PhysicalItem {name:'Transmission Media'})
MERGE (n)-[:HAS_PHYSICAL_ITEM]->(:PhysicalItem {name:'Workstations and Servers'})
MERGE (n)-[:HAS_PHYSICAL_ITEM]->(:PhysicalItem {name:'Network Devices'})

MATCH (n:ControlFamily {name:'System and Communications Protection'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'System and Communications Protection Policies and Procedures'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Identification and Authentication'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'System and Communications Protection Mechanisms'})

MATCH (n:ControlFamily {name:'System and Information Integrity'})
MERGE (n)-[:HAS_TECHNOLOGY_ITEM]->(:TechnologyItem {name:'Data At Rest'})
MERGE (n)-[:HAS_TECHNOLOGY_ITEM]->(:TechnologyItem {name:'Data in Transit'})
MERGE (n)-[:HAS_TECHNOLOGY_ITEM]->(:TechnologyItem {name:'Integrity Verification'})

MATCH (n:ControlFamily {name:'System and Information Integrity'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Security Continuous Monitoring'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Configuration Management'})
MERGE (n)-[:HAS_PROCESS_ITEM]->(:ProcessItem {name:'Information Integrity Program'})

基本的,给你一个视角,让你知道为 "这个 "数据编写一个故事是多么容易。

考虑到这是比较容易的任务。现在,如果我们用OpenAI进入稍微困难的下一层次,这里有一个有趣的例子。

用例2,一个漂亮的小故事,叫做《哭泣的男孩,狼!》!这个故事讲的是一个喊狼来了的男孩,在真正的狼袭击他的羊群之前,他试图愚弄村民,结果他的喊声无人理睬,在村民中失去了信誉。

在没有具体背景的情况下,如果你要求引擎为这个故事生成一个图形,情况是这样的。

让我们假设我们需要更好的东西。我想分别叙述演员,也分别叫出动作,但更重要的是,以一种更容易消费的方式(对一般观众而言)叙述这个序列。尽管这提供了所有的复杂性和速度,但我还是经过了大约15次的反复,才得到了下面的代码。也许这就是为什么我们需要更好地学习 "提示工程 "和上下文设置,但这绝对比之前的例子高一个层次。

Cypher

CREATE (:Character {name: 'Boy', gender: 'Male', age: 10})
CREATE (:Character {name: 'Villagers', count: 35})
CREATE (:Character {name: 'Sheep', species: 'Ovis aries'})
CREATE (:Character {name: 'Wolf', species: 'Canis lupus'})

MATCH (b:Character {name: 'Boy'}),
      (v:Character {name: 'Villagers'}),
      (s:Character {name: 'Sheep'}),
      (w:Character {name: 'Wolf'})
      
CREATE (b)-[:cries {timestamp: 1}]->(w)
CREATE (v)-[:fooled_by {timestamp: 2}]->(b)
CREATE (v)-[:scold {timestamp: 3}]->(b)
CREATE (w)-[:attacks {timestamp: 4}]->(s)
CREATE (b)-[:cries {timestamp: 5}]->(w)
CREATE (v)-[:ignore {timestamp: 6}]->(b)


MATCH (c:Character)-[r]->(c2:Character)
RETURN c.name, r.timestamp, type(r), c2.name
ORDER BY r.timestamp

给你一个看起来像这样的图表,也许 Neo4j的绽放也能帮助更好地显示序列。现在,我用一个简单的表格来解释这个顺序。想让信息成为一个序列的 "概念"(这解释了上面的时间戳)花了大量的反复。

总结一下

  • 大型语言模型仍然令人着迷。GPT引擎不仅因为它产生的代码的速度和可接受的准确性而令人谦卑,而且还因为它抓住上下文的这种不可思议的能力。我还没有玩过GPT4 来进行比较,但我听说它也是成倍地好。
  • 当我们从相对简单的文本理解(用例1)转向通过讲故事的方式更 "人性化 "的东西时,迭代次数确实增加了,但我只是希望这种区别仍然存在!这只是希望!只是我的希望 :-)
  • 最后,你应该知道你想从这个模型中得到什么。这就像在你的汽车后面携带一个变形金刚的机器人,或者在其中。