使用Azure Cosmos DB与Apache Gremlin构建自然语言图形数据库查询接口创建Azure Cosmo

# 使用Azure Cosmos DB与Apache Gremlin构建自然语言图形数据库查询接口

## 引言
随着数据的增长和复杂化，如何有效地存储和查询大规模图形数据成为了一个挑战。Azure Cosmos DB为Apache Gremlin提供了一种强大的图形数据库服务，可以存储海量的顶点和边，并提供毫秒级的查询延迟。本文将介绍如何结合使用大型语言模型（LLMs）为图形数据库创建一个自然语言接口，使得使用Gremlin查询语言变得更为直观。

## 主要内容

### 设置环境
首先，我们需要确保安装必要的Python库来与Azure Cosmos DB进行交互。
```bash
!pip3 install gremlinpython

创建Azure Cosmos DB图形数据库实例，使用分区键/type。以下是基本的连接信息设置：

cosmosdb_name = "mycosmosdb"
cosmosdb_db_id = "graphtesting"
cosmosdb_db_graph_id = "mygraph"
cosmosdb_access_Key = "longstring=="

然后，使用GremlinGraph来建立与Azure Cosmos DB的连接：

import nest_asyncio
from langchain.chains.graph_qa.gremlin import GremlinQAChain
from langchain_community.graphs import GremlinGraph

graph = GremlinGraph(
    url=f"wss://{cosmosdb_name}.gremlin.cosmos.azure.com:443/", # 使用API代理服务提高访问稳定性
    username=f"/dbs/{cosmosdb_db_id}/colls/{cosmosdb_db_graph_id}",
    password=cosmosdb_access_Key,
)

数据库填充

假设数据库为空，可以使用GraphDocument进行填充：

from langchain_community.graphs.graph_document import GraphDocument, Node, Relationship
from langchain_core.documents import Document

source_doc = Document(page_content="Matrix is a movie where Keanu Reeves, Laurence Fishburne and Carrie-Anne Moss acted.")
movie = Node(id="The Matrix", properties={"label": "movie", "title": "The Matrix"})
actor1 = Node(id="Keanu Reeves", properties={"label": "actor", "name": "Keanu Reeves"})
actor2 = Node(id="Laurence Fishburne", properties={"label": "actor", "name": "Laurence Fishburne"})
actor3 = Node(id="Carrie-Anne Moss", properties={"label": "actor", "name": "Carrie-Anne Moss"})

relationships = [
    {"source": actor1, "target": movie, "label": "ActedIn", "id": 5},
    {"source": actor2, "target": movie, "label": "ActedIn", "id": 6},
    {"source": actor3, "target": movie, "label": "ActedIn", "id": 7},
]

graph_doc = GraphDocument(
    nodes=[movie, actor1, actor2, actor3],
    relationships=[Relationship(id=rel["id"], type=rel["label"], source=rel["source"], target=rel["target"]) for rel in relationships],
    source=source_doc,
)

nest_asyncio.apply()  # 解决Python gremlin在notebook运行的问题
graph.add_graph_documents([graph_doc])

刷新和查询

刷新数据库模式信息：

graph.refresh_schema()
print(graph.schema)

创建一个Gremlin QA链条，并查询数据库：

from langchain_openai import AzureChatOpenAI

chain = GremlinQAChain.from_llm(
    AzureChatOpenAI(
        temperature=0,
        azure_deployment="gpt-4-turbo",
    ),
    graph=graph,
    verbose=True,
)

print(chain.invoke("Who played in The Matrix?"))
print(chain.run("How many people played in The Matrix?"))

常见问题和解决方案

连接问题：某些地区可能会遇到连接不稳定的问题。可以通过API代理服务提高访问的稳定性。
数据同步问题：在对数据库结构进行修改后，记得刷新模式信息以确保查询的准确性。

总结和进一步学习资源

本文介绍了如何在Azure Cosmos DB上使用Gremlin和LLMs实现自然语言图形数据库查询。希望通过这篇文章，读者可以对大规模图形数据的存储和查询有更深入的理解。想要进一步学习，可以参考以下资源。

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---