# 使用RePhraseQuery Retriever优化查询检索:轻松预处理用户输入!
## 引言
在信息检索的过程中,如何高效地处理用户输入是一个关键问题。`RePhraseQuery` 是一个简单却强大的工具,它通过在用户输入和检索查询之间应用大语言模型(LLM),来帮助预处理用户输入。本篇文章将介绍如何使用 `RePhraseQueryRetriever` 来优化查询检索的过程,并提供实用的代码示例。
## 主要内容
### 设置你的向量存储
在开始之前,确保你有一个向量存储来存放处理后的数据。以下是设置步骤:
```python
import logging
from langchain.retrievers import RePhraseQueryRetriever
from langchain_chroma import Chroma
from langchain_community.document_loaders import WebBaseLoader
from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_text_splitters import RecursiveCharacterTextSplitter
# 配置日志记录
logging.basicConfig()
logging.getLogger("langchain.retrievers.re_phraser").setLevel(logging.INFO)
# 加载数据
loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()
# 文本切分
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)
# 创建向量存储
vectorstore = Chroma.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())
使用默认提示词
默认的提示模板用于将自然语言查询转换为向量存储查询:
DEFAULT_TEMPLATE = """You are an assistant tasked with taking a natural language \
query from a user and converting it into a query for a vectorstore. \
In this process, you strip out information that is not relevant for \
the retrieval task. Here is the user query: {question}"""
llm = ChatOpenAI(temperature=0)
retriever_from_llm = RePhraseQueryRetriever.from_llm(
retriever=vectorstore.as_retriever(), llm=llm
)
自定义提示词
你可以根据需求自定义提示模板,例如,将查询转化为"海盗风格"语言:
from langchain.chains import LLMChain
from langchain_core.prompts import PromptTemplate
QUERY_PROMPT = PromptTemplate(
input_variables=["question"],
template="""You are an assistant tasked with taking a natural languge query from a user
and converting it into a query for a vectorstore. In the process, strip out all
information that is not relevant for the retrieval task and return a new, simplified
question for vectorstore retrieval. The new user query should be in pirate speech.
Here is the user query: {question} """,
)
llm = ChatOpenAI(temperature=0)
llm_chain = LLMChain(llm=llm, prompt=QUERY_PROMPT)
retriever_from_llm_chain = RePhraseQueryRetriever(
retriever=vectorstore.as_retriever(), llm_chain=llm_chain
)
代码示例
以下是完整的查询检索代码示例:
docs = retriever_from_llm.invoke(
"Hi I'm Lance. What are the approaches to Task Decomposition?"
)
INFO:langchain.retrievers.re_phraser:Re-phrased question: The user query can be converted into a query for a vectorstore as follows:
"approaches to Task Decomposition"
docs = retriever_from_llm_chain.invoke(
"Hi I'm Lance. What is Maximum Inner Product Search?"
)
INFO:langchain.retrievers.re_phraser:Re-phrased question: Ahoy matey! What be Maximum Inner Product Search, ye scurvy dog?
常见问题和解决方案
-
API访问限制:由于网络限制,API的调用可能会遇到地区性限制。可以通过使用API代理服务来提高访问的稳定性。例如:
# 使用API代理服务提高访问稳定性 -
提示词效果不理想:尝试调整提示模板的内容,以更好地适应应用场景。
总结和进一步学习资源
RePhraseQueryRetriever 是一个灵活和强大的工具,能够帮助提升信息检索的效率。通过定制化的提示词,你可以根据具体的需求进行优化。
参考资料
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!
---END---