langchain v0.2文档（3）如何使用流（中英对照）如何使用流式输出，流媒体对于使基于 LLM 的应用程序能够响

Streaming is critical in making applications based on LLMs feel responsive to end-users.

流媒体对于使基于 LLM 的应用程序能够响应最终用户而言至关重要

Important LangChain primitives like chat models, output parsers, prompts, retrievers, and agents implement the LangChain Runnable Interface.

聊天模型、输出解析器、提示、检索器和代理等重要的 LangChain 原语实现了 LangChain Runnable Interface。

This interface provides two general approaches to stream content:

该接口提供了两种流内容的通用方法：

sync stream and async astream: a default implementation of streaming that streams the final output from the chain.

同步流和异步astream：流式传输的默认实现，用于流式传输链中的最终输出。

async astream_events and async astream_log: these provide a way to stream both intermediate steps and final output from the chain.

astream_log：它们提供了一种从链中传输中间步骤和最终输出的方法。

Let's take a look at both approaches, and try to understand how to use them.

让我们看一下这两种方法，并尝试了解如何使用它们。

Using Stream (使用流)

All Runnable objects implement a sync method called stream and an async variant called astream.

所有 Runnable 对象都实现一个称为 Stream 的同步方法和一个称为 astream 的异步变体。

These methods are designed to stream the final output in chunks, yielding each chunk as soon as it is available.

这些方法旨在以块的形式传输最终输出，并在每个块可用时立即生成它。

Streaming is only possible if all steps in the program know how to process an input stream; i.e., process an input chunk one at a time, and yield a corresponding output chunk.

仅当程序中的所有步骤都知道如何处理输入流时，流式传输才可能实现；即，一次处理一个输入块，并产生相应的输出块。

The complexity of this processing can vary, from straightforward tasks like emitting tokens produced by an LLM, to more challenging ones like streaming parts of JSON results before the entire JSON is complete.

此处理的复杂性可能会有所不同，从简单的任务（例如由 LLM 生成的令牌）到更具挑战性的任务（例如在整个 JSON 完成之前流式传输 JSON 结果的部分）。

The best place to start exploring streaming is with the single most important components in LLMs apps-- the LLMs themselves!

开始探索流媒体的最佳位置是 LLMs应用程序中最重要的一个组件—— LLMs 本身！

LLMs and Chat Models

Large language models and their chat variants are the primary bottleneck in LLM based apps.

大型语言模型及其聊天变体是基于 LLM 的应用程序的主要瓶颈。

Large language models can take several seconds to generate a complete response to a query. This is far slower than the ~200-300 ms threshold at which an application feels responsive to an end user.

大型语言模型可能需要几秒钟才能生成对查询的完整响应。这远远慢于应用程序对最终用户的响应速度约为 200-300 毫秒的阈值。

The key strategy to make the application feel more responsive is to show intermediate progress; viz., to stream the output from the model token by token.

让应用程序感觉响应更快的关键策略是显示中间进度；即，按令牌流式传输模型令牌的输出。

We will show examples of streaming using a chat model. Choose one from the options below:

我们将展示使用聊天模型进行流式传输的示例。从以下选项中选择一项：

pip install -qU langchain-openai

import getpass
import os

os.environ["OPENAI_API_KEY"] = getpass.getpass()

from langchain_openai import ChatOpenAI

model = ChatOpenAI(model="gpt-3.5-turbo-0125")

Let's start with the sync stream API:

chunks = []
for chunk in model.stream("what color is the sky?"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

The| sky| appears| blue| during| the| day|.|

Alternatively, if you're working in an async environment, you may consider using the async astream API:

chunks = []
async for chunk in model.astream("what color is the sky?"):
    chunks.append(chunk)
    print(chunk.content, end="|", flush=True)

The| sky| appears| blue| during| the| day|.|

Let's inspect one of the chunks

chunks[0]

AIMessageChunk(content='The', id='run-b36bea64-5511-4d7a-b6a3-a07b3db0c8e7')

We got back something called an AIMessageChunk. This chunk represents a part of an AIMessage.

Message chunks are additive by design -- one can simply add them up to get the state of the response so far!

chunks[0] + chunks[1] + chunks[2] + chunks[3] + chunks[4]

AIMessageChunk(content='The sky appears blue during', id='run-b36bea64-5511-4d7a-b6a3-a07b3db0c8e7')

Chains

Virtually all LLM applications involve more steps than just a call to a language model.

事实上，所有 LLM 申请都涉及更多步骤，而不仅仅是调用语言模型。

Let's build a simple chain using LangChain Expression Language (LCEL) that combines a prompt, model and a parser and verify that streaming works.

让我们使用 LangChain 表达式语言（LCEL）构建一个简单的链，它结合了提示、模型和解析器，并验证流式传输是否有效。

We will use StrOutputParser to parse the output from the model. This is a simple parser that extracts the content field from an AIMessageChunk, giving us the token returned by the model.

我们将使用 StrOutputParser 来解析模型的输出。这是一个简单的解析器，它从 AIMessageChunk 中提取内容字段，为我们提供模型返回的令牌。

TIP

LCEL is a declarative way to specify a "program" by chainining together different LangChain primitives. Chains created using LCEL benefit from an automatic implementation of stream and astream allowing streaming of the final output. In fact, chains created with LCEL implement the entire standard Runnable interface.

LCEL 是一种声明性方法，通过将不同的 LangChain 原语链接在一起来指定“程序”。使用 LCEL 创建的链受益于 stream 和 astream 的自动实现，允许流式传输最终输出。事实上，使用 LCEL 创建的链实现了整个标准 Runnable 接口。

from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_template("tell me a joke about {topic}")
parser = StrOutputParser()
chain = prompt | model | parser

async for chunk in chain.astream({"topic": "parrot"}):
    print(chunk, end="|", flush=True)

API Reference: StrOutputParser | ChatPromptTemplate

Here|'s| a| joke| about| a| par|rot|:|A man| goes| to| a| pet| shop| to| buy| a| par|rot|.| The| shop| owner| shows| him| two| stunning| pa|rr|ots| with| beautiful| pl|um|age|.|"|There|'s| a| talking| par|rot| an|d a| non|-|talking| par|rot|,"| the| owner| says|.| "|The| talking| par|rot| costs| $|100|,| an|d the| non|-|talking| par|rot| is| $|20|."|The| man| says|,| "|I|'ll| take| the| non|-|talking| par|rot| at| $|20|."|He| pays| an|d leaves| with| the| par|rot|.| As| he|'s| walking| down| the| street|,| the| par|rot| looks| up| at| him| an|d says|,| "|You| know|,| you| really| are| a| stupi|d man|!"|The| man| is| stun|ne|d an|d looks| at| the| par|rot| in| dis|bel|ief|.| The| par|rot| continues|,| "|Yes|

Note that we're getting streaming output even though we're using parser at the end of the chain above. The parser operates on each streaming chunk individidually. Many of the LCEL primitives also support this kind of transform-style passthrough streaming, which can be very convenient when constructing apps.

请注意，即使我们在上面链的末尾使用了解析器，我们也会获得流式输出。解析器对每个流式块单独进行操作。许多 LCEL 原语还支持这种转换式直通流式传输，这在构建应用程序时非常方便。

Custom functions can be designed to return generators, which are able to operate on streams.

可以设计自定义函数来返回能够在流上运行的生成器。

Certain runnables, like prompt templates and chat models, cannot process individual chunks and instead aggregate all previous steps. Such runnables can interrupt the streaming process.

某些可运行对象（例如提示模板和聊天模型）无法处理单个块，而是聚合所有先前的步骤。此类可运行程序可能会中断流处理。

NOTE

The LangChain Expression language allows you to separate the construction of a chain from the mode in which it is used (e.g., sync/async, batch/streaming etc.). If this is not relevant to what you're building, you can also rely on a standard imperative programming approach by caling invoke, batch or stream on each component individually, assigning the results to variables and then using them downstream as you see fit.

Working with Input Streams 使用输入流

What if you wanted to stream JSON from the output as it was being generated?

如果您想在生成 JSON 时从输出中流式传输该怎么办

If you were to rely on json.loads to parse the partial json, the parsing would fail as the partial json wouldn't be valid json.

如果您依赖 json.loads 来解析部分 json，则解析将失败，因为部分 json 不是有效的 json。

You'd likely be at a complete loss of what to do and claim that it wasn't possible to stream JSON.

您可能会完全不知所措，并声称无法传输 JSON。

Well, turns out there is a way to do it -- the parser needs to operate on the input stream, and attempt to "auto-complete" the partial json into a valid state.

好吧，事实证明有一种方法可以做到这一点——解析器需要对输入流进行操作，并尝试将部分 json“自动完成”为有效状态。

Let's see such a parser in action to understand what this means.

让我们看看这样一个解析器的运行情况，以了解这意味着什么。

from langchain_core.output_parsers import JsonOutputParser

chain = (
    model | JsonOutputParser()
)  # Due to a bug in older versions of Langchain, JsonOutputParser did not stream results from some models
async for text in chain.astream(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`"
):
    print(text, flush=True)

API Reference: JsonOutputParser

{}
{'countries': []}
{'countries': [{}]}
{'countries': [{'name': ''}]}
{'countries': [{'name': 'France'}]}
{'countries': [{'name': 'France', 'population': 67}]}
{'countries': [{'name': 'France', 'population': 67413}]}
{'countries': [{'name': 'France', 'population': 67413000}]}
{'countries': [{'name': 'France', 'population': 67413000}, {}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain'}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}, {}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}, {'name': ''}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}, {'name': 'Japan'}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}, {'name': 'Japan', 'population': 125}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}, {'name': 'Japan', 'population': 125584}]}
{'countries': [{'name': 'France', 'population': 67413000}, {'name': 'Spain', 'population': 47351567}, {'name': 'Japan', 'population': 125584000}]}

Now, let's break streaming. We'll use the previous example and append an extraction function at the end that extracts the country names from the finalized JSON.

现在，让我们中断流式传输。我们将使用前面的示例，并在末尾附加一个提取函数，用于从最终的 JSON 中提取国家/地区名称。

from langchain_core.output_parsers import (
    JsonOutputParser,
)

# A function that operates on finalized inputs
# rather than on an input_stream
def _extract_country_names(inputs):
    """A function that does not operates on input streams and breaks streaming."""
    if not isinstance(inputs, dict):
        return ""

    if "countries" not in inputs:
        return ""

    countries = inputs["countries"]

    if not isinstance(countries, list):
        return ""

    country_names = [
        country.get("name") for country in countries if isinstance(country, dict)
    ]
    return country_names

chain = model | JsonOutputParser() | _extract_country_names

async for text in chain.astream(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`"
):
    print(text, end="|", flush=True)

API Reference: JsonOutputParser

['France', 'Spain', 'Japan']|

Generator Functions

Le'ts fix the streaming using a generator function that can operate on the input stream.

让我们使用可以对输入流进行操作的生成器函数来修复流式传输。

from langchain_core.output_parsers import JsonOutputParser

async def _extract_country_names_streaming(input_stream):
    """A function that operates on input streams."""
    country_names_so_far = set()

    async for input in input_stream:
        if not isinstance(input, dict):
            continue

        if "countries" not in input:
            continue

        countries = input["countries"]

        if not isinstance(countries, list):
            continue

        for country in countries:
            name = country.get("name")
            if not name:
                continue
            if name not in country_names_so_far:
                yield name
                country_names_so_far.add(name)

chain = model | JsonOutputParser() | _extract_country_names_streaming

async for text in chain.astream(
    "output a list of the countries france, spain and japan and their populations in JSON format. "
    'Use a dict with an outer key of "countries" which contains a list of countries. '
    "Each country should have the key `name` and `population`",
):
    print(text, end="|", flush=True)

API Reference: JsonOutputParser

France|Spain|Japan|

NOTE

Because the code above is relying on JSON auto-completion, you may see partial names of countries (e.g., Sp and Spain), which is not what one would want for an extraction result!

We're focusing on streaming concepts, not necessarily the results of the chains.

Non-streaming components 非流式组件

Some built-in components like Retrievers do not offer any streaming. What happens if we try to stream them? 🤨

一些内置组件（例如 Retrievers）不提供任何流式传输。如果我们尝试流式传输它们会发生什么？

from langchain_community.vectorstores import FAISS
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnablePassthrough
from langchain_openai import OpenAIEmbeddings

template = """Answer the question based only on the following context:
{context}

Question: {question}
"""
prompt = ChatPromptTemplate.from_template(template)

vectorstore = FAISS.from_texts(
    ["harrison worked at kensho", "harrison likes spicy food"],
    embedding=OpenAIEmbeddings(),
)
retriever = vectorstore.as_retriever()

chunks = [chunk for chunk in retriever.stream("where did harrison work?")]
chunks

API Reference: FAISS | StrOutputParser | ChatPromptTemplate | RunnablePassthrough | OpenAIEmbeddings

[[Document(page_content='harrison worked at kensho'),
  Document(page_content='harrison likes spicy food')]]

Stream just yielded the final result from that component.

Stream 刚刚从该组件产生了最终结果。

This is OK 🥹! Not all components have to implement streaming -- in some cases streaming is either unnecessary, difficult or just doesn't make sense.

这样就可以了🥹！并非所有组件都必须实现流式传输——在某些情况下，流式传输要么是不必要的、困难的，要么就是没有意义

TIP

An LCEL chain constructed using non-streaming components, will still be able to stream in a lot of cases, with streaming of partial output starting after the last non-streaming step in the chain.

使用非流式组件构建的 LCEL 链在很多情况下仍然能够进行流式传输，部分输出的流式传输在链中最后一个非流式步骤之后开始。

retrieval_chain = (
    {
        "context": retriever.with_config(run_name="Docs"),
        "question": RunnablePassthrough(),
    }
    | prompt
    | model
    | StrOutputParser()
)

for chunk in retrieval_chain.stream(
    "Where did harrison work? " "Write 3 made up sentences about this place."
):
    print(chunk, end="|", flush=True)

Base|d on| the| given| context|,| Harrison| worke|d at| K|ens|ho|.|

Here| are| |3| |made| up| sentences| about| this| place|:|

1|.| K|ens|ho| was| a| cutting|-|edge| technology| company| known| for| its| innovative| solutions| in| artificial| intelligence| an|d data| analytics|.|

2|.| The| modern| office| space| at| K|ens|ho| feature|d open| floor| plans|,| collaborative| work|sp|aces|,| an|d a| vib|rant| atmosphere| that| fos|tere|d creativity| an|d team|work|.|

3|.| With| its| prime| location| in| the| heart| of| the| city|,| K|ens|ho| attracte|d top| talent| from| aroun|d the| worl|d,| creating| a| diverse| an|d dynamic| work| environment|.|

Now that we've seen how stream and astream work, let's venture into the world of streaming events. 🏞️

现在我们已经了解了 Stream 和 astream 的工作原理，让我们冒险进入流事件的世界