langchain v0.2文档(6)如何运行自定义函数(中英对照)

190 阅读7分钟

This guide assumes familiarity with the following concepts:

You can use arbitrary functions as Runnables. This is useful for formatting or when you need functionality not provided by other LangChain components, and custom functions used as Runnables are called RunnableLambdas.

您可以使用任意函数作为 Runnable。这对于格式化或当您需要其他 LangChain 组件未提供的功能时非常有用,并且用作 Runnable 的自定义函数称为 RunnableLambda。

Note that all inputs to these functions need to be a SINGLE argument. If you have a function that accepts multiple arguments, you should write a wrapper that accepts a single dict input and unpacks it into multiple argument.

请注意,这些函数的所有输入都必须是单个参数。如果您有一个接受多个参数的函数,则应该编写一个包装器来接受单个字典输入并将其解包为多个参数。

This guide will cover:

  • How to explicitly create a runnable from a custom function using the RunnableLambda constructor and the convenience @chain decorator(如何使用 RunnableLambda 构造函数和便捷的 @chain 装饰器从自定义函数显式创建可运行对象)
  • Coercion of custom functions into runnables when used in chains (在链中使用时将自定义函数强制转换为可运行对象)
  • How to accept and use run metadata in your custom function (如何在自定义函数中接受和使用运行元数据)
  • How to stream with custom functions by having them return generators (如何通过让自定义函数返回生成器来进行流式传输)

Using the constructor

Below, we explicitly wrap our custom logic using the RunnableLambda constructor: (下面,我们使用 RunnableLambda 构造函数显式包装我们的自定义逻辑:)

%pip install -qU langchain langchain_openai

import os
from getpass import getpass

os.environ["OPENAI_API_KEY"] = getpass()
from operator import itemgetter

from langchain_core.prompts import ChatPromptTemplate
from langchain_core.runnables import RunnableLambda
from langchain_openai import ChatOpenAI

def length_function(text):
    return len(text)

def _multiple_length_function(text1, text2):
    return len(text1) * len(text2)

def multiple_length_function(_dict):
    return _multiple_length_function(_dict["text1"], _dict["text2"])

model = ChatOpenAI()

prompt = ChatPromptTemplate.from_template("what is {a} + {b}")

chain1 = prompt | model

chain = (
    {
        "a": itemgetter("foo") | RunnableLambda(length_function),
        "b": {"text1": itemgetter("foo"), "text2": itemgetter("bar")}
        | RunnableLambda(multiple_length_function),
    }
    | prompt
    | model
)

chain.invoke({"foo": "bar", "bar": "gah"})

API Reference: ChatPromptTemplate | RunnableLambda | ChatOpenAI

AIMessage(content='3 + 9 equals 12.', response_metadata={'token_usage': {'completion_tokens': 8, 'prompt_tokens': 14, 'total_tokens': 22}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': 'fp_c2295e73ad', 'finish_reason': 'stop', 'logprobs': None}, id='run-73728de3-e483-49e3-ad54-51bd9570e71a-0')

我自定义了一个函数测一下

def say_good_morning(_dict):
    print("========= 日志 ===========")
    return "Good morning!" + _dict

def llm_say_good_morning(text):
    prompt = ChatPromptTemplate.from_template("早上6点,我们应该对 {text} 说什么")

    chain = (
            {
                "text": itemgetter("name") | RunnableLambda(say_good_morning)
            }
            | prompt
            | model
    )

    chain.invoke({"name": text})

The convenience @chain decorator

You can also turn an arbitrary function into a chain by adding a @chain decorator. This is functionaly equivalent to wrapping the function in a RunnableLambda constructor as shown above. Here's an example:

您还可以通过添加 @chain 装饰器将任意函数转换为链。这在功能上相当于将函数包装在 RunnableLambda 构造函数中,如上所示。这是一个例子:

from langchain_core.output_parsers import StrOutputParser
from langchain_core.runnables import chain

prompt1 = ChatPromptTemplate.from_template("Tell me a joke about {topic}")
prompt2 = ChatPromptTemplate.from_template("What is the subject of this joke: {joke}")

@chain
def custom_chain(text):
    prompt_val1 = prompt1.invoke({"topic": text})
    output1 = ChatOpenAI().invoke(prompt_val1)
    parsed_output1 = StrOutputParser().invoke(output1)
    chain2 = prompt2 | ChatOpenAI() | StrOutputParser()
    return chain2.invoke({"joke": parsed_output1})

custom_chain.invoke("bears")

API Reference: StrOutputParser | chain

'The subject of the joke is the bear and his girlfriend.'

Above, the @chain decorator is used to convert custom_chain into a runnable, which we invoke with the .invoke() method.上面,@chain 装饰器用于将 custom_chain 转换为可运行的,我们使用 .invoke() 方法调用它。

If you are using a tracing with LangSmith, you should see a custom_chain trace in there, with the calls to OpenAI nested underneath.如果您使用 LangSmith 进行跟踪,您应该会在其中看到一个 custom_chain 跟踪,并且对 OpenAI 的调用嵌套在下面。

Automatic coercion in chains 链中的自动强制

When using custom functions in chains with the pipe operator (|), you can omit the RunnableLambda or @chain constructor and rely on coercion. Here's a simple example with a function that takes the output from the model and returns the first five letters of it:

当在链中使用带有管道运算符 (|) 的自定义函数时,您可以省略 RunnableLambda 或 @chain 构造函数并依赖强制转换。下面是一个简单的示例,其中有一个函数,它获取模型的输出并返回其前五个字母:

prompt = ChatPromptTemplate.from_template("tell me a story about {topic}")

model = ChatOpenAI()

chain_with_coerced_function = prompt | model | (lambda x: x.content[:5])

chain_with_coerced_function.invoke({"topic": "bears"})
'Once '

Note that we didn't need to wrap the custom function (lambda x: x.content[:5]) in a RunnableLambda constructor because the model on the left of the pipe operator is already a Runnable. The custom function is coerced into a runnable. See this section for more information.

请注意,我们不需要将自定义函数 (lambda x: x.content[:5]) 包装在 RunnableLambda 构造函数中,因为管道运算符左侧的模型已经是 Runnable。自定义函数被强制转换为 Runnable。有关更多信息,请参阅本节。

Passing run metadata 传递运行元数据

Runnable lambda 可以选择接受 RunnableConfig 参数,它们可以使用该参数将回调、标签和其他配置信息传递给嵌套运行。

import json

from langchain_core.runnables import RunnableConfig

def parse_or_fix(text: str, config: RunnableConfig):
    fixing_chain = (
        ChatPromptTemplate.from_template(
            "Fix the following text:\n\n```text\n{input}\n```\nError: {error}"
            " Don't narrate, just respond with the fixed data."
        )
        | model
        | StrOutputParser()
    )
    for _ in range(3):
        try:
            return json.loads(text)
        except Exception as e:
            text = fixing_chain.invoke({"input": text, "error": e}, config)
    return "Failed to parse"

from langchain_community.callbacks import get_openai_callback

with get_openai_callback() as cb:
    output = RunnableLambda(parse_or_fix).invoke(
        "{foo: bar}", {"tags": ["my-tag"], "callbacks": [cb]}
    )
    print(output)
    print(cb)

API Reference: RunnableConfig

{'foo': 'bar'}
Tokens Used: 62
    Prompt Tokens: 56
    Completion Tokens: 6
Successful Requests: 1
Total Cost (USD): $9.6e-05
from langchain_community.callbacks import get_openai_callback

with get_openai_callback() as cb:
    output = RunnableLambda(parse_or_fix).invoke(
        "{foo: bar}", {"tags": ["my-tag"], "callbacks": [cb]}
    )
    print(output)
    print(cb)

API Reference: get_openai_callback

{'foo': 'bar'}Tokens Used: 62 Prompt Tokens: 56 Completion Tokens: 6Successful Requests: 1Total Cost (USD): $9.6e-05

Streaming

RunnableLambda is best suited for code that does not need to support streaming. If you need to support streaming (i.e., be able to operate on chunks of inputs and yield chunks of outputs), use RunnableGenerator instead as in the example below.RunnableLambda 最适合不需要支持流的代码。如果您需要支持流(即能够对输入块进行操作并生成输出块),请使用 RunnableGenerator,如下例所示。

You can use generator functions (ie. functions that use the yield keyword, and behave like iterators) in a chain.您可以在链中使用生成器函数(即使用 yield 关键字且行为类似于迭代器的函数)。

The signature of these generators should be Iterator[Input] -> Iterator[Output]. Or for async generators: AsyncIterator[Input] -> AsyncIterator[Output].这些生成器的签名应该是 Iterator[Input] -> Iterator[Output]。或者对于异步生成器:AsyncIterator[Input] -> AsyncIterator[Output]。

These are useful for:

  • implementing a custom output parser 实现自定义输出解析器
  • modifying the output of a previous step, while preserving streaming capabilities 修改上一步的输出,同时保留流式传输功能

Here's an example of a custom output parser for comma-separated lists. First, we create a chain that generates such a list as text: 以下是逗号分隔列表的自定义输出解析器的示例。首先,我们创建一个以文本形式生成此类列表的链:

from typing import Iterator, List

prompt = ChatPromptTemplate.from_template(
    "Write a comma-separated list of 5 animals similar to: {animal}. Do not include numbers"
)

str_chain = prompt | model | StrOutputParser()

for chunk in str_chain.stream({"animal": "bear"}):
    print(chunk, end="", flush=True)
lion, tiger, wolf, gorilla, panda

Next, we define a custom function that will aggregate the currently streamed output and yield it when the model generates the next comma in the list: 接下来,我们定义一个自定义函数,它将聚合当前流式传输的输出,并在模型生成列表中的下一个逗号时产生它:

# This is a custom parser that splits an iterator of llm tokens
# into a list of strings separated by commas
def split_into_list(input: Iterator[str]) -> Iterator[List[str]]:
    # hold partial input until we get a comma
    buffer = ""
    for chunk in input:
        # add current chunk to buffer
        buffer += chunk
        # while there are commas in the buffer
        while "," in buffer:
            # split buffer on comma
            comma_index = buffer.index(",")
            # yield everything before the comma
            yield [buffer[:comma_index].strip()]
            # save the rest for the next iteration
            buffer = buffer[comma_index + 1 :]
    # yield the last chunk
    yield [buffer.strip()]

list_chain = str_chain | split_into_list

for chunk in list_chain.stream({"animal": "bear"}):
    print(chunk, flush=True)
['lion']
['tiger']
['wolf']
['gorilla']
['raccoon']

Invoking it gives a full array of values: 调用它会给出一个完整的值数组:

list_chain.invoke({"animal": "bear"})
['lion', 'tiger', 'wolf', 'gorilla', 'raccoon']

Async version

If you are working in an async environment, here is an async version of the above example: 如果您在异步环境中工作,以下是上述示例的异步版本:

from typing import AsyncIterator

async def asplit_into_list(
    input: AsyncIterator[str],
) -> AsyncIterator[List[str]]:  # async def
    buffer = ""
    async for (
        chunk
    ) in input:  # `input` is a `async_generator` object, so use `async for`
        buffer += chunk
        while "," in buffer:
            comma_index = buffer.index(",")
            yield [buffer[:comma_index].strip()]
            buffer = buffer[comma_index + 1 :]
    yield [buffer.strip()]

list_chain = str_chain | asplit_into_list

async for chunk in list_chain.astream({"animal": "bear"}):
    print(chunk, flush=True)
['lion']
['tiger']
['wolf']
['gorilla']
['panda']
await list_chain.ainvoke({"animal": "bear"})

['lion', 'tiger', 'wolf', 'gorilla', 'panda']