langchain (LCEL) v0.2文档(12)如何为runnable添加回调(中英对照)

149 阅读6分钟

How to add fallbacks to a runnable

When working with language models, you may often encounter issues from the underlying APIs, whether these be rate limiting or downtime. Therefore, as you go to move your LLM applications into production it becomes more and more important to safeguard against these. That's why we've introduced the concept of fallbacks.

使用语言模型时,您可能经常会遇到来自底层 API 的问题,无论是速率限制还是停机。因此,当您将 LLM 应用程序投入生产时,防范这些问题变得越来越重要。这就是我们引入回退概念的原因。

fallback is an alternative plan that may be used in an emergency.

后备计划是在紧急情况下可以使用的替代计划。

Crucially, fallbacks can be applied not only on the LLM level but on the whole runnable level. This is important because often times different models require different prompts. So if your call to OpenAI fails, you don't just want to send the same prompt to Anthropic - you probably want to use a different prompt template and send a different version there.

至关重要的是,回退不仅可以应用于 LLM 级别,还可以应用于整个可运行级别。这很重要,因为不同的模型通常需要不同的提示。因此,如果您对 OpenAI 的调用失败,您不会只想向 Anthropic 发送相同的提示 - 您可能希望使用不同的提示模板并在那里发送不同的版本。

Fallback for LLM API Errors

This is maybe the most common use case for fallbacks. A request to an LLM API can fail for a variety of reasons - the API could be down, you could have hit rate limits, any number of things. Therefore, using fallbacks can help protect against these types of things.

这可能是回退最常见的用例。对 LLM API 的请求可能会因多种原因而失败 - API 可能已关闭,您可能已达到速率限制,等等。因此,使用回退可以帮助防止这些类型的事情发生。

IMPORTANT: By default, a lot of the LLM wrappers catch errors and retry. You will most likely want to turn those off when working with fallbacks. Otherwise the first wrapper will keep on retrying and not failing.

重要提示:默认情况下,许多 LLM 包装器会捕获错误并重试。在使用后备时,您很可能希望将其关闭。否则第一个包装器将继续重试并且不会失败。

%pip install --upgrade --quiet  langchain langchain-openai
from langchain_anthropic import ChatAnthropic
from langchain_openai import ChatOpenAI

API Reference: ChatAnthropic | ChatOpenAI

First, let's mock out what happens if we hit a RateLimitError from OpenAI

from unittest.mock import patch

import httpx
from openai import RateLimitError

request = httpx.Request("GET", "/")
response = httpx.Response(200, request=request)
error = RateLimitError("rate limit", response=response, body="")
# Note that we set max_retries = 0 to avoid retrying on RateLimits, etc
openai_llm = ChatOpenAI(model="gpt-3.5-turbo-0125", max_retries=0)
anthropic_llm = ChatAnthropic(model="claude-3-haiku-20240307")
llm = openai_llm.with_fallbacks([anthropic_llm])
# Let's use just the OpenAI LLm first, to show that we run into an error
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(openai_llm.invoke("Why did the chicken cross the road?"))
    except RateLimitError:
        print("Hit error")
Hit error
# Now let's try with fallbacks to Anthropic
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(llm.invoke("Why did the chicken cross the road?"))
    except RateLimitError:
        print("Hit error")
content=' I don\'t actually know why the chicken crossed the road, but here are some possible humorous answers:\n\n- To get to the other side!\n\n- It was too chicken to just stand there. \n\n- It wanted a change of scenery.\n\n- It wanted to show the possum it could be done.\n\n- It was on its way to a poultry farmers\' convention.\n\nThe joke plays on the double meaning of "the other side" - literally crossing the road to the other side, or the "other side" meaning the afterlife. So it\'s an anti-joke, with a silly or unexpected pun as the answer.' additional_kwargs={} example=False

We can use our "LLM with Fallbacks" as we would a normal LLM.

from langchain_core.prompts import ChatPromptTemplate

prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a nice assistant who always includes a compliment in your response",
        ),
        ("human", "Why did the {animal} cross the road"),
    ]
)
chain = prompt | llm
with patch("openai.resources.chat.completions.Completions.create", side_effect=error):
    try:
        print(chain.invoke({"animal": "kangaroo"}))
    except RateLimitError:
        print("Hit error")

API Reference: ChatPromptTemplate

content=" I don't actually know why the kangaroo crossed the road, but I can take a guess! Here are some possible reasons:\n\n- To get to the other side (the classic joke answer!)\n\n- It was trying to find some food or water \n\n- It was trying to find a mate during mating season\n\n- It was fleeing from a predator or perceived threat\n\n- It was disoriented and crossed accidentally \n\n- It was following a herd of other kangaroos who were crossing\n\n- It wanted a change of scenery or environment \n\n- It was trying to reach a new habitat or territory\n\nThe real reason is unknown without more context, but hopefully one of those potential explanations does the joke justice! Let me know if you have any other animal jokes I can try to decipher." additional_kwargs={} example=False

Fallback for Sequences

We can also create fallbacks for sequences, that are sequences themselves. Here we do that with two different models: ChatOpenAI and then normal OpenAI (which does not use a chat model). Because OpenAI is NOT a chat model, you likely want a different prompt.

我们还可以为序列创建回退,即序列本身。在这里,我们使用两个不同的模型来实现这一点:ChatOpenAI 和普通 OpenAI(不使用聊天模型)。由于 OpenAI 不是聊天模型,因此您可能需要不同的提示。

# First let's create a chain with a ChatModel
# We add in a string output parser here so the outputs between the two are the same type
from langchain_core.output_parsers import StrOutputParser

chat_prompt = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You're a nice assistant who always includes a compliment in your response",
        ),
        ("human", "Why did the {animal} cross the road"),
    ]
)
# Here we're going to use a bad model name to easily create a chain that will error
chat_model = ChatOpenAI(model="gpt-fake")
bad_chain = chat_prompt | chat_model | StrOutputParser()

API Reference: StrOutputParser

# Now lets create a chain with the normal OpenAI model
from langchain_core.prompts import PromptTemplate
from langchain_openai import OpenAI

prompt_template = """Instructions: You should always include a compliment in your response.

Question: Why did the {animal} cross the road?"""
prompt = PromptTemplate.from_template(prompt_template)
llm = OpenAI()
good_chain = prompt | llm

API Reference: PromptTemplate | OpenAI

# We can now create a final chain which combines the two
chain = bad_chain.with_fallbacks([good_chain])
chain.invoke({"animal": "turtle"})
'\n\nAnswer: The turtle crossed the road to get to the other side, and I have to say he had some impressive determination.'

Fallback for Long Inputs

One of the big limiting factors of LLMs is their context window. Usually, you can count and track the length of prompts before sending them to an LLM, but in situations where that is hard/complicated, you can fallback to a model with a longer context length.

LLM 的一大限制因素是上下文窗口。通常,你可以在将提示发送给 LLM 之前计算并跟踪提示的长度,但在有些情况下,这很难做到

short_llm = ChatOpenAI()
long_llm = ChatOpenAI(model="gpt-3.5-turbo-16k")
llm = short_llm.with_fallbacks([long_llm])
inputs = "What is the next number: " + ", ".join(["one", "two"] * 3000)
try:
    print(short_llm.invoke(inputs))
except Exception as e:
    print(e)
This model's maximum context length is 4097 tokens. However, your messages resulted in 12012 tokens. Please reduce the length of the messages.
try:
    print(llm.invoke(inputs))
except Exception as e:
    print(e)
content='The next number in the sequence is two.' additional_kwargs={} example=False

Fallback to Better Model

Often times we ask models to output format in a specific format (like JSON). Models like GPT-3.5 can do this okay, but sometimes struggle. This naturally points to fallbacks - we can try with GPT-3.5 (faster, cheaper), but then if parsing fails we can use GPT-4.

我们经常要求模型以特定格式(如 JSON)输出格式。像 GPT-3.5 这样的模型可以做到这一点,但有时也会遇到困难。这自然指向了后备方案 - 我们可以尝试使用 GPT-3.5(更快、更便宜),但如果解析失败,我们可以使用 GPT-4。

from langchain.output_parsers import DatetimeOutputParser

API Reference: DatetimeOutputParser

prompt = ChatPromptTemplate.from_template(
    "what time was {event} (in %Y-%m-%dT%H:%M:%S.%fZ format - only return this value)"
)
# In this case we are going to do the fallbacks on the LLM + output parser level
# Because the error will get raised in the OutputParser
openai_35 = ChatOpenAI() | DatetimeOutputParser()
openai_4 = ChatOpenAI(model="gpt-4") | DatetimeOutputParser()
only_35 = prompt | openai_35
fallback_4 = prompt | openai_35.with_fallbacks([openai_4])
try:
    print(only_35.invoke({"event": "the superbowl in 1994"}))
except Exception as e:
    print(f"Error: {e}")
Error: Could not parse datetime string: The Super Bowl in 1994 took place on January 30th at 3:30 PM local time. Converting this to the specified format (%Y-%m-%dT%H:%M:%S.%fZ) results in: 1994-01-30T15:30:00.000Z
try:
    print(fallback_4.invoke({"event": "the superbowl in 1994"}))
except Exception as e:
    print(f"Error: {e}")

1994-01-30 15:30:00