为基于LLM应用(Agent)而生的LangChain框架

42 阅读15分钟

现代应用 - Agentic Application

基于LLM开发的应用现在大家叫做Agent,或者严格点说叫Agentic Application。LLM是大脑,有主观决策的能力,给我们的应用配置一个大脑,变得有主观决策的能力,通过与用户进行自然语言对话,它能做事情,成为了现代所谓的Agent

LangChain解决的问题

一个基于LLM的应用,人工写代码需要与LLM进行多次提示词交互,并且还需要对LLM的输出结果,进行解析。上下文的处理,需要编写大量的粘合代码。

为了简化这些重复的工作,并且形成良好的代码开发。LangChain顺势而出,方便快速构建一个LLM的应用,且为基于LLM开发的生态达成共识。

应用与模型的对话:提示词与回复解析

image.png

提示词

Langchain将提示词字符串模版变成编程领域中可复用的提示词模版对象。

原始的方式

手写代码的方式,定义提示词模版之后,用变量进行填充

# 变量
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse,\
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

# 变量
style = """American English \
in a calm and respectful tone"""

# 提示词模版结合变量组装成提示词
prompt = f"""Translate the text \
that is delimited by triple backticks \
into a style that is {style}.
text: ```{customer_email}```
"""

此时发送给LLM最终的提示词为

Translate the text that is delimited by triple backticks into a style that is
American English in a calm and respectful tone . 
text: ``` Arrr, I be fuming that me blender lid flew off and splattered me kitchen walls with smoothie!
And to make matters worse,the warranty don't cover the cost of cleaning up me kitchen. 
I need yer help right now, matey! ```

Langchain方式

将提示词模版变成对象

这里的提示词字符串模版,关于要定义的变量使用{}进行包裹占位,而不是想之前使用f-string(格式化字符串字面值)

# 提示词模版
template_string = """Translate the text \
that is delimited by triple backticks \
into a style that is {style}. \
text: ```{text}```
"""

# 将提示词模版变成对象

from langchain.prompts import ChatPromptTemplate

prompt_template = ChatPromptTemplate.from_template(template_string)

此时的提示词模版,变成了一个可复用的带对象,可以看到变量此时变成了这个提示词模版的输入变量,非常符合编程的习惯,开发体验良好


prompt_template.messages[0].prompt

"""输出
PromptTemplate(
    input_variables=['style', 'text'], 
    output_parser=None, 
    partial_variables={}, 
    template='Translate the text that is delimited by triple backticks into a style that is {style}. text: ```{text}```\n', 
    template_format='f-string', 
    validate_template=True
)
"""

组装变量,形成一个完整的提示词给LLM

# 变量
customer_style = """American English \
in a calm and respectful tone
"""
# 变量
customer_email = """
Arrr, I be fuming that me blender lid \
flew off and splattered me kitchen walls \
with smoothie! And to make matters worse, \
the warranty don't cover the cost of \
cleaning up me kitchen. I need yer help \
right now, matey!
"""

# 组装成最终的提示词
customer_messages = prompt_template.format_messages(
                    style=customer_style,
                    text=customer_email)
                  
# 发送
from langchain.chat_models import ChatOpenAI
chat = ChatOpenAI()
customer_response = chat(customer_messages)

发送的提示词

print(type(customer_messages[0]))
# <class 'langchain.schema.HumanMessage'>

print(customer_messages[0])
"""输出
content = """Translate the text that is delimited by triple backticks into a style that is American English in a calm and respectful tone. Text: ```
I am very upset that my blender lid flew off and splattered my kitchen walls with smoothie. To make matters worse, the warranty does not cover the cost of cleaning up my kitchen. I would appreciate your help with this matter.
```"""
"""

小结

将字符串的拼接形成最终提示词,变成使用LangChain的对象方式。抽象出来提示词模版对象。

image.png


结果解析

当构建一个复杂的LLM应用的时候,我们都会指示LLM以某种特定的格式进行输出。

ReAct:Reasoning(Thought),Action,Observation),连锁思维推理(一个循环)。补充一点工具的调用就是在这个循环中使用的,让LLM自主决策选用调用哪一个工具。然后执行工作调用,将工作调用的结果整合进上下文,进行下一轮的推理循环。直到完成任务,退出循环。

LLM回复的结果是一段字符串,而我们需要进行解析,把字符串变成编程的对象的属性,方便我们对结果进行编程处理。让LLM回复我们JSON字符串,在python中我们把它变成字典,在Java中变成POJO的对象属性

{
  "gift": False,
  "delivery_days": 5,
  "price_value": "pretty affordable!"
}

LLM回复的字符串

使用上面的LangChain提示词方式构建提示词,发送给LLM

# 变量
customer_review = """  
This leaf blower is pretty amazing. It has four settings:  
candle blower, gentle breeze, windy city, and tornado.  
It arrived in two days, just in time for my wife's  
anniversary present.  
I think my wife liked it so much she was speechless.  
So far I've been the only one using it, and I've been  
using it every other morning to clear the leaves on our lawn.  
It's slightly more expensive than the other leaf blowers  
out there, but I think it's worth it for the extra features.  
"""

# 字符串模版
review_template = """  
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else?  
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product  
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,  
and output them as a comma separated Python list.

Format the output as JSON with the following keys:  
gift  
delivery_days  
price_value

text: {text}  
"""

prompt_template = ChatPromptTemplate.from_template(review_template)

messages = prompt_template.format_messages(text=customer_review)
chat = ChatOpenAI(temperature=0.0, model=llm_model)
response = chat(messages)
print(response.content)

结果是一个json形式的字符串, response.content是字符串,直接获取某个属性response.content.get('gift')会进行报错❌️

{  
"gift": true,  
"delivery_days": 2,  
"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]  
}

指定格式的字符串变成编程语言的类型

这里指定格式的字符串一般使用json格式,由LLM返回,通过Langchain变成python变成语言下的字典形式

定义响应的Schema,构建输出解析器,解析器能够得到格式化的指令,用于添加在提示词中,指导LLM生成指定的格式

from langchain.output_parsers import ResponseSchema
from langchain.output_parsers import StructuredOutputParser

# 定义响应字段的schema
gift_schema = ResponseSchema(name="gift",
                             description="Was the item purchased\
                             as a gift for someone else? \
                             Answer True if yes,\
                             False if not or unknown.")
delivery_days_schema = ResponseSchema(name="delivery_days",
                                      description="How many days\
                                      did it take for the product\
                                      to arrive? If this \
                                      information is not found,\
                                      output -1.")
price_value_schema = ResponseSchema(name="price_value",
                                    description="Extract any\
                                    sentences about the value or \
                                    price, and output them as a \
                                    comma separated Python list.")
                                    
# 定义输出解释器
response_schemas = [gift_schema, 
                    delivery_days_schema,
                    price_value_schema]
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

解析器可以得到格式化指令,添加在提示词中。

format_instructions = output_parser.get_format_instructions()
print(format_instructions)

"""生成的格式化指令
The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
{
	"gift": "string",  // Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
	"delivery_days": "string",  // How many days did it take for the product to arrive? If this information is not found, output -1.
	"price_value": "string"  // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
"""

最终传给LLM的完整提示词

# 提示词模版
review_template_2 = """\
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: {text}

{format_instructions}
"""

# 提示词模版对象
prompt = ChatPromptTemplate.from_template(template=review_template_2)
# 传给LLM的完整提示词
messages = prompt.format_messages(text=customer_review, 
                                format_instructions=format_instructions)

print(messages[0].content)

"""完整提示词
For the following text, extract the following information:

gift: Was the item purchased as a gift for someone else? \
Answer True if yes, False if not or unknown.

delivery_days: How many days did it take for the product\
to arrive? If this information is not found, output -1.

price_value: Extract any sentences about the value or price,\
and output them as a comma separated Python list.

text: This leaf blower is pretty amazing. It has four settings:candle blower, gentle breeze, windy city, and tornado. It arrived in two days, just in time for my wife's anniversary present. I think my wife liked it so much she was speechless. So far I've been the only one using it, and I've been using it every other morning to clear the leaves on our lawn. It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features.

The output should be a markdown code snippet formatted in the following schema, including the leading and trailing "\`\`\`json" and "\`\`\`":
{
	"gift": "string",  // Was the item purchased as a gift for someone else? Answer True if yes, False if not or unknown.
	"delivery_days": "string",  // How many days did it take for the product to arrive? If this information is not found, output -1.
	"price_value": "string"  // Extract any sentences about the value or price, and output them as a comma separated Python list.
}
"""

LLM回复的仍然是字符串,但是是按照我们提示词指定的指令格式返回的字符串。有了这种既定的格式,那么就可以使用解析器进行处理了。

response = chat(messages)
print(response.content)

"""LLM返回的内容
```json
{
	"gift": true,
	"delivery_days": 2,
	"price_value": ["It's slightly more expensive than the other leaf blowers out there, but I think it's worth it for the extra features."]
}
\```
"""

调用解析器进行解析变成python的字典

output_dict = output_parser.parse(response.content)
print(type(output_dict))  # dict

print(output_dict.get['delivery_days']) # 2

小结

提示词模版通常与输出解析器一起结合使用

image.png

Agent的记忆

大语言模型是“无状态的”

  1. LLM视角:Agent与LLM每次交互都是独立的。每一次API调用或推理请求都是一次独立计算,不保留任何上一次调用的内部状态
  2. Agent视角:在一次对话过程中Agent看起来有记忆,是通过将对话历史记录作为LLM的Context(上下文)来提供的。

记忆(Memory):这正是Agent的核心职责之一。Agent负责维护一个消息列表(即对话历史),每次请求时将该列表(全部或经过截断/摘要)塞入新的LLM请求的上下文中。

ConversationBufferMemory

给智能体加上记忆,换句话说就是增加上下文给LLM

from langchain.chat_models import ChatOpenAI
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory

llm = ChatOpenAI(temperature=0.0, model=llm_model)
# 创建记忆容器
memory = ConversationBufferMemory()
conversation = ConversationChain(
    llm=llm, 
    memory = memory, # 🚀 这里指定
    verbose=True  # 🚀 程序运行输出发给LLM的上下文
)

agent与llm的交互过程,可以看到对话的内容不断的新增到上下文中


# ---------------------☝第一次-----------------------------------
# 与llm交互
conversation.predict(input="Hi, my name is Pkmer")
# LLM的回复
"Hello Pkmer! It's nice to meet you. How can I assist you today?"

"""日志输出(☝第一次)
> Entering new ConversationChain chain...  
> Prompt after formatting:  
> The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

Human: Hi, my name is Pkmer  
AI:

> Finished chain.
"""
# ---------------------🥈第二次---------------------------------------
conversation.predict(input="What is 1+1?")
# LLM回复
1 + 1 equals 2. Is there anything else you would like to know?

"""日志输出(🥈第二次)
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:  
Human: Hi, my name is Pkmer  
AI: Hello Pkmer! It's nice to meet you. How can I assist you today?  
Human: What is 1+1?  
AI:
"""

# -----------------------🥉第三次-------------------------------------

conversation.predict(input="What is my name?")
# LLM回复
Your name is Pkmer. Is there anything else you would like to know or discuss?

"""日志输出(🥉第三次)
The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:
Human: Hi, my name is Pkmer
AI: Hello Pkmer! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1 + 1 equals 2. Is there anything else you would like to know?
Human: What is my name?
AI:
"""

查看当前的记忆

# 查看当前的对话内容
print(memory.buffer)

Human: Hi, my name is Pkmer
AI: Hello Pkmer! It's nice to meet you. How can I assist you today?
Human: What is 1+1?
AI: 1 + 1 equals 2. Is there anything else you would like to know?
Human: What is my name?
AI: Your name is Pkmer. Is there anything else you would like to know or discuss?
# 变成一个字典
memory.load_memory_variables({})

{'history': "Human: Hi, my name is Pkmer\nAI: Hello Pkmer! It's nice to meet you. How can I assist you today?\nHuman: What is 1+1?\nAI: 1 + 1 equals 2. Is there anything else you would like to know?\nHuman: What is my name?\nAI: Your name is Pkmer. Is there anything else you would like to know or discuss?"}

内置的记忆类型

大语言模型的上下文窗口(Context Window)是有限的.当对话历史累积超过这个限制时,程序就会报错。 所以langchain还内置了其他几种实现

记忆类型描述
ConversationBufferMemory存储所有消息,并将消息提取到一个变量中。
ConversationBufferWindowMemory记录对话中的交互列表,只使用最近 K 次交互。
ConversationTokenBufferMemory在内存中保留最近的交互缓冲区,使用 token 长度(而非交互次数)来决定何时清除交互。
ConversationSummaryMemory随着时间推移,生成对话的摘要。

额外的记忆类型

  • Vector Data Memory.(向量数据)
  • ConversationEntityMemory 实体记忆

Chains链

与LLM交互的一种编排方式

image.png

基础LLMChain

构建链的基本单元,是构建复制链的基础

prompt = ChatPromptTemplate.from_template(
    "What is the best name to describe \
    a company that makes {product}?"
)
chain = LLMChain(llm=llm, prompt=prompt)
product = "Queen Size Sheet Set"
chain.run(product)
# LLM输出:Regal Linens Co.

顺序链(Sequential Chains)

顺序链是另一种类型的链(Chain)。其核心思想是将多个链组合起来,使得前一个链的输出成为下一个链的输入

顺序链有两种类型:

  1. SimpleSequentialChain:单个输入 / 单个输出
  2. SequentialChain:多个输入 / 多个输出

SimpleSequentialChain

只适合单个输入单个输出的形式,因为提示词中的变量只有一个

image.png

from langchain.chains import SimpleSequentialChain

# prompt template 1
first_prompt = ChatPromptTemplate.from_template(  
"What is the best name to describe  
a company that makes {product}?"  
)

# Chain 1
chain_one = LLMChain(llm=llm, prompt=first_prompt)
#-----------------------------------------------------------------

# prompt template 2
second_prompt = ChatPromptTemplate.from_template(  
"Write a 20 words description for the following  
company:{company_name}"  
)

# chain 2
chain_two = LLMChain(llm=llm, prompt=second_prompt)

# 进行组合
overall_simple_chain = SimpleSequentialChain(chains=[chain_one, chain_two],  
verbose=True  
)  
overall_simple_chain.run(product)

输出

# 中间过程
Regal Linens Co.  
Regal Linens Co. offers luxurious and high-quality linens for bedrooms and bathrooms, providing elegance and comfort for your home.

# 程序中最终得到的结果
Regal Linens Co. offers luxurious and high-quality linens for bedrooms and bathrooms, providing elegance and comfort for your home.

SequentialChain

对于SequentialChain,允许多输入和多输出,但是注意:下一个输入要和上一个输出的提示词模版中的变量名称一致

image.png

以下面的图片构建这种链的输入输出

image.png

from langchain.chains import SequentialChain

# prompt template 1: translate to english
first_prompt = ChatPromptTemplate.from_template(  
"Translate the following review to english:"  
"\n\n{Review}"  
)

# chain 1: input= Review and output= English_Review
chain_one = LLMChain(llm=llm, prompt=first_prompt,  
output_key="English_Review"  
)

second_prompt = ChatPromptTemplate.from_template(  
"Can you summarize the following review in 1 sentence:"  
"\n\n{English_Review}"  
)

# chain 2: input= English_Review and output= summary
chain_two = LLMChain(llm=llm, prompt=second_prompt,  
output_key="summary"  
)

# prompt template 3: translate to english
third_prompt = ChatPromptTemplate.from_template(  
"What language is the following review:\n\n{Review}"  
)

# chain 3: input= Review and output= language
chain_three = LLMChain(llm=llm, prompt=third_prompt,  
output_key="language"  
)

# prompt template 4: follow up message
fourth_prompt = ChatPromptTemplate.from_template(  
"Write a follow up response to the following "  
"summary in the specified language:"  
"\n\nSummary: {summary}\n\nLanguage: {language}"  
)

# chain 4: input= summary, language and output= followup_message
chain_four = LLMChain(llm=llm, prompt=fourth_prompt,  
output_key="followup_message"  
)

# overall_chain: input= Review
# and output= English_Review,summary, followup_message
overall_chain = SequentialChain(  
chains=[chain_one, chain_two, chain_three, chain_four],  
input_variables=["Review"],  
output_variables=["English_Review","language", "summary","followup_message"],  
verbose=False  
)

路由链Router Chain

image.png

在多个不同的提示词模版之间进行路由

image.png

定义字符串提示词模版

# 定义字符串提示词模版
physics_template = """You are a very smart physics professor. \
You are great at answering questions about physics in a concise\
and easy to understand manner. \
When you don't know the answer to a question you admit\
that you don't know.

Here is a question:
{input}"""


math_template = """You are a very good mathematician. \
You are great at answering math questions. \
You are so good because you are able to break down \
hard problems into their component parts, 
answer the component parts, and then put them together\
to answer the broader question.

Here is a question:
{input}"""

history_template = """You are a very good historian. \
You have an excellent knowledge of and understanding of people,\
events and contexts from a range of historical periods. \
You have the ability to think, reflect, debate, discuss and \
evaluate the past. You have a respect for historical evidence\
and the ability to make use of it to support your explanations \
and judgements.

Here is a question:
{input}"""


computerscience_template = """ You are a successful computer scientist.\
You have a passion for creativity, collaboration,\
forward-thinking, confidence, strong problem-solving capabilities,\
understanding of theories and algorithms, and excellent communication \
skills. You are great at answering coding questions. \
You are so good because you know how to solve a problem by \
describing the solution in imperative steps \
that a machine can easily interpret and you know how to \
choose a solution that has a good balance between \
time complexity and space complexity. 

Here is a question:
{input}"""

组装数据,方便后面for循环的统一处理

prompt_infos = [
    {
        "name": "physics", 
        "description": "Good for answering questions about physics", 
        "prompt_template": physics_template
    },
    {
        "name": "math", 
        "description": "Good for answering math questions", 
        "prompt_template": math_template
    },
    {
        "name": "History", 
        "description": "Good for answering history questions", 
        "prompt_template": history_template
    },
    {
        "name": "computer science", 
        "description": "Good for answering computer science questions", 
        "prompt_template": computerscience_template
    }
]

重点:组装字段,相当于一种策略模式

# 组装字典
destination_chains = {}
for p_info in prompt_infos:
    name = p_info["name"]
    prompt_template = p_info["prompt_template"]
    prompt = ChatPromptTemplate.from_template(template=prompt_template)
    chain = LLMChain(llm=llm, prompt=prompt)
    destination_chains[name] = chain  

# 这里将来给路由LLMChain的提示词用
destinations = [f"{p['name']}: {p['description']}" for p in prompt_infos]
destinations_str = "\n".join(destinations)

路由LLM 路由LLM的模版提示词,LangChain自带from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE

提示词大概下面图片所示,用format的方式,将数据进行替换,使用上面我们组装好的destinations_str

提示的意思大概就是让LLM自己决定,用户输入的到底是我们指定分类下的哪一种?若果无法判断就返回默认DEFAULT

from langchain.chains.router.llm_router import LLMRouterChain,RouterOutputParser
from langchain.chains.router.multi_prompt_prompt import MULTI_PROMPT_ROUTER_TEMPLATE

# 替换成我们自己的
router_template = MULTI_PROMPT_ROUTER_TEMPLATE.format(
    destinations=destinations_str
)
router_prompt = PromptTemplate(
    template=router_template,
    input_variables=["input"],
    output_parser=RouterOutputParser(),
)

router_chain = LLMRouterChain.from_llm(llm, router_prompt)

image.png

执行

chain = MultiPromptChain(router_chain=router_chain, 
                         destination_chains=destination_chains, 
                         default_chain=default_chain, verbose=True
                        )
                        
chain.run("What is black body radiation?")

执行结果

# 中间的日志
physics: {'input': 'What is black body radiation?'}
# 结果输出
Black body radiation refers to the electromagnetic radiation emitted by a perfect black body, which is an idealized physical body that absorbs all incident electromagnetic radiation and emits radiation at all frequencies. The radiation emitted by a black body depends only on its temperature and follows a specific distribution known as Planck's law. This type of radiation is important in understanding concepts such as thermal radiation and the behavior of objects at different temperatures

总结

LangChain 不是另一个大模型,也不是一个拥有记忆或推理能力的系统。它是一个为 LLM 量身定制的“流程编排器”

LangChain 只负责一件事:在正确的时间,把正确的上下文,以正确的格式,送给大模型,然后把大模型的回答放到正确的位置。