模型Prompt-输出解析学习笔记-模型I/O PART2 输出解析 LangChain提供的解析模型输出的功能，使你能

学习笔记-模型I/O PART2

输出解析

LangChain提供的解析模型输出的功能，使你能够更容易地从模型输出中获取结构化的信息，这将大大加快基于语言模型进行应用开发的效率。

description：鲜花的说明文本
reason：解释一下为何要这样写上面的文案

那么，模型可能返回的一种结果是：

A：“文案是：让你心动！50元就可以拥有这支充满浪漫气息的玫瑰花束，让TA感受你的真心爱意。为什么这样说呢？因为爱情是无价的，50元对应热恋中的情侣也会觉得值得。”

上面的回答并不是我们在处理数据时所需要的，我们需要的是一个类似于下面的Python字典。

B：{description: "让你心动！50元就可以拥有这支充满浪漫气息的玫瑰花束，让TA感受你的真心爱意。" ; reason: "因为爱情是无价的，50元对应热恋中的情侣也会觉得值得。"}

那么从A的笼统言语，到B这种结构清晰的数据结构，如何自动实现？这就需要LangChain中的输出解析器上场了。

下面，我们就通过LangChain的输出解析器来重构程序。

# 导入OpenAI Key
import os
os.environ["OPENAI_API_KEY"] = '你的OpenAI API Key'

# 导入LangChain中的提示模板
from langchain.prompts import PromptTemplate
# 创建原始提示模板
prompt_template = """您是一位专业的鲜花店文案撰写员。
对于售价为 {price} 元的 {flower_name} ，您能提供一个吸引人的简短描述吗？
{format_instructions}"""

# 通过LangChain调用模型
from langchain_openai import OpenAI
# 创建模型实例
model = OpenAI(model_name='gpt-3.5-turbo-instruct')

# 导入结构化输出解析器和ResponseSchema
from langchain.output_parsers import StructuredOutputParser, ResponseSchema
# 定义我们想要接收的响应模式
response_schemas = [
    ResponseSchema(name="description", description="鲜花的描述文案"),
    ResponseSchema(name="reason", description="问什么要这样写这个文案")
]
# 创建输出解析器
output_parser = StructuredOutputParser.from_response_schemas(response_schemas)

# 获取格式指示
format_instructions = output_parser.get_format_instructions()
# 根据原始模板创建提示，同时在提示中加入输出解析器的说明
prompt = PromptTemplate.from_template(prompt_template, 
                partial_variables={"format_instructions": format_instructions}) 

# 数据准备
flowers = ["玫瑰", "百合", "康乃馨"]
prices = ["50", "30", "20"]

# 创建一个空的DataFrame用于存储结果
import pandas as pd
df = pd.DataFrame(columns=["flower", "price", "description", "reason"]) # 先声明列名

for flower, price in zip(flowers, prices):
    # 根据提示准备模型的输入
    input = prompt.format(flower_name=flower, price=price)

    # 获取模型的输出
    output = model.invoke(input)
    
    # 解析模型的输出（这是一个字典结构）
    parsed_output = output_parser.parse(output)

    # 在解析后的输出中添加“flower”和“price”
    parsed_output['flower'] = flower
    parsed_output['price'] = price

    # 将解析后的输出添加到DataFrame中
    df.loc[len(df)] = parsed_output  

# 打印字典
print(df.to_dict(orient='records'))

# 保存DataFrame到CSV文件
df.to_csv("flowers_with_descriptions.csv", index=False)

输出内容

[{'flower': '玫瑰', 'price': '50', 'description': 'Luxuriate in the beauty of this 50 yuan rose, with its deep red petals and delicate aroma.', 'reason': 'This description emphasizes the elegance and beauty of the rose, which will be sure to draw attention.'}, 
{'flower': '百合', 'price': '30', 'description': '30元的百合，象征着坚定的爱情，带给你的是温暖而持久的情感！', 'reason': '百合是象征爱情的花，写出这样的描述能让顾客更容易感受到百合所带来的爱意。'}, 
{'flower': '康乃馨', 'price': '20', 'description': 'This beautiful carnation is the perfect way to show your love and appreciation. Its vibrant pink color is sure to brighten up any room!', 'reason': 'The description is short, clear and appealing, emphasizing the beauty and color of the carnation while also invoking a sense of love and appreciation.'}]

这段代码中，首先定义输出结构，我们希望模型生成的答案包含两部分：鲜花的描述文案（description）和撰写这个文案的原因（reason）。所以我们定义了一个名为response_schemas的列表，其中包含两个ResponseSchema对象，分别对应这两部分的输出。

根据这个列表，我通过StructuredOutputParser.from_response_schemas方法创建了一个输出解析器。

然后，我们通过输出解析器对象的get_format_instructions()方法获取输出的格式说明（format_instructions），再根据原始的字符串模板和输出解析器格式说明创建新的提示模板（这个模板就整合了输出解析结构信息）。再通过新的模板生成模型的输入，得到模型的输出。此时模型的输出结构将尽最大可能遵循我们的指示，以便于输出解析器进行解析。

对于每一个鲜花和价格组合，我们都用 output_parser.parse(output) 把模型输出的文案解析成之前定义好的数据格式，也就是一个Python字典，这个字典中包含了description 和 reason 这两个字段的值。

最后，把所有信息整合到一个pandas DataFrame对象中（需要安装Pandas库）。这个DataFrame对象中包含了flower、price、description 和 reason 这四个字段的值。其中，description 和 reason 是由 output_parser 从模型的输出中解析出来的，flower 和 price 是我们自己添加的。

我们可以打印出DataFrame的内容，也方便地在程序中处理它，比如保存为下面的CSV文件。因为此时数据不再是模糊的、无结构的文本，而是结构清晰的有格式的数据。输出解析器在这个过程中的功劳很大。

总结

总结一下使用LangChain框架的好处：

模板管理：在大型项目中，可能会有许多不同的提示模板，使用 LangChain 可以帮助你更好地管理这些模板，保持代码的清晰和整洁。
变量提取和检查：LangChain 可以自动提取模板中的变量并进行检查，确保你没有忘记填充任何变量。
模型切换：如果你想尝试使用不同的模型，只需要更改模型的名称就可以了，无需修改代码。
输出解析：LangChain的提示模板可以嵌入对输出格式的定义，以便在后续处理过程中比较方便地处理已经被格式化了的输出。