青训营X豆包MarsCode 技术训练营第三课 | 豆包MarsCode AI刷题ChatGPT之所以能够记得你之前说过

ChatGPT之所以能够记得你之前说过的话，正是因为它使用了记忆（Memory）机制，记录了之前的对话上下文，并且把这个上下文作为提示的一部分，在最新的调用中传递给了模型。在聊天机器人的构建中，记忆机制非常重要。

使用ConversationChain

这个Chain最主要的特点是，它提供了包含AI 前缀和人类前缀的对话摘要格式，这个对话格式和记忆机制结合得非常紧密。

from langchain import OpenAI

from langchain.chains import ConversationChain

初始化大语言模型

llm = OpenAI(

temperature=0.5,

model_name="gpt-3.5-turbo-instruct"

)

初始化对话链

conv_chain = ConversationChain(llm=llm)

打印对话的模板

print(conv_chain.prompt.template)

The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. If the AI does not know the answer to a question, it truthfully says it does not know.

Current conversation:

{history}

Human: {input}

AI:

这里的提示为人类（我们）和人工智能（text-davinci-003）之间的对话设置了一个基本对话框架：这是人类和 AI 之间的友好对话。AI 非常健谈并从其上下文中提供了大量的具体细节。 (The following is a friendly conversation between a human and an AI. The AI is talkative and provides lots of specific details from its context. )

同时，这个提示试图通过说明以下内容来减少幻觉，也就是尽量减少模型编造的信息：

“如果 AI 不知道问题的答案，它就会如实说它不知道。”（If the AI does not know the answer to a question, it truthfully says it does not know.）

之后，我们看到两个参数 {history} 和 {input}。

{history} 是存储会话记忆的地方，也就是人类和人工智能之间对话历史的信息。
{input} 是新输入的地方，你可以把它看成是和ChatGPT对话时，文本框中的输入。

这两个参数会通过提示模板传递给 LLM，我们希望返回的输出只是对话的延续。

那么当有了 {history} 参数，以及 Human 和 AI 这两个前缀，我们就能够把历史对话信息存储在提示模板中，并作为新的提示内容在新一轮的对话过程中传递给模型。—— 这就是记忆机制的原理。

使用ConversationBufferMemory

在LangChain中，通过ConversationBufferMemory（缓冲记忆）可以实现最简单的记忆机制。

下面，在对话链中引入ConversationBufferMemory。

from langchain import OpenAI

from langchain.chains import ConversationChain

from langchain.chains.conversation.memory import ConversationBufferMemory

初始化大语言模型

llm = OpenAI(

temperature=0.5,

model_name="gpt-3.5-turbo-instruct")

初始化对话链

conversation = ConversationChain(

llm=llm,

memory=ConversationBufferMemory()

)

第一天的对话

回合1

conversation("我姐姐明天要过生日，我需要一束生日花束。")

print("第一次对话后的记忆:", conversation.memory.buffer)

有了记忆机制，LLM能够了解之前的对话内容，这样简单直接地存储所有内容为LLM提供了最大量的信息，但是新输入中也包含了更多的Token（所有的聊天历史记录），这意味着响应时间变慢和更高的成本。而且，当达到LLM的令牌数（上下文窗口）限制时，太长的对话无法被记住（对于text-davinci-003和gpt-3.5-turbo，每次的最大输入限制是4096个Token）。