智能体设计模式-CH08：记忆管理（Memory Management）本章深入探讨了智能体系统中至关重要的记忆管理工作

英文原地址：Chapter 8: Memory Management

有效的记忆管理对于智能体保留信息至关重要。与人类类似，智能体需要不同类型的记忆才能高效运作。本章深入探讨记忆管理，具体涉及智能体的即时（短期）和持久（长期）记忆需求

在智能体系统中，记忆是指智能体保留并利用过往交互、观察和学习经验中的信息的能力。该能力使智能体能够做出明智的决策、维护对话上下文，并随着时间推移不断改进。智能体记忆通常分为两大类型：

短期记忆（情境记忆）： 类似于工作记忆，它保存当前正在处理或最近访问的信息。对于使用大型语言模型（LLMs）的智能体而言，短期记忆主要存在于上下文窗口中。该窗口包含近期消息、智能体回复、工具使用结果以及当前交互中的智能体反思，这些都会影响 LLM 随后的回应与行动。上下文窗口容量有限，限制了智能体可直接访问的近期信息量。高效的短期记忆管理需要在这一有限空间内保留最相关的信息，可能通过对较早的对话片段进行总结，或强调关键细节等技术来实现。具备“长上下文”窗口的模型出现，仅仅是扩大了这类短期记忆的容量，使单次交互可容纳更多信息。然而，这种上下文仍是短暂的，在会话结束后即会丢失，而且每次处理它都可能代价高昂且效率不佳。因此，智能体需要不同类型的记忆来实现真正的持久性、从过去的交互中回忆信息，并构建持久的知识库。
长期记忆（持久性记忆）： 它充当信息仓库，用于存储智能体在不同交互、任务或较长时间内需要保留的信息，类似于长期知识库。数据通常存储在智能体的即时处理环境之外，常见于数据库、知识图谱或向量数据库。在向量数据库中，信息被转换为数值向量并存储，使智能体能够基于语义相似度而非精确关键词匹配来检索数据，这一过程称为语义搜索。当智能体需要从长期记忆中获取信息时，它会查询外部存储、检索相关数据，并将其整合到短期上下文中以立即使用，从而将既有知识与当前交互相结合。。

实际应用与使用场景

记忆管理对于智能体随时间跟踪信息并智能地执行任务至关重要。这对于智能体超越基本问答能力是必不可少的。应用包括：

聊天机器人与对话式人工智能： 维持对话流畅依赖短期记忆。聊天机器人需要记住用户之前的输入，才能提供连贯的回应。长期记忆使聊天机器人能够回忆用户偏好、过去的问题或先前的讨论，从而提供个性化且连续的交互。
面向任务的智能体： 管理多步任务的智能体需要短期记忆来跟踪先前步骤、当前进度和整体目标。此类信息可能存放在任务上下文或临时存储中。长期记忆对于获取不在当前上下文中的特定用户相关数据至关重要。
个性化体验： 提供定制化交互的智能体会利用长期记忆来存储和检索用户偏好、过往行为和个人信息。这使智能体能够调整其回应和建议。
学习与改进： 智能体可以通过从过去的交互中学习来优化其表现。成功的策略、错误以及新信息会被存储在长期记忆中，从而促进未来的适应。强化学习的智能体会以这种方式存储已学到的策略或知识。
信息检索（RAG）： 用于回答问题的智能体会访问一个知识库，即其长期记忆，这通常通过检索增强生成（RAG）来实现。该智能体会检索相关文档或数据，以为其回复提供依据。
自主系统： 机器人或自动驾驶汽车需要用于地图、路线、物体位置以及已学习行为的记忆。这包括用于即时周边环境的短期记忆和用于一般环境知识的长期记忆。

记忆使智能体能够保留历史、学习、个性化互动，并处理复杂的、随时间变化的问题。

实战代码示例（ADK）

Google Agent Developer Kit（ADK）提供了一种结构化的方法来管理上下文和记忆，并包含用于实际应用的组件。深入理解 ADK 的 Session、State 和 Memory 对于构建需要保留信息的智能体至关重要。

正如在人类互动中一样，智能体需要具备回忆先前交流内容的能力，才能进行连贯自然的对话。ADK 通过三个核心概念及其相关服务来简化上下文管理。

与智能体的每次交互都可视为一个独立的对话线程。智能体可能需要访问先前交互中的数据。ADK 将其结构化如下：

会话（Session）： 指记录特定交互中的消息与操作（事件）的单个聊天线程，同时存储与该对话相关的临时数据（状态）。
状态（State）：存储在会话中的数据，只与当前正在进行的聊天线程相关的信息。
记忆（Memory）： 一个可搜索的信息库，来源于过去的各类聊天或外部来源，作为超出当前对话范围的数据检索资源。

ADK 提供专门的服务，用于管理构建复杂的、有状态且具备上下文感知能力的智能体所必需的关键组件。SessionService 通过处理会话的启动、记录和结束来管理聊天线程（Session 对象），而 MemoryService 则负责长期知识（Memory）的存储与检索。

SessionService 和 MemoryService 都提供多种配置选项，允许用户根据应用需求选择存储方式。为测试目的提供了记忆选项，但数据在重启后不会保留。为实现持久化存储和可扩展性，ADK 也支持基于数据库和云的服务。

会话（Session）：记录每次聊天

ADK 中的 Session 对象旨在跟踪和管理单个聊天线程。在与智能体开始对话时，SessionService 会生成一个 Session 对象，表示为 google.adk.sessions.Session。该对象封装了与特定会话线程相关的所有数据，包括唯一标识符（id、app_name、user_id）、作为 Event 对象的按时间顺序排列的事件记录、用于存储会话特定临时数据的存储区域（称为 state），以及指示最后更新的时间戳（last_update_time）。开发者通常通过 SessionService 间接与 Session 对象交互。SessionService 负责管理会话会话的生命周期，其中包括启动新会话、恢复先前会话、记录会话活动（包括 state 更新）、识别活动会话，以及管理会话数据的移除。ADK 提供了多种 SessionService 实现，使用不同的存储机制来保存会话历史和临时数据，例如 InMemorySessionService，适用于测试，但在应用程序重启后不提供数据持久化。

# Example: Using InMemorySessionService
# This is suitable for local development and testing where data 
# persistence across application restarts is not required.
from google.adk.sessions import InMemorySessionService
session_service = InMemorySessionService()

如果你想可靠地保存到你管理的数据库中，那么可以使用 DatabaseSessionService。

# Example: Using DatabaseSessionService
# This is suitable for production or development requiring persistent storage.
# You need to configure a database URL (e.g., for SQLite, PostgreSQL, etc.).
# Requires: pip install google-adk[sqlalchemy] and a database driver (e.g., psycopg2 for PostgreSQL)
from google.adk.sessions import DatabaseSessionService
# Example using a local SQLite file:
db_url = "sqlite:///./my_agent_data.db"
session_service = DatabaseSessionService(db_url=db_url)

此外，还有 VertexAiSessionService，它使用 Vertex AI 的基础设施在 Google Cloud 上实现可扩展的生产部署。

# Example: Using VertexAiSessionService
# This is suitable for scalable production on Google Cloud Platform, leveraging
# Vertex AI infrastructure for session management.
# Requires: pip install google-adk[vertexai] and GCP setup/authentication
from google.adk.sessions import VertexAiSessionService

PROJECT_ID = "your-gcp-project-id" # Replace with your GCP project ID
LOCATION = "us-central1" # Replace with your desired GCP location
# The app_name used with this service should correspond to the Reasoning Engine ID or name
REASONING_ENGINE_APP_NAME = "projects/your-gcp-project-id/locations/us-central1/reasoningEngines/your-engine-id" # Replace with your Reasoning Engine resource name

session_service = VertexAiSessionService(project=PROJECT_ID, location=LOCATION)
# When using this service, pass REASONING_ENGINE_APP_NAME to service methods:
# session_service.create_session(app_name=REASONING_ENGINE_APP_NAME, ...)
# session_service.get_session(app_name=REASONING_ENGINE_APP_NAME, ...)
# session_service.append_event(session, event, app_name=REASONING_ENGINE_APP_NAME)
# session_service.delete_session(app_name=REASONING_ENGINE_APP_NAME, ...)

选择合适的 SessionService 至关重要，因为它决定了智能体的交互历史和临时数据如何存储以及其持久性。

每次消息交互都涉及一个循环过程：收到一条消息后，Runner 使用 SessionService 获取或建立一个 Session，智能体基于该 Session 的上下文（状态和历史交互）处理消息，生成响应并可能更新状态；Runner 将其封装为一个 Event，而 session_service.append_event 方法会记录该新事件并更新存储中的状态。随后，Session 等待下一条消息。理想情况下，在交互结束时使用 delete_session 方法终止会话。该过程展示了 SessionService 如何通过管理特定于 Session 的历史和临时数据来维护连续性。

状态（State）：会话的临时记录

在 ADK 中，每个表示聊天线程的 Session 都包含一个状态组件，类似于智能体在该次对话期间的临时工作记忆。session.events 会记录整个聊天历史，而 session.state 则存储并更新与当前聊天相关的动态数据点。

从根本上说，session.state 的运作方式是一个字典，以键值对的形式存储数据。其核心功能是使智能体能够保留并管理实现连贯对话所需的细节，例如用户偏好、任务进度、增量数据收集，或影响后续智能体行为的条件标志。

该状态的结构由字符串键和可序列化的 Python 类型的值组成，包括字符串、数字、布尔值、列表以及包含这些基本类型的字典。状态是动态的，会在对话过程中不断演变。这些更改的持久性取决于所配置的 SessionService。

可以通过使用键前缀来组织状态，以定义数据范围和持久性。没有前缀的键是会话特定的。

user: 前缀将数据与一个用户 ID 关联，并在该用户的所有会话中共享
app: 前缀表示在应用程序的所有用户之间共享的数据。
temp: 前缀表示仅对当前处理轮次有效且不会持久存储的数据。

智能体通过单一的 session.state 字典访问所有状态数据。SessionService 负责数据检索、合并与持久化。当通过 session_service.append_event() 向会话历史添加一个 Event 时，应更新状态。这样可以确保准确跟踪、在持久化服务中正确保存并安全处理状态更改。

1. 简单方式： 使用 output_key（用于智能体文本回复）：如果你只想把智能体最终的文本回复直接保存到状态中，这是最简单的方法。在设置 LlmAgent 时，只需指定你要使用的 output_key。Runner 会识别它，并在追加事件时自动创建必要的动作，将响应保存到状态。下面看一个通过 output_key 更新状态的代码示例。

# Import necessary classes from the Google Agent Developer Kit (ADK)
from google.adk.agents import LlmAgent
from google.adk.sessions import InMemorySessionService, Session
from google.adk.runners import Runner
from google.genai.types import Content, Part

# Define an LlmAgent with an output_key.
greeting_agent = LlmAgent(
   name="Greeter",
   model="gemini-2.0-flash",
   instruction="Generate a short, friendly greeting.",
   output_key="last_greeting"
)

# --- Setup Runner and Session ---
app_name, user_id, session_id = "state_app", "user1", "session1"
session_service = InMemorySessionService()
runner = Runner(
   agent=greeting_agent,
   app_name=app_name,
   session_service=session_service
)
session = session_service.create_session(
   app_name=app_name,
   user_id=user_id,
   session_id=session_id
)

print(f"Initial state: {session.state}")

# --- Run the Agent ---
user_message = Content(parts=[Part(text="Hello")])
print("\n--- Running the agent ---")
for event in runner.run(
   user_id=user_id,
   session_id=session_id,
   new_message=user_message
):
   if event.is_final_response():
     print("Agent responded.")

# --- Check Updated State ---
# Correctly check the state *after* the runner has finished processing all events.
updated_session = session_service.get_session(app_name, user_id, session_id)
print(f"\nState after agent run: {updated_session.state}")

在幕后，Runner 会识别你的 output_key，并在调用 append_event 时自动创建带有 state_delta 的必要动作。

2. 标准方式： 使用 EventActions.state_delta（用于更复杂的更新）：当你需要做更复杂的事情——例如一次更新多个键、保存不仅仅是文本的数据、定位特定范围如 user: 或 app:，或进行不与智能体最终文本回复绑定的更新——你将手动构建一个包含状态更改的字典（state_delta），并把它包含在你要追加的 Event 的 EventActions 中。下面看一个示例：

import time
from google.adk.tools.tool_context import ToolContext
from google.adk.sessions import InMemorySessionService

# --- Define the Recommended Tool-Based Approach ---
def log_user_login(tool_context: ToolContext) -> dict:
   """
   Updates the session state upon a user login event.
   This tool encapsulates all state changes related to a user login.
   Args:
       tool_context: Automatically provided by ADK, gives access to session state.
   Returns:
       A dictionary confirming the action was successful.
   """
   # Access the state directly through the provided context.
   state = tool_context.state
  
   # Get current values or defaults, then update the state.
   # This is much cleaner and co-locates the logic.
   login_count = state.get("user:login_count", 0) + 1
   state["user:login_count"] = login_count
   state["task_status"] = "active"
   state["user:last_login_ts"] = time.time()
   state["temp:validation_needed"] = True
  
   print("State updated from within the `log_user_login` tool.")
  
   return {
       "status": "success",
       "message": f"User login tracked. Total logins: {login_count}."
   }

# --- Demonstration of Usage ---
# In a real application, an LLM Agent would decide to call this tool.
# Here, we simulate a direct call for demonstration purposes.

# 1. Setup
session_service = InMemorySessionService()
app_name, user_id, session_id = "state_app_tool", "user3", "session3"
session = session_service.create_session(
   app_name=app_name,
   user_id=user_id,
   session_id=session_id,
   state={"user:login_count": 0, "task_status": "idle"}
)
print(f"Initial state: {session.state}")

# 2. Simulate a tool call (in a real app, the ADK Runner does this)
# We create a ToolContext manually just for this standalone example.
from google.adk.tools.tool_context import InvocationContext
mock_context = ToolContext(
   invocation_context=InvocationContext(
       app_name=app_name, user_id=user_id, session_id=session_id,
       session=session, session_service=session_service
   )
)

# 3. Execute the tool
log_user_login(mock_context)

# 4. Check the updated state
updated_session = session_service.get_session(app_name, user_id, session_id)
print(f"State after tool execution: {updated_session.state}")

# Expected output will show the same state change as the 
# "Before" case,
# but the code organization is significantly cleaner 
# and more robust.

此代码演示了一种基于工具的方法，用于管理应用中的用户会话状态。它定义了一个充当工具的函数 log_user_login。该工具负责在用户登录时更新会话状态。

该函数接收由 ADK 提供的 ToolContext 对象，以访问并修改会话的状态字典。在工具内部，它会递增 user:login_count，将 task_status 设为“active”，记录 user:last_login_ts（时间戳），并添加一个临时标记 temp:validation_needed。

代码的演示部分模拟了该工具的使用方式。它设置了一个记忆会话服务，并创建了一个带有预定义状态的初始会话。然后手动创建一个 ToolContext，以模拟 ADK Runner 执行工具的环境。接着使用这个模拟上下文调用 log_user_login 函数。最后，再次检索会话以展示该工具的执行已更新了会话状态。其目的在于展示将状态变更封装在工具中如何让代码比在工具外部直接操作状态更整洁、更有组织。

请注意，强烈不建议在检索会话后直接修改 session.state 字典，因为这会绕过标准的事件处理机制。此类直接更改不会记录到会话的事件历史中，所选的 SessionService 可能不会持久化这些更改，还可能引发并发问题，并且不会更新诸如时间戳等关键元数据。推荐的会话状态更新方法是：在 LlmAgent 上使用 output_key 参数（专用于智能体的最终文本响应），或在通过 session_service.append_event() 追加事件时将状态变更包含在 EventActions.state_delta 中。session.state 主要应用于读取现有数据。

总结一下，在设计你的状态时，保持简单、使用基本数据类型、为键使用清晰的名称并正确使用前缀、避免深度嵌套，并始终通过 append_event 流程更新状态。

记忆（Memory）：使用 MemoryService 的长期知识

在智能体系统中，Session 组件维护单次对话的当前聊天历史（events）和临时数据（state）。然而，为了让智能体在多次交互中保留信息或访问外部数据，需要长期知识管理。这由 MemoryService 提供支持。

# Example: Using InMemoryMemoryService
# This is suitable for local development and testing where data 
# persistence across application restarts is not required. 
# Memory content is lost when the app stops.
from google.adk.memory import InMemoryMemoryService
memory_service = InMemoryMemoryService()

可以将 Session 和 State 视为单次聊天会话的短期记忆，而由 MemoryService 管理的长期知识则是一个持久且可搜索的存储库。该存储库可以包含来自多次过去交互或外部来源的信息。由 BaseMemoryService 接口定义的 MemoryService 为管理这种可搜索的长期知识建立了标准。其主要功能包括添加信息（通过使用 add_session_to_memory 方法从会话中提取内容并进行存储）以及检索信息（允许智能体查询存储并使用 search_memory 方法获得相关数据）。

ADK 提供了多种实现来创建这种长期知识存储。InMemoryMemoryService 提供适用于测试的临时存储方案，但数据不会在应用重启后保留。对于生产环境，通常使用 VertexAiRagMemoryService。该服务利用 Google Cloud 的检索增强生成（RAG）服务，实现可扩展、持久化以及语义搜索能力（另见第 14 章关于 RAG 的内容）。

# Example: Using VertexAiRagMemoryService
# This is suitable for scalable production on GCP, leveraging
# Vertex AI RAG (Retrieval Augmented Generation) for persistent, 
# searchable memory.
# Requires: pip install google-adk[vertexai], GCP 
# setup/authentication, and a Vertex AI RAG Corpus.
from google.adk.memory import VertexAiRagMemoryService

# The resource name of your Vertex AI RAG Corpus
RAG_CORPUS_RESOURCE_NAME = "projects/your-gcp-project-id/locations/us-central1/ragCorpora/your-corpus-id" # Replace with your Corpus resource name

# Optional configuration for retrieval behavior
SIMILARITY_TOP_K = 5 # Number of top results to retrieve
VECTOR_DISTANCE_THRESHOLD = 0.7 # Threshold for vector similarity

memory_service = VertexAiRagMemoryService(
   rag_corpus=RAG_CORPUS_RESOURCE_NAME,
   similarity_top_k=SIMILARITY_TOP_K,
   vector_distance_threshold=VECTOR_DISTANCE_THRESHOLD
)
# When using this service, methods like add_session_to_memory 
# and search_memory will interact with the specified Vertex AI 
# RAG Corpus.

实战代码示例（LangChain & LangGraph）

在 LangChain 和 LangGraph 中，Memory 是构建智能且自然对话应用的关键组件。它使智能体能够记住过往交互中的信息，从反馈中学习，并适应用户偏好。LangChain 的记忆功能通过引用已存储的历史来丰富当前提示，并记录最新的对话以供未来使用。随着智能体处理更复杂的任务，这种能力对效率和用户满意度都至关重要。

短期记忆： 这是线程作用域的，意味着它跟踪单个会话或线程内的正在进行的对话。它提供即时上下文，但完整历史可能会挤占 LLM 的上下文窗口，导致错误或性能下降。LangGraph 将短期记忆作为智能体状态的一部分进行管理，并通过检查点器进行持久化，使线程可随时恢复。

长期记忆： 它在多个会话之间存储用户特定或应用级数据，并在会话线程之间共享。它保存在自定义的“命名空间”中，可在任意线程随时调用。LangGraph 提供存储来保存和调用长期记忆，使智能体能够无限期地保留知识。

LangChain 提供了多种管理会话历史的工具，范围从手动控制到在链中自动集成。

ChatMessageHistory： 手动记忆管理。对于在正式 Chain 之外对对话历史进行直接且简洁的控制，ChatMessageHistory 类是理想之选。它允许手动跟踪对话交换。

from langchain.memory import ChatMessageHistory

# Initialize the history object
history = ChatMessageHistory()

# Add user and AI messages
history.add_user_message("I'm heading to New York next week.")
history.add_ai_message("Great! It's a fantastic city.")

# Access the list of messages
print(history.messages)

ConversationBufferMemory： Chain 的自动化记忆。要将记忆直接集成到 Chain 中，ConversationBufferMemory 是常见选择。它保存对话的缓冲区，并将其提供给你的提示词。其行为可通过两个关键参数进行自定义：

memory_key：一个字符串，指定在你的提示词中用于保存聊天历史的变量名。默认为 "history"。
return_messages：一个布尔值，用于决定历史记录的格式。
- 如果为 False（默认），则返回单个格式化字符串，适用于标准 LLMs。
- 如果为 True，则返回消息对象的列表，这是 Chat Models 推荐使用的格式。

from langchain.memory import ConversationBufferMemory

# Initialize memory
memory = ConversationBufferMemory()

# Save a conversation turn
memory.save_context({"input": "What's the weather like?"}, {"output": "It's sunny today."})

# Load the memory as a string
print(memory.load_memory_variables({}))

将此记忆集成进一个 LLMChain，可使模型访问对话历史并提供具备上下文相关性的回复

from langchain_openai import OpenAI
from langchain.chains import LLMChain
from langchain.prompts import PromptTemplate
from langchain.memory import ConversationBufferMemory

# 1. Define LLM and Prompt
llm = OpenAI(temperature=0)
template = """You are a helpful travel agent.

Previous conversation:
{history}

New question: {question}
Response:"""
prompt = PromptTemplate.from_template(template)

# 2. Configure Memory
# The memory_key "history" matches the variable in the prompt
memory = ConversationBufferMemory(memory_key="history")

# 3. Build the Chain
conversation = LLMChain(llm=llm, prompt=prompt, memory=memory)

# 4. Run the Conversation
response = conversation.predict(question="I want to book a flight.")
print(response)
response = conversation.predict(question="My name is Sam, by the way.")
print(response)
response = conversation.predict(question="What was my name again?")
print(response)

为提升在聊天模型中的效果，建议将 return_messages=True，使用结构化的消息对象列表。

from langchain_openai import ChatOpenAI
from langchain.chains import LLMChain
from langchain.memory import ConversationBufferMemory
from langchain_core.prompts import (
   ChatPromptTemplate,
   MessagesPlaceholder,
   SystemMessagePromptTemplate,
   HumanMessagePromptTemplate,
)

# 1. Define Chat Model and Prompt
llm = ChatOpenAI()
prompt = ChatPromptTemplate(
   messages=[
       SystemMessagePromptTemplate.from_template("You are a friendly assistant."),
       MessagesPlaceholder(variable_name="chat_history"),
       HumanMessagePromptTemplate.from_template("{question}")
   ]
)

# 2. Configure Memory
# return_messages=True is essential for chat models
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)

# 3. Build the Chain
conversation = LLMChain(llm=llm, prompt=prompt, memory=memory)

# 4. Run the Conversation
response = conversation.predict(question="Hi, I'm Jane.")
print(response)
response = conversation.predict(question="Do you remember my name?")
print(response)

长期记忆的类型： 长期记忆使系统能够在不同对话之间保留信息，提供更深层次的上下文与个性化。它可类比人类记忆分为三种类型：

语义记忆： 记住事实，涉及保留具体事实与概念，如用户偏好或领域知识。用于为智能体的回答提供依据，从而带来更个性化、更相关的交互。此信息可作为不断更新的用户“档案”（一个 JSON 文档）或作为个别事实文档的“集合”进行管理。
情景记忆： 记住经历，涉及回忆过去的事件或行动。对于 AI 智能体，情景记忆常用于记住如何完成任务。在实践中，它经常通过少样本示例提示来实现，即智能体从过去成功的交互序列中学习以正确执行任务。
程序性记忆： 记住规则，这是有关如何执行任务的记忆——智能体的核心指令与行为，通常包含在其系统提示中。智能体修改自身提示以适应和改进是常见做法。一种有效技术是“反思”（Reflection），即向智能体提供其当前指令和最近交互，然后要求其完善自身指令。

下面是演示智能体如何使用反思来更新其存储在 LangGraph BaseStore 中的程序性记忆的伪代码

# Node that updates the agent's instructions
def update_instructions(state: State, store: BaseStore):
   namespace = ("instructions",)
   # Get the current instructions from the store
   current_instructions = store.search(namespace)[0]
  
   # Create a prompt to ask the LLM to reflect on the conversation
   # and generate new, improved instructions
   prompt = prompt_template.format(
       instructions=current_instructions.value["instructions"],
       conversation=state["messages"]
   )
  
   # Get the new instructions from the LLM
   output = llm.invoke(prompt)
   new_instructions = output['new_instructions']
  
   # Save the updated instructions back to the store
   store.put(("agent_instructions",), "agent_a", {"instructions": new_instructions})

# Node that uses the instructions to generate a response
def call_model(state: State, store: BaseStore):
   namespace = ("agent_instructions", )
   # Retrieve the latest instructions from the store
   instructions = store.get(namespace, key="agent_a")[0]
  
   # Use the retrieved instructions to format the prompt
   prompt = prompt_template.format(instructions=instructions.value["instructions"])
   # ... application logic continues

LangGraph 将长期记忆以 JSON 文档的形式存储在存储中。每条记忆都在自定义命名空间（类似文件夹）下组织，并具有唯一的键（类似文件名）。这种层级结构便于信息的组织与检索。以下代码演示了如何使用 InMemoryStore 来放入、获取和搜索记忆。

from langgraph.store.memory import InMemoryStore

# A placeholder for a real embedding function
def embed(texts: list[str]) -> list[list[float]]:
   # In a real application, use a proper embedding model
   return [[1.0, 2.0] for _ in texts]

# Initialize an in-memory store. For production, use a database-backed store.
store = InMemoryStore(index={"embed": embed, "dims": 2})

# Define a namespace for a specific user and application context
user_id = "my-user"
application_context = "chitchat"
namespace = (user_id, application_context)

# 1. Put a memory into the store
store.put(
   namespace,
   "a-memory",  # The key for this memory
   {
       "rules": [
           "User likes short, direct language",
           "User only speaks English & python",
       ],
       "my-key": "my-value",
   },
)

# 2. Get the memory by its namespace and key
item = store.get(namespace, "a-memory")
print("Retrieved Item:", item)

# 3. Search for memories within the namespace, filtering by content
# and sorting by vector similarity to the query.
items = store.search(
   namespace,
   filter={"my-key": "my-value"},
   query="language preferences"
)
print("Search Results:", items)

Vertex Memory Bank

Memory Bank 是 Vertex AI Agent Engine 中的托管服务，为智能体提供持久的长期记忆。该服务使用 Gemini 模型异步分析对话历史，以提取关键信息和用户偏好。

这些信息被持久化存储，并按定义的范围（如用户 ID）进行组织，同时通过智能更新来整合新数据并消解矛盾。在开始新会话时，智能体通过完整数据回忆或使用嵌入进行相似度搜索来检索相关记忆。此过程使智能体能够在跨会话中保持连续性，并基于召回的信息个性化响应。

智能体的 runner 会与 VertexAiMemoryBankService 交互，该服务会先进行初始化。它负责自动存储智能体在对话过程中生成的记忆。每条记忆都带有唯一的 USER_ID 和 APP_NAME 标签，以确保未来能准确检索。

from google.adk.memory import VertexAiMemoryBankService

agent_engine_id = agent_engine.api_resource.name.split("/")[-1]

memory_service = VertexAiMemoryBankService(
   project="PROJECT_ID",
   location="LOCATION",
   agent_engine_id=agent_engine_id
)

session = await session_service.get_session(
   app_name=app_name,
   user_id="USER_ID",
   session_id=session.id
)
await memory_service.add_session_to_memory(session)

Memory Bank 可与 Google ADK 无缝集成，开箱即用。对于使用其他智能体框架（如 LangGraph 和 CrewAI）的用户，Memory Bank 也通过直接 API 调用提供支持。在线代码示例展示了这些集成，供感兴趣的读者参考。

回顾

是什么（What）

智能体系统需要记住过去交互中的信息，以执行复杂任务并提供连贯体验。没有记忆机制，智能体是无状态的，无法保持对话上下文、从经验中学习或为用户个性化响应。这从根本上将它们限制在简单的一次性交互，无法处理多步流程或不断变化的用户需求。核心问题在于如何有效管理单次对话中的即时、临时信息，以及随时间积累的广泛、持久知识。

为什么（Why）

标准化的解决方案是实现区分短期与长期存储的双组件记忆系统。短期的上下文记忆在 LLM 的上下文窗口内保留近期交互数据，以维持对话流畅。对于需要持久化的信息，长期记忆通常使用外部数据库，常见为向量存储，以实现高效的语义检索。像 Google ADK 这样的智能体框架提供特定组件来管理这一点，例如用于对话线程的 Session 和其临时数据的 State。专用的 MemoryService 用于与长期知识库交互，使智能体能够检索并将相关的历史信息纳入当前上下文。

经验法则（Rule of Thumb）

当智能体需要做的不仅仅是回答单个问题时使用此模式。对于必须在整个对话过程中保持上下文、跟踪多步任务进度，或通过回忆用户偏好与历史来个性化交互的智能体，这是必不可少的。每当期望智能体基于过去的成功、失败或新获得的信息进行学习或适应时，都应实施记忆管理。

图形摘要

关键点

记忆对于智能体追踪信息、学习以及实现个性化互动至关重要。
会话式 AI 依赖短期记忆来处理单次聊天中的即时上下文，依赖长期记忆在多次会话间保留持久知识。
短期记忆（当下的内容）是临时的，通常受限于 LLM 的上下文窗口，或取决于框架如何传递上下文。
长期记忆（会保留下来的内容）通过外部存储（如向量数据库）在不同聊天间保存信息，并通过检索进行访问。
像 ADK 这样的框架有专门的组件来管理记忆，例如 Session（聊天线程）、State（临时聊天数据）和 MemoryService（可检索的长期知识）。
ADK 的 SessionService 负责处理整个聊天会话的生命周期，包括其历史（events）和临时数据（state）。
ADK 的 session.state 是一个用于临时聊天数据的字典。前缀（user:、app:、temp:）表明数据归属以及是否持久化。
在 ADK 中，应通过在添加事件时使用 EventActions.state_delta 或 output_key 来更新 state，而不是直接修改 state 字典。
ADK 的 MemoryService 用于将信息写入长期存储，并让智能体（通常通过工具）进行检索。
LangChain 提供了实用工具，如 ConversationBufferMemory，可将单次对话的历史自动注入到提示中，使智能体能够回忆即时上下文。
LangGraph 通过使用存储来跨不同用户会话保存和检索语义事实、情景化经验，甚至可更新的程序性规则，从而实现高级的长期记忆。
Memory Bank 是一项托管服务，通过自动抽取、存储和回忆用户特定信息，为智能体提供持久的长期记忆，从而在 Google 的 ADK、LangGraph 和 CrewAI 等框架中实现个性化、连续性的对话。

结论

本章深入探讨了智能体系统中至关重要的记忆管理工作，阐明了短期上下文与长期保留知识之间的区别。我们讨论了这些类型记忆的设置方式，以及它们在构建能“记住”信息的更智能智能体时的应用。我们详细考察了 Google ADK 如何提供 Session、State 和 MemoryService 等具体组件来处理这些问题。既然我们已经讲完了智能体如何进行短期与长期记忆，接下来就可以转向它们如何学习与适应。下一个模式“学习与适应”关注智能体如何基于新的经验或数据来改变其思考方式、行为方式或其所掌握的知识。

参考资料

ADK Memory, google.github.io/adk-docs/se…
LangGraph Memory, langchain-ai.github.io/langgraph/c…
Vertex AI Agent Engine Memory Bank, cloud.google.com/blog/produc…