智能体设计模式-CH07：多智能体协作（Multi-Agent Collaboration）本章探讨了多智能体协作模式，

英文原地址：Chapter 7: Multi-Agent

虽然单体式智能体架构在处理定义明确的问题时可能行之有效，但在面对复杂的多领域任务时，其能力往往受到限制。多智能体协作模式通过将系统构建为由不同、专门化的智能体组成的协作集合来应对这些限制。该方法基于任务分解原则，即将高层目标拆分为离散的子问题。随后，每个子问题被分配给拥有最适合该任务的特定工具、数据访问权限或推理能力的智能体。

例如，一个复杂的研究查询可以被分解并分别分配给负责信息检索的 Research Agent、负责统计处理的 Data Analysis Agent，以及负责生成最终报告的 Synthesis Agent。这类系统的有效性不仅源于分工，更关键的是取决于智能体之间的通信机制。这需要标准化的通信协议和共享本体，以便智能体交换数据、委派子任务并协调其行动，从而确保最终输出的一致性。

这种分布式架构带来多重优势，包括更强的模块化、可扩展性和健壮性，因为单个智能体的故障并不一定会导致整个系统的失效。通过协作，可以产生协同效应，使多智能体系统的整体性能超越集合中任何单一智能体的潜在能力。

概述

多智能体协作模式涉及设计由多个独立或半独立的智能体共同工作以实现共同目标的系统。每个智能体通常具有明确定义的角色、与整体目标一致的具体目标，并可能访问不同的工具或知识库。该模式的力量在于这些智能体之间的交互与协同。。

协作可以采取多种形式：

顺序交接： 一个智能体完成一项任务，并将其输出传递给另一智能体以在管道的下一步继续（类似于规划模式，但明确涉及不同的智能体）。
并行处理： 多个智能体同时处理问题的不同部分，之后再将它们的结果进行合并。
辩论与共识： 具有不同视角和信息来源的多智能体通过讨论来评估选项，最终达成共识或做出更为明智的决策。
层级结构： 管理者智能体可根据其工具访问或插件能力动态地将任务分配给工作智能体，并综合它们的结果。每个智能体也可以负责相关的工具组，而不是由单个智能体处理所有工具。
专家团队： 在不同领域拥有专业知识的智能体（例如研究员、撰稿人、编辑）协作以产出复杂成果。
批评-评审者： 智能体先生成初始输出，如计划、草稿或答案。随后由第二组智能体对该输出进行严格评估，以检查其对政策、安全、合规性、正确性、质量以及与组织目标的一致性。原始创建者或最终智能体会根据这些反馈修订输出。该模式在代码生成、科研写作、逻辑校验以及确保伦理对齐方面特别有效。这种方法的优势包括更高的鲁棒性、改进的质量，以及降低幻觉或错误的可能性。

一个多智能体系统（见图 1）从根本上包括对智能体角色与职责的划分、建立用于智能体之间信息交换的通信渠道，以及制定引导其协作努力的任务流程或交互协议。

图 1：多智能体系统示例

诸如 Crew AI 和 Google ADK 等框架旨在通过提供用于定义智能体、任务及其交互流程的结构来推动这一范式。该方法对需要多种专业知识、包含多个离散阶段，或利用并行处理优势以及跨智能体信息相互印证的挑战尤其有效。

实际应用与使用场景

多智能体协作是一种适用于众多领域的强大模式：

复杂研究与分析：一组智能体可以协作完成研究项目。一个智能体可能专长于检索学术数据库，另一个负责总结发现，第三个负责识别趋势，第四个将信息综合成报告。这与人类研究团队的运作方式相似。

复杂研究与分析： 一组智能体可以协作完成研究项目。一个智能体可能专长于检索学术数据库，另一个负责总结发现，第三个负责识别趋势，第四个将信息综合成报告。这与人类研究团队的运作方式相似。
软件开发： 设想智能体协作构建软件。一个智能体可担任需求分析师，另一个是代码生成器，第三个是测试人员，第四个是文档撰写者。他们可以相互传递输出，以构建并验证各个组件。
创意内容生成： 创建一项营销活动可能涉及市场调研智能体、文案撰写智能体、平面设计智能体（使用图像生成工具）以及社交媒体排期智能体，所有智能体协同工作。
财务分析： 一个多智能体系统可以用于分析金融市场。智能体可能分别负责获取股票数据、分析新闻情绪、执行技术分析以及生成投资建议。
客户支持： 一线支持智能体可以处理初始咨询，并在需要时将复杂问题升级给专业智能体（例如技术专家或账单专家），以问题复杂度为基础进行顺序交接。
供应链优化： 智能体可以代表供应链中的不同节点（供应商、制造商、分销商），协作以在需求变化或中断的情况下优化库存水平、物流和排程。
网络分析与修复： 自主运维从智能体架构中获益良多，尤其是在故障定位方面。多个智能体可以协作进行分诊与修复问题，并提出最优行动建议。这些智能体也可以与传统的机器学习模型与工具集成，在利用现有系统的同时，提供生成式 AI 的优势。

划分专门化的智能体并精心编排其相互关系的能力，使开发者能够构建具有更高模块化、可扩展性，以及能够应对单一、集成式智能体无法解决的复杂问题的系统。

多智能体协作：探索相互关系与通信结构

理解智能体彼此交互与通信的复杂方式，是设计高效多智能体系统的基础。如图 2 所示，存在一系列相互关系与通信模型，从最简单的单智能体场景到复杂的、定制化的协作框架。每种模型都呈现出独特的优势与挑战，影响多智能体系统的整体效率、鲁棒性与适应性。

图 2：智能体以多种方式进行交流与交互。 image 2.png

单智能体： 在最基本的层面，“单智能体”在没有与其他实体直接交互或通信的情况下自主运行。尽管该模型易于实现和管理，但其能力天然受限于单个智能体的范围与资源。它适用于可分解为相互独立子问题的任务，每个子问题都可由一个自给自足的单一智能体解决。
网络： “网络”模型代表了迈向协作的重要一步，其中多个智能体以去中心化的方式直接相互交互。通信通常以点对点方式进行，允许共享信息、资源，甚至任务。该模型有助于提升弹性，因为一个智能体的失败不一定会瘫痪整个系统。然而，在一个大型且无结构的网络中，管理通信开销并确保决策的一致性可能具有挑战性。
监督者： 在“监督者”模型中，一个专门的智能体，即“监督者”，负责监督并协调一组下属智能体的活动。监督者充当通信、任务分配和冲突解决的中心枢纽。这种层级结构提供了清晰的权责界限，并可简化管理和控制。然而，它引入了单点故障（即监督者），且当监督者被大量下属或复杂任务压垮时，可能成为瓶颈。
将监督者作为工具： 该模型是对“监督者”概念的细致扩展，其中监督者的角色不再是直接的指挥与控制，而更多是为其他智能体提供资源、指导或分析支持。监督者可能提供工具、数据或计算服务，使其他智能体能够更高效地完成任务，而不必对其每个行动进行指示。此方法旨在利用监督者的能力，同时避免施加僵化的自上而下的控制。
分层式： “分层式”模型在监督者概念的基础上扩展，构建一个多层级的组织结构。这涉及多个层级的监督者，高层监督者监督低层监督者，最底层则是由执行任务的操作型智能体构成的集合。该结构非常适合可分解为子问题的复杂问题，每个子问题由层级中的特定层负责管理。它为可扩展性和复杂性管理提供了结构化的方法，允许在明确边界内进行分布式决策。
自定义： “自定义”模型代表了多智能体系统设计中的终极灵活性。它允许根据特定问题或应用的具体需求，精确地构建独特的相互关系与通信结构。这可以涉及将前述模型的元素进行组合的混合方法，或是在环境的独特约束与机会中涌现出的全新设计。自定义模型通常源于对特定性能指标进行优化的需要、应对高度动态的环境，或将领域特定知识纳入系统架构之中。设计与实现自定义模型通常需要对多智能体系统原理有深入理解，并仔细考量通信协议、协调机制以及涌现行为。

总之，为多智能体系统选择交互关系与通信模型是一个至关重要的设计决策。每种模型都有其独特的优缺点，最佳选择取决于任务的复杂性、智能体数量、期望的自主性水平、对鲁棒性的需求以及可接受的通信开销等因素。未来多智能体系统的发展很可能会继续探索并完善这些模型，同时也会开发出协同智能的新范式

实战代码示例（Crew AI）

这段 Python 代码使用 CrewAI 框架定义了一个由 AI 驱动的团队，用于生成关于 AI 趋势的博客文章。它首先设置运行环境，从 .env 文件加载 API 密钥。应用的核心是定义两个智能体：一个研究员用于发现并总结 AI 趋势，另一个写作者根据研究内容撰写博客文章。

因此相应地定义了两个任务：一个用于研究趋势，另一个用于撰写博客文章，其中写作任务依赖于研究任务的输出。然后将这些智能体和任务组装成一个 Crew，指定按顺序执行任务的顺序流程。Crew 使用智能体、任务和一个语言模型（具体为“gemini-2.0-flash”模型）进行初始化。主函数通过调用 kickoff() 方法执行该 Crew，协调智能体之间的协作以产出所需结果。最后，代码打印出该 Crew 执行的最终结果，即生成的博客文章。

import os
from dotenv import load_dotenv
from crewai import Agent, Task, Crew, Process
from langchain_google_genai import ChatGoogleGenerativeAI

def setup_environment():
   """Loads environment variables and checks for the required API key."""
   load_dotenv()
   if not os.getenv("GOOGLE_API_KEY"):
       raise ValueError("GOOGLE_API_KEY not found. Please set it in your .env file.")

def main():
   """
   Initializes and runs the AI crew for content creation using the latest Gemini model.
   """
   setup_environment()

   # Define the language model to use.
   # Updated to a model from the Gemini 2.0 series for better performance and features.
   # For cutting-edge (preview) capabilities, you could use "gemini-2.5-flash".
   llm = ChatGoogleGenerativeAI(model="gemini-2.0-flash")

   # Define Agents with specific roles and goals
   researcher = Agent(
       role='Senior Research Analyst',
       goal='Find and summarize the latest trends in AI.',
       backstory="You are an experienced research analyst with a knack for identifying key trends and synthesizing information.",
       verbose=True,
       allow_delegation=False,
   )

   writer = Agent(
       role='Technical Content Writer',
       goal='Write a clear and engaging blog post based on research findings.',
       backstory="You are a skilled writer who can translate complex technical topics into accessible content.",
       verbose=True,
       allow_delegation=False,
   )

   # Define Tasks for the agents
   research_task = Task(
       description="Research the top 3 emerging trends in Artificial Intelligence in 2024-2025. Focus on practical applications and potential impact.",
       expected_output="A detailed summary of the top 3 AI trends, including key points and sources.",
       agent=researcher,
   )

   writing_task = Task(
       description="Write a 500-word blog post based on the research findings. The post should be engaging and easy for a general audience to understand.",
       expected_output="A complete 500-word blog post about the latest AI trends.",
       agent=writer,
       context=[research_task],
   )

   # Create the Crew
   blog_creation_crew = Crew(
       agents=[researcher, writer],
       tasks=[research_task, writing_task],
       process=Process.sequential,
       llm=llm,
       verbose=2 # Set verbosity for detailed crew execution logs
   )

   # Execute the Crew
   print("## Running the blog creation crew with Gemini 2.0 Flash... ##")
   try:
       result = blog_creation_crew.kickoff()
       print("\n------------------\n")
       print("## Crew Final Output ##")
       print(result)
   except Exception as e:
       print(f"\nAn unexpected error occurred: {e}")


if __name__ == "__main__":
   main()

接下来我们将在 Google ADK 框架中深入探讨更多示例，重点关注分层、并行和顺序的协调范式，以及将智能体实现为操作性工具的方式。

实战代码示例（ADK）

以下代码通过创建父子关系，展示了在 Google ADK 中建立层级智能体结构。代码定义了两类智能体：LlmAgent 和一个从 BaseAgent 派生的自定义 TaskExecutor 智能体。TaskExecutor 专为特定的非 LLM 任务而设计，在本示例中，它仅产生一个“Task finished successfully”事件。名为 greeter 的 LlmAgent 使用指定的模型和指令进行初始化，作为一个友好的问候者。自定义的 TaskExecutor 实例化为 task_doer。随后创建了一个名为 coordinator 的父级 LlmAgent，同样带有模型和指令。coordinator 的指令引导其将问候任务委派给 greeter，将任务执行委派给 task_doer。greeter 和 task_doer 被添加为 coordinator 的子智能体，从而建立了父子关系。代码接着断言该关系已正确设置。最后，它打印一条消息，表明智能体层级已成功创建。

from google.adk.agents import LlmAgent, BaseAgent
from google.adk.agents.invocation_context import InvocationContext
from google.adk.events import Event
from typing import AsyncGenerator

# Correctly implement a custom agent by extending BaseAgent
class TaskExecutor(BaseAgent):
   """A specialized agent with custom, non-LLM behavior."""
   name: str = "TaskExecutor"
   description: str = "Executes a predefined task."

   async def _run_async_impl(self, context: InvocationContext) -> AsyncGenerator[Event, None]:
       """Custom implementation logic for the task."""
       # This is where your custom logic would go.
       # For this example, we'll just yield a simple event.
       yield Event(author=self.name, content="Task finished successfully.")

# Define individual agents with proper initialization
# LlmAgent requires a model to be specified.
greeter = LlmAgent(
   name="Greeter",
   model="gemini-2.0-flash-exp",
   instruction="You are a friendly greeter."
)
task_doer = TaskExecutor() # Instantiate our concrete custom agent

# Create a parent agent and assign its sub-agents
# The parent agent's description and instructions should guide its delegation logic.
coordinator = LlmAgent(
   name="Coordinator",
   model="gemini-2.0-flash-exp",
   description="A coordinator that can greet users and execute tasks.",
   instruction="When asked to greet, delegate to the Greeter. When asked to perform a task, delegate to the TaskExecutor.",
   sub_agents=[
       greeter,
       task_doer
   ]
)

# The ADK framework automatically establishes the parent-child relationships.
# These assertions will pass if checked after initialization.
assert greeter.parent_agent == coordinator
assert task_doer.parent_agent == coordinator

print("Agent hierarchy created successfully.")

以下代码展示了在 Google ADK 框架中使用 LoopAgent 来建立迭代式工作流。代码定义了两个智能体：ConditionChecker 和 ProcessingStep。ConditionChecker 是一个自定义智能体，它会检查会话状态中的 “status” 值。如果 “status” 为 “completed”，ConditionChecker 将触发事件以停止循环；否则，它会产生事件以继续循环。ProcessingStep 是一个使用 “gemini-2.0-flash-exp” 模型的 LlmAgent。它的指令是执行任务，并在为最终步骤时将会话的 “status” 设置为 “completed”。创建了一个名为 StatusPoller 的 LoopAgent。StatusPoller 被配置为 max_iterations=10。StatusPoller 将 ProcessingStep 和一个 ConditionChecker 实例都包含为子智能体。LoopAgent 将按顺序执行这些子智能体，最多进行 10 次迭代，如果 ConditionChecker 发现状态为 “completed” 则停止。

import asyncio
from typing import AsyncGenerator
from google.adk.agents import LoopAgent, LlmAgent, BaseAgent
from google.adk.events import Event, EventActions
from google.adk.agents.invocation_context import InvocationContext

# Best Practice: Define custom agents as complete, self-describing classes.
class ConditionChecker(BaseAgent):
   """A custom agent that checks for a 'completed' status in the session state."""
   name: str = "ConditionChecker"
   description: str = "Checks if a process is complete and signals the loop to stop."

   async def _run_async_impl(
       self, context: InvocationContext
   ) -> AsyncGenerator[Event, None]:
       """Checks state and yields an event to either continue or stop the loop."""
       status = context.session.state.get("status", "pending")
       is_done = (status == "completed")

       if is_done:
           # Escalate to terminate the loop when the condition is met.
           yield Event(author=self.name, actions=EventActions(escalate=True))
       else:
           # Yield a simple event to continue the loop.
           yield Event(author=self.name, content="Condition not met, continuing loop.")

# Correction: The LlmAgent must have a model and clear instructions.
process_step = LlmAgent(
   name="ProcessingStep",
   model="gemini-2.0-flash-exp",
   instruction="You are a step in a longer process. Perform your task. If you are the final step, update session state by setting 'status' to 'completed'."
)

# The LoopAgent orchestrates the workflow.
poller = LoopAgent(
   name="StatusPoller",
   max_iterations=10,
   sub_agents=[
       process_step,
       ConditionChecker() # Instantiating the well-defined custom agent.
   ]
)

# This poller will now execute 'process_step' 
# and then 'ConditionChecker'
# repeatedly until the status is 'completed' or 10 iterations 
# have passed.

以下代码展示了 Google ADK 中的 SequentialAgent 模式，其旨在构建线性工作流。该代码使用 google.adk.agents 库定义了一个顺序智能体管道。该管道由两个智能体组成，step1 和 step2。step1 命名为 "Step1_Fetch"，其输出将以键 "data" 存储在会话状态中。step2 命名为 "Step2_Process"，其指示是分析存储在 session.state["data"] 中的信息并提供摘要。名为 "MyPipeline" 的 SequentialAgent 负责编排这些子智能体的执行。当管道以初始输入运行时，step1 将首先执行。来自 step1 的响应将以键 "data" 保存到会话状态中。随后，step2 将执行，并按照其指令使用 step1 写入状态的信息。该结构允许构建一种工作流，其中一个智能体的输出成为下一个智能体的输入。这是在创建多步骤 AI 或数据处理管道时常见的模式。

from google.adk.agents import SequentialAgent, Agent

# This agent's output will be saved to session.state["data"]
step1 = Agent(name="Step1_Fetch", output_key="data")

# This agent will use the data from the previous step.
# We instruct it on how to find and use this data.
step2 = Agent(
   name="Step2_Process",
   instruction="Analyze the information found in state['data'] and provide a summary."
)

pipeline = SequentialAgent(
   name="MyPipeline",
   sub_agents=[step1, step2]
)

# When the pipeline is run with an initial input, Step1 will execute,
# its response will be stored in session.state["data"], and then
# Step2 will execute, using the information from the state as instructed.

以下代码展示了 Google ADK 中的 ParallelAgent 模式，该模式用于并行执行多个智能体任务。data_gatherer 被设计为同时运行两个子智能体：weather_fetcher 和 news_fetcher。weather_fetcher 智能体被指示获取指定位置的天气，并将结果存储在 session.state["weather_data"] 中。类似地，news_fetcher 智能体被指示检索给定主题的头条新闻，并将其存储在 session.state["news_data"] 中。每个子智能体都被配置为使用 "gemini-2.0-flash-exp" 模型。ParallelAgent 协调这些子智能体的执行，使其能够并行工作。来自 weather_fetcher 和 news_fetcher 的结果将被汇总并存储在会话状态中。最后，该示例展示了在智能体执行完成后，如何从 final_state 中访问收集到的天气和新闻数据。

from google.adk.agents import Agent, ParallelAgent

# It's better to define the fetching logic as tools for the agents
# For simplicity in this example, we'll embed the logic in the agent's instruction.
# In a real-world scenario, you would use tools.

# Define the individual agents that will run in parallel
weather_fetcher = Agent(
   name="weather_fetcher",
   model="gemini-2.0-flash-exp",
   instruction="Fetch the weather for the given location and return only the weather report.",
   output_key="weather_data"  # The result will be stored in session.state["weather_data"]
)

news_fetcher = Agent(
   name="news_fetcher",
   model="gemini-2.0-flash-exp",
   instruction="Fetch the top news story for the given topic and return only that story.",
   output_key="news_data"      # The result will be stored in session.state["news_data"]
)

# Create the ParallelAgent to orchestrate the sub-agents
data_gatherer = ParallelAgent(
   name="data_gatherer",
   sub_agents=[
       weather_fetcher,
       news_fetcher
   ]
)

以下代码展示了 Google ADK 中的“将智能体作为工具（Agent as a Tool）”范式，使一个智能体能够以类似函数调用的方式利用另一个智能体的能力。具体而言，代码使用 Google 的 LlmAgent 和 AgentTool 类定义了一个图像生成系统。它由两个智能体组成：父级的 artist_agent 和子级的 image_generator_agent。generate_image 函数是一个简单的工具，用于模拟图像创建，返回模拟的图像数据。image_generator_agent 负责根据其收到的文本提示使用该工具。artist_agent 的角色是先构思一个有创意的图像提示，然后通过 AgentTool 包装器调用 image_generator_agent。AgentTool 充当桥梁，使一个智能体可以将另一个智能体作为工具使用。当 artist_agent 调用 image_tool 时，AgentTool 会使用艺术家构思的提示来调用 image_generator_agent。随后，image_generator_agent 使用该提示调用 generate_image 函数。最终，生成的图像（或模拟数据）通过智能体逐级返回。该架构展示了一个分层的智能体系统，其中高层智能体协同调度低层的专业化智能体来执行任务。

from google.adk.agents import LlmAgent
from google.adk.tools import agent_tool
from google.genai import types

# 1. A simple function tool for the core capability.
# This follows the best practice of separating actions from reasoning.
def generate_image(prompt: str) -> dict:
   """
   Generates an image based on a textual prompt.

   Args:
       prompt: A detailed description of the image to generate.

   Returns:
       A dictionary with the status and the generated image bytes.
   """
   print(f"TOOL: Generating image for prompt: '{prompt}'")
   # In a real implementation, this would call an image generation API.
   # For this example, we return mock image data.
   mock_image_bytes = b"mock_image_data_for_a_cat_wearing_a_hat"
   return {
       "status": "success",
       # The tool returns the raw bytes, the agent will handle the Part creation.
       "image_bytes": mock_image_bytes,
       "mime_type": "image/png"
   }

# 2. Refactor the ImageGeneratorAgent into an LlmAgent.
# It now correctly uses the input passed to it.
image_generator_agent = LlmAgent(
   name="ImageGen",
   model="gemini-2.0-flash",
   description="Generates an image based on a detailed text prompt.",
   instruction=(
       "You are an image generation specialist. Your task is to take the user's request "
       "and use the `generate_image` tool to create the image. "
       "The user's entire request should be used as the 'prompt' argument for the tool. "
       "After the tool returns the image bytes, you MUST output the image."
   ),
   tools=[generate_image]
)

# 3. Wrap the corrected agent in an AgentTool.
# The description here is what the parent agent sees.
image_tool = agent_tool.AgentTool(
   agent=image_generator_agent,
   description="Use this tool to generate an image. The input should be a descriptive prompt of the desired image."
)

# 4. The parent agent remains unchanged. Its logic was correct.
artist_agent = LlmAgent(
   name="Artist",
   model="gemini-2.0-flash",
   instruction=(
       "You are a creative artist. First, invent a creative and descriptive prompt for an image. "
       "Then, use the `ImageGen` tool to generate the image using your prompt."
   ),
   tools=[image_tool]
)

回顾

是什么（What）

复杂问题常常超出单一、整体式、基于 LLM 的智能体的能力范围。单一智能体可能缺乏多样化的专业技能，或无法访问解决多方面任务所需的特定工具。这一限制会造成瓶颈，降低系统的整体有效性和可扩展性。结果是，在处理复杂的多领域目标时效率低下，并可能导致不完整或次优的结果。

为什么（Why）

多智能体协作模式通过构建由多个协作智能体组成的系统，提供了标准化的解决方案。它将复杂问题分解为更小、更易管理的子问题。每个子问题随后被分配给具备解决该问题所需的精确工具和能力的专业化智能体。这些智能体通过定义良好的通信协议和交互模型（如顺序交接、并行工作流或分层委派）共同工作。这种具备智能体特性的分布式方法产生协同效应，使整个群体能够实现任何单个智能体都无法完成的成果。

经验法则（Rule of Thumb）

当单个智能体难以胜任且任务可以分解为需要专业技能或工具的不同子任务时，使用此模式。它非常适合那些受益于多元专长、并行处理或具有多阶段结构化工作流的问题，例如复杂的研究与分析、软件开发或创意内容生成。

图形摘要

image 3.png

关键点

多智能体协作是指多个智能体共同工作以实现一个共同目标。
该模式利用专业化角色、分布式任务和智能体之间的通信。
协作形式可以包括顺序交接、并行处理、辩论或分层结构。
该模式非常适合需要多样化专业知识或包含多个不同阶段的复杂问题。

总结

本章探讨了多智能体协作模式，展示了在系统中编排多个专业化智能体的益处。我们考察了多种协作模型，强调该模式在应对跨越多种领域的复杂、多层面问题时所扮演的关键角色。理解智能体协作自然而然地引出了对其与外部环境交互方式的探究。

参考资料

Multi-Agent Collaboration Mechanisms: A Survey of LLMs, arxiv.org/abs/2501.06…
Multi-Agent System — The Power of Collaboration, aravindakumar.medium.com/introducing…