OpenClaw Agents 系统：多代理架构与智能编排的完整技术解析概览 Agents 系统是 OpenClaw 的

概览

Agents 系统是 OpenClaw 的核心大脑，负责管理多个独立的 AI 代理实例，每个代理都有自己的配置、工具集、技能库和上下文管理策略。本文将深入分析 Agents 系统的 5 大核心子系统：

多代理配置管理 - 如何定义和组织多个独立代理
系统提示构建 - 如何动态生成 LLM 的完整指令集
Compaction（上下文压缩） - 如何在超长对话中管理 Token 预算
模型回退机制 - 如何在主模型失败时自动切换备用模型
工具准备与过滤 - 如何为每个代理准备正确的工具集

核心价值：

🎯 多租户支持：一个 OpenClaw 实例支持多个独立代理，每个有自己的配置
🧠 智能提示工程：根据上下文动态构建最优系统提示
🔄 自适应压缩：自动管理超长对话的上下文，避免 Token 溢出
🛡️ 高可用回退：主模型失败时自动切换备用模型，不中断对话
🔧 精准工具管理：9 层策略过滤确保代理只能使用授权工具

第一部分：多代理配置管理

1.1 配置架构

OpenClaw 使用两层配置结构：

// config.json
{
  "agents": {
    // 全局默认配置（所有代理继承）
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": ["anthropic/claude-sonnet-4-5"]
      },
      "workspace": "~/workspace",
      "thinkingDefault": "low",
      "contextTokens": 200000,
      "compaction": { "mode": "safeguard" }
    },

    // 代理列表（每个可覆盖 defaults）
    "list": [
      {
        "id": "default",
        "default": true,
        "name": "Main Assistant"
      },
      {
        "id": "code-reviewer",
        "name": "Code Review Expert",
        "model": "anthropic/claude-sonnet-4-5",
        "workspace": "~/projects",
        "skills": ["git", "code-review"],
        "tools": {
          "profile": "coding"
        }
      },
      {
        "id": "research",
        "name": "Research Assistant",
        "model": "openai/gpt-4o",
        "skills": ["web-search", "arxiv"],
        "tools": {
          "profile": "full"
        }
      }
    ]
  }
}

1.2 代理作用域解析

核心文件：src/agents/agent-scope.ts

关键函数：`resolveSessionAgentIds()`

// 解析会话使用哪个代理
export function resolveSessionAgentIds(params: {
  sessionKey?: string;      // "agent:code-reviewer:abc123"
  config?: OpenClawConfig;
  agentId?: string;         // 显式指定
}): {
  defaultAgentId: string;   // "default"
  sessionAgentId: string;   // "code-reviewer"
}

解析优先级：

显式 agentId 参数
从 sessionKey 解析（格式：agent:<agentId>:<uuid>）
配置中标记 default: true 的代理
列表中第一个代理
最终回退："default"

配置合并逻辑：`resolveAgentConfig()`

// 获取代理的有效配置（defaults + 代理覆盖）
export function resolveAgentConfig(
  cfg: OpenClawConfig,
  agentId: string
): ResolvedAgentConfig | undefined {
  const entry = findAgentEntry(cfg, agentId);
  return {
    name: entry.name,
    workspace: entry.workspace,
    model: entry.model ?? cfg.agents.defaults.model,
    skills: entry.skills ?? cfg.agents.defaults.skills,
    tools: entry.tools ?? cfg.agents.defaults.tools,
    memorySearch: entry.memorySearch ?? cfg.agents.defaults.memorySearch,
    // ... 其他字段
  };
}

1.3 代理工作目录隔离

核心函数：resolveAgentWorkspaceDir()

export function resolveAgentWorkspaceDir(cfg: OpenClawConfig, agentId: string) {
  const configured = resolveAgentConfig(cfg, agentId)?.workspace;

  if (configured) {
    return resolveUserPath(configured);  // "~/projects" → "/Users/user/projects"
  }

  // 默认代理使用 defaults.workspace
  const defaultAgentId = resolveDefaultAgentId(cfg);
  if (agentId === defaultAgentId) {
    return resolveUserPath(cfg.agents.defaults.workspace ?? "~/workspace");
  }

  // 其他代理自动创建隔离目录
  const stateDir = resolveStateDir();
  return path.join(stateDir, `workspace-${agentId}`);
  // 例：~/.openclaw/workspace-code-reviewer
}

隔离效果：

default 代理 → ~/workspace
code-reviewer 代理 → ~/.openclaw/workspace-code-reviewer
research 代理 → ~/.openclaw/workspace-research

1.4 配置继承示例

假设配置如下：

{
  "agents": {
    "defaults": {
      "model": "anthropic/claude-opus-4-6",
      "thinkingDefault": "low",
      "tools": { "profile": "full" }
    },
    "list": [
      {
        "id": "code-reviewer",
        "model": "anthropic/claude-sonnet-4-5",
        // 未指定 thinkingDefault，继承 "low"
        "tools": { "profile": "coding" }  // 覆盖
      }
    ]
  }
}

实际生效配置：

model: "anthropic/claude-sonnet-4-5" ✅ 使用代理覆盖
thinkingDefault: "low" ✅ 继承 defaults
tools.profile: "coding" ✅ 使用代理覆盖

第二部分：系统提示构建

2.1 系统提示架构

核心文件：src/agents/system-prompt.ts

系统提示是 LLM 的完整指令集，由多个动态部分组成：

┌─────────────────────────────────────┐
│ You are a personal assistant...    │  1. 身份定义
├─────────────────────────────────────┤
│ ## Tooling                          │  2. 可用工具列表
│ - read: Read file contents          │
│ - write: Create files               │
│ - exec: Run shell commands          │
├─────────────────────────────────────┤
│ ## Safety                           │  3. 安全约束
│ No self-preservation goals...       │
├─────────────────────────────────────┤
│ ## Skills (mandatory)               │  4. 技能注入
│ <available_skills>                  │
│   - git (location: skills/git.md)   │
│ </available_skills>                 │
├─────────────────────────────────────┤
│ ## Memory Recall                    │  5. 记忆系统指令
│ Before answering: run memory_search │
├─────────────────────────────────────┤
│ ## Workspace                        │  6. 工作目录
│ Working directory: ~/workspace      │
├─────────────────────────────────────┤
│ ## Documentation                    │  7. 文档路径
│ OpenClaw docs: ~/.openclaw/docs     │
├─────────────────────────────────────┤
│ # Project Context                   │  8. 项目上下文文件
│ ## CLAUDE.md                        │     (BOOTSTRAP.md, SOUL.md 等)
│ [file content...]                   │
├─────────────────────────────────────┤
│ ## Silent Replies                   │  9. 特殊响应格式
│ When nothing to say: 🕊️_NO_REPLY    │
├─────────────────────────────────────┤
│ ## Runtime                          │ 10. 运行时信息
│ agent=default | model=opus-4 | ...  │
└─────────────────────────────────────┘

2.2 核心函数：`buildAgentSystemPrompt()`

export function buildAgentSystemPrompt(params: {
  workspaceDir: string;
  defaultThinkLevel?: ThinkLevel;
  reasoningLevel?: ReasoningLevel;
  extraSystemPrompt?: string;
  ownerNumbers?: string[];
  toolNames?: string[];              // ["read", "write", "exec"]
  toolSummaries?: Record<string, string>;
  modelAliasLines?: string[];
  userTimezone?: string;
  contextFiles?: EmbeddedContextFile[];  // BOOTSTRAP.md 等
  skillsPrompt?: string;             // Skills 系统生成
  docsPath?: string;
  promptMode?: PromptMode;           // "full" | "minimal" | "none"
  runtimeInfo?: {
    agentId?: string;
    model?: string;
    channel?: string;
    capabilities?: string[];
  };
  // ... 其他参数
}): string

2.3 PromptMode 三种模式

type PromptMode = "full" | "minimal" | "none";

模式	使用场景	包含的部分
full	主代理（用户直接交互）	所有部分（Tooling, Skills, Memory, Docs, Safety, etc.）
minimal	子代理（通过 `sessions_spawn` 创建）	仅 Tooling, Workspace, Runtime（不包括 Skills, Memory, Docs）
none	极简场景（测试/调试）	仅身份声明："You are a personal assistant..."

minimal 模式示例（子代理）：

if (promptMode === "minimal") {
  // 跳过这些部分
  return [];  // Skills section
  return [];  // Memory section
  return [];  // Docs section
  return [];  // Silent replies
  return [];  // Heartbeats
}

2.4 动态工具列表注入

function buildToolLines(params: {
  toolNames: string[];
  toolSummaries: Record<string, string>;
}): string[] {
  const coreToolSummaries: Record<string, string> = {
    read: "Read file contents",
    write: "Create or overwrite files",
    edit: "Make precise edits to files",
    apply_patch: "Apply multi-file patches",
    grep: "Search file contents for patterns",
    find: "Find files by glob pattern",
    exec: "Run shell commands (pty available for TTY-required CLIs)",
    process: "Manage background exec sessions",
    web_search: "Search the web (Brave API)",
    memory_search: "Search memory (MEMORY.md + memory/*.md)",
    sessions_spawn: "Spawn an isolated sub-agent session",
    // ... 其他工具
  };

  const toolOrder = ["read", "write", "edit", "exec", "process", ...];
  const enabledTools = toolOrder.filter(tool => toolNames.includes(tool));

  return enabledTools.map(tool => {
    const summary = coreToolSummaries[tool] ?? toolSummaries[tool];
    return summary ? `- ${tool}: ${summary}` : `- ${tool}`;
  });
}

生成效果：

## Tooling
Tool availability (filtered by policy):
- read: Read file contents
- write: Create or overwrite files
- edit: Make precise edits to files
- exec: Run shell commands (pty available for TTY-required CLIs)
- memory_search: Search memory (MEMORY.md + memory/*.md)

2.5 Skills 注入

function buildSkillsSection(params: {
  skillsPrompt?: string;  // 由 Skills 系统生成
  readToolName: string;
}): string[] {
  if (!skillsPrompt?.trim()) {
    return [];
  }

  return [
    "## Skills (mandatory)",
    "Before replying: scan <available_skills> <description> entries.",
    `- If exactly one skill clearly applies: read its SKILL.md at <location> with \`${readToolName}\`, then follow it.`,
    "- If multiple could apply: choose the most specific one, then read/follow it.",
    "- If none clearly apply: do not read any SKILL.md.",
    "Constraints: never read more than one skill up front; only read after selecting.",
    skillsPrompt,  // 插入技能列表
    "",
  ];
}

skillsPrompt 示例（由 Skills 系统生成）：

<available_skills>
<skill>
  <name>git</name>
  <emoji>🔀</emoji>
  <description>Git version control operations</description>
  <location>skills/git.md</location>
</skill>
<skill>
  <name>code-review</name>
  <emoji>🔍</emoji>
  <description>Code review and quality checks</description>
  <location>skills/code-review.md</location>
</skill>
</available_skills>

2.6 Memory 指令注入

function buildMemorySection(params: {
  isMinimal: boolean;
  availableTools: Set<string>;
  citationsMode?: MemoryCitationsMode;  // "on" | "off"
}): string[] {
  if (isMinimal || !availableTools.has("memory_search")) {
    return [];
  }

  const lines = [
    "## Memory Recall",
    "Before answering anything about prior work, decisions, dates, people, preferences, or todos: run memory_search on MEMORY.md + memory/*.md; then use memory_get to pull only the needed lines.",
  ];

  if (citationsMode === "off") {
    lines.push("Citations are disabled: do not mention file paths or line numbers in replies.");
  } else {
    lines.push("Citations: include Source: <path#line> when it helps the user verify memory snippets.");
  }

  return lines;
}

2.7 项目上下文文件注入

const contextFiles: EmbeddedContextFile[] = [
  { path: "BOOTSTRAP.md", content: "# Project Setup\n..." },
  { path: "CLAUDE.md", content: "# Repository Guidelines\n..." },
  { path: "SOUL.md", content: "# Personality\nBe concise..." }
];

// 注入到系统提示
if (validContextFiles.length > 0) {
  const hasSoulFile = validContextFiles.some(file =>
    file.path.toLowerCase() === "soul.md"
  );

  lines.push("# Project Context", "");

  if (hasSoulFile) {
    lines.push(
      "If SOUL.md is present, embody its persona and tone. Avoid stiff, generic replies."
    );
  }

  for (const file of validContextFiles) {
    lines.push(`## ${file.path}`, "", file.content, "");
  }
}

生成效果：

# Project Context

If SOUL.md is present, embody its persona and tone.

## BOOTSTRAP.md

# Project Setup
This is a TypeScript project using Bun...

## SOUL.md

# Personality
Be concise and technical. Avoid emojis unless requested.

2.8 运行时信息行

export function buildRuntimeLine(
  runtimeInfo?: {
    agentId?: string;
    host?: string;
    os?: string;
    model?: string;
    channel?: string;
    repoRoot?: string;
  },
  runtimeChannel?: string,
  runtimeCapabilities: string[] = [],
  defaultThinkLevel?: ThinkLevel
): string {
  return `Runtime: ${[
    runtimeInfo?.agentId ? `agent=${runtimeInfo.agentId}` : "",
    runtimeInfo?.host ? `host=${runtimeInfo.host}` : "",
    runtimeInfo?.repoRoot ? `repo=${runtimeInfo.repoRoot}` : "",
    runtimeInfo?.os ? `os=${runtimeInfo.os}` : "",
    runtimeInfo?.model ? `model=${runtimeInfo.model}` : "",
    runtimeChannel ? `channel=${runtimeChannel}` : "",
    runtimeChannel ? `capabilities=${runtimeCapabilities.join(",")}` : "",
    `thinking=${defaultThinkLevel ?? "off"}`,
  ]
    .filter(Boolean)
    .join(" | ")}`;
}

生成示例：

Runtime: agent=code-reviewer | host=macbook-pro | repo=/Users/user/project | os=darwin | model=claude-sonnet-4-5 | channel=telegram | capabilities=inlineButtons,threads | thinking=low

第三部分：Compaction（上下文压缩）

3.1 为什么需要 Compaction？

问题：超长对话会导致 Token 溢出

用户：创建一个 Web 服务器
助手：[生成 200 行代码]
用户：添加数据库支持
助手：[生成 150 行代码]
用户：添加认证系统
助手：[生成 300 行代码]
... 重复 50 轮 ...
用户：重构路由系统  ← Token 溢出！

上下文增长：

每轮对话 = 用户消息 + 助手消息 + 工具调用 + 工具结果
工具结果可能包含大量文件内容
200K Token 上下文约支持 50-100 轮对话

解决方案：Compaction（压缩旧消息）

3.2 Compaction 触发时机

核心文件：src/agents/compaction.ts

// 当前上下文使用率
const currentTokens = estimateMessagesTokens(messages);
const contextWindow = 200000;  // Claude Opus 4
const usage = currentTokens / contextWindow;

if (usage > 0.8) {  // 超过 80%
  // 触发 Compaction
  await compactAndSummarize(messages);
}

3.3 两种 Compaction 模式

Mode 1: Default（Pi Agent 默认压缩）

export type AgentCompactionMode = "default" | "safeguard";

// config.json
{
  "agents": {
    "defaults": {
      "compaction": {
        "mode": "default",
        "reserveTokens": 4096,        // 为总结预留的 Token
        "keepRecentTokens": 100000    // 保留最近的 Token
      }
    }
  }
}

工作原理：

保留最近 keepRecentTokens 的消息
将旧消息总结为简短摘要
摘要插入到消息历史开头

Mode 2: Safeguard（OpenClaw 增强模式）

{
  "compaction": {
    "mode": "safeguard",
    "maxHistoryShare": 0.5,     // 历史最多占用 50% 上下文
    "reserveTokensFloor": 2048  // 最低预留 Token
  }
}

工作原理：

计算历史预算：budgetTokens = contextWindow * 0.5 = 100K
如果当前消息超过预算，分块丢弃旧消息
对丢弃的消息生成总结
修复孤立的 tool_result（防止 API 错误）

3.4 核心算法：`pruneHistoryForContextShare()`

export function pruneHistoryForContextShare(params: {
  messages: AgentMessage[];
  maxContextTokens: number;  // 200K
  maxHistoryShare?: number;  // 0.5
  parts?: number;            // 2
}): {
  messages: AgentMessage[];       // 保留的消息
  droppedMessagesList: AgentMessage[];  // 丢弃的消息
  droppedChunks: number;          // 丢弃的分块数
  droppedMessages: number;        // 丢弃的消息数
  droppedTokens: number;          // 丢弃的 Token 数
  keptTokens: number;             // 保留的 Token 数
  budgetTokens: number;           // 预算 Token 数
}

算法步骤：

const budgetTokens = maxContextTokens * maxHistoryShare;  // 100K

let keptMessages = messages;
const droppedMessages: AgentMessage[] = [];

while (estimateMessagesTokens(keptMessages) > budgetTokens) {
  // 1. 将消息分成 N 块（按 Token 平均分配）
  const chunks = splitMessagesByTokenShare(keptMessages, parts);

  // 2. 丢弃第一块（最旧的消息）
  const [dropped, ...rest] = chunks;
  droppedMessages.push(...dropped);

  // 3. 修复孤立的 tool_result
  //    （如果 tool_use 在 dropped 中，但 tool_result 在 rest 中）
  const repaired = repairToolUseResultPairing(rest.flat());

  keptMessages = repaired.messages;
}

return { messages: keptMessages, droppedMessagesList: droppedMessages, ... };

示例（parts=2）：

初始消息：[M1, M2, M3, M4, M5, M6, M7, M8]  (150K tokens)
预算：100K tokens

Round 1:
  分块：[M1, M2, M3, M4] | [M5, M6, M7, M8]
  丢弃：[M1, M2, M3, M4]
  保留：[M5, M6, M7, M8]  (90K tokens) ✅ 符合预算

3.5 分阶段总结：`summarizeInStages()`

当丢弃的消息很多时，分阶段总结可以避免单次总结过长：

export async function summarizeInStages(params: {
  messages: AgentMessage[];       // 需要总结的消息
  model: Model;
  apiKey: string;
  signal: AbortSignal;
  reserveTokens: number;
  maxChunkTokens: number;         // 单次总结最大 Token
  contextWindow: number;
  parts?: number;                 // 分成几部分（默认 2）
}): Promise<string>

算法步骤：

// 1. 将消息分成 N 部分（按 Token 平均分配）
const splits = splitMessagesByTokenShare(messages, parts);
// splits = [[M1,M2,M3], [M4,M5,M6]]

// 2. 对每部分分别总结
const partialSummaries: string[] = [];
for (const chunk of splits) {
  const summary = await summarizeWithFallback({
    messages: chunk,
    model,
    apiKey,
    signal,
    reserveTokens,
    maxChunkTokens,
    contextWindow
  });
  partialSummaries.push(summary);
}
// partialSummaries = ["Part 1 summary...", "Part 2 summary..."]

// 3. 合并部分总结
const summaryMessages: AgentMessage[] = partialSummaries.map(summary => ({
  role: "user",
  content: summary
}));

const finalSummary = await summarizeWithFallback({
  messages: summaryMessages,
  customInstructions: "Merge these partial summaries into a single cohesive summary. Preserve decisions, TODOs, open questions, and any constraints."
});

return finalSummary;

示例（parts=2）：

第 1 部分消息（M1-M3）：
  "用户请求创建 Web 服务器，助手生成了 Express 代码..."

第 2 部分消息（M4-M6）：
  "用户请求添加数据库，助手集成了 MongoDB..."

合并总结：
  "项目从零开始创建 Web 服务器（Express），后续添加了 MongoDB 数据库支持。当前待办：添加认证系统。"

3.6 保留标识符策略

问题：总结时可能丢失重要标识符（UUID、哈希、文件名等）

export type AgentCompactionIdentifierPolicy = "strict" | "off" | "custom";

{
  "compaction": {
    "identifierPolicy": "strict",  // 默认：严格保留
    "identifierInstructions": "..."  // custom 模式的自定义指令
  }
}

"strict" 模式指令：

Preserve all opaque identifiers exactly as written (no shortening or reconstruction), including UUIDs, hashes, IDs, tokens, API keys, hostnames, IPs, ports, URLs, and file names.

效果：

❌ 错误总结（丢失标识符）：
"用户请求修改一个配置文件，助手完成了修改。"

✅ 正确总结（保留标识符）：
"用户请求修改 config/production.yaml 文件，助手将 database.host 从 localhost 改为 db-prod-01.example.com。"

3.7 Compaction 完整流程

┌────────────────────────────────────┐
│ 1. 检测上下文使用率 > 80%          │
├────────────────────────────────────┤
│ 2. 计算历史预算（50% 上下文）      │
├────────────────────────────────────┤
│ 3. 裁剪消息（pruneHistory）        │
│    - 分块丢弃旧消息                │
│    - 修复孤立 tool_result          │
├────────────────────────────────────┤
│ 4. 总结丢弃的消息                  │
│    - 分阶段总结（大批量消息）      │
│    - 保留标识符（strict 策略）     │
├────────────────────────────────────┤
│ 5. 插入总结消息                    │
│    role: "assistant"               │
│    content: "Summary: ..."         │
├────────────────────────────────────┤
│ 6. 发送压缩后的上下文给 LLM        │
└────────────────────────────────────┘

Token 预算示例：

上下文窗口：200K tokens
历史预算：100K tokens (50%)
系统提示：5K tokens
当前轮次：20K tokens
可用历史：75K tokens

压缩前：120K tokens（历史） → 超过预算
压缩后：70K tokens（历史 + 总结） → 符合预算 ✅

第四部分：模型回退机制

4.1 为什么需要模型回退？

常见模型失败场景：

错误类型	HTTP 状态码	示例
rate_limit	429	"Rate limit exceeded: 60 requests/minute"
billing	402	"Insufficient credits"
auth	401/403	"Invalid API key"
timeout	408/504	"Request timeout after 120s"
format	400	"Invalid tool schema"
model_not_found	404	"Unknown model: gpt-5"

解决方案：自动切换到备用模型

4.2 配置模型回退链

{
  "agents": {
    "defaults": {
      "model": {
        "primary": "anthropic/claude-opus-4-6",
        "fallbacks": [
          "anthropic/claude-sonnet-4-5",
          "openai/gpt-4o",
          "anthropic/claude-opus-3-5"
        ]
      }
    }
  }
}

4.3 核心函数：`runWithModelFallback()`

核心文件：src/agents/model-fallback.ts

export async function runWithModelFallback<T>(params: {
  cfg: OpenClawConfig | undefined;
  provider: string;
  model: string;
  agentDir?: string;
  fallbacksOverride?: string[];
  run: (provider: string, model: string) => Promise<T>;
  onError?: (attempt: {
    provider: string;
    model: string;
    error: unknown;
    attempt: number;
    total: number;
  }) => void;
}): Promise<{
  result: T;
  provider: string;
  model: string;
  attempts: FallbackAttempt[];
}>

算法流程：

// 1. 解析候选模型链
const candidates = resolveFallbackCandidates({
  cfg,
  provider: "anthropic",
  model: "claude-opus-4-6",
  fallbacksOverride
});
// candidates = [
//   { provider: "anthropic", model: "claude-opus-4-6" },
//   { provider: "anthropic", model: "claude-sonnet-4-5" },
//   { provider: "openai", model: "gpt-4o" },
//   { provider: "anthropic", model: "claude-opus-3-5" }
// ]

const attempts: FallbackAttempt[] = [];
let lastError: unknown;

// 2. 逐个尝试候选模型
for (let i = 0; i < candidates.length; i++) {
  const candidate = candidates[i];

  try {
    // 3. 尝试调用模型
    const result = await params.run(candidate.provider, candidate.model);

    // 成功 → 返回结果
    return {
      result,
      provider: candidate.provider,
      model: candidate.model,
      attempts
    };

  } catch (err) {
    // 4. 失败处理

    // 4a. 用户中止 → 立即抛出
    if (shouldRethrowAbort(err)) {
      throw err;
    }

    // 4b. 上下文溢出 → 立即抛出（回退无法解决）
    if (isLikelyContextOverflowError(err.message)) {
      throw err;
    }

    // 4c. 转换为标准化 FailoverError
    const normalized = coerceToFailoverError(err, {
      provider: candidate.provider,
      model: candidate.model
    });

    // 4d. 记录失败尝试
    attempts.push({
      provider: candidate.provider,
      model: candidate.model,
      error: normalized.message,
      reason: normalized.reason,  // "rate_limit", "billing", etc.
      status: normalized.status,
      code: normalized.code
    });

    // 4e. 通知外部错误处理器
    await params.onError?.({
      provider: candidate.provider,
      model: candidate.model,
      error: normalized,
      attempt: i + 1,
      total: candidates.length
    });

    lastError = normalized;
  }
}

// 5. 所有候选模型都失败 → 抛出汇总错误
throw new Error(
  `All models failed (${attempts.length}): ` +
  attempts.map(a => `${a.provider}/${a.model}: ${a.error} (${a.reason})`).join(" | ")
);

4.4 错误分类：FailoverError

核心文件：src/agents/failover-error.ts

export class FailoverError extends Error {
  readonly reason: FailoverReason;  // 错误类型
  readonly provider?: string;
  readonly model?: string;
  readonly profileId?: string;      // Auth profile ID
  readonly status?: number;         // HTTP 状态码
  readonly code?: string;           // 错误码（"ETIMEDOUT" 等）
}

export type FailoverReason =
  | "billing"           // 余额不足
  | "rate_limit"        // 速率限制
  | "auth"              // 认证失败（临时）
  | "auth_permanent"    // 认证失败（永久）
  | "timeout"           // 超时
  | "format"            // 格式错误
  | "model_not_found"   // 模型不存在
  | "session_expired"   // 会话过期
  | "unknown";

错误分类逻辑：

export function resolveFailoverReasonFromError(err: unknown): FailoverReason | null {
  // 1. HTTP 状态码判断
  const status = getStatusCode(err);
  if (status === 402) return "billing";
  if (status === 429) return "rate_limit";
  if (status === 401 || status === 403) {
    const msg = getErrorMessage(err);
    return isAuthPermanentErrorMessage(msg) ? "auth_permanent" : "auth";
  }
  if (status === 408 || status === 502 || status === 503 || status === 504) {
    return "timeout";
  }

  // 2. 错误码判断
  const code = getErrorCode(err);
  if (["ETIMEDOUT", "ECONNRESET", "ECONNABORTED"].includes(code)) {
    return "timeout";
  }

  // 3. 错误消息判断
  const message = getErrorMessage(err);
  if (/rate limit/i.test(message)) return "rate_limit";
  if (/insufficient.*credit/i.test(message)) return "billing";
  if (/invalid.*api.*key/i.test(message)) return "auth";

  return null;  // 未知错误
}

4.5 Auth Profile 轮换

OpenClaw 支持为每个 Provider 配置多个 Auth Profile（API Key），当一个 Key 失败时自动切换到下一个。

配置示例：

{
  "providers": {
    "anthropic": {
      "authProfiles": [
        { "id": "main", "apiKey": "sk-ant-xxx1" },
        { "id": "backup", "apiKey": "sk-ant-xxx2" },
        { "id": "fallback", "apiKey": "sk-ant-xxx3" }
      ],
      "authOrder": ["main", "backup", "fallback"]
    }
  }
}

轮换逻辑：

// 1. 尝试 "main" profile
try {
  await callLLM(provider, model, { apiKey: "sk-ant-xxx1" });
} catch (err) {
  if (err.reason === "rate_limit") {
    // 2. 标记 "main" 进入冷却期（cooldown）
    markAuthProfileFailure(authStore, "main", {
      reason: "rate_limit",
      cooldownMinutes: 5  // 5 分钟后重试
    });

    // 3. 切换到 "backup" profile
    await callLLM(provider, model, { apiKey: "sk-ant-xxx2" });
  }
}

冷却期管理：

export function isProfileInCooldown(
  authStore: AuthProfileStore,
  profileId: string
): boolean {
  const profile = authStore.profiles.get(profileId);
  if (!profile || !profile.cooldownUntil) {
    return false;
  }
  return Date.now() < profile.cooldownUntil;
}

4.6 回退决策逻辑

function resolveCooldownDecision(params: {
  candidate: ModelCandidate;
  isPrimary: boolean;            // 是否是第一个候选模型
  requestedModel: boolean;       // 是否是用户请求的模型
  hasFallbackCandidates: boolean;
  now: number;
  authStore: AuthProfileStore;
  profileIds: string[];
}): CooldownDecision {
  const allProfilesInCooldown = profileIds.every(id =>
    isProfileInCooldown(authStore, id)
  );

  if (!allProfilesInCooldown) {
    return { type: "attempt" };  // 有可用 profile → 尝试
  }

  // 所有 profile 都在冷却期
  const inferredReason = resolveProfilesUnavailableReason(authStore, profileIds);

  // 永久性错误 → 跳过
  if (["auth_permanent", "billing"].includes(inferredReason)) {
    return {
      type: "skip",
      reason: inferredReason,
      error: `Provider ${candidate.provider} has ${inferredReason} issue`
    };
  }

  // 主模型 + 用户明确请求 → 尝试探测（probe）
  if (isPrimary && requestedModel) {
    return {
      type: "attempt",
      reason: inferredReason,
      markProbe: true
    };
  }

  // 其他情况 → 跳过
  return {
    type: "skip",
    reason: inferredReason,
    error: `Provider ${candidate.provider} is in cooldown`
  };
}

4.7 回退示例

场景：Anthropic API 达到速率限制

请求：provider="anthropic", model="claude-opus-4-6"

尝试 1：anthropic/claude-opus-4-6
  错误：429 Rate Limit Exceeded
  原因：rate_limit
  → 标记 anthropic 所有 profiles 进入 5 分钟冷却
  → 继续回退

尝试 2：anthropic/claude-sonnet-4-5
  状态：跳过（Provider 在冷却期）
  → 继续回退

尝试 3：openai/gpt-4o
  状态：成功 ✅
  返回：{ result, provider: "openai", model: "gpt-4o", attempts: [...] }

通知用户：

⚠️ Primary model (anthropic/claude-opus-4-6) failed: Rate Limit Exceeded
✅ Switched to fallback: openai/gpt-4o

第五部分：工具准备与过滤

5.1 工具准备流程

核心文件：src/agents/pi-tools.ts

export function createOpenClawCodingTools(options?: {
  agentId?: string;
  sessionKey?: string;
  config?: OpenClawConfig;
  modelProvider?: string;
  modelId?: string;
  sandbox?: SandboxContext | null;
  workspaceDir?: string;
  messageProvider?: string;
  senderIsOwner?: boolean;
  // ... 其他参数
}): AnyAgentTool[]

准备步骤：

1. 创建基础工具集
   ├─ Pi Coding Agent 工具（read, write, edit, grep, find）
   ├─ OpenClaw 工具（exec, process, web_search, memory_search）
   └─ 渠道工具（message, sessions_spawn, subagents）

2. 沙箱包装（如果启用）
   ├─ 创建沙箱版本（sandboxedRead, sandboxedWrite, sandboxedEdit）
   └─ 配置文件系统桥接

3. 工具策略过滤（9 层）
   ├─ Profile 策略（minimal, coding, messaging, full）
   ├─ Provider Profile 策略（针对特定模型提供商）
   ├─ Global 策略（全局 allow/deny）
   ├─ Global Provider 策略
   ├─ Agent 策略（代理级别覆盖）
   ├─ Agent Provider 策略
   ├─ Group 策略（群组/渠道级别）
   ├─ Sandbox 策略
   └─ Subagent 策略（子代理限制）

4. 消息提供商策略
   └─ 移除不兼容工具（如 voice 渠道移除 tts）

5. 授权检查
   └─ Owner-only 工具过滤（如 gateway.restart）

6. Schema 规范化
   ├─ Anthropic 兼容性处理
   ├─ Gemini 兼容性处理（移除约束关键字）
   └─ OpenAI 兼容性处理

7. 钩子包装
   └─ beforeToolCall 钩子（循环检测等）

8. 中止信号包装
   └─ 支持取消正在执行的工具

5.2 工具策略过滤（9 层）

核心文件：src/agents/tool-policy-pipeline.ts

export function applyToolPolicyPipeline(params: {
  tools: AnyAgentTool[];
  steps: Array<{
    policy: ToolPolicy | undefined;
    label: string;
  }>;
}): AnyAgentTool[]

Layer 1-2: Profile 策略

{
  "tools": {
    "profile": "coding",  // minimal | coding | messaging | full
    "alsoAllow": ["web_search"]  // 额外允许的工具
  }
}

预定义 Profile：

const TOOL_PROFILES: Record<string, ToolPolicy> = {
  minimal: {
    allow: ["read", "write", "edit", "grep", "find", "ls"]
  },
  coding: {
    allow: [
      "read", "write", "edit", "apply_patch",
      "grep", "find", "ls",
      "exec", "process",
      "web_search", "web_fetch"
    ]
  },
  messaging: {
    allow: [
      "read", "write",
      "message", "sessions_list", "sessions_send",
      "cron", "web_search"
    ]
  },
  full: {
    allow: "*"  // 所有工具
  }
};

Layer 3-4: Global 策略

{
  "tools": {
    "allow": ["*"],  // 默认允许所有
    "deny": ["gateway"]  // 全局禁用
  }
}

Layer 5-6: Agent 策略

{
  "agents": {
    "list": [
      {
        "id": "code-reviewer",
        "tools": {
          "profile": "coding",
          "deny": ["web_search"]  // 代理级别禁用
        }
      }
    ]
  }
}

Layer 7: Group 策略

{
  "telegram": {
    "groups": [
      {
        "chatId": "-1001234567890",
        "tools": {
          "allow": ["read", "write", "grep"]  // 群组级别限制
        }
      }
    ]
  }
}

Layer 8: Sandbox 策略

{
  "sandbox": {
    "tools": {
      "deny": ["gateway", "exec"]  // 沙箱环境禁用
    }
  }
}

Layer 9: Subagent 策略

const subagentPolicy = resolveSubagentToolPolicy(
  config,
  subagentDepth  // 1, 2, 3...
);

// Depth 1: deny ["gateway", "sessions_spawn"]
// Depth 2+: deny ["gateway", "sessions_spawn", "cron", "message"]

5.3 策略过滤示例

配置：

{
  "tools": {
    "profile": "coding",
    "alsoAllow": ["memory_search"],
    "deny": ["apply_patch"]
  },
  "agents": {
    "list": [
      {
        "id": "safe-agent",
        "tools": {
          "deny": ["exec"]  // 额外禁用 exec
        }
      }
    ]
  }
}

过滤流程：

初始工具：[read, write, edit, apply_patch, exec, process, web_search, memory_search, ...]

1. Profile "coding"
   → 允许：[read, write, edit, apply_patch, exec, process, web_search]
   → 移除：[memory_search] ❌

2. alsoAllow ["memory_search"]
   → 添加：[memory_search] ✅

3. Global deny ["apply_patch"]
   → 移除：[apply_patch] ❌

4. Agent deny ["exec"]
   → 移除：[exec] ❌

最终工具：[read, write, edit, process, web_search, memory_search]

5.4 Owner-only 工具

某些工具仅允许授权用户（Owner）使用：

const OWNER_ONLY_TOOLS = [
  "gateway",      // 重启网关
  "cron",         // 定时任务
  "nodes"         // 节点管理
];

export function applyOwnerOnlyToolPolicy(
  tools: AnyAgentTool[],
  senderIsOwner: boolean
): AnyAgentTool[] {
  if (senderIsOwner) {
    return tools;  // Owner 可以使用所有工具
  }

  return tools.filter(tool =>
    !OWNER_ONLY_TOOLS.includes(tool.name)
  );
}

Owner 判断：

// 从配置读取授权用户列表
const ownerNumbers = config.agents.defaults.identity?.ownerNumbers ?? [];
// ownerNumbers = ["+1234567890", "+0987654321"]

const senderE164 = "+1234567890";  // 发送者手机号
const senderIsOwner = ownerNumbers.includes(senderE164);

5.5 沙箱工具

当代理运行在沙箱环境（Docker 容器）中时，文件操作需要通过桥接：

const sandbox: SandboxContext = {
  enabled: true,
  containerName: "openclaw-sandbox-abc123",
  workspaceDir: "/workspace",          // 宿主机路径
  containerWorkdir: "/sandbox/workspace",  // 容器内路径
  fsBridge: {
    read: async (path) => { /* 从宿主机读取 */ },
    write: async (path, content) => { /* 写入宿主机 */ }
  }
};

// 创建沙箱版本的工具
const sandboxedRead = createSandboxedReadTool({
  root: sandbox.workspaceDir,
  bridge: sandbox.fsBridge
});

const sandboxedWrite = createSandboxedWriteTool({
  root: sandbox.workspaceDir,
  bridge: sandbox.fsBridge
});

const sandboxedEdit = createSandboxedEditTool({
  root: sandbox.workspaceDir,
  bridge: sandbox.fsBridge
});

文件路径映射：

用户请求：read("src/index.ts")

沙箱环境：
  1. 代理在容器内执行：/sandbox/workspace/src/index.ts
  2. fsBridge 读取宿主机文件：/workspace/src/index.ts
  3. 返回内容给代理

第六部分：Agent、Tools、Skills、Memory 的集成

6.1 完整消息处理流程

用户消息到达
     ↓
┌─────────────────────────────────────┐
│ 1. Routing 系统                     │
│    - 根据消息来源解析 agentId        │
│    - 确定使用哪个代理处理             │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 2. Agent Scope 解析                 │
│    - resolveSessionAgentId()        │
│    - resolveAgentConfig()           │
│    - resolveAgentWorkspaceDir()     │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 3. Tools 准备                       │
│    - createOpenClawCodingTools()    │
│    - 9 层策略过滤                   │
│    - 沙箱包装（如果启用）            │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 4. Skills 加载                      │
│    - loadSkillEntries()             │
│    - filterSkillEntries()           │
│    - buildSkillsPrompt()            │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 5. System Prompt 构建               │
│    - buildAgentSystemPrompt()       │
│    - 注入工具列表                   │
│    - 注入技能提示                   │
│    - 注入项目上下文文件              │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 6. Compaction 检查                  │
│    - estimateMessagesTokens()       │
│    - 如果超过阈值 → 压缩             │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 7. Model Fallback 调用              │
│    - runWithModelFallback()         │
│    - 主模型失败 → 自动回退           │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 8. LLM 处理                         │
│    - 读取系统提示                   │
│    - 选择技能（如果适用）            │
│    - 调用工具                       │
│    - 生成响应                       │
└─────────────────────────────────────┘
     ↓
┌─────────────────────────────────────┐
│ 9. Memory 写入（如果需要）          │
│    - 更新 MEMORY.md                 │
│    - 创建 memory/<date>.md          │
│    - 索引新内容（vector + FTS）      │
└─────────────────────────────────────┘
     ↓
用户接收响应

6.2 配置继承与覆盖

Global Defaults
     ↓
Agent Overrides
     ↓
Session Overrides
     ↓
Tool Call Context

示例：

config.json:
  agents.defaults.model: "anthropic/claude-opus-4-6"
  agents.defaults.thinkingDefault: "low"
  agents.list[0].model: "anthropic/claude-sonnet-4-5"  ← 代理覆盖

最终生效（agent "code-reviewer"）:
  model: "anthropic/claude-sonnet-4-5"  ✅ 使用代理配置
  thinkingDefault: "low"                ✅ 继承 defaults

6.3 代理间隔离

每个代理拥有独立的：

资源	路径示例
工作目录	`~/.openclaw/workspace-<agentId>`
代理目录	`~/.openclaw/agents/<agentId>/agent`
会话历史	`~/.openclaw/agents/<agentId>/sessions/*.jsonl`
Memory 文件	`~/.openclaw/agents/<agentId>/agent/MEMORY.md`
Skills	`~/.openclaw/agents/<agentId>/agent/skills/`
Auth Profiles	`~/.openclaw/agents/<agentId>/auth-profiles.json`

示例：

~/.openclaw/
├── agents/
│   ├── default/
│   │   ├── agent/
│   │   │   ├── MEMORY.md
│   │   │   ├── memory/
│   │   │   └── skills/
│   │   ├── sessions/
│   │   │   ├── abc123.jsonl
│   │   │   └── def456.jsonl
│   │   └── auth-profiles.json
│   │
│   ├── code-reviewer/
│   │   ├── agent/
│   │   │   ├── MEMORY.md
│   │   │   └── skills/
│   │   ├── sessions/
│   │   └── auth-profiles.json
│   │
│   └── research/
│       ├── agent/
│       ├── sessions/
│       └── auth-profiles.json
│
├── workspace-default/          # 默认代理工作目录
├── workspace-code-reviewer/    # code-reviewer 代理工作目录
└── workspace-research/         # research 代理工作目录

第七部分：高级特性

7.1 子代理（Subagents）

代理可以通过 sessions_spawn 工具创建子代理：

// 用户请求
"Please spawn a sub-agent to analyze this large codebase."

// LLM 调用工具
sessions_spawn({
  agentId: "code-reviewer",
  runtime: "subagent",
  message: "Analyze the authentication module in src/auth/",
  thinking: "medium"
})

// 子代理创建
sessionKey: "agent:code-reviewer:subagent:abc123"
parentSession: "agent:default:xyz789"
depth: 1

子代理限制：

{
  "subagents": {
    "maxSpawnDepth": 2,           // 最多 2 层嵌套
    "maxChildrenPerAgent": 5,     // 每个代理最多 5 个子代理
    "maxConcurrent": 1,           // 同时运行 1 个
    "archiveAfterMinutes": 60,    // 60 分钟后自动归档
    "model": "anthropic/claude-sonnet-4-5",  // 子代理使用更便宜的模型
    "thinking": "off"             // 禁用思考模式
  }
}

子代理工具限制（Layer 9）：

// Depth 1
deny: ["gateway", "sessions_spawn"]

// Depth 2+
deny: ["gateway", "sessions_spawn", "cron", "message"]

7.2 Heartbeat（心跳检测）

代理可以定期主动运行检查任务：

{
  "heartbeat": {
    "every": "30m",           // 每 30 分钟
    "activeHours": {
      "start": "09:00",       // 仅在 9:00-18:00 运行
      "end": "18:00",
      "timezone": "user"
    },
    "model": "anthropic/claude-haiku-3-5",  // 使用便宜模型
    "session": "main",
    "target": "telegram",
    "to": "+1234567890",
    "prompt": "Check if there are any urgent tasks. If nothing needs attention, reply HEARTBEAT_OK.",
    "ackMaxChars": 30,
    "lightContext": true      // 仅加载 HEARTBEAT.md
  }
}

Heartbeat 流程：

每 30 分钟（在活跃时间内）
     ↓
创建轻量级上下文（仅 HEARTBEAT.md）
     ↓
调用 LLM（使用 Haiku 模型）
     ↓
LLM 读取 HEARTBEAT.md 并检查任务
     ↓
如果没有任务：
  响应："HEARTBEAT_OK"
  → 丢弃（不发送给用户）

如果有任务：
  响应："⚠️ Critical bug in production!"
  → 发送给用户

7.3 Context Pruning（上下文裁剪）

在 Compaction 之前，可以先裁剪工具结果：

{
  "contextPruning": {
    "mode": "cache-ttl",
    "ttl": "1h",                    // 1 小时后工具结果过期
    "keepLastAssistants": 3,        // 保留最后 3 次助手消息
    "softTrimRatio": 0.7,           // 70% 使用率时软裁剪
    "hardClearRatio": 0.85,         // 85% 使用率时硬清除
    "minPrunableToolChars": 1000,   // 最小 1000 字符才裁剪
    "tools": {
      "allow": ["read", "grep"],    // 仅裁剪这些工具的结果
      "deny": ["memory_search"]     // 不裁剪这些工具的结果
    },
    "softTrim": {
      "maxChars": 5000,             // 裁剪到 5000 字符
      "headChars": 2000,            // 保留前 2000 字符
      "tailChars": 2000             // 保留后 2000 字符
    },
    "hardClear": {
      "enabled": true,
      "placeholder": "[Tool result omitted due to context budget]"
    }
  }
}

裁剪示例：

工具结果（15000 字符）：
{
  "tool": "read",
  "result": "[15000 字符的文件内容]"
}

软裁剪后（5000 字符）：
{
  "tool": "read",
  "result": "[前 2000 字符]... [中间省略] ...[后 2000 字符]"
}

硬清除后（0 字符）：
{
  "tool": "read",
  "result": "[Tool result omitted due to context budget]"
}

7.4 Memory Flush（记忆刷新）

在 Compaction 之前，自动将重要信息写入 Memory：

{
  "compaction": {
    "memoryFlush": {
      "enabled": true,
      "softThresholdTokens": 10000,      // 距离压缩阈值 10K 时触发
      "forceFlushTranscriptBytes": "2mb",  // 会话文件超过 2MB 时强制触发
      "prompt": "Before we compress the conversation, save any important decisions, TODOs, or context to MEMORY.md.",
      "systemPrompt": "Focus on: project decisions, open questions, key identifiers, and action items."
    }
  }
}

Memory Flush 流程：

上下文使用率达到 70%（距离压缩阈值 10K）
     ↓
自动插入 Memory Flush 消息：
  role: "user"
  content: "Before we compress the conversation, save important info to MEMORY.md."
     ↓
LLM 调用 memory_search + write 工具
     ↓
写入重要信息到 MEMORY.md
     ↓
继续 Compaction 流程

总结

核心架构

Agents 系统是 OpenClaw 的中央协调器，通过以下 5 大子系统实现智能编排：

多代理配置管理 - 支持多租户、独立配置、工作目录隔离
系统提示构建 - 动态生成最优指令集（工具、技能、记忆、上下文）
Compaction - 自适应上下文压缩、分阶段总结、标识符保留
Model Fallback - 自动模型回退、错误分类、Auth Profile 轮换
工具准备 - 9 层策略过滤、沙箱包装、Owner 授权

关键数据流

配置文件 (config.json)
     ↓
多代理配置 (agent-scope.ts)
     ↓
工具准备 (pi-tools.ts) + 技能加载 (skills/)
     ↓
系统提示构建 (system-prompt.ts)
     ↓
Compaction 检查 (compaction.ts)
     ↓
模型回退调用 (model-fallback.ts)
     ↓
LLM 处理
     ↓
工具执行 + Memory 写入
     ↓
响应用户

核心文件清单

文件	功能	代码量
`agent-scope.ts`	代理作用域解析、配置合并	~280 行
`system-prompt.ts`	系统提示构建、动态注入	~700 行
`compaction.ts`	上下文压缩、分阶段总结	~450 行
`model-fallback.ts`	模型回退、错误处理	~570 行
`failover-error.ts`	错误分类、FailoverError	~260 行
`pi-tools.ts`	工具准备、策略过滤	~540 行
`tool-policy-pipeline.ts`	9 层策略管道	~200 行
`context.ts`	上下文窗口管理	~190 行

与其他系统的关系

Agents 系统
     ├─ 调用 → Tools 系统（工具注册、执行）
     ├─ 调用 → Skills 系统（技能加载、注入）
     ├─ 调用 → Memory 系统（搜索、写入）
     ├─ 被调用 ← Gateway 系统（消息路由）
     └─ 被调用 ← Routing 系统（代理选择）

设计优势

灵活的多代理架构 - 一个实例支持多个独立代理
智能上下文管理 - 自适应压缩 + Memory Flush
高可用模型回退 - 主模型失败自动切换备用
精细的工具控制 - 9 层策略过滤确保安全
动态系统提示 - 根据上下文生成最优指令

OpenClaw 的 Agents 系统通过这些设计，实现了高度可扩展、高可用、多租户的 AI 代理平台。

OpenClaw Agents 系统：多代理架构与智能编排的完整技术解析

概览