从调参到调水:OpenClaw深度技术指南——如何科学“饲养”你的AI小龙虾

4 阅读10分钟

当养虾从塘口延伸到命令行,我们正在见证一场基础设施革命

第一章:OpenClaw架构解析——这只“龙虾”的解剖学

1.1 核心架构:不只是另一个ChatBot包装器

架构总览

OpenClaw的核心设计理念是“持久性Agent系统”,与传统的会话式AI有本质区别。其架构采用多层记忆持久化设计:

┌─────────────────────────────────────────────┐
│            Agent运行层 (Claw Runtime)        │
├─────────────────────────────────────────────┤
│ 技能调度器 │ 记忆管理器 │ 工具执行器 │ 通信网关  │
├─────────────────────────────────────────────┤
│          记忆持久化层 (Memory Stack)         │
│  • 短期记忆: 当前会话上下文 (Max 128K tokens)  │
│  • 中期记忆: MEMORY.md (结构化记忆存储)        │
│  • 长期记忆: 向量数据库 (Chroma/Weaviate)     │
│  • 日志系统: 按日期组织的Markdown日志文件     │
├─────────────────────────────────────────────┤
│          技能生态系统 (Skill Registry)        │
│  • 核心技能: 文件操作、网络请求、系统调用     │
│  • 社区技能: GitHub仓库中的300+个技能包       │
│  • 自定义技能: 用户开发的TypeScript/JS模块    │
└─────────────────────────────────────────────┘

技术栈亮点

  • 运行时环境:Node.js 22+,ES模块标准
  • 进程管理:通过Gateway守护进程实现7x24小时运行
  • 记忆系统:SQLite + OpenAI兼容的Embeddings API
  • 技能架构:基于扩展的Web Worker模型,安全隔离执行

1.2 配置文件深度解析:你的“龙虾”DNA

核心配置文件~/.openclaw/config.yaml详解

# OpenClaw 主配置文件
version: '2026.1'
claw:
  name: "code_assistant"  # Agent身份标识
  model_provider: "openai"  # 支持: openai, anthropic, local(llama.cpp)
  model: "gpt-4o-mini"  # 默认模型,推荐用于日常任务
  
  # 记忆系统配置
  memory:
    type: "hybrid"  # hybrid|vector|file
    vector_store: 
      provider: "chroma"  # chroma|weaviate|qdrant
      path: "~/.openclaw/vector_db"
    file_memory:
      path: "~/.openclaw/memories"
      auto_summarize: true
      summary_interval: 1000  # 每1000条记忆自动总结
    
  # 技能系统配置
  skills:
    auto_discover: true
    trusted_registries:
      - "https://registry.openclaw.ai"
      - "https://github.com/openclaw-community"
    security_level: "strict"  # strict|moderate|permissive
  
  # 执行环境配置
  execution:
    timeout: 300000  # 单技能执行超时(ms)
    max_concurrent: 3  # 最大并发技能数
    sandbox: 
      enabled: true
      level: "isolated"  # isolated|restricted|trusted
    
  # 开发者选项
  developer:
    debug: false
    log_level: "info"  # debug|info|warn|error
    telemetry: false  # 是否发送匿名使用数据

环境变量敏感配置(存储在.env文件中):

# API密钥配置
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...
LOCAL_LLM_URL=http://localhost:8080  # 本地模型端点

# 高级功能密钥
TAVILY_API_KEY=...  # 联网搜索
SERPAPI_KEY=...     # 搜索引擎结果
GITHUB_TOKEN=...    # GitHub集成

# 存储配置
OPENCLAW_DATA_DIR=~/.openclaw
OPENCLAW_CACHE_DIR=~/.cache/openclaw

1.3 进程管理与监控:让“龙虾”稳定运行

Gateway守护进程详解

OpenClaw的核心是Gateway守护进程,它负责:

  1. Agent生命周期管理
  2. 技能的热加载和卸载
  3. 记忆系统的持久化同步
  4. 外部通信接口的维护

启动Gateway

# 安装Gateway服务
openclaw gateway --install

# 启动Gateway(系统服务方式)
sudo systemctl start openclaw-gateway

# 开发模式启动(带热重载)
openclaw gateway --dev --watch

进程监控与日志

# 查看Gateway状态
openclaw gateway --status

# 实时日志查看
tail -f ~/.openclaw/logs/gateway.log

# 详细的运行指标
openclaw metrics --format=json
# 返回示例:
# {
#   "uptime": 1234567,
#   "active_skills": 5,
#   "memory_usage_mb": 245,
#   "total_requests": 1234,
#   "avg_response_time_ms": 345
# }

健康检查端点

Gateway提供RESTful健康检查接口:

curl http://localhost:7456/health
# 返回:{"status":"healthy","version":"2026.1.2","agents":3}

# 详细系统状态
curl http://localhost:7456/debug/pprof/goroutine?debug=2

第二章:技能开发实战——打造专属工具链

2.1 技能架构深度解析

技能模块标准结构

my-awesome-skill/
├── package.json          # 技能元数据
├── src/
│   ├── index.ts         # 主入口文件
│   ├── types.ts         # TypeScript类型定义
│   ├── handlers/        # 处理器函数
│   └── utils/           # 工具函数
├── config/
│   └── skill.yaml       # 技能配置
├── tests/               # 单元测试
└── README.md           # 使用文档

package.json关键配置

{
  "name": "@yourname/skill-git-helper",
  "version": "1.0.0",
  "description": "Git操作增强技能",
  "type": "module",
  "main": "dist/index.js",
  "claw": {
    "type": "action",
    "permissions": [
      "filesystem:read",
      "filesystem:write",
      "network:internal",
      "process:execute"
    ],
    "triggers": [
      {
        "type": "command",
        "command": "git.*",
        "description": "处理git相关命令"
      },
      {
        "type": "cron",
        "schedule": "0 */2 * * *",
        "description": "每2小时自动提交"
      }
    ],
    "input_schema": {
      "type": "object",
      "properties": {
        "operation": {
          "type": "string",
          "enum": ["commit", "push", "branch", "status"]
        },
        "message": {
          "type": "string",
          "description": "提交信息"
        }
      }
    }
  }
}

2.2 开发你的第一个生产级技能

示例:智能Git助手技能开发

// src/index.ts
import { Skill, ActionContext } from '@openclaw/core';
import { exec } from 'child_process';
import { promisify } from 'util';
import { analyzeCommitMessage, generateBranchName } from './ai-helper';

const execute = promisify(exec);

export default new Skill({
  name: 'git-ai-helper',
  version: '1.0.0',
  
  // 技能描述,用于Agent自动发现
  description: 'AI增强的Git操作助手,支持智能提交、分支管理等',
  
  // 定义技能可执行的操作
  actions: {
    // 智能提交:自动生成合适的提交信息
    async smartCommit(ctx: ActionContext, params: { 
      stageAll?: boolean;
      emoji?: boolean;
    } = {}) {
      const { stageAll = true, emoji = true } = params;
      
      try {
        // 1. 获取git状态
        const { stdout: statusOutput } = await execute('git status --porcelain');
        const changes = statusOutput.trim().split('\n').filter(line => line);
        
        if (changes.length === 0) {
          return { success: false, message: '没有可提交的更改' };
        }
        
        // 2. 暂存文件
        if (stageAll) {
          await execute('git add .');
        } else {
          // 让AI选择要暂存的文件
          const filesToStage = await ctx.ask(
            '以下文件有更改,请选择要提交的文件:\n' + 
            changes.map(c => `- ${c.substring(3)}`).join('\n')
          );
          // ... 处理选择逻辑
        }
        
        // 3. 使用AI生成提交信息
        const diff = await this.getStagedDiff();
        const commitMessage = await analyzeCommitMessage(diff, { emoji });
        
        // 4. 执行提交
        await execute(`git commit -m "${commitMessage}"`);
        
        // 5. 记录到记忆系统
        await ctx.memory.add({
          type: 'git_commit',
          message: commitMessage,
          timestamp: new Date().toISOString(),
          files: changes.map(c => c.substring(3))
        });
        
        return { 
          success: true, 
          message: `提交成功: ${commitMessage}`,
          commitMessage 
        };
        
      } catch (error) {
        ctx.logger.error('Git提交失败:', error);
        return { 
          success: false, 
          error: error.message,
          suggestion: '请检查git配置和网络连接'
        };
      }
    },
    
    // 自动分支管理
    async createFeatureBranch(ctx: ActionContext, params: { 
      issue?: string;
      description: string;
    }) {
      const { issue, description } = params;
      
      // 使用AI生成有意义的分支名
      const branchName = await generateBranchName(description, { issue });
      
      // 创建并切换到新分支
      await execute(`git checkout -b ${branchName}`);
      
      // 设置上游分支(如果需要)
      if (await this.hasRemote()) {
        await execute(`git push -u origin ${branchName}`);
      }
      
      return {
        success: true,
        branch: branchName,
        commands: [
          `git checkout -b ${branchName}`,
          `git push -u origin ${branchName}`
        ]
      };
    },
    
    // PR自动生成
    async generatePullRequest(ctx: ActionContext, params: {
      title?: string;
      template?: string;
    }) {
      // 获取当前分支和diff
      const { stdout: currentBranch } = await execute('git branch --show-current');
      const { stdout: diffMain } = await execute('git diff main HEAD --stat');
      
      // 使用AI生成PR描述
      const prDescription = await ctx.llm.generate(`
        基于以下变更生成PR描述:
        分支: ${currentBranch.trim()}
        变更统计: ${diffMain}
        
        请生成包含以下部分的PR描述:
        1. 变更概述
        2. 修改内容
        3. 测试建议
        4. 相关Issue
        
        使用Markdown格式。
      `);
      
      // 这里可以集成GitHub API自动创建PR
      // const pr = await githubApi.createPR(...);
      
      return {
        success: true,
        branch: currentBranch.trim(),
        description: prDescription,
        // prUrl: pr.html_url
      };
    }
  },
  
  // 定时任务
  schedules: {
    // 每天自动拉取最新代码
    autoPull: {
      cron: '0 9,18 * * *',  // 每天9点和18点
      async handler(ctx) {
        try {
          await execute('git pull --rebase');
          ctx.logger.info('自动拉取代码成功');
        } catch (error) {
          ctx.logger.warn('自动拉取失败,可能有冲突需要手动解决');
        }
      }
    },
    
    // 每周清理过时分支
    cleanupBranches: {
      cron: '0 0 * * 0',  // 每周日0点
      async handler(ctx) {
        const { stdout: branches } = await execute(
          'git branch --merged main | grep -v "main"'
        );
        
        const toDelete = branches.trim().split('\n').filter(b => b);
        
        for (const branch of toDelete) {
          const branchName = branch.trim().replace('* ', '');
          if (branchName) {
            await execute(`git branch -d ${branchName}`);
            ctx.logger.info(`已删除已合并分支: ${branchName}`);
          }
        }
      }
    }
  },
  
  // 辅助方法
  methods: {
    async getStagedDiff(): Promise<string> {
      const { stdout } = await execute('git diff --cached --no-color');
      return stdout;
    },
    
    async hasRemote(): Promise<boolean> {
      try {
        await execute('git remote get-url origin');
        return true;
      } catch {
        return false;
      }
    }
  }
});

技能测试套件

// tests/git-helper.test.ts
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { mock } from 'node:test';
import GitSkill from '../src/index';

describe('Git AI Helper Skill', () => {
  let skill: Skill;
  let mockContext: any;
  
  beforeEach(() => {
    skill = new GitSkill();
    mockContext = {
      memory: {
        add: mock.fn(),
      },
      logger: {
        info: mock.fn(),
        error: mock.fn(),
      },
      llm: {
        generate: mock.fn(async (prompt) => 'AI生成的提交信息'),
      },
    };
  });
  
  describe('smartCommit', () => {
    it('应该成功提交更改', async () => {
      // 模拟git命令
      const executeMock = mock.method(require('child_process'), 'exec');
      executeMock.mock.mockImplementation((cmd) => {
        if (cmd.includes('status')) {
          return Promise.resolve({ stdout: 'M src/index.ts\nA src/utils.ts' });
        }
        if (cmd.includes('add')) {
          return Promise.resolve({ stdout: '' });
        }
        if (cmd.includes('commit')) {
          return Promise.resolve({ stdout: '' });
        }
      });
      
      const result = await skill.actions.smartCommit(mockContext);
      
      expect(result.success).toBe(true);
      expect(mockContext.memory.add).toHaveBeenCalled();
    });
    
    it('没有更改时应返回适当信息', async () => {
      const executeMock = mock.method(require('child_process'), 'exec');
      executeMock.mock.mockResolvedValue({ stdout: '' });
      
      const result = await skill.actions.smartCommit(mockContext);
      
      expect(result.success).toBe(false);
      expect(result.message).toContain('没有可提交的更改');
    });
  });
});

2.3 技能发布与分发

发布到OpenClaw技能市场

  1. 准备发布
# 构建技能
npm run build

# 运行测试
npm test

# 生成技能清单
openclaw skill pack --output dist/skill.tar.gz
  1. 发布到注册表
# 登录技能注册表
openclaw registry login --token YOUR_PUBLISH_TOKEN

# 发布技能
openclaw registry publish dist/skill.tar.gz \
  --visibility=public \
  --category="developer-tools"
  1. 版本管理

    OpenClaw使用语义化版本控制,技能更新时需遵循:

    • 主版本号:不兼容的API变更
    • 次版本号:向下兼容的功能性新增
    • 修订号:向下兼容的问题修正

第三章:多Agent系统架构——构建你的“龙虾农场”

3.1 多Agent架构设计模式

基于角色的Agent分解模式

# agents/orchestrator/config.yaml
agents:
  # 代码专家Agent
  code_expert:
    soul: |
      你是资深全栈工程师,擅长TypeScript、Python、Go。
      你的职责:
      1. 代码编写、审查、重构
      2. 技术架构设计
      3. 性能优化
      4. 代码质量保证
      
      工作原则:
      - 优先使用TypeScript
      - 遵循SOLID原则
      - 编写完整的单元测试
      - 文档驱动开发
    model: "gpt-4"
    skills:
      - "git-helper"
      - "code-reviewer"
      - "test-generator"
      - "docker-builder"
    workspace: "/workspace/code"
    
  # 系统运维Agent
  sysops:
    soul: |
      你是系统运维专家,专注于:
      1. 服务器部署与维护
      2. 容器化与编排
      3. 监控与告警
      4. 安全加固
      
      工作原则:
      - 最小权限原则
      - 基础设施即代码
      - 自动化一切
      - 安全第一
    model: "claude-3-opus"
    skills:
      - "docker-manager"
      - "k8s-operator"
      - "monitoring-setup"
      - "security-scanner"
    workspace: "/workspace/infra"
    
  # 项目管理Agent
  project_manager:
    soul: |
      你是敏捷项目经理,负责:
      1. 任务分解与分配
      2. 进度跟踪
      3. 风险管理
      4. 团队协调
      
      工作原则:
      - 透明沟通
      - 数据驱动决策
      - 持续改进
      - 价值优先
    model: "gpt-4o"
    skills:
      - "task-manager"
      - "jira-connector"
      - "meeting-scheduler"
      - "report-generator"
    workspace: "/workspace/projects"
    
  # 协调者Agent
  coordinator:
    soul: |
      你是团队协调者,负责:
      1. 接收外部请求
      2. 任务路由
      3. 结果整合
      4. 质量检查
    model: "gpt-4o"
    skills:
      - "message-router"
      - "quality-checker"
      - "result-aggregator"

3.2 Agent间通信机制

基于事件的通信系统

// src/event-bus.ts
import { EventEmitter } from 'events';
import { v4 as uuidv4 } from 'uuid';

export interface AgentMessage {
  id: string;
  from: string;
  to: string | string[];
  type: 'request' | 'response' | 'broadcast' | 'error';
  payload: any;
  timestamp: Date;
  correlationId?: string;
  requiresResponse?: boolean;
  timeout?: number;
}

export class AgentEventBus extends EventEmitter {
  private agents: Map<string, AgentConnection> = new Map();
  private pendingResponses: Map<string, Function> = new Map();
  
  // 注册Agent
  registerAgent(agentId: string, handler: (msg: AgentMessage) => Promise<any>) {
    this.agents.set(agentId, { id: agentId, handler });
  }
  
  // 发送消息
  async sendMessage(message: AgentMessage): Promise<AgentMessage | null> {
    const messageId = uuidv4();
    const fullMessage = {
      ...message,
      id: messageId,
      timestamp: new Date(),
    };
    
    // 广播消息
    if (message.to === 'broadcast') {
      const promises = Array.from(this.agents.values())
        .filter(agent => agent.id !== message.from)
        .map(agent => agent.handler(fullMessage).catch(err => ({
          id: uuidv4(),
          from: agent.id,
          to: message.from,
          type: 'error',
          payload: { error: err.message },
          timestamp: new Date(),
        })));
      
      const results = await Promise.allSettled(promises);
      return {
        id: uuidv4(),
        from: 'system',
        to: message.from,
        type: 'response',
        payload: results,
        timestamp: new Date(),
      };
    }
    
    // 单播消息
    const agent = this.agents.get(message.to as string);
    if (!agent) {
      throw new Error(`Agent ${message.to} not found`);
    }
    
    // 如果需要响应,设置超时机制
    if (message.requiresResponse) {
      return new Promise((resolve, reject) => {
        const timeoutId = setTimeout(() => {
          this.pendingResponses.delete(messageId);
          reject(new Error(`Response timeout for message ${messageId}`));
        }, message.timeout || 30000);
        
        this.pendingResponses.set(messageId, (response) => {
          clearTimeout(timeoutId);
          resolve(response);
        });
        
        agent.handler(fullMessage).then(resolve).catch(reject);
      });
    }
    
    // 不需要响应,直接发送
    return agent.handler(fullMessage);
  }
  
  // 请求-响应模式
  async request<T = any>(
    from: string,
    to: string,
    action: string,
    params: any = {},
    options: { timeout?: number } = {}
  ): Promise<T> {
    const message: AgentMessage = {
      id: uuidv4(),
      from,
      to,
      type: 'request',
      payload: { action, params },
      requiresResponse: true,
      timeout: options.timeout || 30000,
    };
    
    const response = await this.sendMessage(message);
    if (response?.type === 'error') {
      throw new Error(response.payload.error);
    }
    
    return response?.payload as T;
  }
}

// 使用示例
const bus = new AgentEventBus();

// Agent 1: 代码专家
bus.registerAgent('code_expert', async (msg) => {
  if (msg.payload.action === 'reviewCode') {
    const code = msg.payload.params.code;
    const issues = await reviewCode(code);
    return {
      id: uuidv4(),
      from: 'code_expert',
      to: msg.from,
      type: 'response',
      payload: { issues, suggestions: [] },
    };
  }
});

// Agent 2: 系统运维
bus.registerAgent('sysops', async (msg) => {
  if (msg.payload.action === 'deploy') {
    const result = await deployToK8s(msg.payload.params.service);
    return {
      id: uuidv4(),
      from: 'sysops',
      to: msg.from,
      type: 'response',
      payload: result,
    };
  }
});

// 协调者Agent发送任务
async function handleDeployRequest(serviceConfig) {
  // 1. 请求代码专家审查代码
  const reviewResult = await bus.request(
    'coordinator',
    'code_expert',
    'reviewCode',
    { code: serviceConfig.code }
  );
  
  if (reviewResult.issues.length > 0) {
    return { status: 'failed', issues: reviewResult.issues };
  }
  
  // 2. 请求系统运维部署
  const deployResult = await bus.request(
    'coordinator',
    'sysops',
    'deploy',
    { service: serviceConfig }
  );
  
  // 3. 整合结果
  return {
    status: 'success',
    codeReview: reviewResult,
    deployment: deployResult,
  };
}

3.3 负载均衡与故障转移

智能路由策略

// src/router.ts
export class IntelligentRouter {
  private agentStats: Map<string, AgentStatistics> = new Map();
  private skillMap: Map<string, string[]> = new Map(); // 技能 -> Agent列表
  
  // 根据多种因素选择最佳Agent
  async selectBestAgent(
    requiredSkills: string[],
    context: RoutingContext
  ): Promise<string | null> {
    
    const candidateAgents = this.findAgentsWithSkills(requiredSkills);
    
    if (candidateAgents.length === 0) {
      return null;
    }
    
    if (candidateAgents.length === 1) {
      return candidateAgents[0];
    }
    
    // 评分算法
    const scoredAgents = await Promise.all(
      candidateAgents.map(async agentId => ({
        agentId,
        score: await this.calculateAgentScore(agentId, context),
      }))
    );
    
    // 选择最高分的Agent
    scoredAgents.sort((a, b) => b.score - a.score);
    return scoredAgents[0].agentId;
  }
  
  private async calculateAgentScore(
    agentId: string,
    context: RoutingContext
  ): Promise<number> {
    const stats = this.agentStats.get(agentId) || this.getDefaultStats();
    const agentConfig = this.getAgentConfig(agentId);
    
    let score = 100; // 基础分
    
    // 1. 负载因子 (0-30分)
    const loadFactor = 1 - Math.min(stats.currentLoad / stats.maxCapacity, 0.9);
    score += loadFactor * 30;
    
    // 2. 响应时间因子 (0-25分)
    const avgResponseTime = stats.avgResponseTime || 1000;
    const responseFactor = Math.max(0, 1 - avgResponseTime / 5000);
    score += responseFactor * 25;
    
    // 3. 成功率因子 (0-20分)
    const successRate = stats.successRate || 0.95;
    score += successRate * 20;
    
    // 4. 技能匹配度 (0-15分)
    const skillMatch = this.calculateSkillMatch(agentId, context.requiredSkills);
    score += skillMatch * 15;
    
    // 5. 成本因子 (0-10分)
    const costFactor = 1 - (agentConfig.costPerRequest || 0) / 0.1;
    score += Math.max(0, costFactor) * 10;
    
    // 6. 专业化加分 (专业Agent在处理专业任务时加分)
    if (agentConfig.specialization === context.taskType) {
      score += 20;
    }
    
    return score;
  }
  
  // 更新Agent统计信息
  updateAgentStats(agentId: string, result: TaskResult) {
    const stats = this.agentStats.get(agentId) || this.getDefaultStats();
    
    stats.totalRequests++;
    stats.totalTime += result.duration;
    stats.avgResponseTime = stats.totalTime / stats.totalRequests;
    
    if (result.success) {
      stats.successfulRequests++;
    } else {
      stats.failedRequests++;
    }
    
    stats.successRate = stats.successfulRequests / stats.totalRequests;
    stats.currentLoad = result.currentLoad;
    
    this.agentStats.set(agentId, stats);
  }
}

第四章:高级主题与优化策略

4.1 性能优化实战

记忆系统优化

// 记忆系统优化配置
const memoryConfig = {
  // 分级存储策略
  storage: {
    levels: [
      {
        name: 'hot',
        maxSize: 1000,  // 1000条记录
        ttl: 3600000,   // 1小时
        storage: 'memory',
      },
      {
        name: 'warm', 
        maxSize: 10000, // 10000条记录
        ttl: 86400000,  // 24小时
        storage: 'sqlite',
        index: ['timestamp', 'type', 'tags'],
      },
      {
        name: 'cold',
        maxSize: 100000, // 100000条记录
        ttl: 604800000,  // 7
        storage: 'vector',
        compression: true,
      },
    ],
  },
  
  // 智能记忆管理
  management: {
    // 自动总结策略
    autoSummarize: {
      enabled: true,
      threshold: 50,  // 每50条相关记忆触发总结
      model: 'gpt-4o-mini',
      maxTokens: 500,
    },
    
    // 记忆重要性评分
    importanceScoring: {
      factors: [
        { field: 'accessCount', weight: 0.3 },
        { field: 'recency', weight: 0.25 },
        { field: 'relevanceScore', weight: 0.25 },
        { field: 'userFeedback', weight: 0.2 },
      ],
      
      // 基于使用模式的自适应权重
      adaptiveWeights: true,
      learningRate: 0.1,
    },
    
    // 缓存优化
    caching: {
      enabled: true,
      strategy: 'lru',  // LRU缓存淘汰
      maxSize: 100,
      preload: ['frequent', 'recent'],
    },
  },
  
  // 向量搜索优化
  vectorSearch: {
    indexType: 'hnsw',  // Hierarchical Navigable Small World
    m: 16,  // 每个节点的连接数
    efConstruction: 200,  // 构建时的动态列表大小
    efSearch: 100,  // 搜索时的动态列表大小
    
    // 量化压缩
    quantization: {
      enabled: true,
      type: 'product',  // Product Quantization
      nbits: 8,  // 每个子向量的比特数
      nsub: 16,  // 子向量数量
    },
    
    // 过滤优化
    filters: {
      metadata: true,
      timeRange: true,
      importanceThreshold: 0.7,
    },
  },
};

大模型调用优化

class OptimizedLLMClient {
  private cache = new Map<string, { response: any; timestamp: number }>();
  private requestQueue: Array<{ 
    prompt: string; 
    resolve: Function; 
    reject: Function;
  }> = [];
  private isProcessing = false;
  
  constructor(
    private model: string,
    private options: {
      batchSize: number;
      maxConcurrent: number;
      cacheTtl: number;
      timeout: number;
    }
  ) {}
  
  // 批处理请求
  async generateBatch(
    prompts: string[],
    options?: any
  ): Promise<string[]> {
    
    // 1. 检查缓存
    const cacheResults: (string | null)[] = [];
    const toFetch: { prompt: string; index: number }[] = [];
    
    prompts.forEach((prompt, index) => {
      const cacheKey = this.getCacheKey(prompt, options);
      const cached = this.cache.get(cacheKey);
      
      if (cached && Date.now() - cached.timestamp < this.options.cacheTtl) {
        cacheResults[index] = cached.response;
      } else {
        cacheResults[index] = null;
        toFetch.push({ prompt, index });
      }
    });
    
    if (toFetch.length === 0) {
      return cacheResults as string[];
    }
    
    // 2. 批处理请求
    const batchResults = await this.processBatch(
      toFetch.map(item => item.prompt),
      options
    );
    
    // 3. 合并结果
    toFetch.forEach((item, i) => {
      cacheResults[item.index] = batchResults[i];
      const cacheKey = this.getCacheKey(item.prompt, options);
      this.cache.set(cacheKey, {
        response: batchResults[i],
        timestamp: Date.now(),
      });
    });
    
    return cacheResults as string[];
  }
  
  private async processBatch(
    prompts: string[],
    options?: any
  ): Promise<string[]> {
    // 动态批处理策略
    const batchSize = this.calculateOptimalBatchSize(prompts);
    const batches: string[][] = [];
    
    for (let i = 0; i < prompts.length; i += batchSize) {
      batches.push(prompts.slice(i, i + batchSize));
    }
    
    const results: string[] = [];
    
    // 并发处理批次
    const batchPromises = batches.map(async (batch, batchIndex) => {
      try {
        const batchResults = await this.callLLMWithRetry({
          model: this.model,
          messages: batch.map(prompt => ({
            role: 'user' as const,
            content: prompt,
          })),
          ...options,
        });
        
        // 将结果放回正确位置
        batchResults.forEach((result: string, index: number) => {
          const originalIndex = batchIndex * batchSize + index;
          results[originalIndex] = result;
        });
        
      } catch (error) {
        // 批次失败,回退到单个请求
        console.warn(`批次 ${batchIndex} 失败,回退到单个请求`);
        
        for (let i = 0; i < batch.length; i++) {
          try {
            const result = await this.callLLMWithRetry({
              model: this.model,
              messages: [{ role: 'user' as const, content: batch[i] }],
              ...options,
            });
            const originalIndex = batchIndex * batchSize + i;
            results[originalIndex] = result;
          } catch (singleError) {
            // 单个请求也失败,使用后备方案
            results[batchIndex * batchSize + i] = this.getFallbackResponse(batch[i]);
          }
        }
      }
    });
    
    await Promise.allSettled(batchPromises);
    return results;
  }
  
  private calculateOptimalBatchSize(prompts: string[]): number {
    const avgLength = prompts.reduce((sum, p) => sum + p.length, 0) / prompts.length;
    
    // 根据平均长度动态调整批次大小
    if (avgLength < 100) return 10;
    if (avgLength < 500) return 5;
    if (avgLength < 1000) return 3;
    return 1;
  }
  
  private async callLLMWithRetry(
    request: any,
    retries = 3
  ): Promise<any> {
    for (let i = 0; i < retries; i++) {
      try {
        const controller = new AbortController();
        const timeoutId = setTimeout(
          () => controller.abort(),
          this.options.timeout
        );
        
        const response = await fetch('https://api.openai.com/v1/chat/completions', {
          method: 'POST',
          headers: {
            'Content-Type': 'application/json',
            'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
          },
          body: JSON.stringify(request),
          signal: controller.signal,
        });
        
        clearTimeout(timeoutId);
        
        if (!response.ok) {
          if (response.status === 429 && i < retries - 1) {
            // 速率限制,指数退避
            const delay = Math.min(1000 * Math.pow(2, i) + Math.random() * 1000, 10000);
            await new Promise(resolve => setTimeout(resolve, delay));
            continue;
          }
          throw new Error(`HTTP ${response.status}: ${response.statusText}`);
        }
        
        const data = await response.json();
        return data.choices[0].message.content;
        
      } catch (error) {
        if (i === retries - 1) throw error;
        
        // 其他错误,线性退避
        await new Promise(resolve => 
          setTimeout(resolve, 1000 * (i + 1))
        );
      }
    }
    
    throw new Error('All retries failed');
  }
}

4.2 安全加固策略

技能沙箱与权限控制

// src/sandbox.ts
import { VM, VMScript } from 'vm2';
import { createHash } from 'crypto';

export class SkillSandbox {
  private vm: VM;
  private permissions: Set<string>;
  private callHistory: Array<{ action: string; timestamp: Date }> = [];
  private rateLimits: Map<string, { count: number; resetAt: Date }> = new Map();
  
  constructor(
    skillId: string,
    permissions: string[] = [],
    options: SandboxOptions = {}
  ) {
    this.permissions = new Set(permissions);
    
    this.vm = new VM({
      timeout: options.timeout || 5000,
      sandbox: this.createSandboxObject(),
      eval: false,
      wasm: false,
      fixAsync: true,
      
      // 严格限制
      ...options.restrictions,
    });
  }
  
  private createSandboxObject(): any {
    const sandbox: any = {};
    
    // 基础工具函数(安全版本)
    sandbox.console = {
      log: (...args: any[]) => this.log('info', args),
      error: (...args: any[]) => this.log('error', args),
      warn: (...args: any[]) => this.log('warn', args),
    };
    
    // 安全的fetch
    sandbox.fetch = async (url: string, init?: RequestInit) => {
      this.checkPermission('network:external');
      this.checkRateLimit('fetch');
      
      const parsedUrl = new URL(url);
      
      // 检查允许的域名
      if (!this.isAllowedDomain(parsedUrl.hostname)) {
        throw new Error(`Domain ${parsedUrl.hostname} is not allowed`);
      }
      
      // 添加安全头
      const safeInit = {
        ...init,
        headers: {
          ...init?.headers,
          'User-Agent': 'OpenClaw-Skill/1.0',
        },
      };
      
      return global.fetch(url, safeInit);
    };
    
    // 安全的文件系统访问
    sandbox.fs = {
      readFile: async (path: string, encoding = 'utf-8') => {
        this.checkPermission('filesystem:read');
        this.validatePath(path, 'read');
        
        const { promises } = require('fs');
        return promises.readFile(path, encoding);
      },
      
      writeFile: async (path: string, data: any) => {
        this.checkPermission('filesystem:write');
        this.validatePath(path, 'write');
        
        const { promises } = require('fs');
        return promises.writeFile(path, data);
      },
      
      readdir: async (path: string) => {
        this.checkPermission('filesystem:read');
        this.validatePath(path, 'read');
        
        const { promises } = require('fs');
        return promises.readdir(path);
      },
      
      stat: async (path: string) => {
        this.checkPermission('filesystem:read');
        this.validatePath(path, 'read');
        
        const { promises } = require('fs');
        return promises.stat(path);
      },
    };
    
    // 安全的子进程执行
    sandbox.exec = async (command: string, options?: any) => {
      this.checkPermission('process:execute');
      this.validateCommand(command);
      
      const { exec } = require('child_process');
      const { promisify } = require('util');
      const execAsync = promisify(exec);
      
      return execAsync(command, {
        timeout: 30000,
        cwd: this.options.workspace,
        ...options,
      });
    };
    
    return sandbox;
  }
  
  private checkPermission(permission: string): void {
    if (!this.permissions.has(permission)) {
      throw new Error(`Permission denied: ${permission}`);
    }
  }
  
  private checkRateLimit(action: string): void {
    const now = new Date();
    const limit = this.rateLimits.get(action);
    
    if (limit) {
      if (now < limit.resetAt) {
        if (limit.count >= this.getRateLimit(action)) {
          throw new Error(`Rate limit exceeded for action: ${action}`);
        }
        limit.count++;
      } else {
        this.rateLimits.set(action, { count: 1, resetAt: this.getNextReset(action) });
      }
    } else {
      this.rateLimits.set(action, { 
        count: 1, 
        resetAt: this.getNextReset(action) 
      });
    }
  }
  
  private validatePath(path: string, operation: 'read' | 'write'): void {
    const resolved = require('path').resolve(this.options.workspace, path);
    const workspace = require('path').resolve(this.options.workspace);
    
    // 防止路径遍历攻击
    if (!resolved.startsWith(workspace)) {
      throw new Error(`Path traversal attempt detected: ${path}`);
    }
    
    // 检查文件类型限制
    const ext = require('path').extname(path).toLowerCase();
    const forbiddenExts = ['.exe', '.dll', '.so', '.sh', '.bat', '.cmd'];
    
    if (forbiddenExts.includes(ext)) {
      throw new Error(`Forbidden file extension: ${ext}`);
    }
  }
  
  private validateCommand(command: string): void {
    const dangerousPatterns = [
      /rm\s+-rf/,
      /mkfs/,
      /dd\s+if=/,
      />\s*/dev/sda/,
      /chmod\s+[0-7]{3,4}\s+/,
    ];
    
    for (const pattern of dangerousPatterns) {
      if (pattern.test(command)) {
        throw new Error(`Dangerous command detected: ${command}`);
      }
    }
  }
  
  async run(code: string, context: any = {}): Promise<any> {
    try {
      // 代码签名验证
      const hash = createHash('sha256').update(code).digest('hex');
      if (!this.verifyCodeSignature(hash)) {
        throw new Error('Code signature verification failed');
      }
      
      // 准备执行环境
      const script = new VMScript(`
        (function() {
          "use strict";
          ${code}
        })();
      `);
      
      // 添加上下文
      this.vm.setGlobal('context', context);
      
      // 执行代码
      const startTime = Date.now();
      const result = await this.vm.run(script);
      const executionTime = Date.now() - startTime;
      
      // 记录执行日志
      this.logExecution({
        skillId: this.skillId,
        executionTime,
        resultHash: createHash('sha256')
          .update(JSON.stringify(result))
          .digest('hex')
          .slice(0, 16),
      });
      
      return result;
      
    } catch (error) {
      this.log('error', ['Sandbox execution failed:', error.message, error.stack]);
      throw error;
    }
  }
}

4.3 监控与可观测性

完整的监控系统实现

# monitoring/prometheus/prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - "alerts.yml"

scrape_configs:
  - job_name: 'openclaw-agents'
    static_configs:
      - targets: ['agent1:7456', 'agent2:7456', 'agent3:7456']
    metrics_path: '/metrics'
    
  - job_name: 'openclaw-gateway'
    static_configs:
      - targets: ['gateway:7457']
      
  - job_name: 'openclaw-skills'
    static_configs:
      - targets: ['skill-runner:7458']

# alerts.yml
groups:
  - name: openclaw-alerts
    rules:
      - alert: HighErrorRate
        expr: rate(openclaw_skill_errors_total[5m]) > 0.1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High error rate detected"
          description: "Error rate is {{ $value }} per second"
          
      - alert: MemoryUsageHigh
        expr: process_resident_memory_bytes / 1024 / 1024 > 1024
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "High memory usage"
          description: "Memory usage is {{ $value }}MB"
// src/monitoring/telemetry.ts
import { Meter, Histogram, Counter } from '@opentelemetry/api';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';

export class OpenClawTelemetry {
  private meter: Meter;
  private requestHistogram: Histogram;
  private errorCounter: Counter;
  private skillExecutionCounter: Counter;
  
  constructor(serviceName: string) {
    // 初始化OpenTelemetry
    const resource = new Resource({
      [SemanticResourceAttributes.SERVICE_NAME]: serviceName,
      [SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
      'openclaw.agent.id': process.env.AGENT_ID || 'unknown',
    });
    
    // 创建Meter
    this.meter = meterProvider.getMeter('openclaw');
    
    // 定义指标
    this.requestHistogram = this.meter.createHistogram('openclaw_request_duration_ms', {
      description: 'Duration of requests in milliseconds',
      unit: 'ms',
    });
    
    this.errorCounter = this.meter.createCounter('openclaw_errors_total', {
      description: 'Total number of errors',
    });
    
    this.skillExecutionCounter = this.meter.createCounter('openclaw_skill_executions_total', {
      description: 'Total skill executions',
    });
  }
  
  // 追踪请求
  trackRequest<T>(
    action: string,
    fn: () => Promise<T>
  ): Promise<T> {
    const startTime = Date.now();
    
    return fn()
      .then(result => {
        const duration = Date.now() - startTime;
        
        this.requestHistogram.record(duration, {
          action,
          status: 'success',
        });
        
        this.skillExecutionCounter.add(1, { action });
        
        return result;
      })
      .catch(error => {
        const duration = Date.now() - startTime;
        
        this.requestHistogram.record(duration, {
          action,
          status: 'error',
          error_type: error.constructor.name,
        });
        
        this.errorCounter.add(1, {
          action,
          error_type: error.constructor.name,
        });
        
        throw error;
      });
  }
  
  // 业务指标
  recordMemoryUsage(memoryType: string, bytes: number) {
    this.meter.createObservableGauge('openclaw_memory_usage_bytes', {
      description: 'Memory usage in bytes',
    }).addCallback(observableResult => {
      observableResult.observe(bytes, { type: memoryType });
    });
  }
  
  recordVectorSearch(query: string, results: number, duration: number) {
    this.meter.createHistogram('openclaw_vector_search_duration_ms', {
      description: 'Vector search duration',
      unit: 'ms',
    }).record(duration, { query_length: query.length.toString() });
    
    this.meter.createCounter('openclaw_vector_search_results_total', {
      description: 'Total vector search results',
    }).add(results, { query_length: query.length.toString() });
  }
}

// 分布式追踪
import { trace, Span, SpanStatusCode } from '@opentelemetry/api';

export function withTracing<T>(
  spanName: string,
  attributes: Record<string, any>,
  fn: (span: Span) => Promise<T>
): Promise<T> {
  const tracer = trace.getTracer('openclaw');
  
  return tracer.startActiveSpan(spanName, async (span) => {
    try {
      // 设置属性
      Object.entries(attributes).forEach(([key, value]) => {
        span.setAttribute(key, value);
      });
      
      // 执行函数
      const result = await fn(span);
      
      span.setStatus({ code: SpanStatusCode.OK });
      return result;
      
    } catch (error) {
      span.setStatus({
        code: SpanStatusCode.ERROR,
        message: error.message,
      });
      
      span.recordException(error);
      throw error;
      
    } finally {
      span.end();
    }
  });
}

// 使用示例
const telemetry = new OpenClawTelemetry('my-agent');

async function processRequest(request: Request) {
  return withTracing(
    'process_request',
    {
      'request.id': request.id,
      'request.type': request.type,
      'user.id': request.userId,
    },
    async (span) => {
      // 业务逻辑
      const result = await telemetry.trackRequest(
        'complex_operation',
        async () => {
          // 执行复杂操作
          return await doComplexOperation(request);
        }
      );
      
      // 记录自定义事件
      span.addEvent('request_processed', {
        'result.size': JSON.stringify(result).length,
        'processing.time': Date.now() - span.startTime,
      });
      
      return result;
    }
  );
}

第五章:实战部署与运维

5.1 生产环境部署架构

基于Kubernetes的部署方案

# k8s/openclaw-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: openclaw
  labels:
    name: openclaw
---
# k8s/openclaw-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: openclaw-config
  namespace: openclaw
data:
  config.yaml: |
    gateway:
      port: 7456
      workers: 4
      max_requests: 1000
      
    memory:
      type: hybrid
      vector_store:
        provider: qdrant
        host: qdrant.openclaw.svc.cluster.local
        port: 6333
        
    skills:
      auto_update: true
      update_interval: 3600
      
  agents.yaml: |
    agents:
      - id: coordinator
        model: gpt-4
        skills: [router, orchestrator]
        
      - id: coder
        model: claude-3-opus
        skills: [code-writer, code-reviewer]
        
      - id: researcher
        model: gpt-4
        skills: [web-search, summarizer]
---
# k8s/openclaw-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
  name: openclaw-secrets
  namespace: openclaw
type: Opaque
stringData:
  openai-api-key: "${OPENAI_API_KEY}"
  anthropic-api-key: "${ANTHROPIC_API_KEY}"
  qdrant-api-key: "${QDRANT_API_KEY}"
---
# k8s/gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
  name: openclaw-gateway
  namespace: openclaw
spec:
  replicas: 3
  selector:
    matchLabels:
      app: openclaw-gateway
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1
      maxUnavailable: 0
  template:
    metadata:
      labels:
        app: openclaw-gateway
    spec:
      containers:
      - name: gateway
        image: openclaw/gateway:2026.1.2
        ports:
        - containerPort: 7456
        env:
        - name: NODE_ENV
          value: production
        - name: OPENCLAW_CONFIG_PATH
          value: /etc/openclaw/config.yaml
        - name: OPENAI_API_KEY
          valueFrom:
            secretKeyRef:
              name: openclaw-secrets
              key: openai-api-key
        volumeMounts:
        - name: config
          mountPath: /etc/openclaw
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "2Gi"
            cpu: "1000m"
        livenessProbe:
          httpGet:
            path: /health
            port: 7456
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /ready
            port: 7456
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: openclaw-config
---
# k8s/gateway-service.yaml
apiVersion: v1
kind: Service
metadata:
  name: openclaw-gateway
  namespace: openclaw
spec:
  selector:
    app: openclaw-gateway
  ports:
  - port: 7456
    targetPort: 7456
  type: ClusterIP
---
# k8s/agent-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: openclaw-agent-hpa
  namespace: openclaw
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: openclaw-gateway
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
      - type: Percent
        value: 10
        periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 60
      policies:
      - type: Percent
        value: 100
        periodSeconds: 60
---
# k8s/qdrant-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: qdrant
  namespace: openclaw
spec:
  serviceName: qdrant
  replicas: 3
  selector:
    matchLabels:
      app: qdrant
  template:
    metadata:
      labels:
        app: qdrant
    spec:
      containers:
      - name: qdrant
        image: qdrant/qdrant:latest
        ports:
        - containerPort: 6333
        - containerPort: 6334
        volumeMounts:
        - name: data
          mountPath: /qdrant/storage
        resources:
          requests:
            memory: "1Gi"
            cpu: "500m"
          limits:
            memory: "4Gi"
            cpu: "2000m"
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "fast-ssd"
      resources:
        requests:
          storage: 100Gi
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: openclaw-ingress
  namespace: openclaw
  annotations:
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
  tls:
  - hosts:
    - openclaw.yourcompany.com
    secretName: openclaw-tls
  rules:
  - host: openclaw.yourcompany.com
    http:
      paths:
      - path: /
        pathType: Prefix
        backend:
          service:
            name: openclaw-gateway
            port:
              number: 7456

5.2 CI/CD流水线配置

# .github/workflows/openclaw-ci.yml
name: OpenClaw CI/CD

on:
  push:
    branches: [ main, develop ]
  pull_request:
    branches: [ main ]

env:
  REGISTRY: ghcr.io
  IMAGE_NAME: ${{ github.repository }}

jobs:
  test:
    runs-on: ubuntu-latest
    strategy:
      matrix:
        node-version: [20.x, 22.x]
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup Node.js
      uses: actions/setup-node@v4
      with:
        node-version: ${{ matrix.node-version }}
        cache: 'npm'
    
    - name: Install dependencies
      run: npm ci
    
    - name: Run linter
      run: npm run lint
    
    - name: Run tests
      run: npm test
      env:
        NODE_ENV: test
        OPENAI_API_KEY: ${{ secrets.TEST_OPENAI_KEY }}
    
    - name: Run integration tests
      run: npm run test:integration
      if: matrix.node-version == '22.x'
    
    - name: Upload coverage
      uses: codecov/codecov-action@v3
      with:
        file: ./coverage/lcov.info
        flags: unittests

  build:
    needs: test
    runs-on: ubuntu-latest
    if: github.event_name == 'push'
    
    steps:
    - uses: actions/checkout@v4
    
    - name: Setup Docker Buildx
      uses: docker/setup-buildx-action@v3
    
    - name: Login to Container Registry
      uses: docker/login-action@v3
      with:
        registry: ${{ env.REGISTRY }}
        username: ${{ github.actor }}
        password: ${{ secrets.GITHUB_TOKEN }}
    
    - name: Extract metadata
      id: meta
      uses: docker/metadata-action@v5
      with:
        images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
        tags: |
          type=ref,event=branch
          type=ref,event=pr
          type=semver,pattern={{version}}
          type=semver,pattern={{major}}.{{minor}}
          type=semver,pattern={{major}}
          type=sha
    
    - name: Build and push
      uses: docker/build-push-action@v5
      with:
        context: .
        push: ${{ github.event_name != 'pull_request' }}
        tags: ${{ steps.meta.outputs.tags }}
        labels: ${{ steps.meta.outputs.labels }}
        cache-from: type=gha
        cache-to: type=gha,mode=max

  deploy-staging:
    needs: build
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/develop'
    environment: staging
    
    steps:
    - name: Checkout
      uses: actions/checkout@v4
    
    - name: Deploy to Staging
      uses: appleboy/ssh-action@master
      with:
        host: ${{ secrets.STAGING_HOST }}
        username: ${{ secrets.STAGING_USERNAME }}
        key: ${{ secrets.STAGING_SSH_KEY }}
        script: |
          cd /opt/openclaw
          git pull origin develop
          docker-compose pull
          docker-compose up -d
          docker system prune -f
    
    - name: Run smoke tests
      run: |
        curl -f ${{ secrets.STAGING_URL }}/health || exit 1
        curl -f ${{ secrets.STAGING_URL }}/ready || exit 1

  deploy-production:
    needs: [test, build]
    runs-on: ubuntu-latest
    if: github.ref == 'refs/heads/main'
    environment: production
    
    steps:
    - name: Checkout
      uses: actions/checkout@v4
    
    - name: Deploy to Production
      uses: azure/k8s-deploy@v4
      with:
        namespace: openclaw
        manifests: |
          k8s/openclaw-namespace.yaml
          k8s/openclaw-config.yaml
          k8s/openclaw-secrets.yaml
          k8s/gateway-deployment.yaml
          k8s/gateway-service.yaml
          k8s/agent-hpa.yaml
          k8s/qdrant-statefulset.yaml
          k8s/ingress.yaml
        images: |
          ghcr.io/${{ github.repository }}:${{ github.sha }}
        kubectl-version: 'latest'
    
    - name: Verify deployment
      run: |
        kubectl rollout status deployment/openclaw-gateway -n openclaw --timeout=300s
        kubectl rollout status statefulset/qdrant -n openclaw --timeout=300s
    
    - name: Run health checks
      run: |
        for i in {1..30}; do
          if curl -f https://openclaw.yourcompany.com/health; then
            echo "Health check passed"
            exit 0
          fi
          echo "Waiting for service to be ready..."
          sleep 10
        done
        echo "Health check failed"
        exit 1

5.3 备份与灾难恢复

# backup/backup-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
  name: openclaw-backup
  namespace: openclaw
spec:
  schedule: "0 2 * * *"  # 每天凌晨2点
  successfulJobsHistoryLimit: 3
  failedJobsHistoryLimit: 1
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: backup
            image: postgres:15-alpine
            env:
            - name: PGHOST
              value: postgres.openclaw.svc.cluster.local
            - name: PGDATABASE
              value: openclaw
            - name: PGUSER
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: username
            - name: PGPASSWORD
              valueFrom:
                secretKeyRef:
                  name: postgres-credentials
                  key: password
            - name: S3_ENDPOINT
              value: s3.amazonaws.com
            - name: S3_BUCKET
              value: openclaw-backups
            - name: S3_PREFIX
              value: "daily"
            command:
            - /bin/sh
            - -c
            - |
              # 备份PostgreSQL
              TIMESTAMP=$(date +%Y%m%d_%H%M%S)
              
              # 1. 备份数据库
              pg_dump -Fc openclaw > /tmp/openclaw_${TIMESTAMP}.dump
              
              # 2. 备份向量数据库
              curl -X POST "http://qdrant.openclaw.svc.cluster.local:6333/collections/openclaw/snapshots" \
                -o /tmp/qdrant_${TIMESTAMP}.snapshot
              
              # 3. 备份配置文件
              tar -czf /tmp/config_${TIMESTAMP}.tar.gz /etc/openclaw/
              
              # 4. 上传到S3
              aws s3 cp /tmp/openclaw_${TIMESTAMP}.dump s3://${S3_BUCKET}/${S3_PREFIX}/
              aws s3 cp /tmp/qdrant_${TIMESTAMP}.snapshot s3://${S3_BUCKET}/${S3_PREFIX}/
              aws s3 cp /tmp/config_${TIMESTAMP}.tar.gz s3://${S3_BUCKET}/${S3_PREFIX}/
              
              # 5. 清理本地临时文件
              rm -f /tmp/*_${TIMESTAMP}.*
              
              # 6. 清理7天前的备份
              aws s3 ls s3://${S3_BUCKET}/${S3_PREFIX}/ | \
                awk '{print $4}' | \
                while read file; do
                  filedate=$(echo $file | grep -oE '[0-9]{8}')
                  if [ ! -z "$filedate" ]; then
                    if [ $(date -d "$filedate" +%s) -lt $(date -d "7 days ago" +%s) ]; then
                      aws s3 rm s3://${S3_BUCKET}/${S3_PREFIX}/$file
                    fi
                  fi
                done
            volumeMounts:
            - name: aws-credentials
              mountPath: /root/.aws
              readOnly: true
          volumes:
          - name: aws-credentials
            secret:
              secretName: aws-backup-credentials
          restartPolicy: OnFailure
---
# disaster-recovery/restore-script.sh
#!/bin/bash
set -e

# 恢复脚本
RESTORE_DATE=${1:-$(date +%Y%m%d)}

echo "开始恢复 OpenClaw 系统,日期: ${RESTORE_DATE}"

# 1. 停止服务
echo "停止服务..."
kubectl scale deployment openclaw-gateway --replicas=0 -n openclaw
kubectl scale statefulset qdrant --replicas=0 -n openclaw

# 2. 从S3下载备份文件
echo "下载备份文件..."
aws s3 cp s3://openclaw-backups/daily/openclaw_${RESTORE_DATE}_*.dump /tmp/openclaw.dump
aws s3 cp s3://openclaw-backups/daily/qdrant_${RESTORE_DATE}_*.snapshot /tmp/qdrant.snapshot
aws s3 cp s3://openclaw-backups/daily/config_${RESTORE_DATE}_*.tar.gz /tmp/config.tar.gz

# 3. 恢复PostgreSQL
echo "恢复数据库..."
kubectl exec -it postgres-0 -n openclaw -- psql -c "DROP DATABASE IF EXISTS openclaw;"
kubectl exec -it postgres-0 -n openclaw -- psql -c "CREATE DATABASE openclaw;"
kubectl cp /tmp/openclaw.dump postgres-0:/tmp/ -n openclaw
kubectl exec -it postgres-0 -n openclaw -- pg_restore -d openclaw /tmp/openclaw.dump

# 4. 恢复Qdrant
echo "恢复向量数据库..."
QDRANT_POD=$(kubectl get pods -n openclaw -l app=qdrant -o jsonpath='{.items[0].metadata.name}')
kubectl cp /tmp/qdrant.snapshot ${QDRANT_POD}:/tmp/ -n openclaw
kubectl exec -it ${QDRANT_POD} -n openclaw -- \
  curl -X POST "http://localhost:6333/collections/openclaw/snapshots/upload" \
    -F snapshot=@/tmp/qdrant.snapshot

# 5. 恢复配置文件
echo "恢复配置文件..."
tar -xzf /tmp/config.tar.gz -C /
kubectl create configmap openclaw-config --from-file=/etc/openclaw -n openclaw --dry-run=client -o yaml | \
  kubectl apply -f -

# 6. 重启服务
echo "重启服务..."
kubectl scale statefulset qdrant --replicas=3 -n openclaw
kubectl rollout status statefulset/qdrant -n openclaw --timeout=300s

kubectl scale deployment openclaw-gateway --replicas=3 -n openclaw
kubectl rollout status deployment/openclaw-gateway -n openclaw --timeout=300s

echo "恢复完成!"

第六章:成本优化与性能调优

6.1 混合模型策略

// 智能模型路由
class ModelRouter {
  private models = {
    'gpt-4': { cost: 0.03, performance: 0.9, speed: 1 },
    'gpt-4o': { cost: 0.015, performance: 0.85, speed: 2 },
    'gpt-4o-mini': { cost: 0.003, performance: 0.7, speed: 4 },
    'claude-3-opus': { cost: 0.075, performance: 0.95, speed: 0.8 },
    'claude-3-sonnet': { cost: 0.015, performance: 0.85, speed: 2 },
    'claude-3-haiku': { cost: 0.001, performance: 0.6, speed: 5 },
    'local-llama': { cost: 0.0001, performance: 0.5, speed: 0.5 },
  };
  
  async selectModel(
    taskType: string,
    complexity: number,
    urgency: number,
    budget?: number
  ): Promise<string> {
    
    const candidates = Object.entries(this.models)
      .map(([name, specs]) => ({
        name,
        ...specs,
        score: this.calculateScore(specs, complexity, urgency, budget),
      }))
      .sort((a, b) => b.score - a.score);
    
    return candidates[0].name;
  }
  
  private calculateScore(
    specs: any,
    complexity: number,
    urgency: number,
    budget?: number
  ): number {
    let score = 0;
    
    // 性能需求(根据任务复杂度调整权重)
    const performanceWeight = complexity * 0.6;
    score += specs.performance * performanceWeight;
    
    // 速度需求(根据紧急程度调整权重)
    const speedWeight = urgency * 0.4;
    score += specs.speed * speedWeight;
    
    // 成本约束
    if (budget !== undefined) {
      const costPenalty = Math.max(0, (specs.cost - budget) * 100);
      score -= costPenalty;
    }
    
    return score;
  }
}

6.2 智能缓存策略

// 多级智能缓存
class IntelligentCache {
  private layers = {
    // L1: 内存缓存(最快)
    memory: new Map<string, { data: any; expires: number; accessCount: number }>(),
    
    // L2: Redis缓存(分布式)
    redis: RedisClient,
    
    // L3: 磁盘缓存(持久化)
    disk: DiskCache,
  };
  
  // 智能缓存策略
  async getWithCache<T>(
    key: string,
    fetchFn: () => Promise<T>,
    options: CacheOptions = {}
  ): Promise<T> {
    const {
      ttl = 3600,
      staleWhileRevalidate = false,
      prefreshThreshold = 0.8, // 提前刷新阈值