当养虾从塘口延伸到命令行,我们正在见证一场基础设施革命
第一章:OpenClaw架构解析——这只“龙虾”的解剖学
1.1 核心架构:不只是另一个ChatBot包装器
架构总览:
OpenClaw的核心设计理念是“持久性Agent系统”,与传统的会话式AI有本质区别。其架构采用多层记忆持久化设计:
┌─────────────────────────────────────────────┐
│ Agent运行层 (Claw Runtime) │
├─────────────────────────────────────────────┤
│ 技能调度器 │ 记忆管理器 │ 工具执行器 │ 通信网关 │
├─────────────────────────────────────────────┤
│ 记忆持久化层 (Memory Stack) │
│ • 短期记忆: 当前会话上下文 (Max 128K tokens) │
│ • 中期记忆: MEMORY.md (结构化记忆存储) │
│ • 长期记忆: 向量数据库 (Chroma/Weaviate) │
│ • 日志系统: 按日期组织的Markdown日志文件 │
├─────────────────────────────────────────────┤
│ 技能生态系统 (Skill Registry) │
│ • 核心技能: 文件操作、网络请求、系统调用 │
│ • 社区技能: GitHub仓库中的300+个技能包 │
│ • 自定义技能: 用户开发的TypeScript/JS模块 │
└─────────────────────────────────────────────┘
技术栈亮点:
- 运行时环境:Node.js 22+,ES模块标准
- 进程管理:通过Gateway守护进程实现7x24小时运行
- 记忆系统:SQLite + OpenAI兼容的Embeddings API
- 技能架构:基于扩展的Web Worker模型,安全隔离执行
1.2 配置文件深度解析:你的“龙虾”DNA
核心配置文件~/.openclaw/config.yaml详解:
# OpenClaw 主配置文件
version: '2026.1'
claw:
name: "code_assistant" # Agent身份标识
model_provider: "openai" # 支持: openai, anthropic, local(llama.cpp)
model: "gpt-4o-mini" # 默认模型,推荐用于日常任务
# 记忆系统配置
memory:
type: "hybrid" # hybrid|vector|file
vector_store:
provider: "chroma" # chroma|weaviate|qdrant
path: "~/.openclaw/vector_db"
file_memory:
path: "~/.openclaw/memories"
auto_summarize: true
summary_interval: 1000 # 每1000条记忆自动总结
# 技能系统配置
skills:
auto_discover: true
trusted_registries:
- "https://registry.openclaw.ai"
- "https://github.com/openclaw-community"
security_level: "strict" # strict|moderate|permissive
# 执行环境配置
execution:
timeout: 300000 # 单技能执行超时(ms)
max_concurrent: 3 # 最大并发技能数
sandbox:
enabled: true
level: "isolated" # isolated|restricted|trusted
# 开发者选项
developer:
debug: false
log_level: "info" # debug|info|warn|error
telemetry: false # 是否发送匿名使用数据
环境变量敏感配置(存储在.env文件中):
# API密钥配置
OPENAI_API_KEY=sk-...
ANTHROPIC_API_KEY=sk-ant-...
GROQ_API_KEY=gsk_...
LOCAL_LLM_URL=http://localhost:8080 # 本地模型端点
# 高级功能密钥
TAVILY_API_KEY=... # 联网搜索
SERPAPI_KEY=... # 搜索引擎结果
GITHUB_TOKEN=... # GitHub集成
# 存储配置
OPENCLAW_DATA_DIR=~/.openclaw
OPENCLAW_CACHE_DIR=~/.cache/openclaw
1.3 进程管理与监控:让“龙虾”稳定运行
Gateway守护进程详解:
OpenClaw的核心是Gateway守护进程,它负责:
- Agent生命周期管理
- 技能的热加载和卸载
- 记忆系统的持久化同步
- 外部通信接口的维护
启动Gateway:
# 安装Gateway服务
openclaw gateway --install
# 启动Gateway(系统服务方式)
sudo systemctl start openclaw-gateway
# 开发模式启动(带热重载)
openclaw gateway --dev --watch
进程监控与日志:
# 查看Gateway状态
openclaw gateway --status
# 实时日志查看
tail -f ~/.openclaw/logs/gateway.log
# 详细的运行指标
openclaw metrics --format=json
# 返回示例:
# {
# "uptime": 1234567,
# "active_skills": 5,
# "memory_usage_mb": 245,
# "total_requests": 1234,
# "avg_response_time_ms": 345
# }
健康检查端点:
Gateway提供RESTful健康检查接口:
curl http://localhost:7456/health
# 返回:{"status":"healthy","version":"2026.1.2","agents":3}
# 详细系统状态
curl http://localhost:7456/debug/pprof/goroutine?debug=2
第二章:技能开发实战——打造专属工具链
2.1 技能架构深度解析
技能模块标准结构:
my-awesome-skill/
├── package.json # 技能元数据
├── src/
│ ├── index.ts # 主入口文件
│ ├── types.ts # TypeScript类型定义
│ ├── handlers/ # 处理器函数
│ └── utils/ # 工具函数
├── config/
│ └── skill.yaml # 技能配置
├── tests/ # 单元测试
└── README.md # 使用文档
package.json关键配置:
{
"name": "@yourname/skill-git-helper",
"version": "1.0.0",
"description": "Git操作增强技能",
"type": "module",
"main": "dist/index.js",
"claw": {
"type": "action",
"permissions": [
"filesystem:read",
"filesystem:write",
"network:internal",
"process:execute"
],
"triggers": [
{
"type": "command",
"command": "git.*",
"description": "处理git相关命令"
},
{
"type": "cron",
"schedule": "0 */2 * * *",
"description": "每2小时自动提交"
}
],
"input_schema": {
"type": "object",
"properties": {
"operation": {
"type": "string",
"enum": ["commit", "push", "branch", "status"]
},
"message": {
"type": "string",
"description": "提交信息"
}
}
}
}
}
2.2 开发你的第一个生产级技能
示例:智能Git助手技能开发:
// src/index.ts
import { Skill, ActionContext } from '@openclaw/core';
import { exec } from 'child_process';
import { promisify } from 'util';
import { analyzeCommitMessage, generateBranchName } from './ai-helper';
const execute = promisify(exec);
export default new Skill({
name: 'git-ai-helper',
version: '1.0.0',
// 技能描述,用于Agent自动发现
description: 'AI增强的Git操作助手,支持智能提交、分支管理等',
// 定义技能可执行的操作
actions: {
// 智能提交:自动生成合适的提交信息
async smartCommit(ctx: ActionContext, params: {
stageAll?: boolean;
emoji?: boolean;
} = {}) {
const { stageAll = true, emoji = true } = params;
try {
// 1. 获取git状态
const { stdout: statusOutput } = await execute('git status --porcelain');
const changes = statusOutput.trim().split('\n').filter(line => line);
if (changes.length === 0) {
return { success: false, message: '没有可提交的更改' };
}
// 2. 暂存文件
if (stageAll) {
await execute('git add .');
} else {
// 让AI选择要暂存的文件
const filesToStage = await ctx.ask(
'以下文件有更改,请选择要提交的文件:\n' +
changes.map(c => `- ${c.substring(3)}`).join('\n')
);
// ... 处理选择逻辑
}
// 3. 使用AI生成提交信息
const diff = await this.getStagedDiff();
const commitMessage = await analyzeCommitMessage(diff, { emoji });
// 4. 执行提交
await execute(`git commit -m "${commitMessage}"`);
// 5. 记录到记忆系统
await ctx.memory.add({
type: 'git_commit',
message: commitMessage,
timestamp: new Date().toISOString(),
files: changes.map(c => c.substring(3))
});
return {
success: true,
message: `提交成功: ${commitMessage}`,
commitMessage
};
} catch (error) {
ctx.logger.error('Git提交失败:', error);
return {
success: false,
error: error.message,
suggestion: '请检查git配置和网络连接'
};
}
},
// 自动分支管理
async createFeatureBranch(ctx: ActionContext, params: {
issue?: string;
description: string;
}) {
const { issue, description } = params;
// 使用AI生成有意义的分支名
const branchName = await generateBranchName(description, { issue });
// 创建并切换到新分支
await execute(`git checkout -b ${branchName}`);
// 设置上游分支(如果需要)
if (await this.hasRemote()) {
await execute(`git push -u origin ${branchName}`);
}
return {
success: true,
branch: branchName,
commands: [
`git checkout -b ${branchName}`,
`git push -u origin ${branchName}`
]
};
},
// PR自动生成
async generatePullRequest(ctx: ActionContext, params: {
title?: string;
template?: string;
}) {
// 获取当前分支和diff
const { stdout: currentBranch } = await execute('git branch --show-current');
const { stdout: diffMain } = await execute('git diff main HEAD --stat');
// 使用AI生成PR描述
const prDescription = await ctx.llm.generate(`
基于以下变更生成PR描述:
分支: ${currentBranch.trim()}
变更统计: ${diffMain}
请生成包含以下部分的PR描述:
1. 变更概述
2. 修改内容
3. 测试建议
4. 相关Issue
使用Markdown格式。
`);
// 这里可以集成GitHub API自动创建PR
// const pr = await githubApi.createPR(...);
return {
success: true,
branch: currentBranch.trim(),
description: prDescription,
// prUrl: pr.html_url
};
}
},
// 定时任务
schedules: {
// 每天自动拉取最新代码
autoPull: {
cron: '0 9,18 * * *', // 每天9点和18点
async handler(ctx) {
try {
await execute('git pull --rebase');
ctx.logger.info('自动拉取代码成功');
} catch (error) {
ctx.logger.warn('自动拉取失败,可能有冲突需要手动解决');
}
}
},
// 每周清理过时分支
cleanupBranches: {
cron: '0 0 * * 0', // 每周日0点
async handler(ctx) {
const { stdout: branches } = await execute(
'git branch --merged main | grep -v "main"'
);
const toDelete = branches.trim().split('\n').filter(b => b);
for (const branch of toDelete) {
const branchName = branch.trim().replace('* ', '');
if (branchName) {
await execute(`git branch -d ${branchName}`);
ctx.logger.info(`已删除已合并分支: ${branchName}`);
}
}
}
}
},
// 辅助方法
methods: {
async getStagedDiff(): Promise<string> {
const { stdout } = await execute('git diff --cached --no-color');
return stdout;
},
async hasRemote(): Promise<boolean> {
try {
await execute('git remote get-url origin');
return true;
} catch {
return false;
}
}
}
});
技能测试套件:
// tests/git-helper.test.ts
import { describe, it, expect, beforeEach, afterEach } from 'vitest';
import { mock } from 'node:test';
import GitSkill from '../src/index';
describe('Git AI Helper Skill', () => {
let skill: Skill;
let mockContext: any;
beforeEach(() => {
skill = new GitSkill();
mockContext = {
memory: {
add: mock.fn(),
},
logger: {
info: mock.fn(),
error: mock.fn(),
},
llm: {
generate: mock.fn(async (prompt) => 'AI生成的提交信息'),
},
};
});
describe('smartCommit', () => {
it('应该成功提交更改', async () => {
// 模拟git命令
const executeMock = mock.method(require('child_process'), 'exec');
executeMock.mock.mockImplementation((cmd) => {
if (cmd.includes('status')) {
return Promise.resolve({ stdout: 'M src/index.ts\nA src/utils.ts' });
}
if (cmd.includes('add')) {
return Promise.resolve({ stdout: '' });
}
if (cmd.includes('commit')) {
return Promise.resolve({ stdout: '' });
}
});
const result = await skill.actions.smartCommit(mockContext);
expect(result.success).toBe(true);
expect(mockContext.memory.add).toHaveBeenCalled();
});
it('没有更改时应返回适当信息', async () => {
const executeMock = mock.method(require('child_process'), 'exec');
executeMock.mock.mockResolvedValue({ stdout: '' });
const result = await skill.actions.smartCommit(mockContext);
expect(result.success).toBe(false);
expect(result.message).toContain('没有可提交的更改');
});
});
});
2.3 技能发布与分发
发布到OpenClaw技能市场:
- 准备发布:
# 构建技能
npm run build
# 运行测试
npm test
# 生成技能清单
openclaw skill pack --output dist/skill.tar.gz
- 发布到注册表:
# 登录技能注册表
openclaw registry login --token YOUR_PUBLISH_TOKEN
# 发布技能
openclaw registry publish dist/skill.tar.gz \
--visibility=public \
--category="developer-tools"
-
版本管理:
OpenClaw使用语义化版本控制,技能更新时需遵循:
- 主版本号:不兼容的API变更
- 次版本号:向下兼容的功能性新增
- 修订号:向下兼容的问题修正
第三章:多Agent系统架构——构建你的“龙虾农场”
3.1 多Agent架构设计模式
基于角色的Agent分解模式:
# agents/orchestrator/config.yaml
agents:
# 代码专家Agent
code_expert:
soul: |
你是资深全栈工程师,擅长TypeScript、Python、Go。
你的职责:
1. 代码编写、审查、重构
2. 技术架构设计
3. 性能优化
4. 代码质量保证
工作原则:
- 优先使用TypeScript
- 遵循SOLID原则
- 编写完整的单元测试
- 文档驱动开发
model: "gpt-4"
skills:
- "git-helper"
- "code-reviewer"
- "test-generator"
- "docker-builder"
workspace: "/workspace/code"
# 系统运维Agent
sysops:
soul: |
你是系统运维专家,专注于:
1. 服务器部署与维护
2. 容器化与编排
3. 监控与告警
4. 安全加固
工作原则:
- 最小权限原则
- 基础设施即代码
- 自动化一切
- 安全第一
model: "claude-3-opus"
skills:
- "docker-manager"
- "k8s-operator"
- "monitoring-setup"
- "security-scanner"
workspace: "/workspace/infra"
# 项目管理Agent
project_manager:
soul: |
你是敏捷项目经理,负责:
1. 任务分解与分配
2. 进度跟踪
3. 风险管理
4. 团队协调
工作原则:
- 透明沟通
- 数据驱动决策
- 持续改进
- 价值优先
model: "gpt-4o"
skills:
- "task-manager"
- "jira-connector"
- "meeting-scheduler"
- "report-generator"
workspace: "/workspace/projects"
# 协调者Agent
coordinator:
soul: |
你是团队协调者,负责:
1. 接收外部请求
2. 任务路由
3. 结果整合
4. 质量检查
model: "gpt-4o"
skills:
- "message-router"
- "quality-checker"
- "result-aggregator"
3.2 Agent间通信机制
基于事件的通信系统:
// src/event-bus.ts
import { EventEmitter } from 'events';
import { v4 as uuidv4 } from 'uuid';
export interface AgentMessage {
id: string;
from: string;
to: string | string[];
type: 'request' | 'response' | 'broadcast' | 'error';
payload: any;
timestamp: Date;
correlationId?: string;
requiresResponse?: boolean;
timeout?: number;
}
export class AgentEventBus extends EventEmitter {
private agents: Map<string, AgentConnection> = new Map();
private pendingResponses: Map<string, Function> = new Map();
// 注册Agent
registerAgent(agentId: string, handler: (msg: AgentMessage) => Promise<any>) {
this.agents.set(agentId, { id: agentId, handler });
}
// 发送消息
async sendMessage(message: AgentMessage): Promise<AgentMessage | null> {
const messageId = uuidv4();
const fullMessage = {
...message,
id: messageId,
timestamp: new Date(),
};
// 广播消息
if (message.to === 'broadcast') {
const promises = Array.from(this.agents.values())
.filter(agent => agent.id !== message.from)
.map(agent => agent.handler(fullMessage).catch(err => ({
id: uuidv4(),
from: agent.id,
to: message.from,
type: 'error',
payload: { error: err.message },
timestamp: new Date(),
})));
const results = await Promise.allSettled(promises);
return {
id: uuidv4(),
from: 'system',
to: message.from,
type: 'response',
payload: results,
timestamp: new Date(),
};
}
// 单播消息
const agent = this.agents.get(message.to as string);
if (!agent) {
throw new Error(`Agent ${message.to} not found`);
}
// 如果需要响应,设置超时机制
if (message.requiresResponse) {
return new Promise((resolve, reject) => {
const timeoutId = setTimeout(() => {
this.pendingResponses.delete(messageId);
reject(new Error(`Response timeout for message ${messageId}`));
}, message.timeout || 30000);
this.pendingResponses.set(messageId, (response) => {
clearTimeout(timeoutId);
resolve(response);
});
agent.handler(fullMessage).then(resolve).catch(reject);
});
}
// 不需要响应,直接发送
return agent.handler(fullMessage);
}
// 请求-响应模式
async request<T = any>(
from: string,
to: string,
action: string,
params: any = {},
options: { timeout?: number } = {}
): Promise<T> {
const message: AgentMessage = {
id: uuidv4(),
from,
to,
type: 'request',
payload: { action, params },
requiresResponse: true,
timeout: options.timeout || 30000,
};
const response = await this.sendMessage(message);
if (response?.type === 'error') {
throw new Error(response.payload.error);
}
return response?.payload as T;
}
}
// 使用示例
const bus = new AgentEventBus();
// Agent 1: 代码专家
bus.registerAgent('code_expert', async (msg) => {
if (msg.payload.action === 'reviewCode') {
const code = msg.payload.params.code;
const issues = await reviewCode(code);
return {
id: uuidv4(),
from: 'code_expert',
to: msg.from,
type: 'response',
payload: { issues, suggestions: [] },
};
}
});
// Agent 2: 系统运维
bus.registerAgent('sysops', async (msg) => {
if (msg.payload.action === 'deploy') {
const result = await deployToK8s(msg.payload.params.service);
return {
id: uuidv4(),
from: 'sysops',
to: msg.from,
type: 'response',
payload: result,
};
}
});
// 协调者Agent发送任务
async function handleDeployRequest(serviceConfig) {
// 1. 请求代码专家审查代码
const reviewResult = await bus.request(
'coordinator',
'code_expert',
'reviewCode',
{ code: serviceConfig.code }
);
if (reviewResult.issues.length > 0) {
return { status: 'failed', issues: reviewResult.issues };
}
// 2. 请求系统运维部署
const deployResult = await bus.request(
'coordinator',
'sysops',
'deploy',
{ service: serviceConfig }
);
// 3. 整合结果
return {
status: 'success',
codeReview: reviewResult,
deployment: deployResult,
};
}
3.3 负载均衡与故障转移
智能路由策略:
// src/router.ts
export class IntelligentRouter {
private agentStats: Map<string, AgentStatistics> = new Map();
private skillMap: Map<string, string[]> = new Map(); // 技能 -> Agent列表
// 根据多种因素选择最佳Agent
async selectBestAgent(
requiredSkills: string[],
context: RoutingContext
): Promise<string | null> {
const candidateAgents = this.findAgentsWithSkills(requiredSkills);
if (candidateAgents.length === 0) {
return null;
}
if (candidateAgents.length === 1) {
return candidateAgents[0];
}
// 评分算法
const scoredAgents = await Promise.all(
candidateAgents.map(async agentId => ({
agentId,
score: await this.calculateAgentScore(agentId, context),
}))
);
// 选择最高分的Agent
scoredAgents.sort((a, b) => b.score - a.score);
return scoredAgents[0].agentId;
}
private async calculateAgentScore(
agentId: string,
context: RoutingContext
): Promise<number> {
const stats = this.agentStats.get(agentId) || this.getDefaultStats();
const agentConfig = this.getAgentConfig(agentId);
let score = 100; // 基础分
// 1. 负载因子 (0-30分)
const loadFactor = 1 - Math.min(stats.currentLoad / stats.maxCapacity, 0.9);
score += loadFactor * 30;
// 2. 响应时间因子 (0-25分)
const avgResponseTime = stats.avgResponseTime || 1000;
const responseFactor = Math.max(0, 1 - avgResponseTime / 5000);
score += responseFactor * 25;
// 3. 成功率因子 (0-20分)
const successRate = stats.successRate || 0.95;
score += successRate * 20;
// 4. 技能匹配度 (0-15分)
const skillMatch = this.calculateSkillMatch(agentId, context.requiredSkills);
score += skillMatch * 15;
// 5. 成本因子 (0-10分)
const costFactor = 1 - (agentConfig.costPerRequest || 0) / 0.1;
score += Math.max(0, costFactor) * 10;
// 6. 专业化加分 (专业Agent在处理专业任务时加分)
if (agentConfig.specialization === context.taskType) {
score += 20;
}
return score;
}
// 更新Agent统计信息
updateAgentStats(agentId: string, result: TaskResult) {
const stats = this.agentStats.get(agentId) || this.getDefaultStats();
stats.totalRequests++;
stats.totalTime += result.duration;
stats.avgResponseTime = stats.totalTime / stats.totalRequests;
if (result.success) {
stats.successfulRequests++;
} else {
stats.failedRequests++;
}
stats.successRate = stats.successfulRequests / stats.totalRequests;
stats.currentLoad = result.currentLoad;
this.agentStats.set(agentId, stats);
}
}
第四章:高级主题与优化策略
4.1 性能优化实战
记忆系统优化:
// 记忆系统优化配置
const memoryConfig = {
// 分级存储策略
storage: {
levels: [
{
name: 'hot',
maxSize: 1000, // 1000条记录
ttl: 3600000, // 1小时
storage: 'memory',
},
{
name: 'warm',
maxSize: 10000, // 10000条记录
ttl: 86400000, // 24小时
storage: 'sqlite',
index: ['timestamp', 'type', 'tags'],
},
{
name: 'cold',
maxSize: 100000, // 100000条记录
ttl: 604800000, // 7天
storage: 'vector',
compression: true,
},
],
},
// 智能记忆管理
management: {
// 自动总结策略
autoSummarize: {
enabled: true,
threshold: 50, // 每50条相关记忆触发总结
model: 'gpt-4o-mini',
maxTokens: 500,
},
// 记忆重要性评分
importanceScoring: {
factors: [
{ field: 'accessCount', weight: 0.3 },
{ field: 'recency', weight: 0.25 },
{ field: 'relevanceScore', weight: 0.25 },
{ field: 'userFeedback', weight: 0.2 },
],
// 基于使用模式的自适应权重
adaptiveWeights: true,
learningRate: 0.1,
},
// 缓存优化
caching: {
enabled: true,
strategy: 'lru', // LRU缓存淘汰
maxSize: 100,
preload: ['frequent', 'recent'],
},
},
// 向量搜索优化
vectorSearch: {
indexType: 'hnsw', // Hierarchical Navigable Small World
m: 16, // 每个节点的连接数
efConstruction: 200, // 构建时的动态列表大小
efSearch: 100, // 搜索时的动态列表大小
// 量化压缩
quantization: {
enabled: true,
type: 'product', // Product Quantization
nbits: 8, // 每个子向量的比特数
nsub: 16, // 子向量数量
},
// 过滤优化
filters: {
metadata: true,
timeRange: true,
importanceThreshold: 0.7,
},
},
};
大模型调用优化:
class OptimizedLLMClient {
private cache = new Map<string, { response: any; timestamp: number }>();
private requestQueue: Array<{
prompt: string;
resolve: Function;
reject: Function;
}> = [];
private isProcessing = false;
constructor(
private model: string,
private options: {
batchSize: number;
maxConcurrent: number;
cacheTtl: number;
timeout: number;
}
) {}
// 批处理请求
async generateBatch(
prompts: string[],
options?: any
): Promise<string[]> {
// 1. 检查缓存
const cacheResults: (string | null)[] = [];
const toFetch: { prompt: string; index: number }[] = [];
prompts.forEach((prompt, index) => {
const cacheKey = this.getCacheKey(prompt, options);
const cached = this.cache.get(cacheKey);
if (cached && Date.now() - cached.timestamp < this.options.cacheTtl) {
cacheResults[index] = cached.response;
} else {
cacheResults[index] = null;
toFetch.push({ prompt, index });
}
});
if (toFetch.length === 0) {
return cacheResults as string[];
}
// 2. 批处理请求
const batchResults = await this.processBatch(
toFetch.map(item => item.prompt),
options
);
// 3. 合并结果
toFetch.forEach((item, i) => {
cacheResults[item.index] = batchResults[i];
const cacheKey = this.getCacheKey(item.prompt, options);
this.cache.set(cacheKey, {
response: batchResults[i],
timestamp: Date.now(),
});
});
return cacheResults as string[];
}
private async processBatch(
prompts: string[],
options?: any
): Promise<string[]> {
// 动态批处理策略
const batchSize = this.calculateOptimalBatchSize(prompts);
const batches: string[][] = [];
for (let i = 0; i < prompts.length; i += batchSize) {
batches.push(prompts.slice(i, i + batchSize));
}
const results: string[] = [];
// 并发处理批次
const batchPromises = batches.map(async (batch, batchIndex) => {
try {
const batchResults = await this.callLLMWithRetry({
model: this.model,
messages: batch.map(prompt => ({
role: 'user' as const,
content: prompt,
})),
...options,
});
// 将结果放回正确位置
batchResults.forEach((result: string, index: number) => {
const originalIndex = batchIndex * batchSize + index;
results[originalIndex] = result;
});
} catch (error) {
// 批次失败,回退到单个请求
console.warn(`批次 ${batchIndex} 失败,回退到单个请求`);
for (let i = 0; i < batch.length; i++) {
try {
const result = await this.callLLMWithRetry({
model: this.model,
messages: [{ role: 'user' as const, content: batch[i] }],
...options,
});
const originalIndex = batchIndex * batchSize + i;
results[originalIndex] = result;
} catch (singleError) {
// 单个请求也失败,使用后备方案
results[batchIndex * batchSize + i] = this.getFallbackResponse(batch[i]);
}
}
}
});
await Promise.allSettled(batchPromises);
return results;
}
private calculateOptimalBatchSize(prompts: string[]): number {
const avgLength = prompts.reduce((sum, p) => sum + p.length, 0) / prompts.length;
// 根据平均长度动态调整批次大小
if (avgLength < 100) return 10;
if (avgLength < 500) return 5;
if (avgLength < 1000) return 3;
return 1;
}
private async callLLMWithRetry(
request: any,
retries = 3
): Promise<any> {
for (let i = 0; i < retries; i++) {
try {
const controller = new AbortController();
const timeoutId = setTimeout(
() => controller.abort(),
this.options.timeout
);
const response = await fetch('https://api.openai.com/v1/chat/completions', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify(request),
signal: controller.signal,
});
clearTimeout(timeoutId);
if (!response.ok) {
if (response.status === 429 && i < retries - 1) {
// 速率限制,指数退避
const delay = Math.min(1000 * Math.pow(2, i) + Math.random() * 1000, 10000);
await new Promise(resolve => setTimeout(resolve, delay));
continue;
}
throw new Error(`HTTP ${response.status}: ${response.statusText}`);
}
const data = await response.json();
return data.choices[0].message.content;
} catch (error) {
if (i === retries - 1) throw error;
// 其他错误,线性退避
await new Promise(resolve =>
setTimeout(resolve, 1000 * (i + 1))
);
}
}
throw new Error('All retries failed');
}
}
4.2 安全加固策略
技能沙箱与权限控制:
// src/sandbox.ts
import { VM, VMScript } from 'vm2';
import { createHash } from 'crypto';
export class SkillSandbox {
private vm: VM;
private permissions: Set<string>;
private callHistory: Array<{ action: string; timestamp: Date }> = [];
private rateLimits: Map<string, { count: number; resetAt: Date }> = new Map();
constructor(
skillId: string,
permissions: string[] = [],
options: SandboxOptions = {}
) {
this.permissions = new Set(permissions);
this.vm = new VM({
timeout: options.timeout || 5000,
sandbox: this.createSandboxObject(),
eval: false,
wasm: false,
fixAsync: true,
// 严格限制
...options.restrictions,
});
}
private createSandboxObject(): any {
const sandbox: any = {};
// 基础工具函数(安全版本)
sandbox.console = {
log: (...args: any[]) => this.log('info', args),
error: (...args: any[]) => this.log('error', args),
warn: (...args: any[]) => this.log('warn', args),
};
// 安全的fetch
sandbox.fetch = async (url: string, init?: RequestInit) => {
this.checkPermission('network:external');
this.checkRateLimit('fetch');
const parsedUrl = new URL(url);
// 检查允许的域名
if (!this.isAllowedDomain(parsedUrl.hostname)) {
throw new Error(`Domain ${parsedUrl.hostname} is not allowed`);
}
// 添加安全头
const safeInit = {
...init,
headers: {
...init?.headers,
'User-Agent': 'OpenClaw-Skill/1.0',
},
};
return global.fetch(url, safeInit);
};
// 安全的文件系统访问
sandbox.fs = {
readFile: async (path: string, encoding = 'utf-8') => {
this.checkPermission('filesystem:read');
this.validatePath(path, 'read');
const { promises } = require('fs');
return promises.readFile(path, encoding);
},
writeFile: async (path: string, data: any) => {
this.checkPermission('filesystem:write');
this.validatePath(path, 'write');
const { promises } = require('fs');
return promises.writeFile(path, data);
},
readdir: async (path: string) => {
this.checkPermission('filesystem:read');
this.validatePath(path, 'read');
const { promises } = require('fs');
return promises.readdir(path);
},
stat: async (path: string) => {
this.checkPermission('filesystem:read');
this.validatePath(path, 'read');
const { promises } = require('fs');
return promises.stat(path);
},
};
// 安全的子进程执行
sandbox.exec = async (command: string, options?: any) => {
this.checkPermission('process:execute');
this.validateCommand(command);
const { exec } = require('child_process');
const { promisify } = require('util');
const execAsync = promisify(exec);
return execAsync(command, {
timeout: 30000,
cwd: this.options.workspace,
...options,
});
};
return sandbox;
}
private checkPermission(permission: string): void {
if (!this.permissions.has(permission)) {
throw new Error(`Permission denied: ${permission}`);
}
}
private checkRateLimit(action: string): void {
const now = new Date();
const limit = this.rateLimits.get(action);
if (limit) {
if (now < limit.resetAt) {
if (limit.count >= this.getRateLimit(action)) {
throw new Error(`Rate limit exceeded for action: ${action}`);
}
limit.count++;
} else {
this.rateLimits.set(action, { count: 1, resetAt: this.getNextReset(action) });
}
} else {
this.rateLimits.set(action, {
count: 1,
resetAt: this.getNextReset(action)
});
}
}
private validatePath(path: string, operation: 'read' | 'write'): void {
const resolved = require('path').resolve(this.options.workspace, path);
const workspace = require('path').resolve(this.options.workspace);
// 防止路径遍历攻击
if (!resolved.startsWith(workspace)) {
throw new Error(`Path traversal attempt detected: ${path}`);
}
// 检查文件类型限制
const ext = require('path').extname(path).toLowerCase();
const forbiddenExts = ['.exe', '.dll', '.so', '.sh', '.bat', '.cmd'];
if (forbiddenExts.includes(ext)) {
throw new Error(`Forbidden file extension: ${ext}`);
}
}
private validateCommand(command: string): void {
const dangerousPatterns = [
/rm\s+-rf/,
/mkfs/,
/dd\s+if=/,
/>\s*/dev/sda/,
/chmod\s+[0-7]{3,4}\s+/,
];
for (const pattern of dangerousPatterns) {
if (pattern.test(command)) {
throw new Error(`Dangerous command detected: ${command}`);
}
}
}
async run(code: string, context: any = {}): Promise<any> {
try {
// 代码签名验证
const hash = createHash('sha256').update(code).digest('hex');
if (!this.verifyCodeSignature(hash)) {
throw new Error('Code signature verification failed');
}
// 准备执行环境
const script = new VMScript(`
(function() {
"use strict";
${code}
})();
`);
// 添加上下文
this.vm.setGlobal('context', context);
// 执行代码
const startTime = Date.now();
const result = await this.vm.run(script);
const executionTime = Date.now() - startTime;
// 记录执行日志
this.logExecution({
skillId: this.skillId,
executionTime,
resultHash: createHash('sha256')
.update(JSON.stringify(result))
.digest('hex')
.slice(0, 16),
});
return result;
} catch (error) {
this.log('error', ['Sandbox execution failed:', error.message, error.stack]);
throw error;
}
}
}
4.3 监控与可观测性
完整的监控系统实现:
# monitoring/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
rule_files:
- "alerts.yml"
scrape_configs:
- job_name: 'openclaw-agents'
static_configs:
- targets: ['agent1:7456', 'agent2:7456', 'agent3:7456']
metrics_path: '/metrics'
- job_name: 'openclaw-gateway'
static_configs:
- targets: ['gateway:7457']
- job_name: 'openclaw-skills'
static_configs:
- targets: ['skill-runner:7458']
# alerts.yml
groups:
- name: openclaw-alerts
rules:
- alert: HighErrorRate
expr: rate(openclaw_skill_errors_total[5m]) > 0.1
for: 5m
labels:
severity: warning
annotations:
summary: "High error rate detected"
description: "Error rate is {{ $value }} per second"
- alert: MemoryUsageHigh
expr: process_resident_memory_bytes / 1024 / 1024 > 1024
for: 2m
labels:
severity: critical
annotations:
summary: "High memory usage"
description: "Memory usage is {{ $value }}MB"
// src/monitoring/telemetry.ts
import { Meter, Histogram, Counter } from '@opentelemetry/api';
import { Resource } from '@opentelemetry/resources';
import { SemanticResourceAttributes } from '@opentelemetry/semantic-conventions';
export class OpenClawTelemetry {
private meter: Meter;
private requestHistogram: Histogram;
private errorCounter: Counter;
private skillExecutionCounter: Counter;
constructor(serviceName: string) {
// 初始化OpenTelemetry
const resource = new Resource({
[SemanticResourceAttributes.SERVICE_NAME]: serviceName,
[SemanticResourceAttributes.SERVICE_VERSION]: '1.0.0',
'openclaw.agent.id': process.env.AGENT_ID || 'unknown',
});
// 创建Meter
this.meter = meterProvider.getMeter('openclaw');
// 定义指标
this.requestHistogram = this.meter.createHistogram('openclaw_request_duration_ms', {
description: 'Duration of requests in milliseconds',
unit: 'ms',
});
this.errorCounter = this.meter.createCounter('openclaw_errors_total', {
description: 'Total number of errors',
});
this.skillExecutionCounter = this.meter.createCounter('openclaw_skill_executions_total', {
description: 'Total skill executions',
});
}
// 追踪请求
trackRequest<T>(
action: string,
fn: () => Promise<T>
): Promise<T> {
const startTime = Date.now();
return fn()
.then(result => {
const duration = Date.now() - startTime;
this.requestHistogram.record(duration, {
action,
status: 'success',
});
this.skillExecutionCounter.add(1, { action });
return result;
})
.catch(error => {
const duration = Date.now() - startTime;
this.requestHistogram.record(duration, {
action,
status: 'error',
error_type: error.constructor.name,
});
this.errorCounter.add(1, {
action,
error_type: error.constructor.name,
});
throw error;
});
}
// 业务指标
recordMemoryUsage(memoryType: string, bytes: number) {
this.meter.createObservableGauge('openclaw_memory_usage_bytes', {
description: 'Memory usage in bytes',
}).addCallback(observableResult => {
observableResult.observe(bytes, { type: memoryType });
});
}
recordVectorSearch(query: string, results: number, duration: number) {
this.meter.createHistogram('openclaw_vector_search_duration_ms', {
description: 'Vector search duration',
unit: 'ms',
}).record(duration, { query_length: query.length.toString() });
this.meter.createCounter('openclaw_vector_search_results_total', {
description: 'Total vector search results',
}).add(results, { query_length: query.length.toString() });
}
}
// 分布式追踪
import { trace, Span, SpanStatusCode } from '@opentelemetry/api';
export function withTracing<T>(
spanName: string,
attributes: Record<string, any>,
fn: (span: Span) => Promise<T>
): Promise<T> {
const tracer = trace.getTracer('openclaw');
return tracer.startActiveSpan(spanName, async (span) => {
try {
// 设置属性
Object.entries(attributes).forEach(([key, value]) => {
span.setAttribute(key, value);
});
// 执行函数
const result = await fn(span);
span.setStatus({ code: SpanStatusCode.OK });
return result;
} catch (error) {
span.setStatus({
code: SpanStatusCode.ERROR,
message: error.message,
});
span.recordException(error);
throw error;
} finally {
span.end();
}
});
}
// 使用示例
const telemetry = new OpenClawTelemetry('my-agent');
async function processRequest(request: Request) {
return withTracing(
'process_request',
{
'request.id': request.id,
'request.type': request.type,
'user.id': request.userId,
},
async (span) => {
// 业务逻辑
const result = await telemetry.trackRequest(
'complex_operation',
async () => {
// 执行复杂操作
return await doComplexOperation(request);
}
);
// 记录自定义事件
span.addEvent('request_processed', {
'result.size': JSON.stringify(result).length,
'processing.time': Date.now() - span.startTime,
});
return result;
}
);
}
第五章:实战部署与运维
5.1 生产环境部署架构
基于Kubernetes的部署方案:
# k8s/openclaw-namespace.yaml
apiVersion: v1
kind: Namespace
metadata:
name: openclaw
labels:
name: openclaw
---
# k8s/openclaw-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: openclaw-config
namespace: openclaw
data:
config.yaml: |
gateway:
port: 7456
workers: 4
max_requests: 1000
memory:
type: hybrid
vector_store:
provider: qdrant
host: qdrant.openclaw.svc.cluster.local
port: 6333
skills:
auto_update: true
update_interval: 3600
agents.yaml: |
agents:
- id: coordinator
model: gpt-4
skills: [router, orchestrator]
- id: coder
model: claude-3-opus
skills: [code-writer, code-reviewer]
- id: researcher
model: gpt-4
skills: [web-search, summarizer]
---
# k8s/openclaw-secrets.yaml
apiVersion: v1
kind: Secret
metadata:
name: openclaw-secrets
namespace: openclaw
type: Opaque
stringData:
openai-api-key: "${OPENAI_API_KEY}"
anthropic-api-key: "${ANTHROPIC_API_KEY}"
qdrant-api-key: "${QDRANT_API_KEY}"
---
# k8s/gateway-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
replicas: 3
selector:
matchLabels:
app: openclaw-gateway
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: openclaw-gateway
spec:
containers:
- name: gateway
image: openclaw/gateway:2026.1.2
ports:
- containerPort: 7456
env:
- name: NODE_ENV
value: production
- name: OPENCLAW_CONFIG_PATH
value: /etc/openclaw/config.yaml
- name: OPENAI_API_KEY
valueFrom:
secretKeyRef:
name: openclaw-secrets
key: openai-api-key
volumeMounts:
- name: config
mountPath: /etc/openclaw
resources:
requests:
memory: "512Mi"
cpu: "250m"
limits:
memory: "2Gi"
cpu: "1000m"
livenessProbe:
httpGet:
path: /health
port: 7456
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
httpGet:
path: /ready
port: 7456
initialDelaySeconds: 5
periodSeconds: 5
volumes:
- name: config
configMap:
name: openclaw-config
---
# k8s/gateway-service.yaml
apiVersion: v1
kind: Service
metadata:
name: openclaw-gateway
namespace: openclaw
spec:
selector:
app: openclaw-gateway
ports:
- port: 7456
targetPort: 7456
type: ClusterIP
---
# k8s/agent-hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: openclaw-agent-hpa
namespace: openclaw
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: openclaw-gateway
minReplicas: 3
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 60
policies:
- type: Percent
value: 100
periodSeconds: 60
---
# k8s/qdrant-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: qdrant
namespace: openclaw
spec:
serviceName: qdrant
replicas: 3
selector:
matchLabels:
app: qdrant
template:
metadata:
labels:
app: qdrant
spec:
containers:
- name: qdrant
image: qdrant/qdrant:latest
ports:
- containerPort: 6333
- containerPort: 6334
volumeMounts:
- name: data
mountPath: /qdrant/storage
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "4Gi"
cpu: "2000m"
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "fast-ssd"
resources:
requests:
storage: 100Gi
---
# k8s/ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: openclaw-ingress
namespace: openclaw
annotations:
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
spec:
tls:
- hosts:
- openclaw.yourcompany.com
secretName: openclaw-tls
rules:
- host: openclaw.yourcompany.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: openclaw-gateway
port:
number: 7456
5.2 CI/CD流水线配置
# .github/workflows/openclaw-ci.yml
name: OpenClaw CI/CD
on:
push:
branches: [ main, develop ]
pull_request:
branches: [ main ]
env:
REGISTRY: ghcr.io
IMAGE_NAME: ${{ github.repository }}
jobs:
test:
runs-on: ubuntu-latest
strategy:
matrix:
node-version: [20.x, 22.x]
steps:
- uses: actions/checkout@v4
- name: Setup Node.js
uses: actions/setup-node@v4
with:
node-version: ${{ matrix.node-version }}
cache: 'npm'
- name: Install dependencies
run: npm ci
- name: Run linter
run: npm run lint
- name: Run tests
run: npm test
env:
NODE_ENV: test
OPENAI_API_KEY: ${{ secrets.TEST_OPENAI_KEY }}
- name: Run integration tests
run: npm run test:integration
if: matrix.node-version == '22.x'
- name: Upload coverage
uses: codecov/codecov-action@v3
with:
file: ./coverage/lcov.info
flags: unittests
build:
needs: test
runs-on: ubuntu-latest
if: github.event_name == 'push'
steps:
- uses: actions/checkout@v4
- name: Setup Docker Buildx
uses: docker/setup-buildx-action@v3
- name: Login to Container Registry
uses: docker/login-action@v3
with:
registry: ${{ env.REGISTRY }}
username: ${{ github.actor }}
password: ${{ secrets.GITHUB_TOKEN }}
- name: Extract metadata
id: meta
uses: docker/metadata-action@v5
with:
images: ${{ env.REGISTRY }}/${{ env.IMAGE_NAME }}
tags: |
type=ref,event=branch
type=ref,event=pr
type=semver,pattern={{version}}
type=semver,pattern={{major}}.{{minor}}
type=semver,pattern={{major}}
type=sha
- name: Build and push
uses: docker/build-push-action@v5
with:
context: .
push: ${{ github.event_name != 'pull_request' }}
tags: ${{ steps.meta.outputs.tags }}
labels: ${{ steps.meta.outputs.labels }}
cache-from: type=gha
cache-to: type=gha,mode=max
deploy-staging:
needs: build
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/develop'
environment: staging
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Deploy to Staging
uses: appleboy/ssh-action@master
with:
host: ${{ secrets.STAGING_HOST }}
username: ${{ secrets.STAGING_USERNAME }}
key: ${{ secrets.STAGING_SSH_KEY }}
script: |
cd /opt/openclaw
git pull origin develop
docker-compose pull
docker-compose up -d
docker system prune -f
- name: Run smoke tests
run: |
curl -f ${{ secrets.STAGING_URL }}/health || exit 1
curl -f ${{ secrets.STAGING_URL }}/ready || exit 1
deploy-production:
needs: [test, build]
runs-on: ubuntu-latest
if: github.ref == 'refs/heads/main'
environment: production
steps:
- name: Checkout
uses: actions/checkout@v4
- name: Deploy to Production
uses: azure/k8s-deploy@v4
with:
namespace: openclaw
manifests: |
k8s/openclaw-namespace.yaml
k8s/openclaw-config.yaml
k8s/openclaw-secrets.yaml
k8s/gateway-deployment.yaml
k8s/gateway-service.yaml
k8s/agent-hpa.yaml
k8s/qdrant-statefulset.yaml
k8s/ingress.yaml
images: |
ghcr.io/${{ github.repository }}:${{ github.sha }}
kubectl-version: 'latest'
- name: Verify deployment
run: |
kubectl rollout status deployment/openclaw-gateway -n openclaw --timeout=300s
kubectl rollout status statefulset/qdrant -n openclaw --timeout=300s
- name: Run health checks
run: |
for i in {1..30}; do
if curl -f https://openclaw.yourcompany.com/health; then
echo "Health check passed"
exit 0
fi
echo "Waiting for service to be ready..."
sleep 10
done
echo "Health check failed"
exit 1
5.3 备份与灾难恢复
# backup/backup-job.yaml
apiVersion: batch/v1
kind: CronJob
metadata:
name: openclaw-backup
namespace: openclaw
spec:
schedule: "0 2 * * *" # 每天凌晨2点
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
spec:
containers:
- name: backup
image: postgres:15-alpine
env:
- name: PGHOST
value: postgres.openclaw.svc.cluster.local
- name: PGDATABASE
value: openclaw
- name: PGUSER
valueFrom:
secretKeyRef:
name: postgres-credentials
key: username
- name: PGPASSWORD
valueFrom:
secretKeyRef:
name: postgres-credentials
key: password
- name: S3_ENDPOINT
value: s3.amazonaws.com
- name: S3_BUCKET
value: openclaw-backups
- name: S3_PREFIX
value: "daily"
command:
- /bin/sh
- -c
- |
# 备份PostgreSQL
TIMESTAMP=$(date +%Y%m%d_%H%M%S)
# 1. 备份数据库
pg_dump -Fc openclaw > /tmp/openclaw_${TIMESTAMP}.dump
# 2. 备份向量数据库
curl -X POST "http://qdrant.openclaw.svc.cluster.local:6333/collections/openclaw/snapshots" \
-o /tmp/qdrant_${TIMESTAMP}.snapshot
# 3. 备份配置文件
tar -czf /tmp/config_${TIMESTAMP}.tar.gz /etc/openclaw/
# 4. 上传到S3
aws s3 cp /tmp/openclaw_${TIMESTAMP}.dump s3://${S3_BUCKET}/${S3_PREFIX}/
aws s3 cp /tmp/qdrant_${TIMESTAMP}.snapshot s3://${S3_BUCKET}/${S3_PREFIX}/
aws s3 cp /tmp/config_${TIMESTAMP}.tar.gz s3://${S3_BUCKET}/${S3_PREFIX}/
# 5. 清理本地临时文件
rm -f /tmp/*_${TIMESTAMP}.*
# 6. 清理7天前的备份
aws s3 ls s3://${S3_BUCKET}/${S3_PREFIX}/ | \
awk '{print $4}' | \
while read file; do
filedate=$(echo $file | grep -oE '[0-9]{8}')
if [ ! -z "$filedate" ]; then
if [ $(date -d "$filedate" +%s) -lt $(date -d "7 days ago" +%s) ]; then
aws s3 rm s3://${S3_BUCKET}/${S3_PREFIX}/$file
fi
fi
done
volumeMounts:
- name: aws-credentials
mountPath: /root/.aws
readOnly: true
volumes:
- name: aws-credentials
secret:
secretName: aws-backup-credentials
restartPolicy: OnFailure
---
# disaster-recovery/restore-script.sh
#!/bin/bash
set -e
# 恢复脚本
RESTORE_DATE=${1:-$(date +%Y%m%d)}
echo "开始恢复 OpenClaw 系统,日期: ${RESTORE_DATE}"
# 1. 停止服务
echo "停止服务..."
kubectl scale deployment openclaw-gateway --replicas=0 -n openclaw
kubectl scale statefulset qdrant --replicas=0 -n openclaw
# 2. 从S3下载备份文件
echo "下载备份文件..."
aws s3 cp s3://openclaw-backups/daily/openclaw_${RESTORE_DATE}_*.dump /tmp/openclaw.dump
aws s3 cp s3://openclaw-backups/daily/qdrant_${RESTORE_DATE}_*.snapshot /tmp/qdrant.snapshot
aws s3 cp s3://openclaw-backups/daily/config_${RESTORE_DATE}_*.tar.gz /tmp/config.tar.gz
# 3. 恢复PostgreSQL
echo "恢复数据库..."
kubectl exec -it postgres-0 -n openclaw -- psql -c "DROP DATABASE IF EXISTS openclaw;"
kubectl exec -it postgres-0 -n openclaw -- psql -c "CREATE DATABASE openclaw;"
kubectl cp /tmp/openclaw.dump postgres-0:/tmp/ -n openclaw
kubectl exec -it postgres-0 -n openclaw -- pg_restore -d openclaw /tmp/openclaw.dump
# 4. 恢复Qdrant
echo "恢复向量数据库..."
QDRANT_POD=$(kubectl get pods -n openclaw -l app=qdrant -o jsonpath='{.items[0].metadata.name}')
kubectl cp /tmp/qdrant.snapshot ${QDRANT_POD}:/tmp/ -n openclaw
kubectl exec -it ${QDRANT_POD} -n openclaw -- \
curl -X POST "http://localhost:6333/collections/openclaw/snapshots/upload" \
-F snapshot=@/tmp/qdrant.snapshot
# 5. 恢复配置文件
echo "恢复配置文件..."
tar -xzf /tmp/config.tar.gz -C /
kubectl create configmap openclaw-config --from-file=/etc/openclaw -n openclaw --dry-run=client -o yaml | \
kubectl apply -f -
# 6. 重启服务
echo "重启服务..."
kubectl scale statefulset qdrant --replicas=3 -n openclaw
kubectl rollout status statefulset/qdrant -n openclaw --timeout=300s
kubectl scale deployment openclaw-gateway --replicas=3 -n openclaw
kubectl rollout status deployment/openclaw-gateway -n openclaw --timeout=300s
echo "恢复完成!"
第六章:成本优化与性能调优
6.1 混合模型策略
// 智能模型路由
class ModelRouter {
private models = {
'gpt-4': { cost: 0.03, performance: 0.9, speed: 1 },
'gpt-4o': { cost: 0.015, performance: 0.85, speed: 2 },
'gpt-4o-mini': { cost: 0.003, performance: 0.7, speed: 4 },
'claude-3-opus': { cost: 0.075, performance: 0.95, speed: 0.8 },
'claude-3-sonnet': { cost: 0.015, performance: 0.85, speed: 2 },
'claude-3-haiku': { cost: 0.001, performance: 0.6, speed: 5 },
'local-llama': { cost: 0.0001, performance: 0.5, speed: 0.5 },
};
async selectModel(
taskType: string,
complexity: number,
urgency: number,
budget?: number
): Promise<string> {
const candidates = Object.entries(this.models)
.map(([name, specs]) => ({
name,
...specs,
score: this.calculateScore(specs, complexity, urgency, budget),
}))
.sort((a, b) => b.score - a.score);
return candidates[0].name;
}
private calculateScore(
specs: any,
complexity: number,
urgency: number,
budget?: number
): number {
let score = 0;
// 性能需求(根据任务复杂度调整权重)
const performanceWeight = complexity * 0.6;
score += specs.performance * performanceWeight;
// 速度需求(根据紧急程度调整权重)
const speedWeight = urgency * 0.4;
score += specs.speed * speedWeight;
// 成本约束
if (budget !== undefined) {
const costPenalty = Math.max(0, (specs.cost - budget) * 100);
score -= costPenalty;
}
return score;
}
}
6.2 智能缓存策略
// 多级智能缓存
class IntelligentCache {
private layers = {
// L1: 内存缓存(最快)
memory: new Map<string, { data: any; expires: number; accessCount: number }>(),
// L2: Redis缓存(分布式)
redis: RedisClient,
// L3: 磁盘缓存(持久化)
disk: DiskCache,
};
// 智能缓存策略
async getWithCache<T>(
key: string,
fetchFn: () => Promise<T>,
options: CacheOptions = {}
): Promise<T> {
const {
ttl = 3600,
staleWhileRevalidate = false,
prefreshThreshold = 0.8, // 提前刷新阈值