OpenClaw 错误处理最佳实践

0 阅读1分钟

常见错误类型

1. API 错误

错误码原因解决方案
401API Key 无效检查配置
429请求频率限制添加重试/降级
500服务端错误重试 + 通知
503服务不可用切换备用模型

2. 网络错误

错误原因解决方案
ETIMEDOUT连接超时增加超时时间
ECONNREFUSED服务未启动检查服务状态
ENOTFOUNDDNS 解析失败检查网络/代理

3. 业务错误

错误原因解决方案
context_too_long上下文超限裁剪历史对话
invalid_response响应格式错误重试 + 日志
skill_errorSkill 执行失败降级处理

错误处理配置

基础配置

# ~/.openclaw/config.yaml
errorHandling:
  retry:
    enabled: true
    maxAttempts: 3
    backoff: "exponential"  # 指数退避
    initialDelay: 1000  # 1 秒
    maxDelay: 30000  # 30 秒

  fallback:
    enabled: true
    model: "deepseek-chat"  # 备用模型
    message: "服务暂时不可用,请稍后重试"

  logging:
    enabled: true
    level: "error"
    file: "~/.openclaw/logs/error.log"

高级配置

errorHandling:
  # 错误分类处理
  handlers:
    - type: "api_error"
      actions:
        - retry
        - fallback
        - notify

    - type: "network_error"
      actions:
        - retry
        - checkNetwork

    - type: "business_error"
      actions:
        - log
        - userMessage

  # 降级策略
  degradation:
    rateLimit:
      threshold: 10  # 10 次/分钟
      action: "queue"  # 排队处理

    errorRate:
      threshold: 0.1  # 10% 错误率
      action: "circuit_breaker"  # 熔断

代码示例

重试逻辑

async function withRetry(fn, maxAttempts = 3) {
  for (let i = 0; i < maxAttempts; i++) {
    try {
      return await fn();
    } catch (error) {
      if (i === maxAttempts - 1) throw error;

      const delay = Math.min(1000 * Math.pow(2, i), 30000);
      await sleep(delay);
    }
  }
}

降级处理

async function chatWithFallback(message) {
  try {
    // 尝试主模型
    return await openclaw.chat(message, { model: "claude-sonnet" });
  } catch (error) {
    console.error("主模型失败,切换备用:", error.message);

    // 降级到 DeepSeek
    return await openclaw.chat(message, { model: "deepseek-chat" });
  }
}

熔断器

class CircuitBreaker {
  constructor(threshold = 5, timeout = 60000) {
    this.failures = 0;
    this.threshold = threshold;
    this.timeout = timeout;
    this.state = "closed";  // closed/open/half-open
    this.lastFailure = null;
  }

  async execute(fn) {
    if (this.state === "open") {
      if (Date.now() - this.lastFailure > this.timeout) {
        this.state = "half-open";
      } else {
        throw new Error("Circuit breaker is open");
      }
    }

    try {
      const result = await fn();
      this.onSuccess();
      return result;
    } catch (error) {
      this.onFailure();
      throw error;
    }
  }

  onSuccess() {
    this.failures = 0;
    this.state = "closed";
  }

  onFailure() {
    this.failures++;
    this.lastFailure = Date.now();

    if (this.failures >= this.threshold) {
      this.state = "open";
    }
  }
}

监控告警

Prometheus 指标

metrics:
  errors_total:
    type: counter
    labels: [type, model, channel]

  error_rate:
    type: gauge
    labels: [model]

  retry_total:
    type: counter
    labels: [model, success]

告警规则

alerts:
  - name: "high_error_rate"
    expr: "error_rate > 0.1"
    duration: 5m
    severity: "warning"
    message: "错误率超过 10%"

  - name: "api_down"
    expr: "errors_total{type='api_error'} > 100"
    duration: 1m
    severity: "critical"
    message: "API 大量错误,请检查"

最佳实践总结

场景策略
临时错误重试 + 指数退避
持续错误降级 + 通知
API 限流排队 + 预热
服务不可用熔断 + 切换备用
数据错误日志 + 人工介入

总结:好的错误处理让系统更稳定,用户体验更好。

需要帮忙配置的可以找我,微信 yanghu-dev,服务页面:yang1002378395-cmyk.github.io/openclaw-in…