十二、《从 OpenAI 实践中提炼的 4 个核心技能：持久执行、闭环测试、架构约束、运行策略》一、引言：从实验到实践

一、引言：从实验到实践

OpenAI 的 Harness Engineering 实验不仅证明了 AI Agent 可以独立完成大规模软件开发，更重要的是，它提炼出了一套可复用的方法论。

Ryan Lopopolo 在博客中总结了 4 个核心技能：

持久执行（Durable Execution）
闭环测试（Closed-Loop Testing）
架构约束（Architectural Constraints）
运行策略（Runtime Policies）

这 4 个技能构成了 Harness Engineering 的技术基石。本文将深入解析每一项技能，并提供可落地的实践指南。

二、技能一：持久执行（Durable Execution）

2.1 为什么需要持久执行？

AI Agent 处理的任务越来越复杂：

大型代码库重构（数小时到数天）
复杂系统生成（百万行代码）
深度代码分析（跨模块依赖）

这些任务需要长时间运行，而进程崩溃、资源不足、网络中断等故障不可避免。

持久执行的核心目标：让 Agent 的任务能够"断点续传"，而不是从头再来。

2.2 持久执行的三层架构

┌─────────────────────────────────────────────────────────┐
│              持久执行的三层架构                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  第 3 层：应用层持久化                                    │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 业务状态保存（当前目标、已完成工作）              │   │
│  │  • 上下文信息（相关代码、文档引用）                  │   │
│  │  • 中间产物（生成的代码片段、分析结果）              │   │
│  └─────────────────────────────────────────────────┘   │
│                         ↑                               │
│  第 2 层：执行层持久化                                    │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 执行计划状态（当前步骤、待执行步骤）              │   │
│  │  • 活动状态（正在执行的活动、输入输出）              │   │
│  │  • 等待状态（等待外部事件、定时器）                  │   │
│  └─────────────────────────────────────────────────┘   │
│                         ↑                               │
│  第 1 层：系统层持久化                                    │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 工作流实例标识和元数据                           │   │
│  │  • 资源配置信息（分配的 Agent、计算资源）            │   │
│  │  • 超时和重试配置                                  │   │
│  │  • 检查点存储位置和策略                             │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘

2.3 检查点策略设计

yaml

复制

# checkpoint-config.yaml
checkpoint:
  # 触发条件
  triggers:
    # 时间触发：每 5 分钟
    - type: time_interval
      interval: 5m
    
    # 事件触发：关键活动完成
    - type: event
      events:
        - activity_completed
        - file_written
        - external_api_called
    
    # 状态触发：重要状态变更
    - type: state_change
      paths:
        - business_state.current_goal
        - execution_state.active_activities
  
  # 检查点内容
  include:
    full_state: true           # 完整状态
    event_history: last_100    # 最近 100 个事件
    generated_artifacts: references  # 产物引用（不存内容）
  
  # 存储配置
  storage:
    type: distributed_storage
    backend: s3               # S3 / GCS / Azure Blob
    replication: 3
    encryption: true
  
  # 保留策略
  retention:
    successful: 7d            # 成功任务保留 7 天
    failed: 30d               # 失败任务保留 30 天
    max_checkpoints_per_workflow: 50

  # 恢复配置
  recovery:
    auto_retry: true          # 自动重试
    max_retries: 3            # 最大重试次数
    backoff_strategy: exponential  # 指数退避

2.4 持久执行的最佳实践

实践

说明

效果

频繁检查点

5-10 分钟一次

故障损失最小化

增量保存

只保存变化部分

减少存储和传输开销

异步保存

不阻塞主流程

性能影响最小化

幂等设计

重复执行无副作用

恢复后正确性保证

状态验证

恢复后验证完整性

防止损坏状态继续执行

三、技能二：闭环测试（Closed-Loop Testing）

3.1 为什么需要闭环测试？

传统测试是"开环"的：

人类编写测试 → 运行测试 → 查看结果 → 人工修复

闭环测试是"自动"的：

Agent 生成代码 → 自动测试 → 反馈给 Agent → Agent 自动修复

闭环测试的核心目标：让 Agent 能够自我验证、自我修正。

3.2 闭环测试的架构

┌─────────────────────────────────────────────────────────┐
│              闭环测试架构                                │
├─────────────────────────────────────────────────────────┤
│                                                         │
│   ┌─────────────────────────────────────────────────┐  │
│   │              测试触发层                          │  │
│   │  • 代码提交自动触发                              │  │
│   │  • 定时全量测试                                  │  │
│   │  • 手动触发                                      │  │
│   └─────────────────────────────────────────────────┘  │
│                         ↓                               │
│   ┌─────────────────────────────────────────────────┐  │
│   │              测试执行层                          │  │
│   │                                                 │  │
│   │  阶段 1: 静态测试（秒级）                         │  │
│   │  ├── 语法检查（编译/解释）                        │  │
│   │  ├── 代码风格检查（Linter）                       │  │
│   │  ├── 静态分析（复杂度、依赖）                      │  │
│   │  └── 安全扫描（漏洞检测）                         │  │
│   │                                                 │  │
│   │  阶段 2: 动态测试（分钟级）                        │  │
│   │  ├── 单元测试（函数/模块）                        │  │
│   │  ├── 集成测试（模块交互）                         │  │
│   │  └── 契约测试（接口兼容性）                        │  │
│   │                                                 │  │
│   │  阶段 3: 系统测试（小时级）                        │  │
│   │  ├── 端到端测试（完整流程）                        │  │
│   │  ├── 性能测试（响应时间、吞吐量）                   │  │
│   │  └── 安全测试（渗透测试）                         │  │
│   │                                                 │  │
│   └─────────────────────────────────────────────────┘  │
│                         ↓                               │
│   ┌─────────────────────────────────────────────────┐  │
│   │              结果分析层                          │  │
│   │  • 测试结果聚合                                  │  │
│   │  • 错误分类和定位                                │  │
│   │  • 生成结构化反馈                                │  │
│   └─────────────────────────────────────────────────┘  │
│                         ↓                               │
│   ┌─────────────────────────────────────────────────┐  │
│   │              反馈应用层                          │  │
│   │  • 反馈给 Agent                                  │  │
│   │  • Agent 分析并修复                              │  │
│   │  • 重新触发测试（循环）                           │  │
│   │  • 超过最大迭代则人工介入                         │  │
│   └─────────────────────────────────────────────────┘  │
│                                                         │
└─────────────────────────────────────────────────────────┘

3.3 反馈设计的关键

json

复制

{
  "test_run_id": "tr_20260323_001",
  "status": "failed",
  "stage": "unit_test",
  
  "summary": {
    "total": 50,
    "passed": 48,
    "failed": 2
  },
  
  "failures": [
    {
      "test_file": "src/auth/login.test.ts",
      "test_name": "should reject weak password",
      "line": 45,
      "error_type": "AssertionError",
      "message": "Expected ValidationError but got null",
      
      "context": {
        "input": { "password": "123" },
        "expected": "ValidationError: Password too weak",
        "actual": "null"
      },
      
      "stack_trace": [
        "at validatePassword (src/auth/validator.ts:23)",
        "at Object.<anonymous> (src/auth/login.test.ts:45)"
      ],
      
      "suggestion": {
        "description": "Password validation regex is too permissive",
        "file": "src/auth/validator.ts",
        "lines": "20-30",
        "reference": "See password policy doc: /docs/security.md"
      },
      
      "related_code": {
        "file": "src/auth/validator.ts",
        "content": "function validatePassword(pwd: string) {\n  // TODO: implement strong validation\n  return true;\n}"
      }
    }
  ],
  
  "recommended_fixes": [
    {
      "priority": "high",
      "file": "src/auth/validator.ts",
      "action": "implement proper password validation",
      "hint": "Use regex: /^(?=.*[a-z])(?=.*[A-Z])(?=.*\\d).{8,}$/"
    }
  ]
}

3.4 闭环测试的最佳实践

实践

说明

效果

快速失败

低成本检查优先

减少无效迭代

分层测试

从快到慢，从简单到复杂

效率最大化

清晰反馈

具体、可行动的建议

Agent 修复效率高

迭代限制

最大修复次数

避免无限循环

人工兜底

复杂问题升级

质量保证

四、技能三：架构约束（Architectural Constraints）

4.1 为什么需要架构约束？

AI Agent 能力强大但"自由散漫"：

可能生成不符合项目规范的代码
可能使用不恰当的技术方案
可能破坏现有架构

架构约束的核心目标：在释放 AI 能力的同时，确保其行为在预期边界内。

4.2 约束的分层设计

┌─────────────────────────────────────────────────────────┐
│              架构约束的分层设计                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  第 4 层：业务约束（最具体）                              │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 领域模型规范（实体、值对象、聚合）                │   │
│  │  • 业务规则验证（不变量、流程）                      │   │
│  │  • 数据一致性要求（事务边界）                        │   │
│  │  • 合规性要求（GDPR、SOX 等）                       │   │
│  │  实现：领域模型验证器、业务规则引擎                   │   │
│  └─────────────────────────────────────────────────┘   │
│                         ↑                               │
│  第 3 层：安全约束                                        │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 输入验证（所有输入必须验证）                      │   │
│  │  • 敏感数据处理（加密、脱敏）                        │   │
│  │  • 权限控制（RBAC/ABAC）                           │   │
│  │  • 安全编码标准（OWASP Top 10 防护）                │   │
│  │  实现：安全扫描工具、静态分析                         │   │
│  └─────────────────────────────────────────────────┘   │
│                         ↑                               │
│  第 2 层：代码约束                                        │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 编码风格（缩进、命名、注释）                      │   │
│  │  • 代码质量（复杂度、重复度）                        │   │
│  │  • 设计模式（推荐/禁止的模式）                       │   │
│  │  • 文档规范（函数注释、README）                      │   │
│  │  实现：Linter、Formatter、静态分析                   │   │
│  └─────────────────────────────────────────────────┘   │
│                         ↑                               │
│  第 1 层：架构约束（最基础）                              │
│  ┌─────────────────────────────────────────────────┐   │
│  │  • 项目结构（目录组织、文件命名）                    │   │
│  │  • 模块边界（职责划分、接口定义）                    │   │
│  │  • 技术栈（允许使用的语言、框架、库）                 │   │
│  │  • 依赖规则（模块间依赖方向）                        │   │
│  │  实现：项目模板、脚手架、架构测试                     │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘

4.3 约束配置示例

yaml

复制

# architecture-constraints.yaml
project_structure:
  root:
    required_directories:
      - src/
        - components/     # UI 组件
        - services/       # 业务逻辑
        - utils/          # 工具函数
        - types/          # 类型定义
      - tests/
        - unit/           # 单元测试
        - integration/    # 集成测试
      - docs/             # 文档
      - scripts/          # 脚本工具
    
    forbidden_patterns:
      - "src/*.js"        # 源文件必须在子目录
      - "tests/*.ts"      # 测试文件必须在子目录
      - "node_modules"    # 不允许提交依赖

module_boundaries:
  rules:
    - name: "ui-cannot-import-business"
      from: "src/components/**"
      allow: ["src/types/**", "src/utils/**"]
      forbid: ["src/services/**"]
      message: "UI 层不应直接依赖业务逻辑层"
    
    - name: "business-can-use-utils"
      from: "src/services/**"
      allow: ["src/utils/**", "src/types/**"]
      message: "业务层可以使用工具层"

tech_stack:
  languages:
    allowed: [typescript, javascript]
    forbidden: [python, java]
  
  frameworks:
    frontend: react
    backend: express
    orm: prisma
  
  forbidden_libraries:
    - jquery          # 过时
    - lodash          # 推荐用原生
    - moment          # 推荐用 date-fns

code_quality:
  complexity:
    max_cyclomatic: 10
    max_cognitive: 15
    max_function_length: 50
  
  coverage:
    unit: 80
    integration: 60
  
  duplication:
    max_lines: 6
    max_tokens: 50

security:
  required:
    - input_validation
    - output_encoding
    - authentication
    - authorization
  
  forbidden:
    - eval
    - innerHTML
    - document.write
    - hardcoded_secrets

4.4 架构约束的最佳实践

实践

说明

效果

约束即代码

版本化、可审计

可追溯、可复用

渐进收紧

从松到严，逐步完善

避免一开始就过度限制

自动验证

集成到 CI/CD

确保约束被执行

文档同步

约束和文档保持一致

减少理解偏差

定期回顾

根据实践优化约束

持续改进

五、技能四：运行策略（Runtime Policies）

5.1 为什么需要运行策略？

Agent 的执行需要管理：

什么时候执行什么任务？
资源如何分配？
出错怎么办？
什么时候需要人工介入？

运行策略的核心目标：让 Agent 的执行可控、可观测、可干预。

5.2 运行策略的四大维度

┌─────────────────────────────────────────────────────────┐
│              运行策略的四大维度                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  1. 调度策略（Scheduling）                        │   │
│  │                                                 │   │
│  │  • 优先级队列：Critical > High > Medium > Low    │   │
│  │  • 资源匹配：任务需求与 Agent 能力匹配             │   │
│  │  • 依赖管理：确保任务按依赖顺序执行                │   │
│  │  • 负载均衡：均匀分配，避免单点过载                │   │
│  │  • 抢占策略：高优先级可抢占低优先级任务            │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  2. 资源策略（Resource）                          │   │
│  │                                                 │   │
│  │  • 配额管理：每个任务/用户的资源上限               │   │
│  │  • 动态扩缩容：根据负载自动调整 Agent 数量         │   │
│  │  • 资源回收：空闲资源自动释放                      │   │
│  │  • 成本优化：Spot 实例、预留实例等                 │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  3. 容错策略（Fault Tolerance）                   │   │
│  │                                                 │   │
│  │  • 重试机制：指数退避、最大重试次数                │   │
│  │  • 故障转移：Agent 故障时任务迁移                  │   │
│  │  • 熔断机制：失败率过高时暂停服务                  │   │
│  │  • 降级策略：资源不足时降低服务质量                │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
│  ┌─────────────────────────────────────────────────┐   │
│  │  4. 协作策略（Collaboration）                     │   │
│  │                                                 │   │
│  │  • 审批流程：关键操作需要人工确认                  │   │
│  │  • 通知机制：重要事件通知相关人员                  │   │
│  │  • 升级策略：自动处理失败时升级给人类              │   │
│  │  • 知识共享：Agent 间共享经验和最佳实践            │   │
│  └─────────────────────────────────────────────────┘   │
│                                                         │
└─────────────────────────────────────────────────────────┘

5.3 运行策略配置示例

yaml

复制

# runtime-policies.yaml
scheduling:
  priority_levels:
    - name: critical
      weight: 100
      max_wait_time: 1m
      preemption: true
    
    - name: high
      weight: 50
      max_wait_time: 5m
      preemption: false
    
    - name: medium
      weight: 20
      max_wait_time: 30m
      preemption: false
    
    - name: low
      weight: 10
      max_wait_time: 2h
      preemption: false
  
  resource_matching:
    cpu_weight: 0.3
    memory_weight: 0.3
    skill_match_weight: 0.4
  
  dependencies:
    resolution_strategy: topological_sort
    max_dependency_depth: 10

resource:
  quotas:
    per_task:
      max_cpu: 4
      max_memory: 8Gi
      max_duration: 24h
      max_storage: 100Gi
    
    per_user:
      max_concurrent_tasks: 5
      max_daily_tasks: 50
  
  autoscaling:
    min_agents: 2
    max_agents: 50
    scale_up_threshold: 80  # CPU 使用率
    scale_down_threshold: 30
    cooldown_period: 5m

fault_tolerance:
  retry:
    max_attempts: 3
    backoff_strategy: exponential
    initial_delay: 1s
    max_delay: 60s
  
  circuit_breaker:
    failure_threshold: 5
    recovery_timeout: 60s
    half_open_max_calls: 3
  
  checkpoint:
    enabled: true
    interval: 5m
    max_checkpoints: 50

collaboration:
  approval_required:
    - production_deploy
    - database_migration
    - api_breaking_change
    - security_policy_change
  
  notifications:
    channels:
      - type: slack
        webhook: ${SLACK_WEBHOOK}
        events: [task_failed, approval_needed]
      - type: email
        recipients: [team@company.com]
        events: [daily_summary, weekly_report]
  
  escalation:
    rules:
      - condition: "retry_exhausted"
        action: "notify_lead"
        timeout: 10m
      
      - condition: "security_alert"
        action: "page_oncall"
        timeout: 0
      
      - condition: "cost_anomaly"
        action: "notify_finance"
        threshold: 200  # 超出预算 200%

5.4 运行策略的最佳实践

实践

说明

效果

策略即配置

可动态调整，无需重启

灵活性高

分层策略

全局、项目、任务多级

精细控制

监控策略效果

数据驱动优化

持续改进

优雅降级

资源不足时保核心功能

可靠性高

人工兜底

自动处理失败有后路

质量保证

六、四大技能的协同关系

┌─────────────────────────────────────────────────────────┐
│              四大技能的协同关系                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│                    ┌─────────────┐                     │
│                    │   运行策略    │                     │
│                    │  (Runtime)   │                     │
│                    │             │                     │
│                    │  指挥和调度   │                     │
│                    └──────┬──────┘                     │
│                           │                            │
│              ┌────────────┼────────────┐               │
│              │            │            │               │
│              ▼            ▼            ▼               │
│        ┌─────────┐  ┌─────────┐  ┌─────────┐          │
│        │ 持久执行 │  │ 闭环测试 │  │ 架构约束 │          │
│        │(Durable)│  │(Testing)│  │(Constraints)│       │
│        │         │  │         │  │         │          │
│        │ 保障可靠 │  │ 确保质量 │  │ 定义边界 │          │
│        └────┬────┘  └────┬────┘  └────┬────┘          │
│             │            │            │               │
│             └────────────┼────────────┘               │
│                          │                            │
│                          ▼                            │
│                    ┌─────────────┐                     │
│                    │   AI Agent   │                     │
│                    │             │                     │
│                    │  高效、可靠、  │                     │
│                    │  高质量地工作  │                     │
│                    └─────────────┘                     │
│                                                         │
│   运行策略是大脑，其他三者是手脚，共同服务于 Agent        │
│                                                         │
└─────────────────────────────────────────────────────────┘

七、技能落地路线图

┌─────────────────────────────────────────────────────────┐
│              四大技能落地路线图                          │
├─────────────────────────────────────────────────────────┤
│                                                         │
│  第 1 阶段：基础建设（1-2 周）                            │
│  ├── 建立项目模板和脚手架                                │
│  │   └── 实现架构约束的基础层                            │
│  ├── 配置 Linter 和 Formatter                           │
│  │   └── 实现代码约束                                    │
│  └── 搭建基础 CI/CD 流水线                               │
│      └── 实现最简单的闭环测试                            │
│                                                         │
│  第 2 阶段：核心能力（2-4 周）                            │
│  ├── 实现检查点机制                                      │
│  │   └── 持久执行的基础能力                              │
│  ├── 完善测试体系                                        │
│  │   └── 单元测试 → 集成测试 → 端到端测试                 │
│  ├── 建立反馈机制                                        │
│  │   └── 测试结果自动反馈给 Agent                         │
│  └── 配置运行策略                                        │
│      └── 调度、资源、容错的基础策略                       │
│                                                         │
│  第 3 阶段：优化完善（4-8 周）                            │
│  ├── 优化检查点策略                                      │
│  │   └── 增量保存、压缩、加密                             │
│  ├── 增强反馈质量                                        │
│  │   └── 结构化、可行动的反馈                             │
│  ├── 细化约束规则                                        │
│  │   └── 根据实践持续优化                                 │
│  └── 完善运行策略                                        │
│      └── 自动扩缩容、智能调度                             │
│                                                         │
│  第 4 阶段：规模化应用（8-12 周）                         │
│  ├── 建立 Harness 模板库                                 │
│  ├── 多项目复用和定制                                     │
│  ├── 数据驱动的持续优化                                   │
│  └── 团队培训和推广                                       │
│                                                         │
└─────────────────────────────────────────────────────────┘

八、结语：技能是实践的结晶

OpenAI 提炼的 4 个核心技能，不是理论构想，而是从 5 个月实践中总结出的经验：

持久执行 解决了可靠性问题
闭环测试 解决了质量问题
架构约束 解决了可控性问题
运行策略 解决了管理问题

这 4 个技能相辅相成，共同构成了 Harness Engineering 的技术基础。

掌握这 4 个技能，就掌握了让 AI Agent 高效、可靠、规模化工作的钥匙。

参考与延伸阅读

Harness engineering: leveraging Codex in an agent-first world - OpenAI
Building Secure Systems - 安全系统设计
Site Reliability Engineering - Google SRE