二十三、构建你的第一个 Harness：从零开始的实战指南引言：从理论到实践经过前面21篇博客的系统性介绍，相信你已经

引言：从理论到实践

经过前面21篇博客的系统性介绍，相信你已经对Harness Engineering有了全面的理解。现在是时候动手实践了——本文将带你从零开始，一步步构建一个真实可用的Harness。

我们将以一个实际的业务场景为例：为一个电商系统构建一个"自动添加新支付方式"的Harness。这个场景足够真实，涵盖了Harness设计的核心要素，又不会过于复杂让你迷失在细节中。

一、场景分析：我们要解决什么问题？

1.1 业务背景

假设你在一个电商平台工作，需要支持多种支付方式（支付宝、微信、信用卡等）。每当业务需要接入新的支付方式时，开发流程通常包括：

创建支付处理器类
实现支付接口
编写单元测试
添加集成测试
更新配置文件
编写文档

传统方式：一个资深工程师需要2-3天完成。

Harness方式：设计好环境后，Agent在2-3小时内完成，人工审核30分钟。

1.2 任务拆解

yaml

复制

task: "添加新的支付方式：Apple Pay"
complexity: "中等"
expected_files:
  - "src/payment/apple_pay_processor.py"
  - "tests/unit/test_apple_pay_processor.py"
  - "tests/integration/test_apple_pay_flow.py"
  - "docs/payment_methods/apple_pay.md"
  
constraints:
  - "必须实现PaymentProcessor接口"
  - "必须支持沙箱环境测试"
  - "必须符合PCI DSS安全规范"
  - "代码覆盖率必须>90%"

二、第一步：设计Harness的核心结构

2.1 Harness配置文件

创建 harnesses/payment_integration.yaml：

yaml

复制

# Harness定义文件
harness:
  name: "payment_integration"
  version: "1.0.0"
  description: "用于集成新支付方式的自动化Harness"
  
  # 触发条件
  triggers:
    - type: "issue_label"
      value: "new-payment-method"
    - type: "manual"
  
  # 输入参数
  inputs:
    payment_method:
      type: "string"
      description: "支付方式名称"
      required: true
    payment_provider:
      type: "string"
      description: "支付提供商"
      required: true
    sandbox_credentials:
      type: "secret"
      description: "沙箱环境凭证"
      required: true

2.2 约束定义

yaml

复制

# constraints.yaml
constraints:
  # 架构约束
  architecture:
    required_interface: "PaymentProcessor"
    file_location: "src/payment/"
    naming_convention: "{payment_method}_processor.py"
    
  # 代码质量约束
  code_quality:
    max_complexity: 10
    max_line_length: 100
    type_hints: "required"
    docstrings: "required"
    
  # 安全约束
  security:
    no_hardcoded_secrets: true
    use_vault_for_credentials: true
    sanitize_all_inputs: true
    
  # 测试约束
  testing:
    unit_test_coverage: 90
    integration_test_required: true
    e2e_test_required: false

三、第二步：构建持久化执行层

3.1 Checkpoint设计

python

复制

# harness/payment_integration.py
from dataclasses import dataclass
from typing import Optional, List
from enum import Enum
import json
import hashlib

class StepStatus(Enum):
    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    FAILED = "failed"

@dataclass
class Checkpoint:
    step_id: str
    status: StepStatus
    inputs: dict
    outputs: dict
    error_message: Optional[str] = None
    timestamp: str = ""
    
    def to_dict(self):
        return {
            "step_id": self.step_id,
            "status": self.status.value,
            "inputs": self.inputs,
            "outputs": self.outputs,
            "error_message": self.error_message,
            "timestamp": self.timestamp
        }

class PaymentIntegrationHarness:
    def __init__(self, storage_backend):
        self.storage = storage_backend
        self.checkpoints = {}
        
    def execute(self, task_inputs: dict) -> dict:
        """主执行流程，带Checkpoint"""
        execution_id = self._generate_execution_id(task_inputs)
        
        steps = [
            ("analyze_requirements", self._analyze_requirements),
            ("design_interface", self._design_interface),
            ("implement_processor", self._implement_processor),
            ("write_unit_tests", self._write_unit_tests),
            ("write_integration_tests", self._write_integration_tests),
            ("run_tests", self._run_tests),
            ("generate_documentation", self._generate_documentation),
            ("final_review", self._final_review)
        ]
        
        results = {}
        
        for step_id, step_func in steps:
            # 检查是否有已完成的Checkpoint
            existing = self._load_checkpoint(execution_id, step_id)
            if existing and existing.status == StepStatus.COMPLETED:
                print(f"⏭️  跳过步骤 {step_id}（已缓存）")
                results[step_id] = existing.outputs
                continue
            
            # 执行步骤
            print(f"▶️  执行步骤: {step_id}")
            try:
                with self._checkpoint_context(execution_id, step_id, task_inputs) as checkpoint:
                    outputs = step_func(task_inputs, results)
                    checkpoint.outputs = outputs
                    checkpoint.status = StepStatus.COMPLETED
                    results[step_id] = outputs
                    print(f"✅ 步骤 {step_id} 完成")
                    
            except Exception as e:
                print(f"❌ 步骤 {step_id} 失败: {str(e)}")
                # 可以选择重试或人工介入
                raise RecoverableError(f"步骤 {step_id} 失败", step_id, e)
        
        return results
    
    def _generate_execution_id(self, inputs: dict) -> str:
        """生成执行ID"""
        content = json.dumps(inputs, sort_keys=True)
        return hashlib.md5(content.encode()).hexdigest()[:12]
    
    def _load_checkpoint(self, execution_id: str, step_id: str) -> Optional[Checkpoint]:
        """从存储加载Checkpoint"""
        key = f"{execution_id}:{step_id}"
        return self.checkpoints.get(key)
    
    def _save_checkpoint(self, checkpoint: Checkpoint, execution_id: str):
        """保存Checkpoint"""
        key = f"{execution_id}:{checkpoint.step_id}"
        self.checkpoints[key] = checkpoint
        # 持久化到存储
        self.storage.save(key, checkpoint.to_dict())

3.2 上下文管理器实现

python

复制

# harness/checkpoint_context.py
from contextlib import contextmanager
from datetime import datetime

@contextmanager
def checkpoint_context(harness, execution_id: str, step_id: str, inputs: dict):
    """Checkpoint上下文管理器"""
    checkpoint = Checkpoint(
        step_id=step_id,
        status=StepStatus.IN_PROGRESS,
        inputs=inputs,
        outputs={},
        timestamp=datetime.now().isoformat()
    )
    
    try:
        yield checkpoint
    except Exception as e:
        checkpoint.status = StepStatus.FAILED
        checkpoint.error_message = str(e)
        harness._save_checkpoint(checkpoint, execution_id)
        raise
    else:
        harness._save_checkpoint(checkpoint, execution_id)

四、第三步：构建闭环测试系统

4.1 测试Harness配置

yaml

复制

# test_harness.yaml
test_harness:
  name: "payment_integration_tests"
  
  # 单元测试配置
  unit_tests:
    framework: "pytest"
    coverage:
      target: 90
      fail_under: 85
    auto_generate:
      enabled: true
      strategies:
        - "boundary_value_analysis"
        - "equivalence_partitioning"
        - "error_guessing"
    
  # 集成测试配置
  integration_tests:
    environment:
      type: "docker_compose"
      services:
        - name: "postgres"
          image: "postgres:15"
          env:
            POSTGRES_DB: "payment_test"
        - name: "redis"
          image: "redis:7"
        - name: "payment_mock_server"
          build: "./mocks/payment"
    
    test_cases:
      - name: "successful_payment_flow"
        steps:
          - "create_payment_intent"
          - "process_payment"
          - "verify_webhook"
          - "confirm_completion"
          
      - name: "failed_payment_handling"
        steps:
          - "create_payment_intent"
          - "simulate_failure"
          - "verify_error_handling"
          - "confirm_rollback"
    
  # 反馈循环配置
  feedback_loop:
    on_test_failure:
      action: "auto_fix"
      max_attempts: 3
      strategies:
        - "fix_implementation"
        - "update_test_expectations"
        - "add_missing_mocks"
    
    on_persistent_failure:
      action: "escalate"
      notify:
        - "team_lead"
        - "harness_owner"

4.2 测试执行器

python

复制

# harness/test_executor.py
import subprocess
import json
from typing import Dict, List, Tuple

class TestExecutor:
    def __init__(self, config: dict):
        self.config = config
        
    def run_all_tests(self, project_path: str) -> Dict:
        """运行所有测试并收集结果"""
        results = {
            "unit_tests": self._run_unit_tests(project_path),
            "integration_tests": self._run_integration_tests(project_path),
            "coverage": self._collect_coverage(project_path)
        }
        
        return results
    
    def _run_unit_tests(self, project_path: str) -> Dict:
        """运行单元测试"""
        cmd = [
            "pytest",
            "tests/unit/",
            "-v",
            "--json-report",
            "--json-report-file=/tmp/unit_test_report.json"
        ]
        
        result = subprocess.run(
            cmd,
            cwd=project_path,
            capture_output=True,
            text=True
        )
        
        # 解析结果
        with open("/tmp/unit_test_report.json") as f:
            report = json.load(f)
        
        return {
            "passed": report.get("summary", {}).get("passed", 0),
            "failed": report.get("summary", {}).get("failed", 0),
            "skipped": report.get("summary", {}).get("skipped", 0),
            "success": result.returncode == 0,
            "details": report.get("tests", [])
        }
    
    def _run_integration_tests(self, project_path: str) -> Dict:
        """运行集成测试"""
        # 启动测试环境
        self._start_test_environment(project_path)
        
        try:
            cmd = [
                "pytest",
                "tests/integration/",
                "-v",
                "--integration"
            ]
            
            result = subprocess.run(
                cmd,
                cwd=project_path,
                capture_output=True,
                text=True
            )
            
            return {
                "success": result.returncode == 0,
                "output": result.stdout,
                "errors": result.stderr if result.returncode != 0 else None
            }
        finally:
            # 清理环境
            self._stop_test_environment(project_path)
    
    def analyze_failures(self, test_results: Dict) -> List[Dict]:
        """分析测试失败原因"""
        failures = []
        
        # 分析单元测试失败
        for test in test_results.get("unit_tests", {}).get("details", []):
            if test.get("outcome") == "failed":
                failures.append({
                    "type": "unit_test",
                    "test_name": test.get("nodeid"),
                    "error": test.get("call", {}).get("longrepr", ""),
                    "suggested_fix": self._suggest_fix(test)
                })
        
        return failures
    
    def _suggest_fix(self, failed_test: Dict) -> str:
        """基于失败类型建议修复方案"""
        error_msg = failed_test.get("call", {}).get("longrepr", "")
        
        if "AssertionError" in error_msg:
            return "检查实现逻辑或更新测试期望"
        elif "ImportError" in error_msg:
            return "添加缺失的依赖或修复导入路径"
        elif "Mock" in error_msg:
            return "完善测试Mock设置"
        else:
            return "需要人工分析"

五、第四步：实现Agent执行逻辑

5.1 核心执行步骤

python

复制

# harness/steps.py
from abc import ABC, abstractmethod
from typing import Dict, Any

class HarnessStep(ABC):
    """Harness步骤基类"""
    
    @abstractmethod
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        pass

class AnalyzeRequirementsStep(HarnessStep):
    """需求分析步骤"""
    
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        payment_method = inputs.get("payment_method")
        provider = inputs.get("payment_provider")
        
        # 使用LLM分析需求
        prompt = f"""
        分析以下支付方式的集成需求：
        - 支付方式：{payment_method}
        - 提供商：{provider}
        
        请提供：
        1. 核心功能需求
        2. 必要的API端点
        3. 安全考虑
        4. 测试场景
        """
        
        analysis = self._call_llm(prompt)
        
        return {
            "requirements": analysis,
            "estimated_complexity": self._estimate_complexity(analysis),
            "required_apis": self._extract_apis(analysis)
        }
    
    def _call_llm(self, prompt: str) -> str:
        # 调用LLM API
        pass
    
    def _estimate_complexity(self, analysis: str) -> str:
        # 基于分析结果估算复杂度
        pass
    
    def _extract_apis(self, analysis: str) -> List[str]:
        # 提取需要的API
        pass

class ImplementProcessorStep(HarnessStep):
    """实现处理器步骤"""
    
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        requirements = previous_results.get("analyze_requirements", {})
        
        prompt = f"""
        基于以下需求实现支付处理器：
        
        需求：{requirements}
        
        约束：
        - 必须实现PaymentProcessor接口
        - 必须使用类型提示
        - 必须包含完整的错误处理
        - 代码必须符合PEP8
        
        请生成完整的Python代码。
        """
        
        code = self._call_llm(prompt)
        
        # 验证代码
        validated_code = self._validate_code(code)
        
        # 写入文件
        file_path = self._write_file(validated_code, inputs)
        
        return {
            "file_path": file_path,
            "code": validated_code,
            "lines_of_code": len(validated_code.split("\n"))
        }
    
    def _validate_code(self, code: str) -> str:
        """验证生成的代码"""
        # 语法检查
        # 风格检查
        # 安全检查
        return code
    
    def _write_file(self, code: str, inputs: Dict) -> str:
        """写入文件"""
        payment_method = inputs.get("payment_method", "").lower()
        filename = f"src/payment/{payment_method}_processor.py"
        
        with open(filename, "w") as f:
            f.write(code)
        
        return filename

class WriteTestsStep(HarnessStep):
    """编写测试步骤"""
    
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        implementation = previous_results.get("implement_processor", {})
        
        prompt = f"""
        为以下支付处理器编写全面的单元测试：
        
        代码文件：{implementation.get('file_path')}
        
        要求：
        1. 使用pytest框架
        2. 覆盖所有公共方法
        3. 包含边界情况测试
        4. 使用Mock隔离外部依赖
        5. 目标覆盖率>90%
        
        请生成完整的测试代码。
        """
        
        test_code = self._call_llm(prompt)
        
        # 写入测试文件
        test_file = self._write_test_file(test_code, inputs)
        
        return {
            "test_file": test_file,
            "test_code": test_code
        }

5.2 步骤编排

python

复制

# harness/orchestrator.py
class StepOrchestrator:
    """步骤编排器"""
    
    def __init__(self):
        self.steps = {
            "analyze_requirements": AnalyzeRequirementsStep(),
            "design_interface": DesignInterfaceStep(),
            "implement_processor": ImplementProcessorStep(),
            "write_unit_tests": WriteTestsStep(),
            "write_integration_tests": WriteIntegrationTestsStep(),
            "run_tests": RunTestsStep(),
            "generate_documentation": GenerateDocumentationStep(),
            "final_review": FinalReviewStep()
        }
        
        self.dependencies = {
            "analyze_requirements": [],
            "design_interface": ["analyze_requirements"],
            "implement_processor": ["design_interface"],
            "write_unit_tests": ["implement_processor"],
            "write_integration_tests": ["implement_processor"],
            "run_tests": ["write_unit_tests", "write_integration_tests"],
            "generate_documentation": ["run_tests"],
            "final_review": ["generate_documentation"]
        }
    
    def get_execution_order(self) -> List[str]:
        """获取拓扑排序后的执行顺序"""
        # 使用拓扑排序确定执行顺序
        visited = set()
        order = []
        
        def visit(step_id):
            if step_id in visited:
                return
            visited.add(step_id)
            for dep in self.dependencies.get(step_id, []):
                visit(dep)
            order.append(step_id)
        
        for step_id in self.steps:
            visit(step_id)
        
        return order
    
    def execute_step(self, step_id: str, inputs: Dict, results: Dict) -> Dict:
        """执行单个步骤"""
        step = self.steps[step_id]
        return step.execute(inputs, results)

六、第五步：集成与运行

6.1 主入口

python

复制

# main.py
#!/usr/bin/env python3
import argparse
import yaml
from harness.payment_integration import PaymentIntegrationHarness
from harness.storage import FileStorage

def main():
    parser = argparse.ArgumentParser(description="Payment Integration Harness")
    parser.add_argument("--payment-method", required=True, help="支付方式名称")
    parser.add_argument("--provider", required=True, help="支付提供商")
    parser.add_argument("--config", default="harnesses/payment_integration.yaml", 
                       help="Harness配置文件")
    parser.add_argument("--resume", help="从指定Checkpoint恢复")
    
    args = parser.parse_args()
    
    # 加载配置
    with open(args.config) as f:
        config = yaml.safe_load(f)
    
    # 初始化存储
    storage = FileStorage("./.harness_checkpoints")
    
    # 初始化Harness
    harness = PaymentIntegrationHarness(storage)
    
    # 准备输入
    inputs = {
        "payment_method": args.payment_method,
        "payment_provider": args.provider,
        "config": config
    }
    
    # 执行
    print(f"🚀 启动Harness: 集成 {args.payment_method}")
    print("=" * 50)
    
    try:
        results = harness.execute(inputs)
        
        print("=" * 50)
        print("✅ Harness执行完成!")
        print(f"📁 生成的文件:")
        for step, result in results.items():
            if "file_path" in result:
                print(f"   - {result['file_path']}")
        
        print(f"\n📊 统计:")
        print(f"   - 总步骤数: {len(results)}")
        print(f"   - 成功步骤: {sum(1 for r in results.values() if r)}")
        
    except RecoverableError as e:
        print(f"❌ Harness执行中断: {e}")
        print(f"💡 可以从步骤 '{e.step_id}' 恢复执行")
        print(f"   命令: python main.py --resume {e.step_id} ...")

if __name__ == "__main__":
    main()

6.2 运行示例

bash

复制

# 运行Harness
$ python main.py \
    --payment-method "ApplePay" \
    --provider "Apple"

🚀 启动Harness: 集成 ApplePay
==================================================
▶️  执行步骤: analyze_requirements
✅ 步骤 analyze_requirements 完成
▶️  执行步骤: design_interface
✅ 步骤 design_interface 完成
▶️  执行步骤: implement_processor
✅ 步骤 implement_processor 完成
▶️  执行步骤: write_unit_tests
✅ 步骤 write_unit_tests 完成
▶️  执行步骤: write_integration_tests
✅ 步骤 write_integration_tests 完成
▶️  执行步骤: run_tests
✅ 步骤 run_tests 完成
   - 单元测试: 15个通过，0个失败
   - 覆盖率: 94%
▶️  执行步骤: generate_documentation
✅ 步骤 generate_documentation 完成
▶️  执行步骤: final_review
✅ 步骤 final_review 完成
==================================================
✅ Harness执行完成!
📁 生成的文件:
   - src/payment/applepay_processor.py
   - tests/unit/test_applepay_processor.py
   - tests/integration/test_applepay_flow.py
   - docs/payment_methods/apple_pay.md

📊 统计:
   - 总步骤数: 8
   - 成功步骤: 8

七、第六步：验证与迭代

7.1 人工审核清单

yaml

复制

# review_checklist.yaml
final_review:
  code_quality:
    - "代码是否符合项目规范"
    - "类型提示是否完整"
    - "错误处理是否完善"
    - "日志记录是否适当"
    
  security:
    - "敏感信息是否已移除"
    - "输入验证是否充分"
    - "权限检查是否正确"
    
  testing:
    - "测试覆盖率是否达标"
    - "边界情况是否覆盖"
    - "Mock使用是否恰当"
    
  documentation:
    - "API文档是否完整"
    - "使用示例是否正确"
    - "部署说明是否清晰"
    
  approval:
    auto_merge_conditions:
      - "所有检查通过"
      - "覆盖率>90%"
      - "无安全警告"
    require_human_review: true

7.2 度量与改进

python

复制

# harness/metrics.py
class HarnessMetrics:
    """Harness度量收集"""
    
    def __init__(self):
        self.metrics = []
    
    def record_execution(self, execution_id: str, results: Dict):
        """记录执行指标"""
        metric = {
            "execution_id": execution_id,
            "timestamp": datetime.now().isoformat(),
            "total_steps": len(results),
            "successful_steps": sum(1 for r in results.values() if r),
            "total_time": self._calculate_total_time(results),
            "files_generated": len([r for r in results.values() if "file_path" in r]),
            "test_coverage": results.get("run_tests", {}).get("coverage", 0),
            "llm_calls": self._count_llm_calls(results),
            "token_usage": self._calculate_token_usage(results)
        }
        
        self.metrics.append(metric)
        return metric
    
    def generate_report(self) -> Dict:
        """生成Harness性能报告"""
        if not self.metrics:
            return {}
        
        return {
            "total_executions": len(self.metrics),
            "success_rate": sum(1 for m in self.metrics if m["successful_steps"] == m["total_steps"]) / len(self.metrics),
            "avg_execution_time": sum(m["total_time"] for m in self.metrics) / len(self.metrics),
            "avg_coverage": sum(m["test_coverage"] for m in self.metrics) / len(self.metrics),
            "avg_llm_calls": sum(m["llm_calls"] for m in self.metrics) / len(self.metrics),
            "improvement_trends": self._calculate_trends()
        }

八、完整项目结构

payment-harness/
├── harness/
│   ├── __init__.py
│   ├── payment_integration.py    # 主Harness类
│   ├── checkpoint_context.py     # Checkpoint上下文
│   ├── test_executor.py          # 测试执行器
│   ├── steps.py                  # 执行步骤
│   ├── orchestrator.py           # 步骤编排器
│   ├── storage.py                # 存储后端
│   └── metrics.py                # 度量收集
├── harnesses/
│   └── payment_integration.yaml  # Harness配置
├── constraints/
│   └── constraints.yaml          # 约束定义
├── test_harness/
│   └── test_harness.yaml         # 测试配置
├── templates/
│   ├── processor_template.py     # 代码模板
│   └── test_template.py          # 测试模板
├── src/                          # 生成的代码目录
├── tests/                        # 生成的测试目录
├── docs/                         # 生成的文档目录
├── .harness_checkpoints/         # Checkpoint存储
├── main.py                       # 主入口
└── requirements.txt

九、关键经验总结

9.1 设计原则

渐进式复杂度：从简单场景开始，逐步增加约束
显式优于隐式：所有约束和策略都应该可配置
失败可恢复：每个步骤都应该支持Checkpoint和重试
人工在环：关键决策点保留人工审核机制

9.2 常见陷阱

陷阱

症状

解决方案

过度设计

Harness比任务本身还复杂

从简单开始，按需扩展

约束不足

Agent生成低质量代码

增加明确的约束和检查点

反馈延迟

测试失败后才发现问题

尽早验证，快速反馈

忽视维护

Harness本身成为负担

把Harness当作产品维护

9.3 下一步

完成第一个Harness后，你可以：

增加更多步骤：添加代码审查、性能测试等
支持更多场景：将Harness泛化到其他类型的任务
建立Harness库：创建可复用的Harness模板
集成到CI/CD：将Harness执行纳入自动化流水线

结语：从0到1，再到100

构建第一个Harness是最难的一步。一旦你完成了这个支付集成的Harness，你就掌握了Harness Engineering的核心模式：约束设计、持久化执行、闭环测试、人工审核。

记住，Harness不是一次性的脚本，而是持续演进的产品。随着你对任务理解的深入，不断优化你的Harness，让它变得更智能、更可靠、更易用。

现在，开始构建你的第一个Harness吧！