二十三、构建你的第一个 Harness:从零开始的实战指南

0 阅读1分钟

引言:从理论到实践

经过前面21篇博客的系统性介绍,相信你已经对Harness Engineering有了全面的理解。现在是时候动手实践了——本文将带你从零开始,一步步构建一个真实可用的Harness。

我们将以一个实际的业务场景为例:为一个电商系统构建一个"自动添加新支付方式"的Harness。这个场景足够真实,涵盖了Harness设计的核心要素,又不会过于复杂让你迷失在细节中。

一、场景分析:我们要解决什么问题?

1.1 业务背景

假设你在一个电商平台工作,需要支持多种支付方式(支付宝、微信、信用卡等)。每当业务需要接入新的支付方式时,开发流程通常包括:

  1. 创建支付处理器类
  2. 实现支付接口
  3. 编写单元测试
  4. 添加集成测试
  5. 更新配置文件
  6. 编写文档

传统方式:一个资深工程师需要2-3天完成。

Harness方式:设计好环境后,Agent在2-3小时内完成,人工审核30分钟。

1.2 任务拆解

yaml

复制

task: "添加新的支付方式:Apple Pay"
complexity: "中等"
expected_files:
  - "src/payment/apple_pay_processor.py"
  - "tests/unit/test_apple_pay_processor.py"
  - "tests/integration/test_apple_pay_flow.py"
  - "docs/payment_methods/apple_pay.md"
  
constraints:
  - "必须实现PaymentProcessor接口"
  - "必须支持沙箱环境测试"
  - "必须符合PCI DSS安全规范"
  - "代码覆盖率必须>90%"

二、第一步:设计Harness的核心结构

2.1 Harness配置文件

创建 harnesses/payment_integration.yaml

yaml

复制

# Harness定义文件
harness:
  name: "payment_integration"
  version: "1.0.0"
  description: "用于集成新支付方式的自动化Harness"
  
  # 触发条件
  triggers:
    - type: "issue_label"
      value: "new-payment-method"
    - type: "manual"
  
  # 输入参数
  inputs:
    payment_method:
      type: "string"
      description: "支付方式名称"
      required: true
    payment_provider:
      type: "string"
      description: "支付提供商"
      required: true
    sandbox_credentials:
      type: "secret"
      description: "沙箱环境凭证"
      required: true

2.2 约束定义

yaml

复制

# constraints.yaml
constraints:
  # 架构约束
  architecture:
    required_interface: "PaymentProcessor"
    file_location: "src/payment/"
    naming_convention: "{payment_method}_processor.py"
    
  # 代码质量约束
  code_quality:
    max_complexity: 10
    max_line_length: 100
    type_hints: "required"
    docstrings: "required"
    
  # 安全约束
  security:
    no_hardcoded_secrets: true
    use_vault_for_credentials: true
    sanitize_all_inputs: true
    
  # 测试约束
  testing:
    unit_test_coverage: 90
    integration_test_required: true
    e2e_test_required: false

三、第二步:构建持久化执行层

3.1 Checkpoint设计

python

复制

# harness/payment_integration.py
from dataclasses import dataclass
from typing import Optional, List
from enum import Enum
import json
import hashlib

class StepStatus(Enum):
    PENDING = "pending"
    IN_PROGRESS = "in_progress"
    COMPLETED = "completed"
    FAILED = "failed"

@dataclass
class Checkpoint:
    step_id: str
    status: StepStatus
    inputs: dict
    outputs: dict
    error_message: Optional[str] = None
    timestamp: str = ""
    
    def to_dict(self):
        return {
            "step_id": self.step_id,
            "status": self.status.value,
            "inputs": self.inputs,
            "outputs": self.outputs,
            "error_message": self.error_message,
            "timestamp": self.timestamp
        }

class PaymentIntegrationHarness:
    def __init__(self, storage_backend):
        self.storage = storage_backend
        self.checkpoints = {}
        
    def execute(self, task_inputs: dict) -> dict:
        """主执行流程,带Checkpoint"""
        execution_id = self._generate_execution_id(task_inputs)
        
        steps = [
            ("analyze_requirements", self._analyze_requirements),
            ("design_interface", self._design_interface),
            ("implement_processor", self._implement_processor),
            ("write_unit_tests", self._write_unit_tests),
            ("write_integration_tests", self._write_integration_tests),
            ("run_tests", self._run_tests),
            ("generate_documentation", self._generate_documentation),
            ("final_review", self._final_review)
        ]
        
        results = {}
        
        for step_id, step_func in steps:
            # 检查是否有已完成的Checkpoint
            existing = self._load_checkpoint(execution_id, step_id)
            if existing and existing.status == StepStatus.COMPLETED:
                print(f"⏭️  跳过步骤 {step_id}(已缓存)")
                results[step_id] = existing.outputs
                continue
            
            # 执行步骤
            print(f"▶️  执行步骤: {step_id}")
            try:
                with self._checkpoint_context(execution_id, step_id, task_inputs) as checkpoint:
                    outputs = step_func(task_inputs, results)
                    checkpoint.outputs = outputs
                    checkpoint.status = StepStatus.COMPLETED
                    results[step_id] = outputs
                    print(f"✅ 步骤 {step_id} 完成")
                    
            except Exception as e:
                print(f"❌ 步骤 {step_id} 失败: {str(e)}")
                # 可以选择重试或人工介入
                raise RecoverableError(f"步骤 {step_id} 失败", step_id, e)
        
        return results
    
    def _generate_execution_id(self, inputs: dict) -> str:
        """生成执行ID"""
        content = json.dumps(inputs, sort_keys=True)
        return hashlib.md5(content.encode()).hexdigest()[:12]
    
    def _load_checkpoint(self, execution_id: str, step_id: str) -> Optional[Checkpoint]:
        """从存储加载Checkpoint"""
        key = f"{execution_id}:{step_id}"
        return self.checkpoints.get(key)
    
    def _save_checkpoint(self, checkpoint: Checkpoint, execution_id: str):
        """保存Checkpoint"""
        key = f"{execution_id}:{checkpoint.step_id}"
        self.checkpoints[key] = checkpoint
        # 持久化到存储
        self.storage.save(key, checkpoint.to_dict())

3.2 上下文管理器实现

python

复制

# harness/checkpoint_context.py
from contextlib import contextmanager
from datetime import datetime

@contextmanager
def checkpoint_context(harness, execution_id: str, step_id: str, inputs: dict):
    """Checkpoint上下文管理器"""
    checkpoint = Checkpoint(
        step_id=step_id,
        status=StepStatus.IN_PROGRESS,
        inputs=inputs,
        outputs={},
        timestamp=datetime.now().isoformat()
    )
    
    try:
        yield checkpoint
    except Exception as e:
        checkpoint.status = StepStatus.FAILED
        checkpoint.error_message = str(e)
        harness._save_checkpoint(checkpoint, execution_id)
        raise
    else:
        harness._save_checkpoint(checkpoint, execution_id)

四、第三步:构建闭环测试系统

4.1 测试Harness配置

yaml

复制

# test_harness.yaml
test_harness:
  name: "payment_integration_tests"
  
  # 单元测试配置
  unit_tests:
    framework: "pytest"
    coverage:
      target: 90
      fail_under: 85
    auto_generate:
      enabled: true
      strategies:
        - "boundary_value_analysis"
        - "equivalence_partitioning"
        - "error_guessing"
    
  # 集成测试配置
  integration_tests:
    environment:
      type: "docker_compose"
      services:
        - name: "postgres"
          image: "postgres:15"
          env:
            POSTGRES_DB: "payment_test"
        - name: "redis"
          image: "redis:7"
        - name: "payment_mock_server"
          build: "./mocks/payment"
    
    test_cases:
      - name: "successful_payment_flow"
        steps:
          - "create_payment_intent"
          - "process_payment"
          - "verify_webhook"
          - "confirm_completion"
          
      - name: "failed_payment_handling"
        steps:
          - "create_payment_intent"
          - "simulate_failure"
          - "verify_error_handling"
          - "confirm_rollback"
    
  # 反馈循环配置
  feedback_loop:
    on_test_failure:
      action: "auto_fix"
      max_attempts: 3
      strategies:
        - "fix_implementation"
        - "update_test_expectations"
        - "add_missing_mocks"
    
    on_persistent_failure:
      action: "escalate"
      notify:
        - "team_lead"
        - "harness_owner"

4.2 测试执行器

python

复制

# harness/test_executor.py
import subprocess
import json
from typing import Dict, List, Tuple

class TestExecutor:
    def __init__(self, config: dict):
        self.config = config
        
    def run_all_tests(self, project_path: str) -> Dict:
        """运行所有测试并收集结果"""
        results = {
            "unit_tests": self._run_unit_tests(project_path),
            "integration_tests": self._run_integration_tests(project_path),
            "coverage": self._collect_coverage(project_path)
        }
        
        return results
    
    def _run_unit_tests(self, project_path: str) -> Dict:
        """运行单元测试"""
        cmd = [
            "pytest",
            "tests/unit/",
            "-v",
            "--json-report",
            "--json-report-file=/tmp/unit_test_report.json"
        ]
        
        result = subprocess.run(
            cmd,
            cwd=project_path,
            capture_output=True,
            text=True
        )
        
        # 解析结果
        with open("/tmp/unit_test_report.json") as f:
            report = json.load(f)
        
        return {
            "passed": report.get("summary", {}).get("passed", 0),
            "failed": report.get("summary", {}).get("failed", 0),
            "skipped": report.get("summary", {}).get("skipped", 0),
            "success": result.returncode == 0,
            "details": report.get("tests", [])
        }
    
    def _run_integration_tests(self, project_path: str) -> Dict:
        """运行集成测试"""
        # 启动测试环境
        self._start_test_environment(project_path)
        
        try:
            cmd = [
                "pytest",
                "tests/integration/",
                "-v",
                "--integration"
            ]
            
            result = subprocess.run(
                cmd,
                cwd=project_path,
                capture_output=True,
                text=True
            )
            
            return {
                "success": result.returncode == 0,
                "output": result.stdout,
                "errors": result.stderr if result.returncode != 0 else None
            }
        finally:
            # 清理环境
            self._stop_test_environment(project_path)
    
    def analyze_failures(self, test_results: Dict) -> List[Dict]:
        """分析测试失败原因"""
        failures = []
        
        # 分析单元测试失败
        for test in test_results.get("unit_tests", {}).get("details", []):
            if test.get("outcome") == "failed":
                failures.append({
                    "type": "unit_test",
                    "test_name": test.get("nodeid"),
                    "error": test.get("call", {}).get("longrepr", ""),
                    "suggested_fix": self._suggest_fix(test)
                })
        
        return failures
    
    def _suggest_fix(self, failed_test: Dict) -> str:
        """基于失败类型建议修复方案"""
        error_msg = failed_test.get("call", {}).get("longrepr", "")
        
        if "AssertionError" in error_msg:
            return "检查实现逻辑或更新测试期望"
        elif "ImportError" in error_msg:
            return "添加缺失的依赖或修复导入路径"
        elif "Mock" in error_msg:
            return "完善测试Mock设置"
        else:
            return "需要人工分析"

五、第四步:实现Agent执行逻辑

5.1 核心执行步骤

python

复制

# harness/steps.py
from abc import ABC, abstractmethod
from typing import Dict, Any

class HarnessStep(ABC):
    """Harness步骤基类"""
    
    @abstractmethod
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        pass

class AnalyzeRequirementsStep(HarnessStep):
    """需求分析步骤"""
    
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        payment_method = inputs.get("payment_method")
        provider = inputs.get("payment_provider")
        
        # 使用LLM分析需求
        prompt = f"""
        分析以下支付方式的集成需求:
        - 支付方式:{payment_method}
        - 提供商:{provider}
        
        请提供:
        1. 核心功能需求
        2. 必要的API端点
        3. 安全考虑
        4. 测试场景
        """
        
        analysis = self._call_llm(prompt)
        
        return {
            "requirements": analysis,
            "estimated_complexity": self._estimate_complexity(analysis),
            "required_apis": self._extract_apis(analysis)
        }
    
    def _call_llm(self, prompt: str) -> str:
        # 调用LLM API
        pass
    
    def _estimate_complexity(self, analysis: str) -> str:
        # 基于分析结果估算复杂度
        pass
    
    def _extract_apis(self, analysis: str) -> List[str]:
        # 提取需要的API
        pass

class ImplementProcessorStep(HarnessStep):
    """实现处理器步骤"""
    
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        requirements = previous_results.get("analyze_requirements", {})
        
        prompt = f"""
        基于以下需求实现支付处理器:
        
        需求:{requirements}
        
        约束:
        - 必须实现PaymentProcessor接口
        - 必须使用类型提示
        - 必须包含完整的错误处理
        - 代码必须符合PEP8
        
        请生成完整的Python代码。
        """
        
        code = self._call_llm(prompt)
        
        # 验证代码
        validated_code = self._validate_code(code)
        
        # 写入文件
        file_path = self._write_file(validated_code, inputs)
        
        return {
            "file_path": file_path,
            "code": validated_code,
            "lines_of_code": len(validated_code.split("\n"))
        }
    
    def _validate_code(self, code: str) -> str:
        """验证生成的代码"""
        # 语法检查
        # 风格检查
        # 安全检查
        return code
    
    def _write_file(self, code: str, inputs: Dict) -> str:
        """写入文件"""
        payment_method = inputs.get("payment_method", "").lower()
        filename = f"src/payment/{payment_method}_processor.py"
        
        with open(filename, "w") as f:
            f.write(code)
        
        return filename

class WriteTestsStep(HarnessStep):
    """编写测试步骤"""
    
    def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
        implementation = previous_results.get("implement_processor", {})
        
        prompt = f"""
        为以下支付处理器编写全面的单元测试:
        
        代码文件:{implementation.get('file_path')}
        
        要求:
        1. 使用pytest框架
        2. 覆盖所有公共方法
        3. 包含边界情况测试
        4. 使用Mock隔离外部依赖
        5. 目标覆盖率>90%
        
        请生成完整的测试代码。
        """
        
        test_code = self._call_llm(prompt)
        
        # 写入测试文件
        test_file = self._write_test_file(test_code, inputs)
        
        return {
            "test_file": test_file,
            "test_code": test_code
        }

5.2 步骤编排

python

复制

# harness/orchestrator.py
class StepOrchestrator:
    """步骤编排器"""
    
    def __init__(self):
        self.steps = {
            "analyze_requirements": AnalyzeRequirementsStep(),
            "design_interface": DesignInterfaceStep(),
            "implement_processor": ImplementProcessorStep(),
            "write_unit_tests": WriteTestsStep(),
            "write_integration_tests": WriteIntegrationTestsStep(),
            "run_tests": RunTestsStep(),
            "generate_documentation": GenerateDocumentationStep(),
            "final_review": FinalReviewStep()
        }
        
        self.dependencies = {
            "analyze_requirements": [],
            "design_interface": ["analyze_requirements"],
            "implement_processor": ["design_interface"],
            "write_unit_tests": ["implement_processor"],
            "write_integration_tests": ["implement_processor"],
            "run_tests": ["write_unit_tests", "write_integration_tests"],
            "generate_documentation": ["run_tests"],
            "final_review": ["generate_documentation"]
        }
    
    def get_execution_order(self) -> List[str]:
        """获取拓扑排序后的执行顺序"""
        # 使用拓扑排序确定执行顺序
        visited = set()
        order = []
        
        def visit(step_id):
            if step_id in visited:
                return
            visited.add(step_id)
            for dep in self.dependencies.get(step_id, []):
                visit(dep)
            order.append(step_id)
        
        for step_id in self.steps:
            visit(step_id)
        
        return order
    
    def execute_step(self, step_id: str, inputs: Dict, results: Dict) -> Dict:
        """执行单个步骤"""
        step = self.steps[step_id]
        return step.execute(inputs, results)

六、第五步:集成与运行

6.1 主入口

python

复制

# main.py
#!/usr/bin/env python3
import argparse
import yaml
from harness.payment_integration import PaymentIntegrationHarness
from harness.storage import FileStorage

def main():
    parser = argparse.ArgumentParser(description="Payment Integration Harness")
    parser.add_argument("--payment-method", required=True, help="支付方式名称")
    parser.add_argument("--provider", required=True, help="支付提供商")
    parser.add_argument("--config", default="harnesses/payment_integration.yaml", 
                       help="Harness配置文件")
    parser.add_argument("--resume", help="从指定Checkpoint恢复")
    
    args = parser.parse_args()
    
    # 加载配置
    with open(args.config) as f:
        config = yaml.safe_load(f)
    
    # 初始化存储
    storage = FileStorage("./.harness_checkpoints")
    
    # 初始化Harness
    harness = PaymentIntegrationHarness(storage)
    
    # 准备输入
    inputs = {
        "payment_method": args.payment_method,
        "payment_provider": args.provider,
        "config": config
    }
    
    # 执行
    print(f"🚀 启动Harness: 集成 {args.payment_method}")
    print("=" * 50)
    
    try:
        results = harness.execute(inputs)
        
        print("=" * 50)
        print("✅ Harness执行完成!")
        print(f"📁 生成的文件:")
        for step, result in results.items():
            if "file_path" in result:
                print(f"   - {result['file_path']}")
        
        print(f"\n📊 统计:")
        print(f"   - 总步骤数: {len(results)}")
        print(f"   - 成功步骤: {sum(1 for r in results.values() if r)}")
        
    except RecoverableError as e:
        print(f"❌ Harness执行中断: {e}")
        print(f"💡 可以从步骤 '{e.step_id}' 恢复执行")
        print(f"   命令: python main.py --resume {e.step_id} ...")

if __name__ == "__main__":
    main()

6.2 运行示例

bash

复制

# 运行Harness
$ python main.py \
    --payment-method "ApplePay" \
    --provider "Apple"

🚀 启动Harness: 集成 ApplePay
==================================================
▶️  执行步骤: analyze_requirements
✅ 步骤 analyze_requirements 完成
▶️  执行步骤: design_interface
✅ 步骤 design_interface 完成
▶️  执行步骤: implement_processor
✅ 步骤 implement_processor 完成
▶️  执行步骤: write_unit_tests
✅ 步骤 write_unit_tests 完成
▶️  执行步骤: write_integration_tests
✅ 步骤 write_integration_tests 完成
▶️  执行步骤: run_tests
✅ 步骤 run_tests 完成
   - 单元测试: 15个通过,0个失败
   - 覆盖率: 94%
▶️  执行步骤: generate_documentation
✅ 步骤 generate_documentation 完成
▶️  执行步骤: final_review
✅ 步骤 final_review 完成
==================================================
✅ Harness执行完成!
📁 生成的文件:
   - src/payment/applepay_processor.py
   - tests/unit/test_applepay_processor.py
   - tests/integration/test_applepay_flow.py
   - docs/payment_methods/apple_pay.md

📊 统计:
   - 总步骤数: 8
   - 成功步骤: 8

七、第六步:验证与迭代

7.1 人工审核清单

yaml

复制

# review_checklist.yaml
final_review:
  code_quality:
    - "代码是否符合项目规范"
    - "类型提示是否完整"
    - "错误处理是否完善"
    - "日志记录是否适当"
    
  security:
    - "敏感信息是否已移除"
    - "输入验证是否充分"
    - "权限检查是否正确"
    
  testing:
    - "测试覆盖率是否达标"
    - "边界情况是否覆盖"
    - "Mock使用是否恰当"
    
  documentation:
    - "API文档是否完整"
    - "使用示例是否正确"
    - "部署说明是否清晰"
    
  approval:
    auto_merge_conditions:
      - "所有检查通过"
      - "覆盖率>90%"
      - "无安全警告"
    require_human_review: true

7.2 度量与改进

python

复制

# harness/metrics.py
class HarnessMetrics:
    """Harness度量收集"""
    
    def __init__(self):
        self.metrics = []
    
    def record_execution(self, execution_id: str, results: Dict):
        """记录执行指标"""
        metric = {
            "execution_id": execution_id,
            "timestamp": datetime.now().isoformat(),
            "total_steps": len(results),
            "successful_steps": sum(1 for r in results.values() if r),
            "total_time": self._calculate_total_time(results),
            "files_generated": len([r for r in results.values() if "file_path" in r]),
            "test_coverage": results.get("run_tests", {}).get("coverage", 0),
            "llm_calls": self._count_llm_calls(results),
            "token_usage": self._calculate_token_usage(results)
        }
        
        self.metrics.append(metric)
        return metric
    
    def generate_report(self) -> Dict:
        """生成Harness性能报告"""
        if not self.metrics:
            return {}
        
        return {
            "total_executions": len(self.metrics),
            "success_rate": sum(1 for m in self.metrics if m["successful_steps"] == m["total_steps"]) / len(self.metrics),
            "avg_execution_time": sum(m["total_time"] for m in self.metrics) / len(self.metrics),
            "avg_coverage": sum(m["test_coverage"] for m in self.metrics) / len(self.metrics),
            "avg_llm_calls": sum(m["llm_calls"] for m in self.metrics) / len(self.metrics),
            "improvement_trends": self._calculate_trends()
        }

八、完整项目结构

payment-harness/
├── harness/
│   ├── __init__.py
│   ├── payment_integration.py    # 主Harness类
│   ├── checkpoint_context.py     # Checkpoint上下文
│   ├── test_executor.py          # 测试执行器
│   ├── steps.py                  # 执行步骤
│   ├── orchestrator.py           # 步骤编排器
│   ├── storage.py                # 存储后端
│   └── metrics.py                # 度量收集
├── harnesses/
│   └── payment_integration.yaml  # Harness配置
├── constraints/
│   └── constraints.yaml          # 约束定义
├── test_harness/
│   └── test_harness.yaml         # 测试配置
├── templates/
│   ├── processor_template.py     # 代码模板
│   └── test_template.py          # 测试模板
├── src/                          # 生成的代码目录
├── tests/                        # 生成的测试目录
├── docs/                         # 生成的文档目录
├── .harness_checkpoints/         # Checkpoint存储
├── main.py                       # 主入口
└── requirements.txt

九、关键经验总结

9.1 设计原则

  1. 渐进式复杂度:从简单场景开始,逐步增加约束
  2. 显式优于隐式:所有约束和策略都应该可配置
  3. 失败可恢复:每个步骤都应该支持Checkpoint和重试
  4. 人工在环:关键决策点保留人工审核机制

9.2 常见陷阱

陷阱

症状

解决方案

过度设计

Harness比任务本身还复杂

从简单开始,按需扩展

约束不足

Agent生成低质量代码

增加明确的约束和检查点

反馈延迟

测试失败后才发现问题

尽早验证,快速反馈

忽视维护

Harness本身成为负担

把Harness当作产品维护

9.3 下一步

完成第一个Harness后,你可以:

  1. 增加更多步骤:添加代码审查、性能测试等
  2. 支持更多场景:将Harness泛化到其他类型的任务
  3. 建立Harness库:创建可复用的Harness模板
  4. 集成到CI/CD:将Harness执行纳入自动化流水线

结语:从0到1,再到100

构建第一个Harness是最难的一步。一旦你完成了这个支付集成的Harness,你就掌握了Harness Engineering的核心模式:约束设计、持久化执行、闭环测试、人工审核

记住,Harness不是一次性的脚本,而是持续演进的产品。随着你对任务理解的深入,不断优化你的Harness,让它变得更智能、更可靠、更易用。

现在,开始构建你的第一个Harness吧!