引言:从理论到实践
经过前面21篇博客的系统性介绍,相信你已经对Harness Engineering有了全面的理解。现在是时候动手实践了——本文将带你从零开始,一步步构建一个真实可用的Harness。
我们将以一个实际的业务场景为例:为一个电商系统构建一个"自动添加新支付方式"的Harness。这个场景足够真实,涵盖了Harness设计的核心要素,又不会过于复杂让你迷失在细节中。
一、场景分析:我们要解决什么问题?
1.1 业务背景
假设你在一个电商平台工作,需要支持多种支付方式(支付宝、微信、信用卡等)。每当业务需要接入新的支付方式时,开发流程通常包括:
- 创建支付处理器类
- 实现支付接口
- 编写单元测试
- 添加集成测试
- 更新配置文件
- 编写文档
传统方式:一个资深工程师需要2-3天完成。
Harness方式:设计好环境后,Agent在2-3小时内完成,人工审核30分钟。
1.2 任务拆解
yaml
复制
task: "添加新的支付方式:Apple Pay"
complexity: "中等"
expected_files:
- "src/payment/apple_pay_processor.py"
- "tests/unit/test_apple_pay_processor.py"
- "tests/integration/test_apple_pay_flow.py"
- "docs/payment_methods/apple_pay.md"
constraints:
- "必须实现PaymentProcessor接口"
- "必须支持沙箱环境测试"
- "必须符合PCI DSS安全规范"
- "代码覆盖率必须>90%"
二、第一步:设计Harness的核心结构
2.1 Harness配置文件
创建 harnesses/payment_integration.yaml:
yaml
复制
# Harness定义文件
harness:
name: "payment_integration"
version: "1.0.0"
description: "用于集成新支付方式的自动化Harness"
# 触发条件
triggers:
- type: "issue_label"
value: "new-payment-method"
- type: "manual"
# 输入参数
inputs:
payment_method:
type: "string"
description: "支付方式名称"
required: true
payment_provider:
type: "string"
description: "支付提供商"
required: true
sandbox_credentials:
type: "secret"
description: "沙箱环境凭证"
required: true
2.2 约束定义
yaml
复制
# constraints.yaml
constraints:
# 架构约束
architecture:
required_interface: "PaymentProcessor"
file_location: "src/payment/"
naming_convention: "{payment_method}_processor.py"
# 代码质量约束
code_quality:
max_complexity: 10
max_line_length: 100
type_hints: "required"
docstrings: "required"
# 安全约束
security:
no_hardcoded_secrets: true
use_vault_for_credentials: true
sanitize_all_inputs: true
# 测试约束
testing:
unit_test_coverage: 90
integration_test_required: true
e2e_test_required: false
三、第二步:构建持久化执行层
3.1 Checkpoint设计
python
复制
# harness/payment_integration.py
from dataclasses import dataclass
from typing import Optional, List
from enum import Enum
import json
import hashlib
class StepStatus(Enum):
PENDING = "pending"
IN_PROGRESS = "in_progress"
COMPLETED = "completed"
FAILED = "failed"
@dataclass
class Checkpoint:
step_id: str
status: StepStatus
inputs: dict
outputs: dict
error_message: Optional[str] = None
timestamp: str = ""
def to_dict(self):
return {
"step_id": self.step_id,
"status": self.status.value,
"inputs": self.inputs,
"outputs": self.outputs,
"error_message": self.error_message,
"timestamp": self.timestamp
}
class PaymentIntegrationHarness:
def __init__(self, storage_backend):
self.storage = storage_backend
self.checkpoints = {}
def execute(self, task_inputs: dict) -> dict:
"""主执行流程,带Checkpoint"""
execution_id = self._generate_execution_id(task_inputs)
steps = [
("analyze_requirements", self._analyze_requirements),
("design_interface", self._design_interface),
("implement_processor", self._implement_processor),
("write_unit_tests", self._write_unit_tests),
("write_integration_tests", self._write_integration_tests),
("run_tests", self._run_tests),
("generate_documentation", self._generate_documentation),
("final_review", self._final_review)
]
results = {}
for step_id, step_func in steps:
# 检查是否有已完成的Checkpoint
existing = self._load_checkpoint(execution_id, step_id)
if existing and existing.status == StepStatus.COMPLETED:
print(f"⏭️ 跳过步骤 {step_id}(已缓存)")
results[step_id] = existing.outputs
continue
# 执行步骤
print(f"▶️ 执行步骤: {step_id}")
try:
with self._checkpoint_context(execution_id, step_id, task_inputs) as checkpoint:
outputs = step_func(task_inputs, results)
checkpoint.outputs = outputs
checkpoint.status = StepStatus.COMPLETED
results[step_id] = outputs
print(f"✅ 步骤 {step_id} 完成")
except Exception as e:
print(f"❌ 步骤 {step_id} 失败: {str(e)}")
# 可以选择重试或人工介入
raise RecoverableError(f"步骤 {step_id} 失败", step_id, e)
return results
def _generate_execution_id(self, inputs: dict) -> str:
"""生成执行ID"""
content = json.dumps(inputs, sort_keys=True)
return hashlib.md5(content.encode()).hexdigest()[:12]
def _load_checkpoint(self, execution_id: str, step_id: str) -> Optional[Checkpoint]:
"""从存储加载Checkpoint"""
key = f"{execution_id}:{step_id}"
return self.checkpoints.get(key)
def _save_checkpoint(self, checkpoint: Checkpoint, execution_id: str):
"""保存Checkpoint"""
key = f"{execution_id}:{checkpoint.step_id}"
self.checkpoints[key] = checkpoint
# 持久化到存储
self.storage.save(key, checkpoint.to_dict())
3.2 上下文管理器实现
python
复制
# harness/checkpoint_context.py
from contextlib import contextmanager
from datetime import datetime
@contextmanager
def checkpoint_context(harness, execution_id: str, step_id: str, inputs: dict):
"""Checkpoint上下文管理器"""
checkpoint = Checkpoint(
step_id=step_id,
status=StepStatus.IN_PROGRESS,
inputs=inputs,
outputs={},
timestamp=datetime.now().isoformat()
)
try:
yield checkpoint
except Exception as e:
checkpoint.status = StepStatus.FAILED
checkpoint.error_message = str(e)
harness._save_checkpoint(checkpoint, execution_id)
raise
else:
harness._save_checkpoint(checkpoint, execution_id)
四、第三步:构建闭环测试系统
4.1 测试Harness配置
yaml
复制
# test_harness.yaml
test_harness:
name: "payment_integration_tests"
# 单元测试配置
unit_tests:
framework: "pytest"
coverage:
target: 90
fail_under: 85
auto_generate:
enabled: true
strategies:
- "boundary_value_analysis"
- "equivalence_partitioning"
- "error_guessing"
# 集成测试配置
integration_tests:
environment:
type: "docker_compose"
services:
- name: "postgres"
image: "postgres:15"
env:
POSTGRES_DB: "payment_test"
- name: "redis"
image: "redis:7"
- name: "payment_mock_server"
build: "./mocks/payment"
test_cases:
- name: "successful_payment_flow"
steps:
- "create_payment_intent"
- "process_payment"
- "verify_webhook"
- "confirm_completion"
- name: "failed_payment_handling"
steps:
- "create_payment_intent"
- "simulate_failure"
- "verify_error_handling"
- "confirm_rollback"
# 反馈循环配置
feedback_loop:
on_test_failure:
action: "auto_fix"
max_attempts: 3
strategies:
- "fix_implementation"
- "update_test_expectations"
- "add_missing_mocks"
on_persistent_failure:
action: "escalate"
notify:
- "team_lead"
- "harness_owner"
4.2 测试执行器
python
复制
# harness/test_executor.py
import subprocess
import json
from typing import Dict, List, Tuple
class TestExecutor:
def __init__(self, config: dict):
self.config = config
def run_all_tests(self, project_path: str) -> Dict:
"""运行所有测试并收集结果"""
results = {
"unit_tests": self._run_unit_tests(project_path),
"integration_tests": self._run_integration_tests(project_path),
"coverage": self._collect_coverage(project_path)
}
return results
def _run_unit_tests(self, project_path: str) -> Dict:
"""运行单元测试"""
cmd = [
"pytest",
"tests/unit/",
"-v",
"--json-report",
"--json-report-file=/tmp/unit_test_report.json"
]
result = subprocess.run(
cmd,
cwd=project_path,
capture_output=True,
text=True
)
# 解析结果
with open("/tmp/unit_test_report.json") as f:
report = json.load(f)
return {
"passed": report.get("summary", {}).get("passed", 0),
"failed": report.get("summary", {}).get("failed", 0),
"skipped": report.get("summary", {}).get("skipped", 0),
"success": result.returncode == 0,
"details": report.get("tests", [])
}
def _run_integration_tests(self, project_path: str) -> Dict:
"""运行集成测试"""
# 启动测试环境
self._start_test_environment(project_path)
try:
cmd = [
"pytest",
"tests/integration/",
"-v",
"--integration"
]
result = subprocess.run(
cmd,
cwd=project_path,
capture_output=True,
text=True
)
return {
"success": result.returncode == 0,
"output": result.stdout,
"errors": result.stderr if result.returncode != 0 else None
}
finally:
# 清理环境
self._stop_test_environment(project_path)
def analyze_failures(self, test_results: Dict) -> List[Dict]:
"""分析测试失败原因"""
failures = []
# 分析单元测试失败
for test in test_results.get("unit_tests", {}).get("details", []):
if test.get("outcome") == "failed":
failures.append({
"type": "unit_test",
"test_name": test.get("nodeid"),
"error": test.get("call", {}).get("longrepr", ""),
"suggested_fix": self._suggest_fix(test)
})
return failures
def _suggest_fix(self, failed_test: Dict) -> str:
"""基于失败类型建议修复方案"""
error_msg = failed_test.get("call", {}).get("longrepr", "")
if "AssertionError" in error_msg:
return "检查实现逻辑或更新测试期望"
elif "ImportError" in error_msg:
return "添加缺失的依赖或修复导入路径"
elif "Mock" in error_msg:
return "完善测试Mock设置"
else:
return "需要人工分析"
五、第四步:实现Agent执行逻辑
5.1 核心执行步骤
python
复制
# harness/steps.py
from abc import ABC, abstractmethod
from typing import Dict, Any
class HarnessStep(ABC):
"""Harness步骤基类"""
@abstractmethod
def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
pass
class AnalyzeRequirementsStep(HarnessStep):
"""需求分析步骤"""
def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
payment_method = inputs.get("payment_method")
provider = inputs.get("payment_provider")
# 使用LLM分析需求
prompt = f"""
分析以下支付方式的集成需求:
- 支付方式:{payment_method}
- 提供商:{provider}
请提供:
1. 核心功能需求
2. 必要的API端点
3. 安全考虑
4. 测试场景
"""
analysis = self._call_llm(prompt)
return {
"requirements": analysis,
"estimated_complexity": self._estimate_complexity(analysis),
"required_apis": self._extract_apis(analysis)
}
def _call_llm(self, prompt: str) -> str:
# 调用LLM API
pass
def _estimate_complexity(self, analysis: str) -> str:
# 基于分析结果估算复杂度
pass
def _extract_apis(self, analysis: str) -> List[str]:
# 提取需要的API
pass
class ImplementProcessorStep(HarnessStep):
"""实现处理器步骤"""
def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
requirements = previous_results.get("analyze_requirements", {})
prompt = f"""
基于以下需求实现支付处理器:
需求:{requirements}
约束:
- 必须实现PaymentProcessor接口
- 必须使用类型提示
- 必须包含完整的错误处理
- 代码必须符合PEP8
请生成完整的Python代码。
"""
code = self._call_llm(prompt)
# 验证代码
validated_code = self._validate_code(code)
# 写入文件
file_path = self._write_file(validated_code, inputs)
return {
"file_path": file_path,
"code": validated_code,
"lines_of_code": len(validated_code.split("\n"))
}
def _validate_code(self, code: str) -> str:
"""验证生成的代码"""
# 语法检查
# 风格检查
# 安全检查
return code
def _write_file(self, code: str, inputs: Dict) -> str:
"""写入文件"""
payment_method = inputs.get("payment_method", "").lower()
filename = f"src/payment/{payment_method}_processor.py"
with open(filename, "w") as f:
f.write(code)
return filename
class WriteTestsStep(HarnessStep):
"""编写测试步骤"""
def execute(self, inputs: Dict, previous_results: Dict) -> Dict:
implementation = previous_results.get("implement_processor", {})
prompt = f"""
为以下支付处理器编写全面的单元测试:
代码文件:{implementation.get('file_path')}
要求:
1. 使用pytest框架
2. 覆盖所有公共方法
3. 包含边界情况测试
4. 使用Mock隔离外部依赖
5. 目标覆盖率>90%
请生成完整的测试代码。
"""
test_code = self._call_llm(prompt)
# 写入测试文件
test_file = self._write_test_file(test_code, inputs)
return {
"test_file": test_file,
"test_code": test_code
}
5.2 步骤编排
python
复制
# harness/orchestrator.py
class StepOrchestrator:
"""步骤编排器"""
def __init__(self):
self.steps = {
"analyze_requirements": AnalyzeRequirementsStep(),
"design_interface": DesignInterfaceStep(),
"implement_processor": ImplementProcessorStep(),
"write_unit_tests": WriteTestsStep(),
"write_integration_tests": WriteIntegrationTestsStep(),
"run_tests": RunTestsStep(),
"generate_documentation": GenerateDocumentationStep(),
"final_review": FinalReviewStep()
}
self.dependencies = {
"analyze_requirements": [],
"design_interface": ["analyze_requirements"],
"implement_processor": ["design_interface"],
"write_unit_tests": ["implement_processor"],
"write_integration_tests": ["implement_processor"],
"run_tests": ["write_unit_tests", "write_integration_tests"],
"generate_documentation": ["run_tests"],
"final_review": ["generate_documentation"]
}
def get_execution_order(self) -> List[str]:
"""获取拓扑排序后的执行顺序"""
# 使用拓扑排序确定执行顺序
visited = set()
order = []
def visit(step_id):
if step_id in visited:
return
visited.add(step_id)
for dep in self.dependencies.get(step_id, []):
visit(dep)
order.append(step_id)
for step_id in self.steps:
visit(step_id)
return order
def execute_step(self, step_id: str, inputs: Dict, results: Dict) -> Dict:
"""执行单个步骤"""
step = self.steps[step_id]
return step.execute(inputs, results)
六、第五步:集成与运行
6.1 主入口
python
复制
# main.py
#!/usr/bin/env python3
import argparse
import yaml
from harness.payment_integration import PaymentIntegrationHarness
from harness.storage import FileStorage
def main():
parser = argparse.ArgumentParser(description="Payment Integration Harness")
parser.add_argument("--payment-method", required=True, help="支付方式名称")
parser.add_argument("--provider", required=True, help="支付提供商")
parser.add_argument("--config", default="harnesses/payment_integration.yaml",
help="Harness配置文件")
parser.add_argument("--resume", help="从指定Checkpoint恢复")
args = parser.parse_args()
# 加载配置
with open(args.config) as f:
config = yaml.safe_load(f)
# 初始化存储
storage = FileStorage("./.harness_checkpoints")
# 初始化Harness
harness = PaymentIntegrationHarness(storage)
# 准备输入
inputs = {
"payment_method": args.payment_method,
"payment_provider": args.provider,
"config": config
}
# 执行
print(f"🚀 启动Harness: 集成 {args.payment_method}")
print("=" * 50)
try:
results = harness.execute(inputs)
print("=" * 50)
print("✅ Harness执行完成!")
print(f"📁 生成的文件:")
for step, result in results.items():
if "file_path" in result:
print(f" - {result['file_path']}")
print(f"\n📊 统计:")
print(f" - 总步骤数: {len(results)}")
print(f" - 成功步骤: {sum(1 for r in results.values() if r)}")
except RecoverableError as e:
print(f"❌ Harness执行中断: {e}")
print(f"💡 可以从步骤 '{e.step_id}' 恢复执行")
print(f" 命令: python main.py --resume {e.step_id} ...")
if __name__ == "__main__":
main()
6.2 运行示例
bash
复制
# 运行Harness
$ python main.py \
--payment-method "ApplePay" \
--provider "Apple"
🚀 启动Harness: 集成 ApplePay
==================================================
▶️ 执行步骤: analyze_requirements
✅ 步骤 analyze_requirements 完成
▶️ 执行步骤: design_interface
✅ 步骤 design_interface 完成
▶️ 执行步骤: implement_processor
✅ 步骤 implement_processor 完成
▶️ 执行步骤: write_unit_tests
✅ 步骤 write_unit_tests 完成
▶️ 执行步骤: write_integration_tests
✅ 步骤 write_integration_tests 完成
▶️ 执行步骤: run_tests
✅ 步骤 run_tests 完成
- 单元测试: 15个通过,0个失败
- 覆盖率: 94%
▶️ 执行步骤: generate_documentation
✅ 步骤 generate_documentation 完成
▶️ 执行步骤: final_review
✅ 步骤 final_review 完成
==================================================
✅ Harness执行完成!
📁 生成的文件:
- src/payment/applepay_processor.py
- tests/unit/test_applepay_processor.py
- tests/integration/test_applepay_flow.py
- docs/payment_methods/apple_pay.md
📊 统计:
- 总步骤数: 8
- 成功步骤: 8
七、第六步:验证与迭代
7.1 人工审核清单
yaml
复制
# review_checklist.yaml
final_review:
code_quality:
- "代码是否符合项目规范"
- "类型提示是否完整"
- "错误处理是否完善"
- "日志记录是否适当"
security:
- "敏感信息是否已移除"
- "输入验证是否充分"
- "权限检查是否正确"
testing:
- "测试覆盖率是否达标"
- "边界情况是否覆盖"
- "Mock使用是否恰当"
documentation:
- "API文档是否完整"
- "使用示例是否正确"
- "部署说明是否清晰"
approval:
auto_merge_conditions:
- "所有检查通过"
- "覆盖率>90%"
- "无安全警告"
require_human_review: true
7.2 度量与改进
python
复制
# harness/metrics.py
class HarnessMetrics:
"""Harness度量收集"""
def __init__(self):
self.metrics = []
def record_execution(self, execution_id: str, results: Dict):
"""记录执行指标"""
metric = {
"execution_id": execution_id,
"timestamp": datetime.now().isoformat(),
"total_steps": len(results),
"successful_steps": sum(1 for r in results.values() if r),
"total_time": self._calculate_total_time(results),
"files_generated": len([r for r in results.values() if "file_path" in r]),
"test_coverage": results.get("run_tests", {}).get("coverage", 0),
"llm_calls": self._count_llm_calls(results),
"token_usage": self._calculate_token_usage(results)
}
self.metrics.append(metric)
return metric
def generate_report(self) -> Dict:
"""生成Harness性能报告"""
if not self.metrics:
return {}
return {
"total_executions": len(self.metrics),
"success_rate": sum(1 for m in self.metrics if m["successful_steps"] == m["total_steps"]) / len(self.metrics),
"avg_execution_time": sum(m["total_time"] for m in self.metrics) / len(self.metrics),
"avg_coverage": sum(m["test_coverage"] for m in self.metrics) / len(self.metrics),
"avg_llm_calls": sum(m["llm_calls"] for m in self.metrics) / len(self.metrics),
"improvement_trends": self._calculate_trends()
}
八、完整项目结构
payment-harness/
├── harness/
│ ├── __init__.py
│ ├── payment_integration.py # 主Harness类
│ ├── checkpoint_context.py # Checkpoint上下文
│ ├── test_executor.py # 测试执行器
│ ├── steps.py # 执行步骤
│ ├── orchestrator.py # 步骤编排器
│ ├── storage.py # 存储后端
│ └── metrics.py # 度量收集
├── harnesses/
│ └── payment_integration.yaml # Harness配置
├── constraints/
│ └── constraints.yaml # 约束定义
├── test_harness/
│ └── test_harness.yaml # 测试配置
├── templates/
│ ├── processor_template.py # 代码模板
│ └── test_template.py # 测试模板
├── src/ # 生成的代码目录
├── tests/ # 生成的测试目录
├── docs/ # 生成的文档目录
├── .harness_checkpoints/ # Checkpoint存储
├── main.py # 主入口
└── requirements.txt
九、关键经验总结
9.1 设计原则
- 渐进式复杂度:从简单场景开始,逐步增加约束
- 显式优于隐式:所有约束和策略都应该可配置
- 失败可恢复:每个步骤都应该支持Checkpoint和重试
- 人工在环:关键决策点保留人工审核机制
9.2 常见陷阱
陷阱
症状
解决方案
过度设计
Harness比任务本身还复杂
从简单开始,按需扩展
约束不足
Agent生成低质量代码
增加明确的约束和检查点
反馈延迟
测试失败后才发现问题
尽早验证,快速反馈
忽视维护
Harness本身成为负担
把Harness当作产品维护
9.3 下一步
完成第一个Harness后,你可以:
- 增加更多步骤:添加代码审查、性能测试等
- 支持更多场景:将Harness泛化到其他类型的任务
- 建立Harness库:创建可复用的Harness模板
- 集成到CI/CD:将Harness执行纳入自动化流水线
结语:从0到1,再到100
构建第一个Harness是最难的一步。一旦你完成了这个支付集成的Harness,你就掌握了Harness Engineering的核心模式:约束设计、持久化执行、闭环测试、人工审核。
记住,Harness不是一次性的脚本,而是持续演进的产品。随着你对任务理解的深入,不断优化你的Harness,让它变得更智能、更可靠、更易用。
现在,开始构建你的第一个Harness吧!