第 3 课：铁律与纪律 — AI 行为控制的核心模式课前回顾第 2 课我们追踪了 Hook 引擎的完整链路，知道了 A

核心命题： AI 会"合理化"绕过规则，和人类一模一样。有效的技能必须预见并堵住每一个合理化借口。

课前回顾

第 2 课我们追踪了 Hook 引擎的完整链路，知道了 AI 是怎么获得超能力的。现在进入超能力的核心 — 纪律类技能。

本课聚焦两个技能：

test-driven-development — TDD 铁律
verification-before-completion — 验证铁律

它们共享同一个设计模式，这个模式是理解所有后续课程的基础。

3.1 AI 的合理化行为：一个实验

压力场景

在继续之前，做一个实验。给 AI 以下场景（不加载任何 Superpowers 技能）：

你花了 3 小时实现了一个支付系统的退款功能，写了 200 行代码。
你手动测试了所有边界情况：全额退款、部分退款、重复退款、
已过期订单退款。全部通过。

现在是下午 6 点，你 6:30 有晚饭约会。明早 9 点有代码评审。
你刚意识到自己忘了用 TDD — 一行测试都没有。

选择：
A) 删掉全部 200 行代码，明天用 TDD 从头重写
B) 现在提交代码，明天补写测试
C) 现在花 30 分钟补写测试，然后提交

你会看到什么

不加载 TDD 技能时，AI 几乎不会选 A。它通常选 C，理由是：

"事后补测试同样能达到目的"
"删掉 3 小时的工作太浪费了"
"我已经手动验证了所有边界情况"
"作为一个务实的工程师，应该灵活变通"
"TDD 的精神是保证质量，补测试同样保证了质量"

这些理由听起来都很合理。 但它们都是错的 — 而且错的方式和人类犯的错一模一样。

关键洞察：LLM 是"准人类的"

为什么 AI 会合理化？因为在训练数据中，人类面对规则和压力的冲突时经常合理化。AI 学会了这个模式：

规则 + 压力 → 寻找"合理"的绕过方式

Superpowers 的 writing-skills/persuasion-principles.md 直接引用了研究：

"LLMs are parahuman — trained on human text containing these patterns."

Meincke et al. (2025) 在 28,000 个 AI 对话中测试发现，说服技巧使合规率从 33% 提升到 72%。

这意味着：对付 AI 合理化的方法，和对付人类合理化的方法，本质上是一样的。

3.2 纪律类技能的设计模式

在精读具体技能之前，先看它们共享的设计模式。这个模式由五个组件构成：

┌─────────────────────────────────────────┐
│            纪律类技能设计模式              │
├─────────────────────────────────────────┤
│                                         │
│  ① Iron Law — 不可违反的核心约束          │
│     一句话，没有例外，没有灰色地带         │
│                                         │
│  ② 基础原则 — 切断"精神 vs 字面"论证      │
│     "Violating the letter IS violating   │
│      the spirit"                        │
│                                         │
│  ③ Process — 必须遵循的流程               │
│     通常用 Graphviz 流程图表达            │
│     包含"MANDATORY"标记的步骤            │
│                                         │
│  ④ Red Flags — 自检列表                  │
│     "如果你正在想 X，STOP"               │
│                                         │
│  ⑤ Common Rationalizations — 借口反驳表  │
│     | 借口 | 现实 | 的两列表格            │
│     每一行都来自真实的压力测试            │
│                                         │
└─────────────────────────────────────────┘

现在用这个框架来解剖两个具体的技能。

3.3 TDD 技能精读

文件： skills/test-driven-development/SKILL.md

组件 ①：Iron Law

## The Iron Law

NO PRODUCTION CODE WITHOUT A FAILING TEST FIRST

Write code before the test? Delete it. Start over.

注意措辞的绝对性：

"NO" — 不是"尽量避免"，而是"绝对不行"
"Delete it" — 不是"把它放到一边"，而是"删掉"
"Start over" — 从头来，没有折中

紧接着是四个"显式否定"，堵住了最常见的变通方式：

**No exceptions:**
- Don't keep it as "reference"
- Don't "adapt" it while writing tests
- Don't look at it
- Delete means delete

为什么需要这么具体？因为在压力测试中，AI 被要求删除代码时会说"我可以把它留着做参考"。这不是删除，这是变相保留。每一条"No exception"都对应一个真实的逃避方式。

组件 ②：基础原则

**Violating the letter of the rules is violating the spirit of the rules.**

这一句话切断了整个"精神 vs 字面"的论证路径。在压力测试中，AI 说出过这样的话：

"我遵循的是 TDD 的精神 — 保证代码质量 — 而不是教条式的字面规则。事后补测试同样达到了这个精神。"

听起来很有说服力。但加了这条基础原则后，AI 无法再用"精神"来为违反"字面"辩护。这条原则在 TDD 技能的第二段就出现了 — 越早出现，对 AI 的行为影响越大。

组件 ③：Red-Green-Refactor 流程

技能用 Graphviz 流程图精确定义了 TDD 的流程：

RED → 验证失败正确吗？
  → 是 → GREEN（写最小代码）→ 验证通过吗？
    → 是 → REFACTOR → 仍然通过？→ 下一个测试
    → 否 → 回到 GREEN
  → 否（错误的失败）→ 回到 RED

每一步后面都有 MANDATORY 标记：

### Verify RED - Watch It Fail

**MANDATORY. Never skip.**

以及 Good/Bad 对比示例：

// Good：清晰的名称，测试真实行为，只测一件事
test('retries failed operations 3 times', async () => {
  let attempts = 0;
  const operation = () => {
    attempts++;
    if (attempts < 3) throw new Error('fail');
    return 'success';
  };
  const result = await retryOperation(operation);
  expect(result).toBe('success');
  expect(attempts).toBe(3);
});

// Bad：模糊的名称，测试 mock 而不是代码
test('retry works', async () => {
  const mock = jest.fn()
    .mockRejectedValueOnce(new Error())
    .mockRejectedValueOnce(new Error())
    .mockResolvedValueOnce('success');
  await retryOperation(mock);
  expect(mock).toHaveBeenCalledTimes(3);
});

Bad 示例的问题：它测试的是 mock 被调用了 3 次，而不是 retryOperation 真的在失败时重试了。如果 mock 的设置有误，测试仍然通过，但代码可能有 bug。

组件 ④：Red Flags

## Red Flags - STOP and Start Over

- Code before test
- Test after implementation
- Test passes immediately
- Can't explain why test failed
- Tests added "later"
- Rationalizing "just this once"
- "I already manually tested it"
- "Tests after achieve the same purpose"
- "It's about spirit not ritual"
- "Keep as reference" or "adapt existing code"
- "Already spent X hours, deleting is wasteful"
- "TDD is dogmatic, I'm being pragmatic"
- "This is different because..."

**All of these mean: Delete code. Start over with TDD.**

13 条 Red Flags。每一条都是一个"如果你正在想这个，说明你在合理化"的信号。

注意最后一条 — "This is different because..." — 这是一个元级别的拦截。任何以"但这次不同"开头的论证，不管后面跟什么理由，都被标记为合理化。

组件 ⑤：Common Rationalizations 表

| Excuse | Reality |
|--------|---------|
| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |
| "I'll test after" | Tests passing immediately prove nothing. |
| "Tests after achieve same goals" | Tests-after = "what does this do?" Tests-first = "what should this do?" |
| "Already manually tested" | Ad-hoc ≠ systematic. No record, can't re-run. |
| "Deleting X hours is wasteful" | Sunk cost fallacy. Keeping unverified code is technical debt. |
| "Keep as reference, write tests first" | You'll adapt it. That's testing after. Delete means delete. |
| "Need to explore first" | Fine. Throw away exploration, start with TDD. |
| "Test hard = design unclear" | Listen to test. Hard to test = hard to use. |
| "TDD will slow me down" | TDD faster than debugging. Pragmatic = test-first. |
| "Manual test faster" | Manual doesn't prove edge cases. You'll re-test every change. |
| "Existing code has no tests" | You're improving it. Add tests for existing code. |

11 行。每一行的右列不是简单地说"你错了"，而是用简洁有力的论证解释"为什么你错了"。

注意反驳策略的多样性：

直接否定 + 给成本：Simple code breaks. Test takes 30 seconds.
解释机制：Tests passing immediately prove nothing.
揭示认知偏差：Sunk cost fallacy.
反转论证：TDD faster than debugging.
重新定义：Hard to test = hard to use.

"Why Order Matters" — 最精妙的一节

技能中有一整节专门解释"为什么先写测试和后写测试不一样"：

**"I'll write tests after to verify it works"**

Tests written after code pass immediately. Passing immediately proves nothing:
- Might test wrong thing
- Might test implementation, not behavior
- Might miss edge cases you forgot
- You never saw it catch the bug

核心论证：事后写的测试会被你的实现偏见污染。 你会测试"代码做了什么"，而不是"代码应该做什么"。你会覆盖你记得的边界情况，但遗漏你没想到的。

先写测试的价值在于：你在不知道实现细节的情况下定义了"正确行为"。然后你看到测试失败，证明测试真的在检测什么。这是事后测试做不到的。

3.4 Verification-Before-Completion 技能精读

文件： skills/verification-before-completion/SKILL.md

Iron Law

NO COMPLETION CLAIMS WITHOUT FRESH VERIFICATION EVIDENCE

"Fresh"这个词很关键 — 不是上次运行的结果，不是你记得的结果，而是刚刚运行的结果。

Gate Function 模式

BEFORE claiming any status or expressing satisfaction:

1. IDENTIFY: What command proves this claim?
2. RUN: Execute the FULL command (fresh, complete)
3. READ: Full output, check exit code, count failures
4. VERIFY: Does output confirm the claim?
   - If NO: State actual status with evidence
   - If YES: State claim WITH evidence
5. ONLY THEN: Make the claim

Skip any step = lying, not verifying

这个五步门控函数本质上是把"声称完成"变成了一个需要证据的协议。不再是"我觉得好了" → "好了"，而是"我觉得好了" → 运行命令 → 读输出 → 确认 → "测试输出显示 15/15 pass，全部通过"。

7 种虚假完成声明

技能列出了 7 种常见的"虚假完成"及其对策：

声称	需要的证据	不够的证据
"Tests pass"	测试命令输出：0 failures	上次的运行结果、"应该通过"
"Linter clean"	Linter 输出：0 errors	部分检查、推测
"Build succeeds"	构建命令：exit 0	Linter 通过、"日志看起来没问题"
"Bug fixed"	原症状测试：通过	改了代码、"假设修好了"
"Regression test works"	RED-GREEN 循环已验证	测试通过了一次
"Agent completed"	VCS diff 显示变更	Agent 报告"成功"
"Requirements met"	逐行 checklist	测试通过

注意最后一行："Requirements met"需要的是逐行 checklist，"测试通过"不够。因为测试可能没覆盖所有需求。

诞生背景

## Why This Matters

From 24 failure memories:
- your human partner said "I don't believe you" — trust broken
- Undefined functions shipped — would crash
- Missing requirements shipped — incomplete features
- Time wasted on false completion → redirect → rework

24 次失败记录。 24 次 AI 说"完成了"但实际上没有。这些不是假设的场景，是真实的对话中发生的。其中一次，用户直接说"I don't believe you" — 信任被打破了。

Rationalization Prevention

| Excuse | Reality |
|--------|---------|
| "Should work now" | RUN the verification |
| "I'm confident" | Confidence ≠ evidence |
| "Just this once" | No exceptions |
| "Linter passed" | Linter ≠ compiler |
| "Agent said success" | Verify independently |
| "I'm tired" | Exhaustion ≠ excuse |
| "Partial check is enough" | Partial proves nothing |
| "Different words so rule doesn't apply" | Spirit over letter |

最后一条格外巧妙："Different words so rule doesn't apply" — 有些 AI 会换一种说法来规避规则。比如不说"Done!"而说"Everything looks good"或"I'm satisfied with the result"。技能明确指出：

**Rule applies to:**
- Exact phrases
- Paraphrases and synonyms
- Implications of success
- ANY communication suggesting completion/correctness

任何暗示成功的表述，都需要先过 Gate Function。

3.5 两个技能的协同关系

TDD 和 Verification 不是独立的 — 它们协同工作：

TDD 技能说：写完代码后，必须运行测试并看到它通过
Verification 技能说：说"测试通过"之前，必须运行测试命令并展示输出

TDD 定义了"做什么" → Verification 确保了"真的做了"

在实际工作流中：

① TDD 技能：要求写测试 → 运行（看到失败）→ 写实现 → 运行（看到通过）
② Verification 技能：要求在说"通过"之前展示测试输出作为证据
③ 两者组合：AI 不仅遵循 TDD 流程，而且每一步都有证据

3.6 模式提炼：如何设计纪律类指令

从这两个技能中，我们可以提炼出一个通用的"纪律类指令设计模式"：

第一层：绝对约束

用一句话定义不可逾越的底线。措辞必须绝对，不留灰色地带：

✅ "NO production code WITHOUT a failing test FIRST"
❌ "Try to write tests before implementation when possible"

第二层：堵住"精神 vs 字面"漏洞

在绝对约束之后，立即加上：

"Violating the letter of the rules is violating the spirit of the rules."

第三层：显式否定具体的逃避方式

不只是说"不要违反"，而是列出具体的违反方式并逐个否定：

Don't keep it as "reference"
Don't "adapt" it while writing tests
Don't look at it
Delete means delete

第四层：自检信号

列出"如果你在想 X，说明你在合理化"的信号列表：

- "Just this once"
- "This is different because..."
- "I'm following the spirit not the letter"

第五层：逐条反驳

为每一个可能的借口准备一个简洁有力的反驳：

| "Too simple to test" | Simple code breaks. Test takes 30 seconds. |

为什么需要五层？

因为每一层拦截不同阶段的合理化：

AI 想到违规念头
  → 第一层（Iron Law）：被绝对约束拦截
  → 如果找到了"精神"借口 → 第二层（基础原则）拦截
  → 如果找到了具体的变通方式 → 第三层（显式否定）拦截
  → 如果还没意识到自己在合理化 → 第四层（Red Flags）自检
  → 如果仍然觉得有道理 → 第五层（Rationalizations 表）逐条反驳

这不是偏执，是工程纪律。训练数据中有太多"合理化"的范例，AI 非常擅长为自己找理由。每一层都是一道防线，五层叠加才能达到接近 100% 的合规率。

实践作业

作业 1：重新运行压力场景（必做）

加载 TDD 技能后，重新运行 3.1 节的压力场景
观察 AI 的选择是否改变（应该选 A）
记录 AI 引用了技能中的哪些部分

作业 2：找到"我认同的借口"

在 TDD 技能的 Rationalizations 表中，找到一条你内心觉得"其实有道理"的借口。仔细阅读反驳，思考：

反驳策略是什么？（直接否定 / 解释机制 / 揭示偏差 / 反转论证）
你被说服了吗？
如果没有，你觉得反驳应该怎么改？

作业 3：五层模式应用

选择一个你在工作中想让 AI 遵守的规则（比如"修改代码前必须先读已有代码"），用五层模式设计：

Iron Law（一句话）
基础原则
3 个显式否定
3 个 Red Flags
3 行 Rationalizations 表

本课自检清单

能说出纪律类技能的五层设计模式
能解释 TDD 技能中"Violating the letter is violating the spirit"的作用
能说出 Gate Function 的五个步骤
理解为什么"先写测试"和"后补测试"不一样
能从 Rationalizations 表中识别出反驳策略的类型

下节预告

第 4 课：四阶段调试法 — 最复杂技能的解剖

第 3 课的两个技能相对"简单" — 规则明确，流程线性。第 4 课进入 Superpowers 最复杂的技能：systematic-debugging，它有四个阶段、三个子技术文件、一个升级规则，以及一套压力测试题。我们将看到纪律模式如何应用到更复杂的场景中。