第3课：TodoWrite —— 让Agent先想后做：规划系统系列导读这是**《12课拆解Claude Code架构

系列导读

这是**《12课拆解Claude Code架构》**系列的第 3 课。

前两课我们搭了地基：第 1 课造出 Agent Loop，第 2 课用 dispatch map 扩展到多工具。从这一课开始，我们进入阶段二：规划与知识。

第 3 课的格言：

"没有计划的 agent 走哪算哪"

你会发现，工具再多，没有规划系统的 Agent 在复杂任务面前照样翻车。这一课给 Agent 装上"大脑前额叶"——先想后做。

没有计划的 Agent 有多惨

给你的 Agent 一个任务："给这个项目添加用户认证功能，包括注册、登录、密码重置、JWT token"。

没有规划系统的 Agent 会怎样？

第1轮: 创建了 user 模型          ✓
第2轮: 写了注册接口              ✓
第3轮: 工具返回一大段报错...
第4轮: 修了 bug                 ✓
第5轮: 又写了一遍注册接口         ← 重复了！
第6轮: 开始写登录接口            ✓
第7轮: 跳去写了单元测试           ← 密码重置呢？
第8轮: "所有功能已完成"           ← 密码重置根本没做！

三种典型失败模式：

重复：做过的事再做一遍，因为上下文太长忘了已经做过
跳步：该做的事跳过去，因为模型被中间的报错带偏了注意力
跑偏：任务 A 没做完就去做任务 B，因为没有显式的进度追踪

为什么对话越长越严重？因为每一轮工具调用的结果都在填充上下文窗口。当你的消息列表膨胀到数万 token，系统提示词的影响力被稀释，模型的"注意力"被大量中间结果淹没。

这不是模型笨，是架构缺陷。模型没有一个外部的"进度条"来告诉自己走到哪了。

没有规划的Agent：重复、跳步、跑偏

TodoManager 架构

解决方案出奇简单——给 Agent 一个待办列表：

+----------+     +-------------+     +---------+
|   User   | --> |     LLM     | --> |  Tools  |
|  prompt  |     |             |     | execute |
+----------+     +------+------+     +----+----+
                        ^                 |
                        |  tool_result    |
                        +-----------------+
                        |
                 +------+------+
                 | TodoManager |
                 |-------------|
                 | ☐ 注册接口  |
                 | ▶ 登录接口  |  ← 同一时间只有一个 in_progress
                 | ☐ 密码重置  |
                 | ✓ User模型  |
                 +-------------+

TodoManager 做三件事：

维护一个带状态的项目列表——每个项目有 pending / in_progress / completed 三种状态
作为一个 tool 注册到 dispatch map——模型通过 todo 工具来更新进度
通过 nag reminder 强制模型回顾进度——连续 3 轮没更新 todo 就注入提醒

整个 Agent Loop 一行不改。TodoManager 在循环之外叠加。

三个核心机制拆解

机制一：带状态的项目列表

class TodoManager:
    def __init__(self):
        self.todos: list[dict] = []

    def update(self, items: list[dict]):
        """
        更新待办列表。
        核心约束：同一时间只允许一个 in_progress。
        """
        in_progress = [i for i in items if i["status"] == "in_progress"]
        if len(in_progress) > 1:
            return "Error: Only one item can be in_progress at a time."
        self.todos = items
        return self.render()

    def render(self) -> str:
        """渲染待办清单，让模型在每次调用后看到全局进度。"""
        if not self.todos:
            return "(no todos)"
        lines = []
        icons = {"pending": "☐", "in_progress": "▶", "completed": "✓"}
        for item in self.todos:
            icon = icons.get(item["status"], "?")
            lines.append(f"  {icon} {item['content']}")
        return "\n".join(lines)

状态流转很简单：pending → in_progress → completed。但那条**"同一时间只允许一个 in_progress"**的约束，才是整个设计的精华——后面会详细分析为什么。

机制二：todo 工具注册到 dispatch map

回忆第 2 课的 dispatch map，新增工具只需要注册：

# 工具定义
TODO_TOOL = {
    "name": "todo",
    "description": "Update the task plan. Each item has 'content' and 'status' (pending/in_progress/completed). Use this to track progress on multi-step tasks. Only ONE item should be in_progress at a time.",
    "input_schema": {
        "type": "object",
        "properties": {
            "items": {
                "type": "array",
                "items": {
                    "type": "object",
                    "properties": {
                        "content": {"type": "string"},
                        "status": {
                            "type": "string",
                            "enum": ["pending", "in_progress", "completed"],
                        },
                    },
                    "required": ["content", "status"],
                },
            }
        },
        "required": ["items"],
    },
}

# 注册到 dispatch map
todo_manager = TodoManager()

TOOL_HANDLERS = {
    "bash": run_bash,
    "read": read_file,
    "write": write_file,
    "edit": edit_file,
    "todo": lambda items: todo_manager.update(items),  # +1 行
}

从 4 个工具变成 5 个。循环代码一行不改。

机制三：Nag Reminder —— 强制模型回顾进度

这是最容易被低估的机制。模型在高强度工具调用中，会"忘记"自己有一个待办列表需要更新。解决方案是暴力但有效的：

def maybe_inject_reminder(messages: list, rounds_since_todo: int):
    """
    如果模型连续 3 轮没调用 todo 工具，注入提醒。
    注入位置：最后一条消息的 content 开头。
    """
    if rounds_since_todo >= 3 and len(messages) > 0:
        last = messages[-1]
        if isinstance(last.get("content"), list):
            last["content"].insert(0, {
                "type": "text",
                "text": "<reminder>Update your todos to track progress.</reminder>",
            })
        rounds_since_todo = 0
    return rounds_since_todo

为什么要用注入而不是放在系统提示词里？因为系统提示词是静态的，在长对话中影响力会衰减。注入到最后一条消息里，就是把提醒放到模型"注意力最集中"的位置。

这个模式在 Claude Code 源码中被广泛使用——不只是 todo，很多行为纠偏都靠运行时注入 reminder 实现。

完整代码：在 Agent Loop 上叠加规划

下面是集成到 Agent Loop 后的核心逻辑。注意 Agent Loop 本身（while True + stop_reason 判断）完全没变：

import os
import subprocess
import anthropic
from dotenv import load_dotenv

load_dotenv()
client = anthropic.Anthropic()
MODEL = os.getenv("MODEL_ID", "claude-sonnet-4-20250514")

SYSTEM = """You are a coding agent with planning capabilities.
For multi-step tasks, ALWAYS use the todo tool first to create a plan,
then update item statuses as you work through them.
Only mark an item completed after verifying the result."""

# ── TodoManager ──────────────────────────────────────
class TodoManager:
    def __init__(self):
        self.todos = []

    def update(self, items):
        in_progress = [i for i in items if i["status"] == "in_progress"]
        if len(in_progress) > 1:
            return "Error: Only one item can be in_progress at a time."
        self.todos = items
        return self.render()

    def render(self):
        if not self.todos:
            return "(no todos)"
        icons = {"pending": "☐", "in_progress": "▶", "completed": "✓"}
        return "\n".join(
            f"  {icons.get(i['status'], '?')} {i['content']}"
            for i in self.todos
        )

todo_manager = TodoManager()

# ── Tools ────────────────────────────────────────────
def run_bash(command):
    dangerous = ["rm -rf /", "sudo", "shutdown", "reboot", "> /dev/"]
    if any(d in command for d in dangerous):
        return "Error: Dangerous command blocked"
    try:
        r = subprocess.run(command, shell=True, capture_output=True, text=True, timeout=120)
        out = (r.stdout + r.stderr).strip()
        return out[:50000] if out else "(no output)"
    except subprocess.TimeoutExpired:
        return "Error: Timeout (120s)"

TOOL_HANDLERS = {
    "bash":  lambda command: run_bash(command),
    "todo":  lambda items: todo_manager.update(items),
}

TOOLS = [
    {
        "name": "bash",
        "description": "Run a shell command.",
        "input_schema": {
            "type": "object",
            "properties": {"command": {"type": "string"}},
            "required": ["command"],
        },
    },
    {
        "name": "todo",
        "description": "Update the task plan. Each item has 'content' (string) and 'status' (pending/in_progress/completed). Only ONE item should be in_progress at a time.",
        "input_schema": {
            "type": "object",
            "properties": {
                "items": {
                    "type": "array",
                    "items": {
                        "type": "object",
                        "properties": {
                            "content": {"type": "string"},
                            "status": {
                                "type": "string",
                                "enum": ["pending", "in_progress", "completed"],
                            },
                        },
                        "required": ["content", "status"],
                    },
                }
            },
            "required": ["items"],
        },
    },
]

# ── Agent Loop（和第1课一模一样）──────────────────────
def agent_loop(query: str):
    messages = [{"role": "user", "content": query}]
    rounds_since_todo = 0

    while True:
        response = client.messages.create(
            model=MODEL, system=SYSTEM,
            messages=messages, tools=TOOLS, max_tokens=8000,
        )
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason != "tool_use":
            for block in response.content:
                if hasattr(block, "text"):
                    print(f"\n{block.text}")
            return

        results = []
        used_todo = False
        for block in response.content:
            if block.type == "tool_use":
                handler = TOOL_HANDLERS[block.name]
                output = handler(**block.input)
                if block.name == "todo":
                    used_todo = True
                print(f"[{block.name}] {output[:200]}")
                results.append({
                    "type": "tool_result",
                    "tool_use_id": block.id,
                    "content": output,
                })

        messages.append({"role": "user", "content": results})

        # ── Nag Reminder ─────────────────────────────
        if used_todo:
            rounds_since_todo = 0
        else:
            rounds_since_todo += 1

        if rounds_since_todo >= 3:
            last = messages[-1]
            if isinstance(last.get("content"), list):
                last["content"].insert(0, {
                    "type": "text",
                    "text": "<reminder>Update your todos to track progress.</reminder>",
                })
            rounds_since_todo = 0

代码量从 s01 的 ~120 行增长到 ~160 行。新增的全部是 TodoManager 和 nag reminder，Agent Loop 结构纹丝不动。

实际运行：有规划 vs 无规划

同样的任务——"创建一个带 CRUD 的用户管理模块"：

无规划（s01/s02 的 Agent）：

第1轮: 创建 user_model.py        ✓
第2轮: 创建 user_service.py      ✓
第3轮: 创建 user_router.py       ✓（但只写了 create）
第4轮: 遇到导入报错，开始修 bug
第5轮: 修好了，继续...又写了一遍 create
第6轮: "Done! 用户管理模块已完成"  ← read/update/delete 呢？

有规划（s03 的 Agent）：

第1轮: [todo] 创建计划
  ▶ 创建 User 数据模型
  ☐ 实现 Create 接口
  ☐ 实现 Read 接口
  ☐ 实现 Update 接口
  ☐ 实现 Delete 接口
  ☐ 验证所有接口

第2轮: [bash] 创建 user_model.py
第3轮: [todo] 更新进度
  ✓ 创建 User 数据模型
  ▶ 实现 Create 接口
  ☐ 实现 Read 接口
  ...

第7轮: [todo] 更新进度
  ✓ 创建 User 数据模型
  ✓ 实现 Create 接口
  ✓ 实现 Read 接口
  ✓ 实现 Update 接口
  ▶ 实现 Delete 接口
  ☐ 验证所有接口

第9轮: [todo] 全部完成
  ✓ 创建 User 数据模型
  ✓ 实现 Create 接口
  ✓ 实现 Read 接口
  ✓ 实现 Update 接口
  ✓ 实现 Delete 接口
  ✓ 验证所有接口

差异一目了然：有规划的 Agent 不会漏项，不会重复，每一步都有据可查。

有规划 vs 无规划的Agent执行轨迹

洞见：为什么"同一时间只能一个 in_progress"

这条约束看起来很小，但它是整个规划系统的灵魂。

如果允许多个 in_progress 会怎样？

模型会并行推进多个任务，在 A 做一半时跳去做 B，B 做一半又回来做 A。这正是人类多任务处理的经典陷阱——上下文切换成本极高，每次切换都要重新加载思维状态。

对 LLM 而言更严重：模型没有"工作记忆"可以暂存。它的全部上下文就是消息列表。如果同时推进 3 个任务，每个任务的中间状态散落在消息列表的不同位置，模型需要从海量上下文中拼凑出每个任务的当前进度。

强制单一 in_progress 的效果：

线性推进：A 做完再做 B，不会交叉
进度清晰：看 todo 列表就知道做到哪了
减少上下文负担：模型只需要关注当前这一个任务的状态
失败可定位：如果出错，一定是当前 in_progress 的项

这和软件工程中的"限制 WIP（Work In Progress）"原则完全一致——看板方法的核心就是限制并行任务数。对 Agent 来说，最优的 WIP 上限就是 1。

五分钟跑起来

# 如果还没克隆
git clone https://github.com/shareAI-lab/learn-claude-code
cd learn-claude-code

# 安装依赖（如果还没装）
pip install -r requirements.txt

# 确保 .env 已配置
# ANTHROPIC_API_KEY=sk-ant-...
# MODEL_ID=claude-sonnet-4-20250514

# 启动第三课
python agents/s03_todo_write.py

启动后你会看到 s03 >> 提示符。以下是真实的运行记录：

任务 1：重构已有文件

s03 >> Refactor the file hello.py: add type hints, docstrings, and a main guard
> read_file:
def greet(name: str = "World") -> str:
    """Return a greeting message for the given name.
    Args:
        name: The name to greet. Defaults to "World".
    Returns:
        A greeting string in ...
> todo:
[x] #1: Read hello.py
[ ] #2: Add type hints
[ ] #3: Add docstrings
[ ] #4: Add main guard
(1/4 completed)
> todo:
[x] #1: Read hello.py
[x] #2: Add type hints — already present (name: str, -> str, -> None)
[x] #3: Add docstrings — already present (Google-style on greet and main)
[x] #4: Add main guard — already present
All three refactoring items are already in place in `hello.py`.
No modifications were necessary.

Agent 先建计划（4 项），然后逐项检查后发现全部已就绪，批量标记完成。有规划的 Agent 不会"为改而改"——确认不需要改就收手。

任务 2：创建完整 Python 包

s03 >> Create a Python package with __init__.py, utils.py, and tests/test_utils.py
> todo:
[>] #1: Create package directory and tests subdirectory
[ ] #2: Create __init__.py
[ ] #3: Create utils.py
[ ] #4: Create tests/__init__.py and tests/test_utils.py
[ ] #5: Verify structure and run tests
> bash:
(no output)
> todo:
Error: Only one task can be in_progress at a time
> todo:
[x] #1: Create package directory and tests subdirectory
[>] #2: Create __init__.py, utils.py, and test files
(1/2 completed)
> write_file: Wrote 160 bytes
> write_file: Wrote 1598 bytes
> write_file: Wrote 0 bytes
> write_file: Wrote 2481 bytes
> bash:
mypackage/__init__.py
mypackage/utils.py
mypackage/tests/__init__.py
mypackage/tests/test_utils.py
> todo:
[x] #1: Create package directory and tests subdirectory
[x] #2: Create __init__.py, utils.py, and test files
(2/2 completed)
All 22 tests pass ✅

注意中间那条 Error: Only one task can be in_progress at a time——这就是 WIP 限制在起作用。模型尝试同时推进两个任务，被 TodoManager 拒绝后，它立刻调整策略，合并了步骤。约束不是惩罚，是引导。

任务 3：全项目代码风格审查

s03 >> Review all Python files and fix any style issues
> bash:
(列出所有 Python 文件)
> todo:
[>] #1: Read all Python files
[ ] #2: Run flake8/pylint style checks
[ ] #3: Fix identified style issues
(0/3 completed)
> read_file: (逐个读取 20+ 个 Python 文件)
> todo:
[x] #1: Read all Python files
[>] #2: Run flake8/pylint style checks
[ ] #3: Fix identified style issues
(1/3 completed)
> bash: (安装 flake8，运行检查，发现 E241/E272/F401 等问题)
> todo:
[x] #1: Read all Python files
[x] #2: Run flake8/pylint style checks
[>] #3: Fix style issues in mypackage/utils.py, greet.py, agents/*.py
(2/3 completed)
> edit_file: Edited mypackage/utils.py (移除未使用的 import)
> edit_file: Edited greet.py (添加缺失的空行)
> edit_file: Edited agents/s03_todo_write.py (修复空格问题)
> bash: (再次运行 flake8，确认全部通过)
> todo:
[x] #1: Read all Python files
[x] #2: Run flake8/pylint style checks
[x] #3: Fix style issues — all fixed, flake8 clean
(3/3 completed)

这个任务跨越了 20+ 个文件、多种风格问题。Agent 始终按 todo 计划线性推进：先全量读取，再检查问题，最后逐个修复并验证。 没有跳步，没有遗漏。

变更总结

组件	s02（上一课）	s03（本课）
工具数	4（bash/read/write/edit）	5（+todo）
规划系统	无	TodoManager（pending/in_progress/completed）
进度追踪	无	todo 工具 + render 渲染
行为纠偏	无	nag reminder（3轮未更新则注入）
WIP 限制	无	同一时间最多 1 个 in_progress
代码量	~140 行	~160 行
Agent Loop	不变	不变

核心变更只有三个：TodoManager 类、todo 工具定义 + 注册、nag reminder 注入逻辑。Agent Loop 本身从第 1 课到现在，一行没改。

下一课预告

第 3 课给了 Agent 规划能力，但所有工作还是"一个人扛"。复杂任务中，有些子任务适合交给一个独立的、上下文更干净的 Agent 去做。

第 4 课：Subagent —— 一个 Agent 不够，就派一支小队。 核心机制是一个轻量级的子 Agent 调度器：主 Agent 把子任务委派出去，子 Agent 独立执行，结果汇报回来。

# 预告：s04 的子 Agent 调度
def dispatch_subagent(task: str) -> str:
    """启动一个独立的 Agent Loop 执行子任务"""
    sub_messages = [{"role": "user", "content": task}]
    return agent_loop(sub_messages)  # 复用同一个循环！

同一个循环，但可以嵌套调用。一个 Agent 变成一支团队。

这是《12课拆解Claude Code架构：从零掌握Agent Harness工程》系列的第 3 课。关注Claw开发者，不错过后续更新。

完整代码和交互式学习平台：github.com/shareAI-lab…

如果这篇文章对你有帮助，欢迎转发给你的技术团队。

系列目录

第1课：用20行Python造出你的第一个AI Agent
第2课：给Agent加工具 —— dispatch map模式详解
第3课：TodoWrite —— 让Agent先想后做：规划系统（本文）
第4课：Subagent —— 拆解大任务，上下文隔离
第5课：按需加载领域知识——Skill机制
第6课：无限对话——上下文压缩三层策略
第7课：任务持久化——文件级DAG任务图
第8课：后台执行——异步任务与通知队列
第9课：Agent Teams——多Agent协作：团队与邮箱系统
第10课：团队协议——状态机驱动的协商
第11课：自治Agent——自组织任务认领
第12课：终极隔离——Worktree并行执行