第 12 课：写入工具 — FileEdit, FileWrite学习目标完成本课后，你将能够：解释 FileEdi

模块四：工具系统 | 前置依赖：第 11 课 | 预计学习时间：75 分钟

学习目标

完成本课后，你将能够：

解释 FileEditTool 的精确字符串替换机制，包括唯一性检查和 replace_all 模式
描述花引号规范化和 API 标签反清洁化如何确保编辑准确命中
说明 FileWriteTool 的"先读后写"强制策略及其安全考量
理解文件修改时间戳追踪在并发编辑安全中的作用

12.1 写入工具 vs 只读工具

写入工具与只读工具的核心区别不仅是"是否修改文件"，还体现在整个安全和状态管理层面：

┌──────────────────────────────────────────────────────┐
│         只读工具 vs 写入工具                          │
├──────────────────────────────────────────────────────┤
│                                                      │
│  只读工具                    写入工具                 │
│  ──────────                  ──────────              │
│  isReadOnly() = true         isReadOnly() 未定义     │
│  checkReadPermission         checkWritePermission    │
│  无文件状态追踪               readFileState 追踪     │
│  无 LSP 通知                 LSP didChange/didSave   │
│  无 undo 支持                fileHistory 备份        │
│  无 diff 生成                structuredPatch 输出    │
│  自动允许（多数模式）         需要用户确认            │
│                                                      │
└──────────────────────────────────────────────────────┘

12.2 FileEditTool — 精确字符串替换

文件位置

tools/FileEditTool/
├── FileEditTool.ts    # 核心逻辑（~550 行）
├── constants.ts       # 工具名称和常量
├── types.ts           # Zod schema 定义
├── utils.ts           # 编辑辅助函数（~776 行）
├── prompt.ts          # 提示词生成
└── UI.tsx             # Diff 渲染组件

输入参数

z.strictObject({
  file_path: z.string(),     // 要修改的文件绝对路径
  old_string: z.string(),    // 要替换的文本
  new_string: z.string(),    // 替换后的文本
  replace_all: z.boolean().default(false).optional(), // 是否替换所有匹配
})

这个设计看似简单，但其背后有大量的验证和安全逻辑。

完整的 validateInput 校验链

FileEditTool 的输入验证是所有工具中最严格的。下面是完整的校验链：

validateInput() 校验链
  │
  ├── 1. 秘密检查：checkTeamMemSecrets(fullFilePath, new_string)
  │       └── 防止向团队记忆文件写入敏感信息
  │
  ├── 2. 空操作检查：old_string === new_string ?
  │       └── "No changes to make" (errorCode: 1)
  │
  ├── 3. 路径权限检查：matchingRuleForInput(path, context, 'edit', 'deny')
  │       └── "File is in a denied directory" (errorCode: 2)
  │
  ├── 4. UNC 路径安全：startsWith('\\\\') || startsWith('//')
  │       └── 跳过文件系统操作防止 NTLM 泄漏
  │
  ├── 5. 文件大小检查：size > 1 GiB ?
  │       └── "File is too large to edit" (errorCode: 10)
  │
  ├── 6. 文件编码检测：UTF-8 或 UTF-16LE
  │       └── 支持 BOM 检测
  │
  ├── 7. 文件不存在 + old_string 非空：
  │       └── 建议相似文件名 (errorCode: 4)
  │
  ├── 8. 文件存在 + old_string 为空 + 文件有内容：
  │       └── "Cannot create new file - file already exists" (errorCode: 3)
  │
  ├── 9. Notebook 检查：.ipynb 后缀
  │       └── "Use NotebookEditTool instead" (errorCode: 5)
  │
  ├── 10. 先读后编辑检查：readFileState.get(fullFilePath)
  │       └── "File has not been read yet" (errorCode: 6)
  │
  ├── 11. 时间戳过期检查：lastWriteTime > readTimestamp
  │       └── "File has been modified since read" (errorCode: 7)
  │
  ├── 12. 字符串查找（含引号规范化）：findActualString(file, old_string)
  │       └── "String to replace not found" (errorCode: 8)
  │
  ├── 13. 唯一性检查：matches > 1 && !replace_all
  │       └── "Found N matches, set replace_all to true" (errorCode: 9)
  │
  └── 14. Claude 设置文件校验：validateInputForSettingsFileEdit()
          └── 防止无效的设置文件修改

花引号规范化

一个极其精巧的功能。Claude 模型输出的引号总是直引号（" 和 '），但许多文件（特别是文档和文学作品）使用花引号（\u201C \u201D \u2018 \u2019）。

utils.ts 中的 findActualString() 处理这个不匹配：

export function findActualString(
  fileContent: string,
  searchString: string,
): string | null {
  // 第一步：精确匹配
  if (fileContent.includes(searchString)) {
    return searchString
  }

  // 第二步：规范化引号后匹配
  const normalizedSearch = normalizeQuotes(searchString)
  const normalizedFile = normalizeQuotes(fileContent)

  const searchIndex = normalizedFile.indexOf(normalizedSearch)
  if (searchIndex !== -1) {
    // 返回文件中的原始字符串（带花引号），而非规范化后的
    return fileContent.substring(searchIndex, searchIndex + searchString.length)
  }

  return null
}

当通过花引号规范化找到匹配后，preserveQuoteStyle() 会将 new_string 中的直引号也转换为花引号，保持文件的排版风格一致：

export function preserveQuoteStyle(
  oldString: string,
  actualOldString: string,
  newString: string,
): string {
  if (oldString === actualOldString) return newString  // 没有发生规范化

  // 检测文件中使用了哪种花引号
  const hasDoubleQuotes = actualOldString.includes('\u201C') || actualOldString.includes('\u201D')
  const hasSingleQuotes = actualOldString.includes('\u2018') || actualOldString.includes('\u2019')

  let result = newString
  if (hasDoubleQuotes) result = applyCurlyDoubleQuotes(result)
  if (hasSingleQuotes) result = applyCurlySingleQuotes(result)
  return result
}

花引号的应用使用了一个启发式规则区分开引号和闭引号：前面是空白、句首或开括号时是开引号，否则是闭引号。缩写词中的撇号（如 don't）特殊处理：

// 两边都是字母 → 缩写撇号 → 使用右单花引号
const prevIsLetter = prev !== undefined && /\p{L}/u.test(prev)
const nextIsLetter = next !== undefined && /\p{L}/u.test(next)
if (prevIsLetter && nextIsLetter) {
  result.push(RIGHT_SINGLE_CURLY_QUOTE)  // 缩写词
}

API 标签反清洁化

Claude API 会对某些 XML 标签进行清洁化处理（sanitization），将它们替换为简短别名。当模型在编辑中输出这些清洁化后的标签时，FileEditTool 需要还原：

const DESANITIZATIONS: Record<string, string> = {
  '<fnr>': '<function_results>',
  '<n>': '<name>',
  '</n>': '</name>',
  '<o>': '<output>',
  '</o>': '</output>',
  '<e>': '<error>',
  '</e>': '</error>',
  '<s>': '<system>',
  '</s>': '</system>',
  '\n\nH:': '\n\nHuman:',
  '\n\nA:': '\n\nAssistant:',
  // ...更多映射
}

call() 核心流程：原子性读改写

call() 执行流程
  │
  ├── 1. expandPath() — 路径规范化
  │
  ├── 2. discoverSkillDirsForPaths() — 发现技能目录
  │
  ├── 3. diagnosticTracker.beforeFileEdited() — LSP 诊断准备
  │
  ├── 4. mkdir(dirname) — 确保父目录存在
  │
  ├── 5. fileHistoryTrackEdit() — 备份原文件（undo 支持）
  │
  │  ┌─── 原子性临界区开始 ───┐
  │  │                        │
  ├──├── 6. readFileSyncWithMetadata() — 同步读取当前内容
  │  │
  ├──├── 7. 时间戳过期检查 — 二次确认
  │  │
  ├──├── 8. findActualString() — 花引号规范化查找
  │  │
  ├──├── 9. preserveQuoteStyle() — 保持引号风格
  │  │
  ├──├── 10. getPatchForEdit() — 生成 diff patch
  │  │
  ├──├── 11. writeTextContent() — 写入磁盘
  │  │                        │
  │  └─── 原子性临界区结束 ───┘
  │
  ├── 12. LSP 通知 — didChange + didSave
  │
  ├── 13. notifyVscodeFileUpdated() — VS Code 集成
  │
  ├── 14. readFileState.set() — 更新读取状态
  │
  └── 15. 日志和分析事件

源码注释特别强调了临界区的原子性：

Please avoid async operations between here and writing to disk to preserve atomicity

步骤 6-11 之间没有 await，防止并发编辑的交错。

新文件创建

当 old_string 为空且文件不存在时，FileEditTool 的行为等同于创建新文件：

// validateInput 中
if (old_string === '') {
  if (fileContent === null) {
    return { result: true }  // 文件不存在 + 空 old_string = 创建新文件
  }
  if (fileContent.trim() !== '') {
    return { result: false, message: 'Cannot create new file - file already exists.' }
  }
}

12.3 FileWriteTool — 完整文件覆写

文件位置

tools/FileWriteTool/
├── FileWriteTool.ts    # 核心逻辑（~300 行）
├── prompt.ts           # 提示词
└── UI.tsx              # 渲染组件

输入参数

z.strictObject({
  file_path: z.string(),  // 绝对路径
  content: z.string(),     // 完整文件内容
})

"先读后写" 强制策略

FileWriteTool 的提示词明确要求：

export function getWriteToolDescription(): string {
  return `Writes a file to the local filesystem.

Usage:
- This tool will overwrite the existing file if there is one at the provided path.
- If this is an existing file, you MUST use the Read tool first to read the file's contents.
  This tool will fail if you did not read the file first.
- Prefer the Edit tool for modifying existing files — it only sends the diff.
  Only use this tool to create new files or for complete rewrites.
- NEVER create documentation files (*.md) or README files unless explicitly requested.`
}

这个策略在 validateInput 中强制执行：

const readTimestamp = toolUseContext.readFileState.get(fullFilePath)
if (!readTimestamp || readTimestamp.isPartialView) {
  return {
    result: false,
    message: 'File has not been read yet. Read it first before writing to it.',
    errorCode: 2,
  }
}

FileWriteTool vs FileEditTool 的选择

┌──────────────────────────────────────────────────────┐
│         何时用 FileEdit vs FileWrite                  │
├──────────────────────────────────────────────────────┤
│                                                      │
│  FileEditTool（首选）                                │
│  ├── 修改现有文件的一部分                             │
│  ├── 只发送 diff，token 消耗低                       │
│  ├── 用户在权限弹窗中看到精确的变更                   │
│  └── old_string 为空 + 文件不存在 = 创建新文件        │
│                                                      │
│  FileWriteTool                                       │
│  ├── 创建全新文件（首选方式）                         │
│  ├── 完全重写文件内容                                │
│  ├── 发送完整内容，token 消耗高                      │
│  └── 用户在权限弹窗中看到完整文件 diff               │
│                                                      │
└──────────────────────────────────────────────────────┘

时间戳过期检测

FileWriteTool 和 FileEditTool 都使用同样的时间戳机制防止覆盖用户的外部修改：

// 文件存在的情况
const fileStat = await fs.stat(fullFilePath)
const fileMtimeMs = fileStat.mtimeMs

// 检查 read 时间戳
const readTimestamp = toolUseContext.readFileState.get(fullFilePath)

// 文件在 Read 之后被修改过
if (fileMtimeMs > readTimestamp.timestamp) {
  // Windows 特殊处理：云同步、杀毒软件可能改变时间戳但不改内容
  const isFullRead = readTimestamp.offset === undefined && readTimestamp.limit === undefined
  if (isFullRead && currentContent === readTimestamp.content) {
    // 内容未变，安全继续
  } else {
    return { result: false, message: FILE_UNEXPECTEDLY_MODIFIED_ERROR }
  }
}

这个双重检查（时间戳 + 内容比较）的设计值得注意：

时间戳变化？
  │
  ├── 否 → 安全，继续编辑
  │
  └── 是 → 内容也变化了？
        │
        ├── 否（Windows 误报）→ 安全，继续编辑
        │
        └── 是 → 阻止！"File has been unexpectedly modified"

12.4 NotebookEditTool 简介

NotebookEditTool 专门处理 Jupyter Notebook (.ipynb) 文件的编辑：

z.strictObject({
  notebook_path: z.string(),
  cell_id: z.string().optional(),      // 要编辑的 cell ID
  new_source: z.string(),              // 新的 cell 内容
  cell_type: z.enum(['code', 'markdown']).optional(),
  edit_mode: z.enum(['replace', 'insert', 'delete']).optional(),
})

三种编辑模式：

replace：替换指定 cell 的内容
insert：在指定 cell 之后插入新 cell
delete：删除指定 cell

FileEditTool 的 validateInput 会检测 .ipynb 文件并引导使用 NotebookEditTool：

if (fullFilePath.endsWith('.ipynb')) {
  return {
    result: false,
    message: `File is a Jupyter Notebook. Use the ${NOTEBOOK_EDIT_TOOL_NAME} to edit this file.`,
    errorCode: 5,
  }
}

12.5 readFileState — 读写状态追踪

readFileState 是一个 Map<string, ReadFileStateEntry>，由 ToolUseContext 管理，贯穿所有文件操作工具：

┌──────────────────────────────────────────────────────┐
│              readFileState 生命周期                    │
├──────────────────────────────────────────────────────┤
│                                                      │
│  FileReadTool.call()                                 │
│  └── readFileState.set(path, {                       │
│        content: "文件内容",                           │
│        timestamp: mtimeMs,                           │
│        offset: 1,      // Read 设置行号              │
│        limit: undefined                              │
│      })                                              │
│                                                      │
│  FileEditTool.call()                                 │
│  └── readFileState.set(path, {                       │
│        content: "编辑后内容",                         │
│        timestamp: newMtimeMs,                        │
│        offset: undefined,  // Edit 设置 undefined    │
│        limit: undefined                              │
│      })                                              │
│                                                      │
│  FileWriteTool.call()                                │
│  └── readFileState.set(path, {                       │
│        content: "写入后内容",                         │
│        timestamp: newMtimeMs,                        │
│        offset: undefined,                            │
│        limit: undefined                              │
│      })                                              │
│                                                      │
│  FileReadTool 去重检查                                │
│  └── existingState.offset !== undefined ?            │
│        // 只有 Read 设置了 offset，                  │
│        // Edit/Write 的 offset=undefined 不会去重    │
│        // 防止指向过时的 pre-edit 内容               │
│                                                      │
└──────────────────────────────────────────────────────┘

这个设计的精妙之处在于：Read 和 Edit/Write 对 offset 字段的不同设置，使得去重机制只对 Read-Read 重复生效，不会错误地去重 Edit 后的 Read。

12.6 Diff 生成与展示

FileEditTool 使用 diff 库的 structuredPatch 生成 diff：

export function getPatchForEdit({
  filePath, fileContents, oldString, newString, replaceAll = false,
}): { patch: StructuredPatchHunk[]; updatedFile: string } {

  // 应用编辑
  let updatedFile = applyEditToFile(fileContents, oldString, newString, replaceAll)

  // 转换 tab 为空格（仅用于展示）
  const patch = getPatchFromContents({
    filePath,
    oldContent: convertLeadingTabsToSpaces(fileContents),
    newContent: convertLeadingTabsToSpaces(updatedFile),
  })

  return { patch, updatedFile }
}

Diff snippet 的生成有大小限制：

const DIFF_SNIPPET_MAX_BYTES = 8192  // 8KB

源码注释解释了原因：

Format-on-save of a large file previously injected the entire file per turn (observed max 16.1KB, ~14K tokens/session). 8KB preserves meaningful context while bounding worst case.

12.7 编辑等价性检查

FileEditTool 实现了 inputsEquivalent() 方法，用于 speculation（乐观更新）中判断两次编辑是否产生相同结果：

export function areFileEditsEquivalent(
  edits1: FileEdit[],
  edits2: FileEdit[],
  originalContent: string,
): boolean {
  // 快速路径：字面量相同
  if (edits1.length === edits2.length &&
      edits1.every((e1, i) => {
        const e2 = edits2[i]
        return e2 && e1.old_string === e2.old_string &&
               e1.new_string === e2.new_string &&
               e1.replace_all === e2.replace_all
      })) {
    return true
  }

  // 语义比较：应用两组编辑，比较最终结果
  const result1 = getPatchForEdits({ filePath: 'temp', fileContents: originalContent, edits: edits1 })
  const result2 = getPatchForEdits({ filePath: 'temp', fileContents: originalContent, edits: edits2 })

  return result1.updatedFile === result2.updatedFile
}

这支持了一个重要的优化场景：当 Claude 在流式输出中先产生一个编辑预测，后来确认相同的编辑时，系统不需要重新执行。

课后练习

练习 1：花引号边缘案例

考虑以下文件内容（使用花引号）：

She said \u201CHello\u201D and he replied \u201CHi\u201D.

Claude 输出的 old_string 是 She said "Hello"（直引号）。

findActualString() 如何找到匹配？
如果 new_string 是 She whispered "Goodbye"，preserveQuoteStyle() 的输出是什么？

练习 2：并发编辑安全分析

两个 Agent（主 Agent 和子 Agent）同时编辑同一个文件。Agent A 在 t=1 读取文件，Agent B 在 t=2 读取文件，Agent A 在 t=3 编辑文件。

Agent B 在 t=4 尝试编辑时会发生什么？
如果 Agent B 是部分读取（isPartialView=true），行为有何不同？

练习 3：验证 13 步校验链

阅读 FileEditTool 的 validateInput 源码，找到以下 errorCode 对应的错误条件：

errorCode 1, 3, 6, 8, 9 然后思考：为什么某些错误使用 behavior: 'ask' 而不是直接拒绝？

练习 4：设计一个原子性测试

设计一个测试场景来验证 FileEditTool 的临界区原子性。提示：在步骤 6（读取文件）和步骤 11（写入文件）之间，如果另一个进程修改了文件，会发生什么？

本课小结

要点	内容
FileEditTool	精确字符串替换，13 步验证链，花引号规范化，API 标签反清洁化
FileWriteTool	完整文件覆写，强制"先读后写"，preferEdit 策略
NotebookEditTool	Jupyter Notebook 专用，三种编辑模式（replace/insert/delete）
readFileState	跨工具共享的文件状态追踪，Read 和 Edit/Write 通过 offset 字段区分
原子性保证	临界区内无 await，时间戳 + 内容双重检查，Windows 误报容错
Diff 生成	structuredPatch + 8KB snippet 限制，tab 转空格仅用于展示

下一课预告

第 13 课：BashTool — Shell 执行与安全防线 — BashTool 是整个工具系统中最大、最复杂的工具（主文件 1143 行，安全模块 2592 行，权限模块 2621 行）。我们将深入分析命令安全检查、危险命令检测、沙箱机制、超时管理和后台任务执行。