AI 记忆标签体系设计：为什么 4 个标签不够，你需要 21 种组合双维度标签体系（7 topic × 3 stage

适用场景：你在用 Hindsight / Mem0 / Zep 等 AI 记忆系统，发现记忆越来越多但 consolidation（记忆合并）越来越乱——business 决策混在 infra 配置里，7 天前的调试过程淹没了 30 天前的重要决策。本文给出双维度标签体系的完整设计方案 + 踩坑实录。

1. 问题：4 个标签 = 0 区分度

很多 AI 记忆系统的初始配置是这样打标签的：

{
  "tags": ["business", "infra", "dev_tools", "code"]
}

看起来挺清晰？跑一周就崩了：

时间	记忆内容	标签
6/1	"闲鱼虚拟产品亏损 ¥50+"	business
6/2	"闲鱼转腾讯云 Django 部署"	business? infra? dev_tools?
6/3	"Hindsight PATCH 副作用清空 221 条数据"	code? dev_tools?
6/4	"决定用激活码订阅模式"	business
6/5	"Docker 容器 restart policy"	infra

核心矛盾：单维度标签只能回答"这是关于什么的"，不能回答"这条记忆该留多久"。

结果：consolidation 任务跑起来时，250 条 business 标签的记忆混在一起合并——有的是永久决策，有的是 7 天前的进度，有的是 30 天前的草稿。合并不是越合越清晰，而是越合越糊。

2. 解决方案：双维度 = topic × stage = 21 种组合

2.1 维度一：topic（这是什么）

7 个 topic 覆盖 AI 助手日常所有记忆类型：

topic	存什么	示例
`business`	变现决策、定价策略、收入方向	"决定用腾讯云公网部署 Django+激活码订阅"
`infra`	服务器/网络/Docker 配置	"Jetson AGX Orin 跑 Hindsight 容器"
`dev_tools`	Hermes/Skills/AI 工具配置	"retain_every_n_turns 设为 20"
`code`	关键设计决策、bug 根因	"Timeline controller 缺 PATCH 路由"
`content`	文章发布、多平台分发策略	"默认不发文章，Sir 决定发 CSDN→掘金→知乎→公众号"
`research`	竞品调研、市场分析	"CSDN 下载频道 AI Agent 源码 1540 条结果"
`reflection`	meta-cognitive 反思	"我以为 X / 实际 Y / 根因 Z / 改进 W"

为什么是 7 不是 4：原来 4 个时，reflection 类型的记忆（关于 AI 系统自身行为的反思）没地方放，要么混进 code 要么直接丢失。加了 reflection + content + research 后，区分度从"几乎没有"变成"基本够"。

2.2 维度二：stage（该留多久）

3 个 stage 控制记忆生命周期：

stage	含义	保留策略
`decision`	长期决策/偏好	永久保留
`process`	7 天内的过程/进度	7 天后自动归档
`reference`	可清理的参考/历史	30 天后清理

为什么需要 stage：没有 stage，你的 consolidation 不知道哪些该合并、哪些该丢弃。一条"6/1 试了 SSH 密钥被拒"和一条"6/1 决定用密钥认证不用密码"——前者是 process（7 天后该清理），后者是 decision（永久保留）。单维度标签分不出来。

2.3 组合效果：21 种区分

7 topics × 3 stages = 21 种组合

consolidation 任务现在可以：

按 topic:business + stage:decision 合并 → 只合并商业永久决策
按 topic:infra + stage:process 合并 → 只合并 7 天内的基础设施过程
跨 topic 不合并 → business 的绝不混进 infra 的

3. 配置实现

3.1 entity_labels 配置（Hindsight Bank Config）

{
  "updates": {
    "entity_labels": {
      "topic": ["business", "infra", "dev_tools", "code", "content", "research", "reflection"],
      "stage": ["decision", "process", "reference"]
    }
  }
}

⚠️ 版本坑：

Hindsight 0.7.1 用平铺格式：{"entity_labels": {...}}
Hindsight 0.8.0+ 必须包 updates 层：{"updates": {"entity_labels": {...}}}

报错信号：{"detail":[{"type":"missing","loc":["body","updates"]}]} = 你用了 0.7.1 格式但跑的是 0.8.0。

3.2 retain_mission：告诉 LLM 什么时候打什么标签

entity_labels 定义了"有哪些标签"，但 LLM 抽记忆时不会自动打标签——你需要在 retain_mission 里明确告诉它。

You are a memory extractor for an AI assistant. Extract facts worth keeping long-term.

【STORE - worth retaining】
- business: monetization decisions, pricing strategies, finalized revenue directions
- infra: confirmed server/network/Docker configs, deployment plans, architectural decisions
- dev_tools: final Hermes/Skills/AI tool configurations, rule changes
- code: key design decisions, root causes and fixes
- content: publishing strategies, cross-platform distribution decisions
- research: research conclusions, market analysis results, tech selection decisions
- reflection: meta-cognitive errors, reflection journal entries of the form
  '我以为 X / 实际 Y / 根因 Z / 改进 W'

【IGNORE - do not store】
- Any topic intermediate debugging, CLI trial-and-error, SSH attempts
- Draft iteration of articles
- "try this", "how about this", "still broken" mid-process operations
- Repeated troubleshooting steps, transient log output

【Rule】
- Is this turn searching for an answer or has the answer been decided? Former → skip. Latter → store.
- When unsure, skip.
- One fact = one complete conclusion. Do not store the derivation process.

【Fallback for meta-cognitive facts】
- If a fact describes the AI system itself (Hindsight engine, retain behavior, SOUL config,
  bank config, PATCH/DELETE side effects, ambiguity resolution, tag/label system) AND matches
  none of the 7 topics above, DEFAULT-tag it as topic:reflection. Do not skip the topic field.

为什么要写 Fallback 段：实测发现 LLM 在抽记忆时，对"关于 AI 系统自身行为"的记忆会漏标 topic:reflection——因为 LLM 做语义推理时把这些当"系统噪声"跳过了。加了 Fallback 段 + 显式列举后，漏标问题消失。

3.3 observations_mission：consolidation 按 topic 隔离

Group consolidation strictly by topic. Observations from different topics must never be merged.
Prefer many narrow observations over few broad ones.

这一句话的作用：business 的记忆绝不混进 infra 的记忆合并。没有这句话，consolidation 会把 250 条标签的记忆揉成一坨。

4. 安全修改配置：PATCH 前后必做基线对比

血泪教训：修改 Hindsight Bank Config 时，PATCH 操作可能静默触发内部数据重建，导致大量旧记忆被清空。

实测数据（2026-06-12）

阶段	fact 总数	说明
PATCH 前	229	基线
PATCH 后（瞬间）	59	暴跌 170 条

根因：PATCH entity_labels 触发了 Hindsight 内部的数据重建/清理逻辑，不在文档里写明。

安全 PATCH 三步法

# Step 1: PATCH 前抓基线
BEFORE=$(curl -s http://localhost:8888/v1/default/banks/hermes/stats | \
  python3 -c 'import json,sys; print(json.load(sys.stdin)["total_nodes"])')
echo "before: $BEFORE"

# Step 2: 执行 PATCH
curl -s -X PATCH http://localhost:8888/v1/default/banks/hermes/config \
  -H 'Content-Type: application/json' \
  --data @/tmp/patch.json

# Step 3: PATCH 后立即验证
AFTER=$(curl -s http://localhost:8888/v1/default/banks/hermes/stats | \
  python3 -c 'import json,sys; print(json.load(sys.stdin)["total_nodes"])')
echo "after: $AFTER delta=$((AFTER - BEFORE))"
# ✅ 期望：delta=0 或 delta>0（只增不减）
# ❌ 如果 delta<0 → 立即停手，检查原因

5. tag 格式版本切换的坑

Hindsight 0.8.0 升级后，tag 字符串格式从 = 切到了 :：

版本	格式	示例
0.7.x	`topic=infra`	`["topic=infra", "stage=decision"]`
0.8.0+	`topic:infra`	`["topic:infra", "stage:decision"]`

旧数据不会自动转换。如果你的记忆系统里同时存在两种格式：

搜 topic:infra → 只能命中 0.8.0+ 的新数据
搜 topic=infra → 只能命中 0.7.x 的旧数据

解决方案：搜索时两种格式都试，不要单凭一种格式 0 命中就判"没数据"。

6. 可复用的配置模板

以下配置可以直接复制到你的 Hindsight Bank Config PATCH 请求里（0.8.0+ 格式）：

{
  "updates": {
    "entity_labels": {
      "topic": [
        "business",
        "infra",
        "dev_tools",
        "code",
        "content",
        "research",
        "reflection"
      ],
      "stage": ["decision", "process", "reference"]
    },
    "retain_mission": "You are a memory extractor for an AI assistant. Extract facts worth keeping long-term.\n\n【STORE - worth retaining】\n- business: monetization decisions, pricing strategies, finalized revenue directions\n- infra: confirmed server/network/Docker configs, deployment plans, architectural decisions\n- dev_tools: final Hermes/Skills/AI tool configurations, rule changes\n- code: key design decisions, root causes and fixes\n- content: publishing strategies, cross-platform distribution decisions\n- research: research conclusions, market analysis results, tech selection decisions\n- reflection: meta-cognitive errors, reflection journal entries of the form '我以为 X / 实际 Y / 根因 Z / 改进 W'\n\n【IGNORE - do not store】\n- Intermediate debugging, CLI trial-and-error, SSH attempts\n- Draft iteration of articles\n- \"try this\", \"how about this\", \"still broken\" mid-process operations\n- Repeated troubleshooting steps, transient log output\n\n【Rule】\n- Is this turn searching for an answer or has the answer been decided? Former → skip. Latter → store.\n- When unsure, skip.\n- One fact = one complete conclusion. Do not store the derivation process.\n\n【Fallback for meta-cognitive facts】\n- If a fact describes the AI system itself (Hindsight engine, retain behavior, SOUL config, bank config, PATCH/DELETE side effects) AND matches none of the 7 topics above, DEFAULT-tag it as topic:reflection. Do not skip the topic field.",
    "observations_mission": "Group consolidation strictly by topic. Observations from different topics must never be merged. Prefer many narrow observations over few broad ones."
  }
}

7. 总结

维度	旧方案（单维 4 标签）	新方案（双维 21 组合）
区分度	0（所有记忆混在一起）	21 种组合
consolidation	乱合并	按 topic 隔离
生命周期管理	无	decision 永久 / process 7天 / reference 30天
meta-cognitive 反思	丢失或混入 code	topic:reflection 独立存放
可复用性	每次重配	一份模板到处用

核心原则：标签不是越多越好，而是要回答两个独立问题——"这是什么"（topic）和"该留多久"（stage）。两个独立维度正交组合，才能得到有效的区分度。

📌 系列文章：

你的 AI 记忆正在腐烂——Hindsight 的 consolidation 不会救你

本文：AI 记忆标签体系设计

下一篇：Hindsight 0.8.0 升级踩坑实录（敬请期待）