给 AI Agent 接上记忆系统：Honcho + DeepSeek 踩坑全记录为什么需要记忆系统？用 AI Age

折腾了两天，终于让 AI 拥有了"记住我是谁"的能力。本文记录从零搭建 Honcho 自托管记忆服务的全过程，包含 DeepSeek 兼容性修复、国内网络下载方案、以及 Embedding 模型本地化部署。

一、为什么需要记忆系统？

用 AI Agent（比如 Claude、GPT、Hermes）最烦的一件事：每次新会话，它都不记得你是谁。

你有过这样的体验吗？

每次都要重新说一遍"我是后端开发，用 Python"
上回教会它的工具用法，下次又忘了
换个 Agent 用，一切从零开始

市面上有很多记忆方案，经过对比我选了 Honcho：

方案	跨会话	跨Agent	自托管	费用
SQLite 本地	✅	❌	✅	免费
Mem0	✅	❌	❌	API 付费
Honcho	✅	✅	✅	只需 LLM key

Honcho 的 Peer 模型天然支持多个 Agent 共享同一份用户记忆但保持各自行为模式——这正是我需要的。

二、架构总览

┌─────────────────────────────────────────────┐
│                  Hermes Agent               │
│         (honcho_profile / honcho_search)     │
└──────────────────┬──────────────────────────┘
                   │ HTTP :8000
┌──────────────────▼──────────────────────────┐
│              Honcho Server (Docker)          │
│  ┌─────────┐  ┌──────────┐  ┌────────────┐ │
│  │   API   │  │ Deriver  │  │  Dialectic │ │
│  │ FastAPI │  │(记忆提取)│  │ (推理回答) │ │
│  └────┬────┘  └────┬─────┘  └─────┬──────┘ │
│       │            │              │         │
│  ┌────▼────────────▼──────────────▼──────┐  │
│  │         DeepSeek API (LLM)             │  │
│  └───────────────────────────────────────┘  │
│  ┌────────────────────────────────────┐     │
│  │   PostgreSQL + pgvector (向量存储)  │     │
│  └───────────────┬────────────────────┘     │
└──────────────────┼──────────────────────────┘
                   │
┌──────────────────▼──────────────────────────┐
│    本地 Embedding 服务 (fastembed + ONNX)     │
│    jina-embeddings-v2-base-zh  · 768维      │
│    端口 :8080                                │
└─────────────────────────────────────────────┘

三、部署 Honcho

3.1 克隆并配置

git clone https://github.com/plastic-labs/honcho
cd honcho
cp .env.template .env

3.2 配置 DeepSeek 作为 LLM

在 .env 中配置：

# LLM
LLM_OPENAI_API_KEY=sk-your-deepseek-key

# Deriver - 记忆提取
DERIVER_MODEL_CONFIG__MODEL=deepseek-chat
DERIVER_MODEL_CONFIG__OVERRIDES__BASE_URL=https://api.deepseek.com/v1

# Dialectic - 推理回答（所有级别）
DIALECTIC_LEVELS__low__MODEL_CONFIG__MODEL=deepseek-chat
DIALECTIC_LEVELS__low__MODEL_CONFIG__OVERRIDES__BASE_URL=https://api.deepseek.com/v1

# Summary / Dream 同样配置...

3.3 启动

docker compose up -d

四个容器启动：api、deriver、database（pgvector）、redis。

四、踩坑 #1：DeepSeek 不兼容 `json_schema`

启动后看 Deriver 日志，铺天盖地的报错：

BadRequestError: This response_format type is unavailable now

原因：Honcho 的 Deriver 在调用 LLM 时使用 OpenAI 的 json_schema 结构化输出模式。但 DeepSeek 的 API 不支持 json_schema，只支持 json_object。

修复：改 Honcho 源码 src/llm/backends/openai.py，在 _create_structured_response 方法中加 fallback：

async def _create_structured_response(self, *, params, response_format):
    structured_params = dict(params)
    structured_params["response_format"] = {
        "type": "json_schema",
        "json_schema": {
            "name": response_format.__name__,
            "schema": response_format.model_json_schema(),
        },
    }
    try:
        return await self._client.chat.completions.create(**structured_params)
    except BadRequestError:
        # DeepSeek doesn't support json_schema, fall back to json_object
        structured_params["response_format"] = {"type": "json_object"}
        return await self._client.chat.completions.create(**structured_params)

改完后重建镜像：

docker compose build api deriver
docker compose up -d --force-recreate api deriver

✅ Deriver 正常了。

五、踩坑 #2：Embedding API 401

Deriver 修好后，又出现新错误：

openai.AuthenticationError: Error code: 401
Incorrect API key provided: sk-87225...
You can find your API key at https://platform.openai.com/account/api-keys

原因：DeepSeek 没有 Embedding API！Honcho 默认用 OpenAI 的 text-embedding-3-small，把 DeepSeek 的 key 发到 api.openai.com 去了。

方案对比：

方案	速度	费用	国内可达
OpenAI Embedding	快	付费	❌ 需代理
SiliconFlow	快	免费 2000次/天	✅
本地 ONNX 模型	极快	免费	✅

最终选了本地部署。

六、踩坑 #3：国内下载 HuggingFace 模型

用 fastembed 加载模型，卡在下载一动不动——HuggingFace 在国内被墙。

解决：使用 HF 镜像站：

export HF_ENDPOINT=https://hf-mirror.com

踩坑中的踩坑：模型文件符号链接

用 huggingface_hub 下载完模型后，发现 fastembed 读不到。因为 HF 的缓存用符号链接组织文件（model.onnx -> ../../blobs/xxx），直接 cp -r 会复制断链。

正确做法：

# 下载模型
HF_ENDPOINT=https://hf-mirror.com python3 -c "
from huggingface_hub import snapshot_download
snapshot_download('Qdrant/bge-small-zh-v1.5')
"

# 复制到 fastembed 缓存（-L 解析符号链接）
cp -rL ~/.cache/huggingface/hub/models--Qdrant--bge-small-zh-v1.5/snapshots/* \
      /tmp/fastembed_cache/models--Qdrant--bge-small-zh-v1.5/snapshots/

模型选择

最初用 bge-small-zh-v1.5（512 维），后来升级到 jina-embeddings-v2-base-zh（768 维）：

属性	bge-small-zh	jina-v2-base-zh
大小	90 MB	~500 MB
维度	512	768
中文优化	一般	✅ 专门优化
加载速度	0.2s	0.5s

Embedding 服务代码

from fastapi import FastAPI
from fastembed import TextEmbedding
from pydantic import BaseModel

MODEL_NAME = "jinaai/jina-embeddings-v2-base-zh"
model = TextEmbedding(model_name=MODEL_NAME)
app = FastAPI()

class EmbeddingRequest(BaseModel):
    input: str | list[str]
    model: str = MODEL_NAME

@app.post("/v1/embeddings")
async def create_embeddings(req: EmbeddingRequest):
    texts = [req.input] if isinstance(req.input, str) else req.input
    embeddings = list(model.embed(texts))
    return {
        "object": "list",
        "data": [{"object": "embedding", "index": i, "embedding": emb.tolist()}
                 for i, emb in enumerate(embeddings)],
        "model": MODEL_NAME,
        "usage": {"prompt_tokens": 0, "total_tokens": 0},
    }

启动：

HF_ENDPOINT=https://hf-mirror.com python3 embedding_server.py
# 监听 0.0.0.0:8080

配置 Honcho 使用本地 Embedding

EMBEDDING_MODEL_CONFIG__TRANSPORT=openai
EMBEDDING_MODEL_CONFIG__MODEL=jinaai/jina-embeddings-v2-base-zh
EMBEDDING_MODEL_CONFIG__OVERRIDES__BASE_URL=http://host.docker.internal:8080/v1
EMBEDDING_MODEL_CONFIG__OVERRIDES__API_KEY=local

七、最终效果

自检结果

=== Docker 容器 ===
honcho-api-1:       Up  ·  healthy
honcho-deriver-1:   Up  ·  零错误
honcho-database-1:  Up  ·  healthy
honcho-redis-1:     Up  ·  healthy

=== 本地 Embedding ===
{"status":"ok"}  ·  768维  ·  进程占用 11% 内存

=== 记忆功能 ===
honcho_profile   →  正常
honcho_search    →  正常
honcho_reasoning →  正常

资源消耗

组件	内存	说明
Jina Embedding	~900 MB	ONNX 运行时 + 模型
PostgreSQL	~200 MB	pgvector
Redis	~50 MB	消息队列
API + Deriver	~200 MB	Python 进程
合计	~1.4 GB	个人开发机完全够用

费用

DeepSeek API：记忆提取 + 推理，日常使用约 ¥0.5-2/天
Embedding：本地运行，免费
总计：月均 ¥15-60，比任何 SaaS 方案都便宜

八、总结

这次折腾的核心收获：

DeepSeek 可以用，但要修 json_schema——加一个 json_object 回退就行
Embedding 本地化是正确选择——免费、低延迟、不受网络影响
HF 镜像站是必备——HF_ENDPOINT=https://hf-mirror.com 解决一切下载问题
符号链接坑——cp -rL 不是 cp -r
Jina 模型比 BGE 更适合中文——768 维，效果更好

现在每次打开 Hermes，它都能记住我的偏好、项目背景和使用习惯。AI 终于不是"鱼的记忆"了 🐟。

给 AI Agent 接上记忆系统：Honcho + DeepSeek 踩坑全记录