第 34 课: 序列化、缓存与存储系统课程目标精读 Serializable 基类的序列化机制、动态加载系统（load

课程目标

精读 Serializable 基类的序列化机制、动态加载系统（load/index.ts）、LLM 缓存系统、Storage 实现层（InMemoryStore / LocalFileStore / EncoderBackedStore），以及 LangChain Hub 的 push/pull 能力。

34.1 为什么需要序列化

LangChain.js 中的组件（Prompt、Chain、Model 配置等）需要支持：

持久化 — 将配置保存为 JSON，下次直接恢复
传输 — 在客户端/服务端之间传递组件定义
Hub 共享 — 将 Prompt 推送到 LangChain Hub 供团队复用
追踪 — 在 LangSmith 中记录组件的结构化信息

34.2 Serializable 基类

源码位置: libs/langchain-core/src/load/serializable.ts:97

export abstract class Serializable implements SerializableInterface {
  lc_serializable = false;  // 默认不可序列化，子类按需开启

  lc_kwargs: SerializedFields;  // 构造参数的记录

  abstract lc_namespace: string[];  // 命名空间，如 ["langchain_core", "prompts"]

  static lc_name(): string {  // 类名（可重写，防止代码压缩后丢失）
    return this.name;
  }

  get lc_id(): string[] {  // 最终标识: [...namespace, className]
    return [...this.lc_namespace, get_lc_unique_name(this.constructor as typeof Serializable)];
  }

  get lc_secrets(): { [key: string]: string } | undefined { return undefined; }
  get lc_attributes(): SerializedFields | undefined { return undefined; }
  get lc_aliases(): { [key: string]: string } | undefined { return undefined; }
  get lc_serializable_keys(): string[] | undefined { return undefined; }

  toJSON(): Serialized {
    if (!this.lc_serializable) return this.toJSONNotImplemented();
    // 序列化 kwargs，替换 secrets 为哨兵值
    // ...
    return { lc: 1, type: "constructor", id: this.lc_id, kwargs: escapedKwargs };
  }
}

序列化输出格式：

{
  "lc": 1,
  "type": "constructor",
  "id": ["langchain_core", "prompts", "chat", "ChatPromptTemplate"],
  "kwargs": {
    "messages": [/* ... */],
    "inputVariables": ["question"]
  }
}

Secret 处理 — API Key 等敏感信息不会被序列化为明文：

// 子类声明 secret 映射
get lc_secrets() {
  return { "openaiApiKey": "OPENAI_API_KEY" };
}

// 序列化后 secret 变成哨兵值
// "openaiApiKey": { "lc": 1, "type": "secret", "id": ["OPENAI_API_KEY"] }

34.3 动态加载系统

源码位置: libs/langchain-core/src/load/index.ts

load() 函数将序列化的 JSON 恢复为 LangChain 对象实例：

export async function load<T>(
  text: string,                              // JSON 字符串
  secretsMap?: Record<string, string>,       // secret 值映射
  optionalImportsMap?: OptionalImportMap,     // 可选导入映射
  importMap?: Record<string, any>,           // 导入映射
  secretsFromEnv?: boolean                   // 是否从环境变量读取 secret
): Promise<T>

安全警告：反序列化会实例化类并调用构造函数。永远不要对不受信任的输入调用 load()。

加载过程：

解析 JSON，验证 lc 版本号和 type 字段
根据 id 数组查找对应的类（通过 import_map.ts 和 import_constants.ts）
处理 secret：从 secretsMap 或环境变量中填充
处理转义：还原 __lc_escaped__ 包装的纯对象
用 kwargs 调用构造函数

import_map.ts 定义了 langchain-core 中所有可序列化类的映射：

// libs/langchain-core/src/load/import_map.ts
export { ChatPromptTemplate } from "../prompts/chat.js";
export { AIMessage, HumanMessage } from "../messages/index.js";
// ...

注入保护：序列化时，普通对象中含有 lc 键的会被转义为 { "__lc_escaped__": {...} }，反序列化时还原为普通对象，不会被当作 LangChain 对象实例化。

34.4 缓存系统

源码位置: libs/langchain-core/src/caches/index.ts

LLM 调用是昂贵的操作（时间 + token 费用）。缓存系统通过 (prompt + llmKey) 缓存生成结果：

export abstract class BaseCache<T = Generation[]> {
  protected keyEncoder: HashKeyEncoder = defaultHashKeyEncoder;

  abstract lookup(prompt: string, llmKey: string): Promise<T | null>;
  abstract update(prompt: string, llmKey: string, value: T): Promise<void>;

  // 自定义 key 编码函数
  makeDefaultKeyEncoder(keyEncoderFn: HashKeyEncoder): void {
    this.keyEncoder = keyEncoderFn;
  }
}

InMemoryCache：

export class InMemoryCache<T = Generation[]> extends BaseCache<T> {
  private cache: Map<string, T>;

  lookup(prompt: string, llmKey: string): Promise<T | null> {
    return Promise.resolve(this.cache.get(this.keyEncoder(prompt, llmKey)) ?? null);
  }

  async update(prompt: string, llmKey: string, value: T): Promise<void> {
    this.cache.set(this.keyEncoder(prompt, llmKey), value);
  }

  // 全局单例
  static global(): InMemoryCache {
    return new InMemoryCache(GLOBAL_MAP);
  }
}

缓存 key 策略：默认使用 SHA-256 哈希 prompt + "_" + llmKey，确保 key 长度一致且无冲突。

使用方式：

import { InMemoryCache } from "@langchain/core/caches";

const model = new ChatOpenAI({
  cache: InMemoryCache.global(),  // 启用全局缓存
});

// 相同 prompt 的第二次调用会命中缓存
const result1 = await model.invoke("你好");  // 调用 LLM
const result2 = await model.invoke("你好");  // 命中缓存，不调用 LLM

序列化辅助：

export function serializeGeneration(generation: Generation): StoredGeneration {
  return {
    text: generation.text,
    message: (generation as ChatGeneration).message?.toDict(),
  };
}

export function deserializeStoredGeneration(stored: StoredGeneration): Generation {
  return {
    text: stored.text,
    message: stored.message ? mapStoredMessageToChatMessage(stored.message) : undefined,
  };
}

34.5 Storage 实现层

源码位置: libs/langchain/src/storage/

上层的 langchain 包提供了 BaseStore 的具体实现：

LocalFileStore — 文件系统存储

源码位置: libs/langchain/src/storage/file_system.ts

export class LocalFileStore extends BaseStore<string, Uint8Array> {
  rootPath: string;

  // 安全验证：防止路径遍历
  private getFullPath(key: string): string {
    if (!/^[a-zA-Z0-9_.\-/]+$/.test(key)) {
      throw new Error(`Invalid characters in key: ${key}`);
    }
    const fullPath = path.resolve(this.rootPath, `${key}.txt`);
    if (!fullPath.startsWith(path.resolve(this.rootPath))) {
      throw new Error("Invalid key: path traversal detected");
    }
    return fullPath;
  }

  // 原子写入：先写临时文件，再 rename
  private async writeFileAtomically(content: Uint8Array, fullPath: string) {
    const tempPath = `${fullPath}.${Date.now()}-${Math.random().toString(16).slice(2)}.tmp`;
    await fs.writeFile(tempPath, content);
    await fs.rename(tempPath, fullPath);
  }

  // 工厂方法（自动创建目录 + 清理残留临时文件）
  static async fromPath(rootPath: string): Promise<LocalFileStore> { /* ... */ }
}

设计亮点：

路径遍历防护：严格校验 key 字符和解析后路径
原子写入：使用临时文件 + rename，防止写入中断导致数据损坏
per-key 锁（withKeyLock）：通过 Promise 链序列化同一 key 的并发操作

EncoderBackedStore — 编码器适配层

源码位置: libs/langchain/src/storage/encoder_backed.ts

export class EncoderBackedStore<K, V, SerializedType = any> extends BaseStore<K, V> {
  store: BaseStore<string, SerializedType>;   // 底层存储
  keyEncoder: (key: K) => string;             // key 编码
  valueSerializer: (value: V) => SerializedType;   // 值序列化
  valueDeserializer: (value: SerializedType) => V;  // 值反序列化

  async mget(keys: K[]): Promise<(V | undefined)[]> {
    const encodedKeys = keys.map(this.keyEncoder);
    const values = await this.store.mget(encodedKeys);
    return values.map((v) => v === undefined ? undefined : this.valueDeserializer(v));
  }

  async mset(keyValuePairs: [K, V][]): Promise<void> {
    const encodedPairs = keyValuePairs.map(
      ([key, value]) => [this.keyEncoder(key), this.valueSerializer(value)] as [string, SerializedType]
    );
    return this.store.mset(encodedPairs);
  }
}

便捷工厂 — 从字节存储创建文档存储：

export function createDocumentStoreFromByteStore(store: BaseStore<string, Uint8Array>) {
  const encoder = new TextEncoder();
  const decoder = new TextDecoder();
  return new EncoderBackedStore({
    store,
    keyEncoder: (key: string) => key,
    valueSerializer: (doc: Document) =>
      encoder.encode(JSON.stringify({ pageContent: doc.pageContent, metadata: doc.metadata })),
    valueDeserializer: (bytes: Uint8Array) =>
      new Document(JSON.parse(decoder.decode(bytes))),
  });
}

34.6 LangChain Hub

源码位置: libs/langchain/src/hub/

Hub 是 LangSmith 提供的远程 Prompt 仓库，支持版本管理和团队共享。

Push — 推送到 Hub

export async function push(
  repoFullName: string,         // "owner/repo-name"
  runnable: Runnable,           // 要推送的 Prompt / Chain
  options?: {
    apiKey?: string;
    isPublic?: boolean;
    description?: string;
    tags?: string[];
    parentCommitHash?: string;  // 基于特定版本更新
  }
): Promise<string>  // 返回 Hub URL

Pull — 从 Hub 拉取

export async function pull<T extends Runnable>(
  ownerRepoCommit: string,     // "owner/repo-name" 或 "owner/repo-name:commit-hash"
  options?: {
    includeModel?: boolean;     // 是否同时实例化关联的模型
    modelClass?: typeof BaseLanguageModel;  // 非 OpenAI 模型需指定类
    secrets?: Record<string, string>;
    secretsFromEnv?: boolean;
  }
): Promise<T>

内部流程：

basePull() 通过 LangSmith Client 获取 PromptCommit 对象
用 load() 反序列化 manifest 为 LangChain 对象
如果有 output schema，通过 bindOutputSchema() 自动绑定

const loadedPrompt = await load<T>(
  JSON.stringify(promptObject.manifest),
  options?.secrets,
  generateOptionalImportMap(options?.modelClass),
  generateModelImportMap(options?.modelClass),
  options?.secretsFromEnv
);
return bindOutputSchema(loadedPrompt);

使用示例：

import { pull, push } from "langchain/hub";

// 推送 Prompt 到 Hub
const prompt = ChatPromptTemplate.fromMessages([
  ["system", "你是一个翻译助手"],
  ["human", "将以下文本翻译为{language}: {text}"],
]);
const url = await push("my-org/translator", prompt, { isPublic: false });

// 从 Hub 拉取
const pulled = await pull<ChatPromptTemplate>("my-org/translator");
const result = await pulled.invoke({ language: "英语", text: "你好世界" });

34.7 系统层级关系

Serializable (序列化基类)
├── Runnable (可执行抽象)
│   ├── BaseChatModel, BaseRetriever, BasePromptTemplate ...
│   └── toJSON() -> Hub push
│
├── BaseStore<K,V> (通用存储)
│   ├── InMemoryStore (内存)
│   ├── LocalFileStore (文件系统)
│   └── EncoderBackedStore (编码适配)
│
└── BaseCache<T> (LLM 缓存)
    └── InMemoryCache (内存缓存)

load/index.ts -> 反序列化 -> Hub pull

34.8 源码精读路线

优先级	文件	关注点
P0	`langchain-core/src/load/serializable.ts`	`toJSON()`、lc_secrets、lc_id
P0	`langchain-core/src/load/index.ts`	`load()` 函数、安全模型
P1	`langchain-core/src/caches/index.ts`	BaseCache、InMemoryCache、key 编码
P1	`langchain/src/storage/file_system.ts`	LocalFileStore 原子写入、路径安全
P2	`langchain/src/storage/encoder_backed.ts`	EncoderBackedStore 适配器模式
P2	`langchain/src/hub/index.ts` + `base.ts`	push/pull、模型导入映射
P3	`langchain-core/src/load/import_map.ts`	可序列化类的注册表

34.9 实战练习

基础: 创建一个 ChatPromptTemplate，调用 toJSON() 观察序列化输出结构
进阶: 使用 LocalFileStore + EncoderBackedStore 实现一个文档持久化存储
高阶: 为一个自定义 Runnable 实现序列化支持（设置 lc_serializable = true，定义 lc_namespace），验证 toJSON() 和 load() 的往返一致性

本课收获总结

级别	你应该掌握的
🟢 基础	理解序列化的用途：持久化、传输、Hub 共享
🔵 中阶	掌握 Serializable 基类的核心字段（lc_namespace/lc_secrets/toJSON）
🟡 高阶	理解 `load()` 的动态加载机制和安全模型
🟠 资深	分析缓存与存储系统的分层设计（BaseCache vs BaseStore）
🔴 架构	能设计安全的序列化方案：secret 隔离、注入保护、import 白名单

下一课预告

第 35 课深入流式架构与多运行时支持 — stream/transform/streamEvents 的实现原理，以及 Node/Deno/Bun/Browser/Edge 全平台兼容策略。