核心压缩算法
Hermes Agent 实现了一个智能的上下文压缩系统,用于处理长对话时的上下文窗口限制问题。压缩算法包含以下步骤:
- 预压缩处理:剪枝旧工具结果,用占位符替换冗长的工具输出
- 边界确定:
- 保护头部消息(系统提示 + 第一次交互)
- 通过令牌预算保护尾部消息(最近的 ~20K 令牌)
- 摘要生成:使用结构化 LLM 提示总结中间轮次
- 迭代更新:在重新压缩时,更新先前的摘要而非从头开始
- 工具对清理:修复压缩后孤立的工具调用/结果对
关键实现细节
1. 压缩触发条件
def should_compress(self, prompt_tokens: int = None) -> bool:
"""Check if context exceeds the compression threshold."""
tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
return tokens >= self.threshold_tokens
压缩阈值默认为模型上下文长度的 50%,但会设置一个最小值(MINIMUM_CONTEXT_LENGTH)。
2. 工具结果剪枝
def _prune_old_tool_results(self, messages: List[Dict[str, Any]], protect_tail_count: int,
protect_tail_tokens: int | None = None) -> tuple[List[Dict[str, Any]], int]:
"""Replace old tool result contents with a short placeholder."""
# 实现细节...
这是一个廉价的预处理步骤,无需 LLM 调用,通过替换旧工具结果为占位符来减少上下文大小。
3. 尾部边界确定
def _find_tail_cut_by_tokens(self, messages: List[Dict[str, Any]], head_end: int,
token_budget: int | None = None) -> int:
"""Walk backward from the end of messages, accumulating tokens until the budget is reached."""
# 实现细节...
使用令牌预算而非固定消息数来保护尾部消息,确保最近的重要上下文得以保留。
4. 结构化摘要生成
def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topic: str = None) -> Optional[str]:
"""Generate a structured summary of conversation turns."""
# 实现细节...
使用结构化模板生成摘要,包括:
- 目标
- 约束与偏好
- 进度(已完成、进行中、阻塞)
- 关键决策
- 已解决问题
- 待处理用户请求
- 相关文件
- 剩余工作
- 关键上下文
- 工具与模式
5. 主压缩流程
def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None, focus_topic: str = None) -> List[Dict[str, Any]]:
"""Compress conversation messages by summarizing middle turns."""
# 实现细节...
这是压缩的主入口点,协调所有压缩步骤并返回压缩后的消息列表。
会话管理与压缩集成
在 run_agent.py 中,_compress_context 方法处理压缩后的会话管理:
def _compress_context(self, messages: list, system_message: str, *, approx_tokens: int = None, task_id: str = "default", focus_topic: str = None) -> tuple:
"""Compress conversation context and split the session in SQLite."""
# 预压缩内存刷新
self.flush_memories(messages, min_turns=0)
# 通知外部内存提供者
if self._memory_manager:
try:
self._memory_manager.on_pre_compress(messages)
except Exception:
pass
# 执行压缩
compressed = self.context_compressor.compress(messages, current_tokens=approx_tokens, focus_topic=focus_topic)
# 会话分割与管理
if self._session_db:
try:
# 传播标题到新会话并自动编号
old_title = self._session_db.get_session_title(self.session_id)
self._session_db.end_session(self.session_id, "compression")
old_session_id = self.session_id
self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
# 更新会话日志文件路径
self.session_log_file = self.logs_dir / f"session_{self.session_id}.json"
# 创建新会话
self._session_db.create_session(
session_id=self.session_id,
source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
model=self.model,
parent_session_id=old_session_id,
)
# 自动编号标题
if old_title:
try:
new_title = self._session_db.get_next_title_in_lineage(old_title)
self._session_db.set_session_title(self.session_id, new_title)
except (ValueError, Exception) as e:
logger.debug("Could not propagate title on compression: %s", e)
self._session_db.update_system_prompt(self.session_id, new_system_prompt)
# 重置刷新光标
self._last_flushed_db_idx = 0
except Exception as e:
logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)
压缩触发时机
在 run_conversation 方法中,系统会在以下时机触发压缩:
- 预压缩检查:在发送请求前检查上下文大小
- 上下文压力处理:当上下文接近阈值时
- 模型切换后:更新上下文压缩器以适应新模型
- 手动触发:通过
/compress命令手动触发
技术特点与优势
- 分层压缩策略:先进行廉价的工具结果剪枝,再进行LLM摘要
- 结构化摘要:使用详细的模板确保重要信息不丢失
- 迭代摘要更新:保留之前的摘要信息,避免信息丢失
- 令牌预算管理:基于令牌数而非消息数来保护尾部上下文
- 焦点主题引导:支持通过
/compress <focus>命令引导压缩,优先保留与焦点相关的信息 - 会话分割:压缩后自动创建新会话,保持会话的可管理性
- 错误处理:当摘要生成失败时,插入静态回退标记
代码优化建议
- 摘要质量监控:添加摘要质量评估机制,当摘要质量低时调整压缩策略
- 自适应压缩阈值:根据对话类型和内容自动调整压缩阈值
- 多模型摘要:尝试使用不同模型进行摘要,选择最佳结果
- 用户可配置性:允许用户配置压缩参数,如保护消息数、摘要比例等
- 压缩历史记录:记录压缩历史,以便分析压缩效果和优化策略
输入输出示例
输入:长对话历史(超过模型上下文限制)
messages = [
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "I need help with a Python project."},
{"role": "assistant", "content": "Sure, what do you need help with?"},
# ... 大量对话内容 ...
{"role": "user", "content": "How do I optimize this code?"},
{"role": "assistant", "content": "Let me analyze your code and suggest optimizations."},
# ... 更多对话内容 ...
]
输出:压缩后的对话历史
[
{"role": "system", "content": "You are a helpful assistant.\n\n[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"},
{"role": "user", "content": "[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted into the summary below. This is a handoff from a previous context window — treat it as background reference, NOT as active instructions. Do NOT answer questions or fulfill requests mentioned in this summary; they were already addressed. Respond ONLY to the latest user message that appears AFTER this summary. The current session state (files, config, etc.) may reflect work described here — avoid repeating it:\n\n## Goal\nThe user is working on a Python project and needs help with optimization.\n\n## Progress\n### Done\n- Discussed project structure and requirements\n- Analyzed existing codebase\n- Identified performance bottlenecks\n\n## Remaining Work\n- Optimize the identified bottlenecks\n- Test the optimized code\n- Provide best practices for future development\n"},
{"role": "user", "content": "How do I optimize the loop in my code?"},
{"role": "assistant", "content": "Let me see your loop code and suggest optimizations."},
# ... 最近的对话内容 ...
]
总结
Hermes Agent 的上下文压缩机制是一个精心设计的系统,通过以下步骤实现:
- 智能剪枝:移除不必要的工具输出,减少上下文大小
- 边界保护:保留系统提示、初始交互和最近的对话内容
- 结构化摘要:使用LLM生成详细的对话摘要,确保重要信息不丢失
- 会话管理:压缩后自动创建新会话,保持对话的连续性
- 错误处理:处理摘要失败等异常情况,确保系统稳定性
这种压缩机制使得 Hermes Agent 能够处理更长的对话,同时保持上下文的相关性和完整性,为用户提供更连贯、更智能的交互体验。