Hermes Agent 上下文压缩机制分析核心压缩算法 Hermes Agent 实现了一个智能的上下文压缩系统，用于

核心压缩算法

Hermes Agent 实现了一个智能的上下文压缩系统，用于处理长对话时的上下文窗口限制问题。压缩算法包含以下步骤：

预压缩处理：剪枝旧工具结果，用占位符替换冗长的工具输出
边界确定：
- 保护头部消息（系统提示 + 第一次交互）
- 通过令牌预算保护尾部消息（最近的 ~20K 令牌）
摘要生成：使用结构化 LLM 提示总结中间轮次
迭代更新：在重新压缩时，更新先前的摘要而非从头开始
工具对清理：修复压缩后孤立的工具调用/结果对

关键实现细节

1. 压缩触发条件

def should_compress(self, prompt_tokens: int = None) -> bool:
    """Check if context exceeds the compression threshold."""
    tokens = prompt_tokens if prompt_tokens is not None else self.last_prompt_tokens
    return tokens >= self.threshold_tokens

压缩阈值默认为模型上下文长度的 50%，但会设置一个最小值（MINIMUM_CONTEXT_LENGTH）。

2. 工具结果剪枝

def _prune_old_tool_results(self, messages: List[Dict[str, Any]], protect_tail_count: int,
                           protect_tail_tokens: int | None = None) -> tuple[List[Dict[str, Any]], int]:
    """Replace old tool result contents with a short placeholder."""
    # 实现细节...

这是一个廉价的预处理步骤，无需 LLM 调用，通过替换旧工具结果为占位符来减少上下文大小。

3. 尾部边界确定

def _find_tail_cut_by_tokens(self, messages: List[Dict[str, Any]], head_end: int,
                            token_budget: int | None = None) -> int:
    """Walk backward from the end of messages, accumulating tokens until the budget is reached."""
    # 实现细节...

使用令牌预算而非固定消息数来保护尾部消息，确保最近的重要上下文得以保留。

4. 结构化摘要生成

def _generate_summary(self, turns_to_summarize: List[Dict[str, Any]], focus_topic: str = None) -> Optional[str]:
    """Generate a structured summary of conversation turns."""
    # 实现细节...

使用结构化模板生成摘要，包括：

目标
约束与偏好
进度（已完成、进行中、阻塞）
关键决策
已解决问题
待处理用户请求
相关文件
剩余工作
关键上下文
工具与模式

5. 主压缩流程

def compress(self, messages: List[Dict[str, Any]], current_tokens: int = None, focus_topic: str = None) -> List[Dict[str, Any]]:
    """Compress conversation messages by summarizing middle turns."""
    # 实现细节...

这是压缩的主入口点，协调所有压缩步骤并返回压缩后的消息列表。

会话管理与压缩集成

在 run_agent.py 中，_compress_context 方法处理压缩后的会话管理：

def _compress_context(self, messages: list, system_message: str, *, approx_tokens: int = None, task_id: str = "default", focus_topic: str = None) -> tuple:
    """Compress conversation context and split the session in SQLite."""
    # 预压缩内存刷新
    self.flush_memories(messages, min_turns=0)
    
    # 通知外部内存提供者
    if self._memory_manager:
        try:
            self._memory_manager.on_pre_compress(messages)
        except Exception:
            pass
    
    # 执行压缩
    compressed = self.context_compressor.compress(messages, current_tokens=approx_tokens, focus_topic=focus_topic)
    
    # 会话分割与管理
    if self._session_db:
        try:
            # 传播标题到新会话并自动编号
            old_title = self._session_db.get_session_title(self.session_id)
            self._session_db.end_session(self.session_id, "compression")
            old_session_id = self.session_id
            self.session_id = f"{datetime.now().strftime('%Y%m%d_%H%M%S')}_{uuid.uuid4().hex[:6]}"
            # 更新会话日志文件路径
            self.session_log_file = self.logs_dir / f"session_{self.session_id}.json"
            # 创建新会话
            self._session_db.create_session(
                session_id=self.session_id,
                source=self.platform or os.environ.get("HERMES_SESSION_SOURCE", "cli"),
                model=self.model,
                parent_session_id=old_session_id,
            )
            # 自动编号标题
            if old_title:
                try:
                    new_title = self._session_db.get_next_title_in_lineage(old_title)
                    self._session_db.set_session_title(self.session_id, new_title)
                except (ValueError, Exception) as e:
                    logger.debug("Could not propagate title on compression: %s", e)
            self._session_db.update_system_prompt(self.session_id, new_system_prompt)
            # 重置刷新光标
            self._last_flushed_db_idx = 0
        except Exception as e:
            logger.warning("Session DB compression split failed — new session will NOT be indexed: %s", e)

压缩触发时机

在 run_conversation 方法中，系统会在以下时机触发压缩：

预压缩检查：在发送请求前检查上下文大小
上下文压力处理：当上下文接近阈值时
模型切换后：更新上下文压缩器以适应新模型
手动触发：通过 /compress 命令手动触发

技术特点与优势

分层压缩策略：先进行廉价的工具结果剪枝，再进行LLM摘要
结构化摘要：使用详细的模板确保重要信息不丢失
迭代摘要更新：保留之前的摘要信息，避免信息丢失
令牌预算管理：基于令牌数而非消息数来保护尾部上下文
焦点主题引导：支持通过 /compress <focus> 命令引导压缩，优先保留与焦点相关的信息
会话分割：压缩后自动创建新会话，保持会话的可管理性
错误处理：当摘要生成失败时，插入静态回退标记

代码优化建议

摘要质量监控：添加摘要质量评估机制，当摘要质量低时调整压缩策略
自适应压缩阈值：根据对话类型和内容自动调整压缩阈值
多模型摘要：尝试使用不同模型进行摘要，选择最佳结果
用户可配置性：允许用户配置压缩参数，如保护消息数、摘要比例等
压缩历史记录：记录压缩历史，以便分析压缩效果和优化策略

输入输出示例

输入：长对话历史（超过模型上下文限制）

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "I need help with a Python project."},
    {"role": "assistant", "content": "Sure, what do you need help with?"},
    # ... 大量对话内容 ...
    {"role": "user", "content": "How do I optimize this code?"},
    {"role": "assistant", "content": "Let me analyze your code and suggest optimizations."},
    # ... 更多对话内容 ...
]

输出：压缩后的对话历史

[
    {"role": "system", "content": "You are a helpful assistant.\n\n[Note: Some earlier conversation turns have been compacted into a handoff summary to preserve context space. The current session state may still reflect earlier work, so build on that summary and state rather than re-doing work.]"},
    {"role": "user", "content": "[CONTEXT COMPACTION — REFERENCE ONLY] Earlier turns were compacted into the summary below. This is a handoff from a previous context window — treat it as background reference, NOT as active instructions. Do NOT answer questions or fulfill requests mentioned in this summary; they were already addressed. Respond ONLY to the latest user message that appears AFTER this summary. The current session state (files, config, etc.) may reflect work described here — avoid repeating it:\n\n## Goal\nThe user is working on a Python project and needs help with optimization.\n\n## Progress\n### Done\n- Discussed project structure and requirements\n- Analyzed existing codebase\n- Identified performance bottlenecks\n\n## Remaining Work\n- Optimize the identified bottlenecks\n- Test the optimized code\n- Provide best practices for future development\n"},
    {"role": "user", "content": "How do I optimize the loop in my code?"},
    {"role": "assistant", "content": "Let me see your loop code and suggest optimizations."},
    # ... 最近的对话内容 ...
]

总结

Hermes Agent 的上下文压缩机制是一个精心设计的系统，通过以下步骤实现：

智能剪枝：移除不必要的工具输出，减少上下文大小
边界保护：保留系统提示、初始交互和最近的对话内容
结构化摘要：使用LLM生成详细的对话摘要，确保重要信息不丢失
会话管理：压缩后自动创建新会话，保持对话的连续性
错误处理：处理摘要失败等异常情况，确保系统稳定性

这种压缩机制使得 Hermes Agent 能够处理更长的对话，同时保持上下文的相关性和完整性，为用户提供更连贯、更智能的交互体验。