Hermes Agent API 调用与错误处理机制分析Hermes Agent API 调用是 Hermes Agen

Hermes Agent API 调用是 Hermes Agent 的核心部分，主要负责处理与 LLM 提供商的 API 交互、错误处理和恢复策略。这部分代码实现了一个健壮的 API 调用系统，确保在各种错误情况下能够优雅地处理和恢复。

核心功能

1. API 调用与重试机制

while retry_count < max_retries:
    try:
        self._reset_stream_delivery_tracking()
        api_kwargs = self._build_api_kwargs(api_messages)
        # ...
        if _use_streaming:
            response = self._interruptible_streaming_api_call(
                api_kwargs, on_first_delta=_stop_spinner
            )
        else:
            response = self._interruptible_api_call(api_kwargs)
        # ...

流式 API 调用：优先使用流式 API 调用，即使没有流式消费者，因为它提供更好的健康检查和超时检测
重试机制：实现了带有指数退避和抖动的重试策略，处理临时错误

2. 错误处理与分类

classified = classify_api_error(
    api_error,
    provider=getattr(self, "provider", "") or "",
    model=getattr(self, "model", "") or "",
    approx_tokens=approx_tokens,
    context_length=_ctx_len,
    num_messages=len(api_messages) if api_messages else 0,
)

错误分类：将 API 错误分类为不同类型，如速率限制、上下文溢出、有效载荷过大等
针对性处理：根据错误类型采取不同的恢复策略

3. 恢复策略

3.1 上下文压缩

if is_payload_too_large:
    compression_attempts += 1
    if compression_attempts > max_compression_attempts:
        # ...
    original_len = len(messages)
    messages, active_system_prompt = self._compress_context(
        messages, system_message, approx_tokens=approx_tokens,
        task_id=effective_task_id,
    )
    # ...

当遇到有效载荷过大或上下文长度超限错误时，自动压缩对话历史
压缩后创建新会话，确保历史记录的连续性

3.2 模型回退

if is_rate_limited and self._fallback_index < len(self._fallback_chain):
    # ...
    self._emit_status("⚠️ Rate limited — switching to fallback provider...")
    if self._try_activate_fallback():
        retry_count = 0
        compression_attempts = 0
        primary_recovery_attempted = False
        continue

当遇到速率限制或其他无法恢复的错误时，切换到备用模型
实现了模型回退链，确保服务的连续性

3.3 凭证池轮换

recovered_with_pool, has_retried_429 = self._recover_with_credential_pool(
    status_code=status_code,
    has_retried_429=has_retried_429,
    classified_reason=classified.reason,
    error_context=error_context,
)
if recovered_with_pool:
    continue

当遇到认证错误或速率限制时，尝试使用凭证池中的其他凭证

4. 令牌使用跟踪与会话管理

if hasattr(response, 'usage') and response.usage:
    canonical_usage = normalize_usage(
        response.usage,
        provider=self.provider,
        api_mode=self.api_mode,
    )
    prompt_tokens = canonical_usage.prompt_tokens
    completion_tokens = canonical_usage.output_tokens
    total_tokens = canonical_usage.total_tokens
    # ...
    self.session_prompt_tokens += prompt_tokens
    self.session_completion_tokens += completion_tokens
    self.session_total_tokens += total_tokens
    # ...

跟踪 API 调用的令牌使用情况
持久化令牌计数到会话数据库
估算 API 调用成本

5. 特殊错误处理

5.1 推理预算耗尽

_thinking_exhausted = (
    not _trunc_has_tool_calls
    and _has_think_tags
    and (
        (_trunc_content is not None and not self._has_content_after_think_block(_trunc_content))
        or _trunc_content is None
    )
)

检测模型是否将所有输出令牌都用于推理，没有留下令牌用于实际响应
提供用户友好的错误消息和解决方案

5.2 思维块签名错误

if (
    classified.reason == FailoverReason.thinking_signature
    and not thinking_sig_retry_attempted
):
    thinking_sig_retry_attempted = True
    for _m in messages:
        if isinstance(_m, dict):
            _m.pop("reasoning_details", None)
    # ...

处理 Anthropic 思维块签名无效的情况
移除所有思维块，然后重试

技术亮点

多层错误处理：实现了精细的错误分类和针对性处理策略
自动上下文管理：当上下文过大时自动压缩，确保对话能够继续
弹性恢复机制：通过凭证池、模型回退等多种方式提高系统可靠性
用户体验优先：提供详细的错误信息和恢复建议，增强用户体验
资源使用跟踪：详细记录令牌使用和成本估算，帮助用户了解资源消耗
中断处理：在长时间操作中响应用户中断请求，提高交互性

代码优化建议

错误处理模块化：将错误处理逻辑进一步模块化，提高代码可维护性
异步错误处理：对于某些长时间运行的恢复操作，考虑使用异步处理
自适应重试策略：根据错误类型和历史成功率调整重试策略
错误预测：基于历史数据预测可能的错误类型，提前采取预防措施
监控与告警：添加更详细的监控和告警机制，及时发现和解决问题

输入输出示例

输入：用户发送了一个复杂的请求，导致上下文长度超过模型限制

messages = [
    {"role": "system", "content": "You are a helpful assistant."},
    {"role": "user", "content": "I need help with a complex Python project. Here's the code... [very long code]"},
    # ... 更多对话内容，导致上下文长度超限 ...
    {"role": "user", "content": "Now I want to optimize this code."}
]

输出：系统自动压缩上下文并继续对话

⚠️  Context length exceeded — stepping down: 1000000 → 200000 tokens
🗜️ Context too large (~250000 tokens) — compressing (1/3)...
🗜️ Compressed 50 → 10 messages, retrying...
⏱️  API call completed in 4.2s

最终响应：

I've analyzed your code and identified several optimization opportunities:

1. **Algorithm optimization**: The nested loops in your `process_data` function can be replaced with vectorized operations using NumPy.

2. **Memory usage**: You're loading the entire dataset into memory. Consider using generators or batch processing for large files.

3. **Caching**: Implement memoization for frequently called functions with expensive computations.

Would you like me to implement any of these optimizations?

总结

这段代码是 Hermes Agent 的核心部分，实现了一个健壮的 API 调用和错误处理系统。它通过多种恢复策略（如上下文压缩、模型回退、凭证池轮换）确保在各种错误情况下能够继续提供服务，同时通过详细的错误信息和用户友好的提示提高用户体验。

这种设计使得 Hermes Agent 能够处理复杂的对话场景，即使在遇到 API 错误或资源限制时也能保持服务的连续性和可靠性。这是一个精心设计的系统，展示了如何构建一个健壮的 AI 代理，能够应对各种挑战并提供一致的用户体验。