OpenClaw LLM Request Timed Out 怎么解决？踩坑 3 天，整理了 4 种方案作者在使用Ope

上周用 OpenClaw 跑批量文本分析，控制台疯狂刷 LLM request timed out，整个 pipeline 直接瘫了。

如果你也踩到这个坑，核心原因通常是这几个：单次请求 token 量太大导致推理超时、并发打满了速率限制、或者 OpenClaw 底层调用的模型服务本身不稳。解决方向包括调整超时参数和重试策略、拆分长文本、做请求队列控流、换一个更稳定的 API 通道。下面是我排查两天后整理的 4 种方案，亲测有效。

为什么会出现这个问题

OpenClaw 本质上是个 token 消耗型工具，底层调大模型 API 做推理。LLM request timed out 的意思就是：它发出去的请求，在规定时间内没拿到响应，直接断了。

常见触发场景：

单次 token 太多：输入 + 输出超过模型的甜区，推理时间拉长，超过默认超时阈值
并发量太高：短时间大量请求涌过去，触发速率限制，请求排队等太久
底层模型服务抖动：OpenClaw 依赖的模型 API 本身有延迟，高峰期尤其明显
网络链路问题：你的服务器 → OpenClaw → 模型服务商，链路长，任何一环慢了都会超时

graph LR
 A[你的代码] -->|请求| B[OpenClaw]
 B -->|转发| C[底层 LLM API]
 C -->|推理| D[模型响应]
 D -->|返回| B
 B -->|返回| A
 
 style B fill:#f9a825,stroke:#f57f17
 C -.->|超时风险点1: 模型推理慢| E[Timeout!]
 B -.->|超时风险点2: 速率限制排队| E
 A -.->|超时风险点3: 网络延迟| E

方案一：调大超时时间 + 加重试机制

最直接的方案。很多时候不是真挂了，就是默认超时太短。

import time
import requests
from requests.adapters import HTTPAdapter
from urllib3.util.retry import Retry

def call_openclaw_with_retry(payload, max_retries=3, timeout=120):
 """带重试和超时控制的 OpenClaw 调用"""
 session = requests.Session()
 
 # 配置重试策略：遇到 429/500/502/503/504 自动重试
 retry_strategy = Retry(
 total=max_retries,
 backoff_factor=2, # 指数退避：2s, 4s, 8s
 status_forcelist=[429, 500, 502, 503, 504],
 )
 adapter = HTTPAdapter(max_retries=retry_strategy)
 session.mount("https://", adapter)
 
 try:
 response = session.post(
 "https://api.openclaw.example/v1/chat/completions",
 json=payload,
 timeout=timeout, # 从默认30s调到120s
 headers={"Authorization": "Bearer your-key"}
 )
 response.raise_for_status()
 return response.json()
 except requests.exceptions.Timeout:
 print(f"请求超时（{timeout}s），已重试 {max_retries} 次")
 return None
 except requests.exceptions.RequestException as e:
 print(f"请求失败: {e}")
 return None

# 使用
result = call_openclaw_with_retry({
 "model": "gpt-5",
 "messages": [{"role": "user", "content": "你的 prompt"}],
 "max_tokens": 2000
})

把超时从 30s 调到 120s，加上 3 次指数退避重试后，我的批量任务成功率从 60% 涨到 85% 左右。但还有 15% 会失败，说明不只是超时的问题。

方案二：拆分长文本，控制单次 token 量

这个才是我排查到的真正元凶。之前图省事，把整篇文档（大概 8000 token）一次性丢进去让模型做摘要 + 分析，推理时间直接飙到 40-60 秒，超时概率极高。

def chunk_text(text, max_chunk_size=2000):
 """按段落拆分文本，每块不超过 max_chunk_size 字符"""
 paragraphs = text.split('\n\n')
 chunks = []
 current_chunk = ""
 
 for para in paragraphs:
 if len(current_chunk) + len(para) < max_chunk_size:
 current_chunk += para + "\n\n"
 else:
 if current_chunk:
 chunks.append(current_chunk.strip())
 current_chunk = para + "\n\n"
 
 if current_chunk:
 chunks.append(current_chunk.strip())
 
 return chunks

def process_long_document(document, system_prompt):
 """分块处理长文档，最后合并结果"""
 chunks = chunk_text(document, max_chunk_size=2000)
 results = []
 
 for i, chunk in enumerate(chunks):
 print(f"处理第 {i+1}/{len(chunks)} 块...")
 result = call_openclaw_with_retry({
 "model": "gpt-5",
 "messages": [
 {"role": "system", "content": system_prompt},
 {"role": "user", "content": chunk}
 ],
 "max_tokens": 1000 # 控制输出长度
 })
 if result:
 results.append(result["choices"][0]["message"]["content"])
 time.sleep(1) # 块间加间隔，避免触发速率限制
 
 return "\n\n".join(results)

把单次请求的 token 控制在 3000 以内（输入 2000 + 输出 1000），超时率直接降到 5% 以下。代价是需要多次请求，但胜在稳定。

方案三：加请求队列，控制并发

同时发 20 个请求，基本上是在找死。OpenClaw 底层的速率限制会让大部分请求排队，排着排着就超时了。

import asyncio
from asyncio import Semaphore

async def call_with_semaphore(semaphore, payload, session):
 """带并发控制的异步调用"""
 async with semaphore:
 try:
 async with session.post(
 "https://api.openclaw.example/v1/chat/completions",
 json=payload,
 timeout=aiohttp.ClientTimeout(total=120)
 ) as resp:
 return await resp.json()
 except asyncio.TimeoutError:
 print("单个请求超时，跳过")
 return None

async def batch_process(payloads, max_concurrent=3):
 """批量处理，限制最大并发数"""
 semaphore = Semaphore(max_concurrent) # 最多同时 3 个请求
 
 import aiohttp
 async with aiohttp.ClientSession(
 headers={"Authorization": "Bearer your-key"}
 ) as session:
 tasks = [call_with_semaphore(semaphore, p, session) for p in payloads]
 results = await asyncio.gather(*tasks)
 
 return results

# 使用：20 个任务，但同时只跑 3 个
# asyncio.run(batch_process(my_payloads, max_concurrent=3))

并发从 20 降到 3，超时率从 40% 降到 8%。总耗时变长了，但结果是完整的，不用反复补跑失败任务。

方案四：换一个更稳定的 API 通道

前三个方案说白了都是在"忍"。如果 OpenClaw 底层调用的模型服务本身就不稳定，怎么调参数都是治标不治本。

对超时敏感的任务，我后来直接走稳定性更好的 API 聚合通道。ofox.ai 是一个 AI 模型聚合平台，一个 API Key 可以调用 GPT-5、Claude Opus 4.6、Gemini 3、DeepSeek V3 等 50+ 模型，低延迟直连约 300ms，多供应商冗余备份（Azure/Bedrock/阿里云/火山引擎），支持支付宝/微信付款，按量计费。

改动量极小，只换 base_url 和 api_key：

from openai import OpenAI

# 原来调 OpenClaw 底层模型经常超时
# 换成聚合接口，一个 Key 用所有模型
client = OpenAI(
 api_key="your-ofox-key",
 base_url="https://api.ofox.ai/v1"
)

response = client.chat.completions.create(
 model="gpt-5", # 或 claude-opus-4.6、deepseek-v3 等
 messages=[
 {"role": "user", "content": "你的 prompt"}
 ],
 max_tokens=2000,
 timeout=60 # 正常 60s 绰绰有余
)

print(response.choices[0].message.content)

同样的 prompt，同样的模型，走聚合接口的平均响应时间在 3-8 秒，跑了 200 条数据零超时。就是通道稳定性的差距。

我的最终选择

方案	改动量	超时改善效果	适用场景
调超时 + 重试	小	中等（85%成功率）	偶尔超时，任务不密集
拆分长文本	中	好（95%成功率）	单次 token 量大
并发控制	中	好（92%成功率）	批量任务
换稳定 API 通道	小	最好（99%+成功率）	对稳定性要求高

我现在用的是方案二 + 方案四：长文本先拆块，然后走稳定的聚合 API 通道。跑了一周批量任务，超时报错从每天几十条降到了零。

最后说句掏心窝的：LLM request timed out 看起来简单，背后可能是 token 量、并发、网络链路、模型服务稳定性几个问题叠在一起。别只盯着一个方向调，先加日志把每次请求的 token 数和响应时间记下来，定位到真正的瓶颈再动手，能省不少时间。