Agent 性能优化：降低 Token 消耗的 5 个技巧Agent 性能优化：降低 Token 消耗的 5 个技巧 �

Agent 性能优化：降低 Token 消耗的 5 个技巧

系列文章： 《AI Agent 开发实战》第 7 期
难度等级： ⭐⭐⭐⭐
预计耗时： 35 分钟

🎯 本文目标

学会优化 AI Agent 性能：

✅ 减少 Token 消耗
✅ 提高响应速度
✅ 降低 API 成本
✅ 提升用户体验

📊 成本分析

Token 消耗构成

总 Token = 输入 Token + 输出 Token

输入 Token：
- 系统提示词（100-500）
- 对话历史（可变）
- 工具描述（100-1000）
- 用户输入（可变）

输出 Token：
- Agent 回复（100-1000）

成本计算

假设：

GPT-4: $0.03/1K tokens (输入)
GPT-4: $0.06/1K tokens (输出)
日均调用：1000 次
平均消耗：2000 tokens/次

月度成本：

日消耗：1000 * 2000 = 2,000,000 tokens
月消耗：2M * 30 = 60M tokens
月成本：60M * $0.03/1000 ≈ $1,800

优化后（减少 50%）：

月成本：$900
节省：$900/月

🔧 优化技巧

技巧 1：精简提示词

❌ 低效：

instruction = """
你是一个智能助手，你的任务是帮助用户解决问题。
你应该友好、专业、准确地回答用户的问题。
如果遇到问题，要诚实地告诉用户你不知道。
不要编造信息，不要提供错误的建议。
你的回答应该简洁明了，不要啰嗦。
...（500 字）
"""

✅ 高效：

instruction = "你是助手，友好专业，准确回答，不知则说不知"

效果：

提示词：500 tokens → 20 tokens
节省：96%
效果：相当

技巧 2：智能裁剪对话历史

❌ 低效：

# 保留所有历史
full_history = get_all_messages()

✅ 高效：

# 只保留最近 N 轮
recent_history = get_recent_messages(max_turns=10)

# 或者使用摘要
if len(history) > threshold:
    summary = summarize_old_messages()
    context = summary + recent_messages

效果：

历史：2000 tokens → 500 tokens
节省：75%

技巧 3：优化工具描述

❌ 低效：

tool_description = """
这个工具可以用来搜索网络信息。
当你需要查询最新的数据、新闻、或者其他实时信息时，
应该使用这个工具。使用方法是传入一个搜索关键词，
工具会返回相关的搜索结果。搜索结果包括标题、链接和摘要。
...（300 字）
"""

✅ 高效：

tool_description = "搜索网络：query=关键词，返回搜索结果"

效果：

描述：300 tokens → 15 tokens
节省：95%

技巧 4：流式输出

❌ 低效：

# 等待完整回复
response = agent.run(prompt)
print(response)

✅ 高效：

# 流式输出
for chunk in agent.run_stream(prompt):
    print(chunk, end='')
    # 用户可以提前看到内容

效果：

用户体验提升
可减少不必要的生成

技巧 5：缓存结果

❌ 低效：

# 每次都调用 API
def answer(question):
    return agent.run(question)

✅ 高效：

from functools import lru_cache

@lru_cache(maxsize=1000)
def answer(question):
    return agent.run(question)

效果：

重复问题：0 tokens
缓存命中率：30-50%

💻 实战优化

案例 1：客服机器人

优化前：

class CustomerServiceBot:
    def __init__(self):
        self.instruction = """
        你是一个专业的客服助手...（1000 字）
        """
        self.memory = Memory(max_turns=50)
    
    def chat(self, user_input):
        return self.agent.run(f"{self.instruction}\n用户：{user_input}")

Token 消耗： 1500/次

优化后：

class CustomerServiceBot:
    def __init__(self):
        self.instruction = "客服助手，专业解答产品问题"
        self.memory = Memory(max_turns=10)
        self.cache = {}
    
    def chat(self, user_input):
        # 缓存
        if user_input in self.cache:
            return self.cache[user_input]
        
        # 精简上下文
        context = f"{self.instruction}\n{self.get_recent()}"
        response = self.agent.run(f"{context}\n用户：{user_input}")
        
        # 缓存结果
        self.cache[user_input] = response
        return response

Token 消耗： 500/次
节省： 67%

案例 2：数据分析助手

优化前：

def analyze_data(data):
    prompt = f"""
    请分析以下数据：
    {data}
    
    分析要求：
    1. 计算基本统计指标
    2. 找出异常值
    3. 识别趋势
    4. 提供建议
    ...（详细要求）
    """
    return agent.run(prompt)

Token 消耗： 3000/次

优化后：

def analyze_data(data):
    # 数据摘要
    summary = {
        "count": len(data),
        "mean": sum(data)/len(data),
        "min": min(data),
        "max": max(data)
    }
    
    prompt = f"分析数据：{summary}，找异常和趋势"
    return agent.run(prompt)

Token 消耗： 800/次
节省： 73%

🎓 高级优化

1. 模型选择策略

class SmartModelSelector:
    def __init__(self):
        self.simple_model = "gpt-3.5-turbo"  # 便宜
        self.complex_model = "gpt-4"         # 强大
    
    def select_model(self, task: str):
        # 简单任务用小模型
        if self.is_simple(task):
            return self.simple_model
        else:
            return self.complex_model
    
    def is_simple(self, task: str) -> bool:
        simple_keywords = ["问候", "感谢", "简单计算"]
        return any(k in task for k in simple_keywords)

效果：

简单任务成本： $0.002 →$ 0.0005
节省：75%

2. 批量处理

# ❌ 逐个处理
for question in questions:
    answer = agent.run(question)

# ✅ 批量处理
batch_prompt = "\n".join([f"Q{i}: {q}" for i, q in enumerate(questions)])
batch_response = agent.run(f"回答以下问题：{batch_prompt}")
answers = parse_batch_response(batch_response)

效果：

API 调用：100 次 → 1 次
节省：90%

3. 提前终止

def run_with_early_stop(prompt, max_tokens=1000):
    response = ""
    for chunk in agent.run_stream(prompt):
        response += chunk
        
        # 提前终止条件
        if "总结：" in response and len(response) > 500:
            break
        
        if len(response) > max_tokens:
            break
    
    return response

📊 优化效果对比

优化技巧	优化前	优化后	节省
精简提示词	500 tokens	20 tokens	96%
裁剪历史	2000 tokens	500 tokens	75%
优化工具	300 tokens	15 tokens	95%
缓存结果	100% 调用	50% 调用	50%
模型选择	$0.03/次	$0.005/次	83%
综合优化	3000 tokens	600 tokens	80%

💰 成本节省计算

优化前

日均调用：1000 次
平均 Token：3000/次
日消耗：3,000,000 tokens
月消耗：90,000,000 tokens
月成本：$2,700

优化后

日均调用：1000 次
平均 Token：600/次
日消耗：600,000 tokens
月消耗：18,000,000 tokens
月成本：$540

节省

月节省：$2,160
年节省：$25,920

📚 系列总结

AI Agent 系列（7 篇）完成！

期数	主题	字数	状态
第 1 期	30 分钟搭建第一个 Agent	4,471 字	✅
第 2 期	记忆系统实现	8,227 字	✅
第 3 期	工具调用能力	9,008 字	✅
第 4 期	多 Agent 协作	8,592 字	✅
第 5 期	LangChain vs Google ADK	5,042 字	✅
第 6 期	Agent 自动写代码	待发布
第 7 期	性能优化	6,000+ 字	✅

总计： 约 41,000 字

AI Agent 系列完结！感谢支持！🎉

觉得有用？点赞 👍 收藏 ⭐ 关注 ➕ 三连支持一下！