【CrewAI系列7】14 年测试老兵：我用 AI Agent 做性能测试，发现了 1 个致命瓶颈开篇：一个让我意外的性

**
作者**：测试员周周（14 年测试/QA 老兵）

系列：CrewAI 多 Agent 测试框架实战（第 7 篇，暂定24篇）

字数：约 4,500 字

阅读时间：11 分钟

声明：本文所有测试数据均为真实执行，代码来自自研系统

开篇：一个让我意外的性能测试结果

今天我对自己的 crewai-web-platform 系统做性能测试。

测试前，我信心满满：

FastAPI 框架，性能应该不错
本地部署，没有网络延迟
接口简单，只是返回 JSON

测试结果让我意外：

场景 1：健康检查接口（简单 GET） ✅ P95: 105ms ✅ QPS: 166

场景 2：详细健康检查（复杂 GET）
❌ P95: 2137ms（是场景 1 的 20 倍！）
❌ QPS: 9.38（下降了 94%！）

同一个系统，同样并发，为什么性能差了 20 倍？ 这就是性能测试的价值。

1. 性能测试的 3 个常见误区

误区 1：功能正常 = 性能 OK

❌ 错误想法： "接口能返回 200，应该没问题"

✅ 真实情况：

接口返回 200，但响应时间 3 秒95% 的用户在等待中流失系统没崩溃，但体验极差

功能测试保证系统能用，性能测试保证系统好用。

误区 2：上线后再优化

❌ 错误做法： "先上线，有问题再优化"

✅ 真实代价：

上线后用户已经流失紧急修复可能引入新 Bug架构问题很难事后修复

误区 3：性能测试很复杂

❌ 传统认知：
要学 JMeter要写压测脚本要分析复杂报告
✅ AI Agent 方案：
用自然语言描述测试场景AI 自动执行并生成报告瓶颈分析直接给出建议

2. PerformanceTestTool：我的性能测试武器

这是我在 crewai-web-platform 系统中实现的工具：

2.1 核心代码（精简展示）

完整源码见：/crewai-web-platform/backend/app/tools/performance_test.py

python from crewai.tools import BaseTool import requests, time, uuid, threading from concurrent.futures import ThreadPoolExecutor, as_completed import statistics from typing import Dict, Any, List, Optional

class PerformanceTestTool(BaseTool):
    """性能测试工具类（优化版）"""

    name: str = "performance_test"
    description: str = "执行接口性能测试，支持并发请求和压力测试"

    def _run(
        self, url: str, method: str = "GET",
        concurrent_users: int = 10, iterations: int = 100,
        body: Optional[Dict[str, Any]] = None,
        headers: Optional[Dict[str, str]] = None,
        enable_cache_bust: bool = True
    ) -> Dict[str, Any]:
        """执行性能测试（优化版）"""

        # 1. 并发启动控制（CountDownLatch 等效）
        start_gate = threading.Event()

        response_times = []
        success_count = error_count = 0
        status_codes = {}
        lock = threading.Lock()

        # 2. 准确 QPS 计算
        actual_start_time = actual_end_time = None
        time_lock = threading.Lock()

        def send_request(request_id):
            nonlocal success_count, error_count, actual_start_time, actual_end_time

            start_gate.wait()  # 等待并发启动信号

            with time_lock:
                if actual_start_time is None:
                    actual_start_time = time.time()

            try:
                # 3. 防缓存 + 动态参数化
                test_url = url
                if enable_cache_bust:
                    test_url = f"{url}{'&' if '?' in url else '?'}_cache_bust={uuid.uuid4()}"

                req_start = time.time()
                response = requests.request(method, test_url, json=body, timeout=60)
                req_end = time.time()

                with lock:
                    response_times.append((req_end - req_start) * 1000)
                    if response.status_code == 200:
                        success_count += 1
                    else:
                        error_count += 1
                    status_codes[response.status_code] = \
                        status_codes.get(response.status_code, 0) + 1

                with time_lock:
                    actual_end_time = time.time()
            except Exception:
                with lock:
                    error_count += 1
                with time_lock:
                    actual_end_time = time.time()

        # 4. 并发执行
        with ThreadPoolExecutor(max_workers=concurrent_users) as executor:
            futures = [executor.submit(send_request, i) for i in range(iterations)]
            start_gate.set()  # 所有任务提交后，同时启动
            for future in as_completed(futures):
                try:
                    future.result()
                except Exception:
                    pass  # 内部已处理，不重复计数

        # 5. 计算结果
        actual_total_time = (actual_end_time - actual_start_time) if actual_start_time else 0

        if response_times:
            sorted_times = sorted(response_times)
            return {
                "total_requests": iterations,
                "success_rate": f"{success_count/iterations*100:.2f}%",
                "p95_response_time_ms": round(sorted_times[int(len(sorted_times)*0.95)], 2),
                "qps": round(iterations / actual_total_time, 2),
                # ... 其他指标
            }
        return {"error": "所有请求失败"}

💡 核心优化点：

优化点	实现方式	效果
并发启动控制	`threading.Event()`	所有线程同时发起请求
准确 QPS 计算	记录首尾时间戳	排除队列等待时间
防缓存机制	UUID 参数	避免服务器缓存虚高
动态参数化	request_id	模拟真实用户行为
异常处理优化	内部计数	避免重复计算

📍 完整源码：/crewai-web-platform/backend/app/tools/performance_test.py

3. 真实测试：我的系统性能如何？

测试时间：2026-04-24 15:03:44（优化版） 测试系统：crewai-web-platform（FastAPI 后端） 测试环境：本地部署 优化点：CountDownLatch 并发控制 + 防缓存机制 + 准确 QPS 计算

3.1 场景 1：简单接口基准测试

python
测试配置
url = "http://localhost:8000/health" concurrent_users = 10 iterations = 50 enable_cache_bust = True  # 启用防缓存

真实结果（优化版）：

指标	数值	评价
P95 响应时间	105.86ms	✅ 良好
成功率	100.00%	✅ 完美
QPS	166.74	✅ 高

结论：简单接口性能良好，符合预期。

3.2 场景 2：复杂接口性能测试

python
测试配置
url = "http://localhost:8000/health/detailed" concurrent_users = 20 iterations = 50 enable_cache_bust = True  # 启用防缓存

真实结果（优化版）：

指标	数值	评价
P95 响应时间	2137.9ms	❌ 危险
成功率	100.00%	✅ 完美
QPS	9.38	❌ 很低

对比场景 1：

响应时间：105ms → 2137ms（增长 20 倍）
QPS：166 → 9.38（下降 94% ）

问题定位：

详细健康检查接口做了什么？

python @app.get("/health/detailed") async def detailed_health_check():     # 1. 采集 CPU 使用率（阻塞 0.1 秒）     cpu_percent = psutil.cpu_percent(interval=0.1)          # 2. 采集内存信息     memory = psutil.virtual_memory()          # 3. 采集磁盘信息     disk = psutil.disk_usage('/')          # 4. 采集进程信息     process = psutil.Process(os.getpid())     process_memory = process.memory_info().rss

瓶颈找到了： psutil.cpu_percent(interval=0.1) 每次调用阻塞 0.1 秒，并发时累积成 2 秒延迟！

3.3 场景 3：高并发压力测试

python
测试配置
url = "http://localhost:8000/health" concurrent_users = 50 iterations = 100 enable_cache_bust = True  # 启用防缓存

真实结果（优化版）：

指标	数值	评价
P95 响应时间	364.29ms	✅ 良好
成功率	100.00%	✅ 完美
QPS	212.37	✅ 更高

意外发现：并发从 10 增加到 50，QPS 不降反升（166→212）！ 原因分析：FastAPI 的异步特性，高并发时资源利用率更高。

4. 性能对比全景图

场景	接口	并发	P95	QPS	瓶颈
场景 1	`/health`	10	105ms	166	无
场景 2	`/health/detailed`	20	2137ms	9.38	psutil 阻塞
场景 3	`/health`	50	364ms	212	无

核心发现：

1. 简单接口性能良好：P95<150ms，QPS>160
2. 复杂接口有致命瓶颈：psutil 导致 20 倍性能下降
3. 系统并发能力良好：50 并发仍能保持 P95<400ms
4. 优化版 QPS 更准确：排除队列等待时间，反映真实性能

5. 性能优化的 5 个实战建议

基于这次测试，我总结了 5 个优化建议（部分已在我的系统中实现）：

建议 1：避免阻塞调用

python
❌ 错误写法（阻塞 0.1 秒）
cpu_percent = psutil.cpu_percent(interval=0.1)
✅ 正确写法（非阻塞）
cpu_percent = psutil.cpu_percent(interval=None)

我的系统已实现此优化。

建议 2 & 3：终极优化方案（异步 + 缓存）

结合异步执行和缓存机制，我重构了监控代码，彻底解决阻塞问题：

python import time import psutil import asyncio

class SystemMonitor:
    def __init__(self):
        self._cache = {"data": None, "expires_at": 0}

    def get_system_info_sync(self):
        """同步采集（耗时操作，在后台线程执行）"""
        return {
            "cpu": psutil.cpu_percent(interval=0.1),  # 💡 后台线程可用 interval，数据更准
            "memory": psutil.virtual_memory().percent,
            "disk": psutil.disk_usage('/').percent,
            "timestamp": time.time()
        }

    async def get_system_info(self, cache_ttl=5.0):
        """异步获取系统信息（带缓存）"""
        now = time.time()
        # 1. 检查缓存
        if now < self._cache["expires_at"] and self._cache["data"] is not None:
            return self._cache["data"]

        # 2. 异步执行耗时的采集操作，防止阻塞主线程
        loop = asyncio.get_running_loop()
        new_data = await loop.run_in_executor(None, self.get_system_info_sync)

        # 3. 更新缓存
        self._cache["data"] = new_data
        self._cache["expires_at"] = now + cache_ttl
        return new_data

在 FastAPI 中使用
monitor = SystemMonitor()

@app.get("/health")
async def health_check():
    try:
        # 设置 3 秒超时，防止采集卡死
        data = await asyncio.wait_for(monitor.get_system_info(), timeout=3.0)
        return {"status": "healthy", "metrics": data}
    except asyncio.TimeoutError:
        return {"status": "degraded", "message": "Metrics collection timeout"}

优化效果：

优化点	原理	效果
不阻塞主线程	`run_in_executor` 扔给后台线程	FastAPI 继续处理其他请求
减少采集频率	5 秒 TTL 缓存	避免高频调用 `psutil`
精准数据	后台线程可用 `interval=0.1`	解决首次返回 0.0 的问题
防雪崩	`asyncio.wait_for` 超时控制	采集卡死不影响接口响应

建议 4：设置超时时间

python
所有接口统一超时配置
@app.get("/health/detailed") async def detailed_health_check():     try:         # 设置 5 秒超时         result = await asyncio.wait_for(             collect_system_metrics(),             timeout=5.0         )         return result     except asyncio.TimeoutError:         return {"status": "timeout", "message": "采集超时"}

理论建议，我的系统尚未实现。

建议 5：持续监控

python
添加性能监控中间件
@app.middleware("http") async def add_performance_metrics(request, call_next):     start_time = time.time()     response = await call_next(request)     process_time = (time.time() - start_time) * 1000     response.headers["X-Process-Time"] = str(process_time)     return response

理论建议，我的系统尚未实现。

6. 性能测试的 5 个避坑指南（通用经验）

以下 5 个坑是我在 14 年测试生涯中总结的通用经验，不仅适用于本文场景：

坑 1：只看平均值

❌ 错误做法： 平均响应时间 100ms，达标！

✅ 正确做法：
P95 响应时间 2000ms，有 5% 用户体验极差！

平均值骗人，P95 不会。

坑 2：并发数设置不合理

❌ 错误做法： 并发 10 个用户，测出来没问题就上线

✅ 正确做法：
按预估流量的 1.5-2 倍设置并发数

坑 3：忽略网络延迟

❌ 错误做法： 只测本地，不测生产环境

✅ 正确做法：
在生产环境或准生产环境压测

坑 4：没有基线对比

❌ 错误做法： 测一次就完事

✅ 正确做法：
每次优化后都压测，用数据说话

坑 5：不模拟真实场景

❌ 错误做法： 只压测一个接口

✅ 正确做法：
模拟完整用户链路

7. 小结

核心要点：

1. 并发测试 - ThreadPoolExecutor 实现并发
2. 统计指标 - 平均值、P95、P99、QPS
3. 线程安全 - Lock 保护共享数据
4. 瓶颈定位 - 用对比测试找问题
5. 持续优化 - 基于数据做决策

总结：

性能测试不是走过场，而是确保系统稳定性的最后一道防线。

14 年测试经验告诉我：Bug 不是测出来的，是设计出来的。

但好的性能，是压出来的。

📚 系列文章索引

序号	文章	状态
01	CrewAI 入门指南	✅
02	Agent 角色设计方法论	✅
03	多 Agent 协作流程	✅
04	APITestTool 实现	✅
05	DatabaseTestTool 集成	✅
06	测试工具开发实战	✅
07	PerformanceTestTool 实现	✅ 本篇
08	UITestTool 集成 Selenium	📝 下一篇
09	工具的测试与验证	📝

作者说：

14 年测试生涯，我从手工测试做到自动化，再做到 AI 辅助测试。

工具在变，但测试的核心没变： 确保质量，对用户负责。

如果你也在做测试，或者对 AI+ 测试感兴趣，欢迎关注公众号「测试员周周」。

下一篇：UITestTool 集成 Selenium，让 AI 自动操作浏览器做 UI 测试。