1、性能测试简介

首先我们分析一下性能测试

什么是性能测试？

首先需要理解什么是性能，性能是指系统在高并发情况下能够满足他的功能，保证系统不会宕机，出现异常，响应速度在可接受范围内，每秒能够完成的请求数。
怎么做性能测试？

企业级性能测试大部分是通过工具来完成的，大家熟知的有jmeter，loadrunner等开源或者商用压测工具，随着近几年互联网的发展，第四十七次中国互联网统计报告数据显示中国网名9.89亿，同时业务场景变得更加复杂，电商，团购，短视频，即时通讯，地图，打车，外卖等领域越来越多。如果保障系统稳定性难度越来越大。

以下以电商平台为例，大致的1个用户下单到退货的场景，涵盖的流程至少10个以上，任何一个环节出现性能问题，都会导致用户体验上的极大影响

面对这么越来越的复杂的场景和需求，性能测试同样也有方案解决，美团推出Quake，字节推出Rhino等全链路压测平台，可惜的是2个工具均没有开源。

今天介绍的是locust这款压测工具，该工具是python3开发，使用微线程的方式（协程）实现高并发请求。

2、如何使用locust

如何安装 pip3 install locust
检查是否安装成功 import locust
开始使用

# api_stress.py
from locust import HttpUser, task, between


class ApiStress(HttpUser):
    
    # 表示每个线程启动后间隔时间后再次发送请求
    wait_time = between(1, 2)
    
    # 任务方法
    @task
    def hello(self):
        
        # self.client继承了http request类，发送http请求是通过request
        hello_res = self.client.get("/hello")
        
        # 断言http code码，如果不正确将会抛出异常
        assert hello_res.status_code == 200, "http code is not equal 200"
    
    # 每次启动时执行
    def on_start(self):
        print("start to run stress")
    
    # 结束时执行
    def on_stop(self):
        print("stop to run stress")

通过命令行启动locust任务

locust -f api_stress.py

配置线程数以及每次增加的线程数以及域名

在浏览器打开locust启动的本地服务http://127.0.0.1:8089/

在浏览器上查看结果

3、 locust图表介绍

结果首页展示，接口请求的列表信息，重点介绍几个重要字段

Line 90%：表示满足90%的用户的响应时间

Max:最大响应值

Current RPS: 当前qps

图标页结果展示

分析这个图表能发现大部分的压测问题，一般稳定的qps，不会出现响应时间不稳定的情况，通常，性能测试工程师最希望看到的就是像上面这样的图，说明在qps=66负载情况下，系统非常稳定，当然也不是绝对，我们仍然需要根据服务端的cpu，内存，io，jvm等指标综合看，比如我们在qps=66的时候，内存使用了10%，下降到qps=33的时候，内存使用仍然是10%，那就说明可能有问题了

4、locust使用剖析

每个需要发送http请求的locust的脚本都需要继承HttpUser

class ApiStress(HttpUser):

主要包含2个作用，第1是说明这个是1个http请求的压测脚本，httpUser初始化了1个以request.session的client，发送http请求时只需要调用self.client

wait_time = between(1, 2)

wait_time是指每次用户线程发送1次请求后都会间隔wait_time的时间再发送第2个请求

# 任务方法
@task(10)
def hello(self):

     # self.client继承了http request类，发送http请求是通过request
     hello_res = self.client.get("/hello")

     # 断言http code码，如果不正确将会抛出异常
     assert hello_res.status_code == 200, "http code is not equal 200"

@task表示该方法是1个任务，这个任务是权重是10，权重是指当多个任务之前权重高的请求次数多

self.client.get表示发送http get请求，也可以是self.client.post发送post请求

    # 每次启动时执行
    def on_start(self):
        print("start to run stress")

    # 结束时执行
    def on_stop(self):
        print("stop to run stress")

on_start是指每次启动的时候都会运行，比如接口如果需要鉴权可以放到这里 on_stop是指接口如果在压测完成后需要数据清理，可以放到这里

5、如果使用locust进行全链路压测呢

locust脚本

from locust import HttpUser, between, task


class GoodOrderFull(HttpUser):

    def on_start(self):
        pass

    def on_stop(self):
        pass

    @task
    def good_order_full(self):
        # 1.浏览商品
        good_list_api_res = self.client.get('/good_list')
        assert good_list_api_res.status_code == 200
        good_id = good_list_api_res.json()["data"]["list"][0]["id"]
        # 2.生成订单
        order_res = self.client.post("/create_order", json={"id": good_id})
        assert order_res.status_code == 200
        order_no = order_res.json().get("data")
        # 3.浏览订单
        order_list_res = self.client.get("/order_list", params={ "orderNo": order_no})
        assert order_list_res.status_code == 200
        assert order_list_res.json()["data"]["list"][0]["orderNo"] == order_no

订单链路说明
查看locust结果

查看服务器状态

通过arthas查看jvm进程内存使用情况

通过top命令查看服务器cpu和内存使用情况，cpu使用率较高了

性能分析通过以上压测发现，随着压测时间变长，数据量变多，接口响应时间变长，压测结果不通过，需要进行瓶颈分析

通过arthas java诊断工具开始进行分析，在服务器上启动arthas

java -jar arthas.jar

通过 trace命令查看时间链路耗时，查看结果 trace com.wujing.sds.controller.GoodController orderList

发现主要耗时在orderList这个方法，继续跟进 trace com.wujing.sds.service.OrderService orderList

查看结果耗时主要在于sql查询，查看查询条件，发现是根据orderNo和goodId

检查索引，发现表结构中并没有使用索引，尝试通过添加索引看是否能够解决

添加索引

alter table `kl_order` add key `order_no_good_id_key`(order_no, good_id);

通过执行计划查看是否命中索引

explain select * from kl_order where kl_order.order_no='KL1406198246645473281' and good_id=1;

再次压测查看问题是否解决

通过arthas，查看链路耗时

最终确认本次性能问题解决，原因在于数据库未添加索引

全链路压测系列-locust使用和性能分析实战

1、性能测试简介

2、如何使用locust

3、 locust图表介绍

4、locust使用剖析

5、如果使用locust进行全链路压测呢