计算机性能测试总结

177 阅读3分钟

这篇文章是我准备期末考试的笔记。主要关于计算机性能测试的一些理论基础。读完这篇文章,你会对计算机有更加深入的理解。

首先抛出两个问题

  1. 为什么我们需要评估性能

    因为我们想花最少的钱得到更好的性能

  2. 我们为什么要表征性能(characterize the performance)

    因为我们需要知道多少资源才能实现功能。比如多大的带宽才能流畅看直播?

术语

  • 系统(System): 任何软件,硬件等的集合
  • 指标(Metric):用于评价系统性能的指标
  • workload: 用户如何使用系统。比如用户编译程序是workload, 编译时间是Metric。

重要的指标(Metric)

  1. Latency
  2. Throughput(吞入量)
  3. Utilization(资源利用率)
  4. Reliablity(出错的概率)
  5. Availability(Meant time to failure)
  6. Cost/Performance ratio(代价/性能)

系统的容量(Capacity)

所谓容量我们可以看做是一个指标。比如随着工作量的增大,吞吐量也在变化。吞吐量的变化我们可以看做系统容量的变化。

  • Usable capacity
    • 容量不在上升
  • Knee capacity
    • 容量曲线斜率变缓
  • Nominal capacity
    • 理论的容量

Workload 等级

  1. Full Capacity(最好)
  2. Beyond its Capacity(最坏)
  3. At the load level from real world.

习题

What metric and workload would you choose to compare:

  • Two computers with different CPUs: x86 and ARM
  • Two versions of the same operating systems: Ubuntu 16.04 vs Ubuntu 22.04
  • Two hardware components: Two hard drives
  • Two languages: C vs. Python
#MetricWorkload
aSupported compilers, Execution time.Run different kind of programs.
bResponse time to open some APPs; Network performanceOpen some same APPS
cR/W speedWrite a program that generates representative I/O requests.
dSize of code, execution time.Test a representative set of programs in C and Python. Time to write a program with same functionnality.

评估方法

  1. Measurement(真机评估)

    优点:数据很真实

    缺点:代价高。用于对将要设计的新系统评估太困难

  2. Evaluation(建模)

    优点:相对容易实现

    缺点:对于大型系统,可能需要很长的时间才能出靠谱的结果。

  3. Numerical analysis(数学计算)

    优点:简单(只要纸和笔)

    缺点: 很难能考虑所有可能的因素

永远不要相信这三种方法中单一的结果,除非这个方法被其它两个之一验证了。

习题

What methodology would you choose:

  • To select a personal computer for yourself?
  • To select 1000 servers to host a service of your company?

答案

  1. To compare two personal computers, using measurements seems like a good approach. We would compare the performance of each computer under representative workloads for the user. If the use case of the user is programming, we can look at the execution time and compile time of some programs, etc.

  2. Making measurements on 1000 servers would be difficult. If the servers are not available to us, it is even impossible. Modeling and simulation is a better approach. Analytical modeling is cheaper and faster. Simulation can take longer to run.

评估性能的步骤

  1. State goals and define the system
  2. List services and outcomes
  3. select metrics
  4. list parameters
  5. Select factors to study
  6. Select Evaluation Technique
  7. Select workload
  8. Design Experiments
  9. Analyze and Interpret data
  10. Present Results.

习题

Choose a system for performance study. Briefly describe the system and list:

  • Services and outcomes

  • Performance metrics

  • System parameters

  • Workload parameters

  • Factors and their ranges

  • Evaluation technique

  • Workload generation

Justify your choices.

Answer:

Of course, this question has multiple correct answers. We choose a web server that sends the requested pages for performance study.

  • Service: 回复客户端请求. Outcomes: correct response, wrong response(servers badly synced) and no response(server is overloaded)
  • Performance metrics: response time, number of request served per second, error rate, availability.
  • All the parameters that can influence the performance but are fixed, CPU, OS version, software version used, etc.
  • Workload parameters are the parameters that we can vary during the experiments. Network quality, CPU cores used, request size, number of users. We cannot try everything.
  • Factors are the workload parameters that we will actually vary.
  • Measurements seems like a good technique if the system is available, otherwise, simulation and modeling can be used. Using two different approaches is always good.
  • Create a client/server to test the measurements or using one if available.

3R

  1. Repeatability: 同一个团队在同样的环境下使用相同的软件得到同样的结果

  2. Reproducibility:不同的团队在不同的环境下使用相同的软件得到相同的结果

  3. Replicability:从某种精度来说,不同的团队不适用同样的软件也能得到符合精度的结果。