以用户为中心的性能指标

272 阅读10分钟

以用户为中心的性能指标

原文:web.dev/user-centri…

We've all heard how important performance is. But when we talk about performance—and about making websites "fast"—what specifically do we mean?

我们都知道性能有多重要。但是当我们在讨论性能时,讨论让网站快一些,我们在说什么?

The truth is performance is relative:

  • A site might be fast for one user (on a fast network with a powerful device) but slow for another user (on a slow network with a low-end device).
  • Two sites may finish loading in the exact same amount of time, yet one may seem to load faster (if it loads content progressively rather than waiting until the end to display anything).
  • A site might appear to load quickly but then respond slowly (or not at all) to user interaction.

事实上性能是相对的:

  • 同一个网站,用一个网速快的设备上访问可能是快的,但是用一个慢网速的设备来访问就是慢的。
  • 两个网站可能花费同样多的时间去加载完成,但是其中一个可能看上去要快些,因为可能是渐进式的显示内容而不是等所有所有内容都加载完毕了才显示。
  • 一个网站看上去加载的很快,但是对于用户的交互却相应的很慢甚至不相应。

So when talking about performance, it's important to be precise and to refer to performance in terms of objective criteria that can be quantitatively measured. These criteria are known as metrics.

所以当我们在讨论性能性时候,准确并且使用可以量化性能的客观标准就很重要。这些标准就被称为指标。

But just because a metric is based on objective criteria and can be quantitatively measured, it doesn't necessarily mean those measurements are useful.

但是,仅仅因为一个度量标准是基于客观标准并且可以量化的,并不一定意味着这些度量是有用的。

Defining metrics

定义指标

Historically, web performance has been measured with the load event. However, even though load is a well-defined moment in a page's lifecycle, that moment doesn't necessarily correspond with anything the user cares about.

过去,我们使用load事件来测量页面性能。虽然使load事件是页面声明周期的一个明确的时刻,但是用户也不一定关心该时刻。

For example, a server could respond with a minimal page that "loads" immediately but then defers fetching content and displaying anything on the page until several seconds after the load event fires. While such a page might technically have a fast load time, that time would not correspond to how a user actually experiences the page loading.

例如,服务器可以响应页面最少需要东西,load事件立刻发生了,然后再去获取数据内容并展示在接下来几秒。从技术上来说这个页面加载的很快,但是这并等于用户对于页面加载的实际体验。

Over the past few years, members of the Chrome team—in collaboration with the W3C Web Performance Working Group—have been working to standardize a set of new APIs and metrics that more accurately measure how users experience the performance of a web page.

过去几年,谷歌团队和W3C Web性能小组一起制定了一系列能更加准确测量用户对页面性能体验的接口和指标。

To help ensure the metrics are relevant to users, we frame them around a few key questions:

  • Is it happening?Did the navigation start successfully? Has the server responded?

  • Is it useful? -- Has enough content rendered that users can engage with it?

  • Is it usable? -- Can users interact with the page, or is it busy?

  • Is it delightful? Are the interactions smooth and natural, free of lag and jank?

为了确保这些指标是和用户相关的,我们针对它们设计了几个关键问题:

  • 它发生了吗?-- 导航成功开始了吗?服务器已经响应了吗?
  • 它是有效的吗?-- 是否已经渲染了足够的内容让用户使用?
  • 它是可用的吗?-- 用户可以和页面产生交互吗?页面还在忙吗?
  • 它是否令人愉快?-- 交互是否顺畅自然,没有卡顿?

How metrics are measured

如何测量指标

Performance metrics are generally measured in one of two ways:

  • In the lab: using tools to simulate a page load in a consistent, controlled environment
  • In the field: on real users actually loading and interacting with the page

Neither of these options is necessarily better or worse than the other—in fact you generally want to use both to ensure good performance.

通常有两种方法来测量这些性能指标:

  • 实验场景:使用工具模拟在一致受控的环境加载页面。
  • 真实场境:依赖用户实际加载和操作页面。

两种方法没有好坏之分,事实上通常需要使用两种方法来中确保好的性能。

In the lab

Testing performance in the lab is essential when developing new features. Before features are released in production, it's impossible to measure their performance characteristics on real users, so testing them in the lab before the feature is released is the best way to prevent performance regressions.

实验场景

当开发新的特性时,使用模拟测试性能是有必要的。在新功能发布到线上之前,不能依赖真实用户去测量性能情况,因此在实验环境模拟测试是预防性能下级最好的方法。

In the field

On the other hand, while testing in the lab is a reasonable proxy for performance, it isn't necessarily reflective of how all users experience your site in the wild.

The performance of a site can vary dramatically based on a user's device capabilities and their network conditions. It can also vary based on whether (or how) a user is interacting with the page.

Moreover, page loads may not be deterministic. For example, sites that load personalized content or ads may experience vastly different performance characteristics from user to user. A lab test will not capture those differences.

The only way to truly know how your site performs for your users is to actually measure its performance as those users are loading and interacting with it. This type of measurement is commonly referred to as Real User Monitoring—or RUM for short.

真实场景

另一方面,虽然模拟测试一个合理的代替,但是这不一定反应了在真实场景下用户对你的网站的性能体验。

不用的用户设备和网络条件以及用户是否有操作和如何操作,一个网站的性能也会表现不同。

而且页面加载也不一定是确定的。例如,网站加载个性化的内容或者广告在用户之间可能表现不同的性能情况,而模拟测试没有这些不同。

想要真正地了解网站性能的唯一方法就是在真实用户加载和操作网站的时候去测量。这种测试方法也通常被称为用户监控--简称RUM。

Types of metrics

指标类型

There are several other types of metrics that are relevant to how users perceive performance.

  • Perceived load speed: how quickly a page can load and render all of its visual elements to the screen.
  • Load responsiveness: how quickly a page can load and execute any required JavaScript code in order for components to respond quickly to user interaction
  • Runtime responsiveness: after page load, how quickly can the page respond to user interaction.
  • Visual stability: do elements on the page shift in ways that users don't expect and potentially interfere with their interactions?
  • Smoothness: do transitions and animations render at a consistent frame rate and flow fluidly from one state to the next?

Given all the above types of performance metrics, it's hopefully clear that no single metric is sufficient to capture all the performance characteristics of a page.

还有其他几个和用户对性能察觉有关的指标类型。

  • 加载速度:一个页面加载和渲染所有可见元素到屏幕的速度。
  • 响应速度:页面加载和执行任何必需的JavaScript代码以使组件快速响应用户交互的速度。
  • 运行响应:页面加载后,页面对用户交互的响应速度有多快。
  • 视觉稳定:页面上的元素是否以用户不期望的方式移动并可能干扰他们的交互?
  • 平滑度:过渡和动画是否以一致的帧速率渲染并从一种状态流畅地流动到另一种状态?

鉴于以上所有类型的性能指标,希望可以清楚地发现没有一个单独的指标足以捕获页面的所有性能特征。

Important metrics to measure

测量重要指标

  • First contentful paint (FCP): measures the time from when the page starts loading to when any part of the page's content is rendered on the screen. (lab, field)
  • Largest contentful paint (LCP): measures the time from when the page starts loading to when the largest text block or image element is rendered on the screen. (lab, field)
  • First input delay (FID): measures the time from when a user first interacts with your site (i.e. when they click a link, tap a button, or use a custom, JavaScript-powered control) to the time when the browser is actually able to respond to that interaction. (field)
  • Time to Interactive (TTI): measures the time from when the page starts loading to when it's visually rendered, its initial scripts (if any) have loaded, and it's capable of reliably responding to user input quickly. (lab)
  • Total blocking time (TBT): measures the total amount of time between FCP and TTI where the main thread was blocked for long enough to prevent input responsiveness. (lab)
  • Cumulative layout shift (CLS): measures the cumulative score of all unexpected layout shifts that occur between when the page starts loading and when its lifecycle state changes to hidden. (lab, field)

While this list includes metrics measuring many of the various aspects of performance relevant to users, it does not include everything (e.g. runtime responsiveness and smoothness are not currently covered).

In some cases, new metrics will be introduced to cover missing areas, but in other cases the best metrics are ones specifically tailored to your site.

在某些情况下,会引入新的指标来覆盖缺失的场景,但在其他情况下,最好的指标是专门针对您的站点量身定制的指标。

Custom metrics

自定义指标

The performance metrics listed above are good for getting a general understanding of the performance characteristics of most sites on the web. They're also good for having a common set of metrics for sites to compare their performance against their competitors.

However, there may be times when a specific site is unique in some way that requires additional metrics to capture the full performance picture. For example, the LCP metric is intended to measure when a page's main content has finished loading, but there could be cases where the largest element is not part of the page's main content and thus LCP may not be relevant.

To address such cases, the Web Performance Working Group has also standardized lower-level APIs that can be useful for implementing your own custom metrics:

Refer to the guide on Custom Metrics to learn how to use these APIs to measure performance characteristics specific to your site.

上面列举的性能指标有利于了解大部分网站的性能特征。同时也有利于为网站提供一套通用的指标来比较竞争对手网站性能。

但是,有时在一些特定站点在,需要额外的指标才能捕获完整的性能情况。 例如,LCP指标旨在测量页面主要内容何时完成加载,但是在某些情况下,最大元素不是页面主要内容的一部分,因此LCP可能不相关。

为了解决这种情况,Web性能小组还提供了一些底层API,这些API对于实现自定义指标很有用:

请参阅Custom Metrics指南,了解如何使用这些API来衡量特定于您站点的性能特征。