Prometheus：入门与注意事项Prometheus是什么? Prometheus是一个开源系统监控和警报工具包，

Prometheus是什么?

Prometheus是一个开源系统监控和警报工具包，最初由SoundCloud构建。自2012年成立以来，许多公司和组织都采用了Prometheus，并且该项目拥有非常活跃的开发人员和用户社区。普罗米修斯收集并存储其指标作为时间序列数据，即指标信息与记录时间戳一起存储，以及称为标签的可选键值对。

Promethus有以下特点：

支持多维数据模型：由度量名和键值对组成的时间序列数据
内置时间序列数据库TSDB
支持PromQL查询语言，可以完成非常复杂的查询和分析，对图表展示和告警非常有意义
支持HTTP的Pull方式采集时间序列数据
支持PushGateway采集瞬时任务的数据
支持服务发现和静态配置两种方式发现目标
支持接入Grafana

下图阐述了prometheus 架构和一些生态组件:

Prometheus 指标分类

Prometheus 将采集的数据分为 Counter、Gauge、Histogram、Summary 四种类型。这只是一种逻辑分类，Prometheus 内部并没有使用采集的数据的类型信息，而是将它们做为无类型的数据进行处理。

Counter

Counter 是计数器类型，适合单调递增的场景，比如请求的总数、完成的任务总数、出现的错误总数等。它拥有很好的不相关性，不会因为重启而重置为 0。

Gauge

Gauge 用来表示可增可减的值，比如 CPU 和内存的使用量、IO 大小等。

Histogram

Histogram 是一种累积直方图，它通常用来描述监控项的长尾效应。举个例子：假设使用 Hitogram 来分析 API 调用的响应时间，使用数组 [30ms, 100ms, 300ms, 1s, 3s, 5s, 10s] 将响应时间分为 8 个区间。那么每次采集到响应时间，比如 200ms，那么对应的区间 (0, 30ms], (30ms, 100ms], (100ms, 300ms] 的计数都会加 1。最终以响应时间为横坐标，每个区间的计数值为纵坐标，就能得到 API 调用响应时间的累积直方图。

Summary

Summary 和 Histogram 类似，它记录的是监控项的分位数。什么是分位数？举个例子：假设对于一个 http 请求调用了 100 次，得到 100 个响应时间值。将这 100 个时间响应值按照从小到大的顺序排列，那么 0.9 分位数（90% 位置）就代表着第 90 个数。通过 Histogram 可以近似的计算出百分位数，但是结果并不准确，而 Summary 是在客户端计算的，比 Histogram 更准确。不过，Summary 计算消耗的资源更多，并且计算的指标不能再获取平均数或者关联其他指标，所以它通常独立使用。

Prometheus UI 数据展示

展示数据格式

codelab_api_http_requests_in_progress{container="example-app", endpoint="web", instance="10.1.0.128:8080", job="example-app", namespace="default", pod="example-app-cddd79b89-dgcnr", service="example-app"}	0
codelab_api_http_requests_in_progress{container="example-app", endpoint="web", instance="10.1.0.129:8080", job="example-app", namespace="default", pod="example-app-cddd79b89-x99w5", service="example-app"}	2
codelab_api_http_requests_in_progress{container="example-app", endpoint="web", instance="10.1.0.130:8080", job="example-app", namespace="default", pod="example-app-cddd79b89-chkk6", service="example-app"}	2

展示时序数据

Prometheus Configuration

Prometheus是通过命令行标志和配置文件配置的。虽然命令行标志配置不可变的系统参数(例如存储位置、磁盘和内存中保留的数据量等)，但配置文件定义了与抓取作业及其实例相关的所有内容，以及要加载哪些规则文件。

在配置文件中我们可以指定：

global
alerting
rule_files
scrape_configs
remote_write
remote_read 等属性。

gloabl

global:
  # How frequently to scrape targets by default.
  [ scrape_interval: <duration> | default = 1m ]

  # How long until a scrape request times out.
  [ scrape_timeout: <duration> | default = 10s ]

  # How frequently to evaluate rules.
  [ evaluation_interval: <duration> | default = 1m ]

  # The labels to add to any time series or alerts when communicating with
  # external systems (federation, remote storage, Alertmanager).
  external_labels:
    [ <labelname>: <labelvalue> ... ]

alerting

通常我们可以使用运行参数 -alertmanager.xxx 来配置 Alertmanager，但是这样不够灵活，没有办法做到动态更新加载，以及动态定义告警属性。

所以 alerting 配置主要用来解决这个问题，它能够更好的管理 Alertmanager, 主要包含 2 个参数：

alert_relabel_configs: 动态修改 alert 属性的规则配置。
alertmanagers: 用于动态发现 Alertmanager 的配置。

其代码结构体定义为：

# Alerting specifies settings related to the Alertmanager.
alerting:
  alert_relabel_configs:
    [ - <relabel_config> ... ]
  alertmanagers:
    [ - <alertmanager_config> ... ]

rule_files

# Rule files specifies a list of globs. Rules and alerts are read from
# all matching files.
rule_files:
  [ - <filepath_glob> ... ]

主要用于配置 rules 文件，它支持多个文件以及文件目录。

配置文件结构大致为：

rule_files:
  - "rules/node.rules"
  - "rules2/*.rules"

Prometheus 主要支持两种类型的rules，它能够被配置和按照规则的间隔去评估：recording rules 和 alerting rules。Prometheus支持这些配置热加载生效。

alerting rule

# The name of the alert. Must be a valid label value.
alert: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and all resultant time series become
# pending/firing alerts.
expr: <string>

# Alerts are considered firing once they have been returned for this long.
# Alerts which have not yet fired for long enough are considered pending.
[ for: <duration> | default = 0s ]

# How long an alert will continue firing after the condition that triggered it
# has cleared.
[ keep_firing_for: <duration> | default = 0s ]

# Labels to add or overwrite for each alert.
labels:
  [ <labelname>: <tmpl_string> ]

# Annotations to add to each alert.
annotations:
  [ <labelname>: <tmpl_string> ]

例子：

groups:
- name: example
  rules:
  - alert: HighRequestLatency
    expr: job:request_latency_seconds:mean5m{job="myjob"} > 0.5
    for: 10m
    labels:
      severity: page
    annotations:
      summary: High request latency

recording rule

recording rule 允许预先计算经常需要或计算开销较大的表达式，并将其结果保存为一组新的时间序列。查询预先计算的结果通常比每次需要时执行原始表达式要快得多。这对于仪表板特别有用，因为每次刷新时都需要重复查询相同的表达式。这是一种以空间换取时间的方式。

# The name of the time series to output to. Must be a valid metric name.
record: <string>

# The PromQL expression to evaluate. Every evaluation cycle this is
# evaluated at the current time, and the result recorded as a new set of
# time series with the metric name as given by 'record'.
expr: <string>

# Labels to add or overwrite before storing the result.
labels:
  [ <labelname>: <labelvalue> ]

例子：假设我们有一个应用程序，它会在每个请求中记录该请求的处理时间。我们可以使用以下查询语句来获取该应用程序的平均请求处理时间：

avg(http_request_duration_seconds_sum{job="myapp"}) / avg(http_request_duration_seconds_count{job="myapp"})

上述查询语句可以计算出每个请求的平均处理时间。但是，如果我们需要经常使用该指标，每次都要手动执行一次查询是非常繁琐的。因此，我们可以使用 Prometheus recording rule 来创建一个新的指标，以便我们随时查看该指标的值。以下是如何使用 Prometheus recording rule：首先，我们需要在 Prometheus 的配置文件中添加以下 recording rule：

rule_files:
  - /path/to/rules.yml

然后，我们需要在 /path/to/rules.yml 文件中添加以下内容：

groups:
- name: myapp_rules
  rules:
  - record: myapp_request_duration_seconds_avg
    expr: avg(http_request_duration_seconds_sum{job="myapp"}) / avg(http_request_duration_seconds_count{job="myapp"})

在上面的配置中，我们定义了一个名为 myapp_rules 的记录规则组，并在其中定义了一个名为 myapp_request_duration_seconds_avg 的记录规则。该规则使用上述查询语句计算每个请求的平均处理时间，并将结果存储在新的指标 myapp_request_duration_seconds_avg 中。最后，我们需要重新加载 Prometheus 的配置文件以使配置生效。在 Prometheus 的 Web 界面上，我们可以通过以下查询语句来查看新的指标

myapp_request_duration_seconds_avg

scrape_configs

# A list of scrape configurations.
scrape_configs:
  [ - <scrape_config> ... ]

例子：

# The job name assigned to scraped metrics by default.
job_name: <job_name>

# How frequently to scrape targets from this job.
[ scrape_interval: <duration> | default = <global_config.scrape_interval> ]

# Per-scrape timeout when scraping this job.
[ scrape_timeout: <duration> | default = <global_config.scrape_timeout> ]

# The HTTP resource path on which to fetch metrics from targets.
[ metrics_path: <path> | default = /metrics ]

remote_write

remote write 配置可以将 Prometheus 数据写入到任何支持 Prometheus 格式的存储中，例如 Graphite、InfluxDB、OpenTSDB 等。这样做的好处是，将监控数据备份到多个地方，以提高数据的可靠性和安全性。Prometheus本地存储不具备高可用性，所以通过支持第三方存储来补足。

# Settings related to the experimental remote write feature.
remote_write:
  [ - <remote_write> ... ]

这样做的好处不限于：

可以将多个 Prometheus 服务器的数据合并到一个集中的存储中，以实现更全面的监控和分析。
可以将 Prometheus 与其他工具集成，例如 Grafana，从而实现更广泛的监控和分析。
可以扩展 Prometheus 的查询能力，以处理更大规模的监控数据。

remote_read

配置了remote_read ，prometheus 可以执行全局查询聚合与告警，无需关注数据存放在何处。

# Settings related to the experimental remote read feature.
remote_read:
  [ - <remote_read> ... ]

Prometheus Operator

Prometheus Operator 是一个 Kubernetes Operator，用于在 Kubernetes 集群中自动化部署、管理和运行 Prometheus 和相关组件，可以帮助用户更轻松、高效地部署和管理 Prometheus 监控系统，提高监控系统的可靠性和可维护性。它的作用包括：

简化 Prometheus 部署：Prometheus Operator 提供了一种简单的方式来部署和管理 Prometheus 和相关组件，从而减少了部署和管理的工作量。
保证高可用性：Prometheus Operator 可以自动创建和管理多个 Prometheus 实例，并使用 Kubernetes 的自动故障转移机制来保证监控系统的高可用性
自动化配置管理：Prometheus Operator 可以自动配置 Prometheus 和相关组件，例如 Alertmanager、Grafana 等，从而减少了手动配置的工作量。
支持自定义：Prometheus Operator 提供了丰富的自定义选项，可以根据需要进行配置，以满足不同场景的需求。

Prometheus Operater 定义了如下的四类自定义资源：

Prometheus
ServiceMonitor
Alertmanager
PrometheusRule

Prometheus Operator 架构图

Prometheus

apiVersion: monitoring.coreos.com/v1
kind: Prometheus
metadata:
  name: example
spec:
  serviceAccountName: prometheus
  replicas: 2
  alerting:
    alertmanagers:
    - namespace: default
      name: alertmanager-example
      port: web
  serviceMonitorSelector:
    matchLabels:
      team: frontend
  ruleSelector:
    matchLabels:
      role: alert-rules
      prometheus: example
  ruleNamespaceSelector:
    matchLabels:
      team: frontend

ServiceMonitor

apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: example-app
  labels:
    team: frontend
spec:
  selector:
    matchLabels:
      app: example-app
  endpoints:
  - port: web

AlertManager

apiVersion: monitoring.coreos.com/v1
kind: Alertmanager
metadata:
  name: example
  namespace: default
spec:
  replicas: 3
  alertmanagerConfiguration:
    name: config-example

PrometheusRule

apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  creationTimestamp: null
  labels:
    prometheus: example
    role: alert-rules
  name: prometheus-example-rules
spec:
  groups:
  - name: ./example.rules
    rules:
    - alert: ExampleAlert
      expr: vector(1)

Prometheus 查询优化

Prometheus client

避免高基数

避免使用高基数标签组合查询：如果需要对多个高基数标签进行组合查询，会导致索引过大，影响查询性能。应尽量避免使用高基数标签组合查询。

在 Prometheus 中，高基数标签（High Cardinality Labels）指的是在一个指标中拥有大量唯一值的标签。例如，在一个 HTTP 请求指标中，可能有一个叫做 url 的标签，它包含了大量唯一的 URL 地址。由于 Prometheus 在内存中维护时间序列数据，因此高基数标签会对内存和磁盘使用产生较大的影响，并可能导致性能问题。

避免无效查询

避免使用不必要的查询：在查询时，应尽量避免查询不必要的指标和标签，可以减少查询时间和资源消耗。

Recording Rules

使用 Prometheus 的预聚合规则：Prometheus 支持在采集时对数据进行预聚合，可以减少查询时的计算量和返回数据量。

假设我们有一个由多个服务器组成的集群，每个服务器上都运行着一个应用程序，并且我们想要监控每个应用程序的CPU使用率。在这种情况下，我们可以使用Prometheus的预聚合规则将每个服务器上的所有应用程序的CPU使用率聚合到单个时间序列中，以减少查询时的负载和提高性能。例如，我们可以使用以下规则:

groups:
- name: cpu_usage
  rules:
  - record: cluster_cpu_usage
    expr: sum by (server) (app_cpu_usage)

这个规则将所有服务器上的应用程序CPU使用率聚合到单个时间序列中，该时间序列的标签为“server”，其值为每个服务器上的所有应用程序CPU使用率之和。然后，我们可以使用该时间序列来查询整个集群的CPU使用率，而不必查询每个服务器上的每个应用程序。

避免频繁查询

避免频繁的查询：频繁的查询会消耗大量的资源，应尽量避免频繁的查询。可以通过设置告警规则等方式来减少不必要的查询。

避免正则过多匹配查询

避免使用正则表达式：正则表达式在查询时会消耗大量的 CPU 和内存资源，应尽量避免使用。

避免无效存储

避免使用不必要的标签：在数据采集时，应尽量避免采集不必要的标签，可以减少存储空间和查询时间。

最小查询范围

减少查询范围：尽可能缩小查询的时间范围和标签条件，可以大大减少查询所需的资源。

使用减少查询数量

使用聚合函数：Prometheus 支持多种聚合函数，如 sum、avg、max、min 等，可以在查询时使用这些函数来减少返回数据的数量。

使用标签过滤：Prometheus 数据模型中的标签可以用来过滤数据，这样可以减少需要查询的数据量。例如，可以使用 job="node" 过滤出所有 job 标签为 node 的指标。

Prometheus server

优化 Prometheus 的存储配置：可以通过调整 Prometheus 的存储配置来优化查询性能，如调整块大小、压缩算法等。
使用 Prometheus 的短期存储：Prometheus 支持将数据存储到短期存储中，如内存或本地磁盘，可以在查询时快速读取数据。
使用 Prometheus 的索引缓存：Prometheus 会对查询结果进行缓存，可以通过调整索引缓存的大小来提高查询性能。
使用 Prometheus 的 TSDB 存储格式：Prometheus 的 TSDB 存储格式采用了基于块的压缩算法，可以大大减少存储空间和读取时间。
使用 Prometheus 的远程存储：Prometheus 支持将数据存储到远程存储中，如 Amazon S3、Google Cloud Storage 等，可以将数据存储在更快的存储介质中，提高读取速度。
使用 Prometheus 的分布式查询：Prometheus 可以通过联邦查询实现分布式查询，可以将查询请求分发到多个 Prometheus 实例上进行并行处理，提高查询效率。