Prometheus Pushgateway 推送监控指标

prometheus架构

prometheus架构.png

我们知道 Prometheus 采用的 pull 模式，现在简单谈下pushgateway模式下应用场景：

场景1：当采集端和promtheus不在一个子网或者防火墙时，Prometheus 无法直接拉取监控指标数据，这个时候我们可能就需要一种能够主动 push 的模式了

场景2：对于短周期的任务，例如：定时任务采集类型。这时候就利用到pushgateway push推的模式

缺点：

将多个节点数据汇总到 pushgateway, 如果 pushgateway 挂了，受影响范围更大
Prometheus 拉取状态 up 只针对 pushgateway, 无法做到对每个目标有效
由于 Pushgateway 可以持久化推送给它的所有监控数据，所以即使你的监控已经下线，Prometheus 还会拉取到旧的监控数据，需要手动清理 Pushgateway 不要的数据

Pushgateway安装

二进制安装：

<https://github.com/prometheus/pushgateway/releases/download/v1.6.0/pushgateway-1.6.0.linux-amd64.tar.gz>

tar -xvf pushgateway-1.6.0.linux-amd64.tar.gz -C /opt/app
[root@prometheus app]# cd pushgateway/
[root@prometheus pushgateway]# ls
data  disk_usage_metris.sh  LICENSE  NOTICE  pushgateway

启动服务文件：

vim /usr/lib/systemd/system/pushgateway.service
[Unit]
Description=Pushgetway
After=network.target
[Service]
Type=simple
ExecStart=/opt/app/pushgateway/pushgateway --persistence.file="/opt/app/pushgateway/data"
Restart=on-failure
[Install]
WantedBy=multi-user.target

直接执行 Pushgateway 二进制文件即可启动了，要更改监听的地址，可以通过 --web.listen-address 标志（例如0.0.0.0:9091或:9091）指定。默认情况下 Pushgateway 不保留指标。但是 --persistence.file 标志允许我们指定一个文件，将推送的指标保存在其中，这样当 Pushgateway 重新启动后指标仍然存在

docker部署：

docker run -d -p 9091:9091 prom/pushgateway

kubernetes部署：

Pushgateway 部署在 Kubernetes 集群中，对应的资源清单文件如下所示：

# pushgateway.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pushgateway-data
  namespace: kube-mon
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: local-path
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pushgateway
  namespace: kube-mon
  labels:
    app: pushgateway
spec:
  selector:
    matchLabels:
      app: pushgateway
  template:
    metadata:
      labels:
        app: pushgateway
    spec:
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: pushgateway-data
      containers:
        - name: pushgateway
          image: prom/pushgateway:v1.4.3
          imagePullPolicy: IfNotPresent
          args:
            - "--persistence.file=/data"
          ports:
            - containerPort: 9091
              name: http
          volumeMounts:
            - mountPath: "/data"
              name: data
          resources:
            requests:
              cpu: 100m
              memory: 500Mi
            limits:
              cpu: 100m
              memory: 500Mi
---
apiVersion: v1
kind: Service
metadata:
  name: pushgateway
  namespace: kube-mon
  labels:
    app: pushgateway
spec:
  selector:
    app: pushgateway
  type: NodePort
  ports:
    - name: http
      port: 9091
      targetPort: http

访问pushgateway主页：

pushgateway主页.png

基本使用

Pushgateway 的数据推送支持两种方式，Prometheus Client SDK 推送和 API 推送。

Client SDK 推送

Prometheus 本身提供了支持多种语言的 SDK，可通过 SDK 的方式，生成相关的数据，并推送到 Pushgateway，当然这种方式需要客户端代码支持，这也是官方推荐的方案。目前的 SDK 覆盖语言有官方的：

Go
Java or Scala
Python
Ruby

采用Python实现推送(将Postgresql查询的结果转换promtheus指标)

#!/usr/bin/env python3

import psycopg2
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
import schedule
import time


# PostgreSQL数据库连接信息
db_host = "192.168.10.30"
db_name = "postgres"
db_user = "postgres"
db_password = "123456"


# 健康检查函数
def check_health():
    try:
        conn = psycopg2.connect(
            host=db_host,
            database=db_name,
            user=db_user,
            password=db_password
        )
        cur = conn.cursor()
        cur.execute("SELECT 1")
        result = cur.fetchone()
        if result[0] == 1:
            return True
        else:
            return False
    except:
        return False


def job():
    # 检查数据库健康状态
    if check_health():
        # 执行查询
        conn = psycopg2.connect(
            host=db_host,
            database=db_name,
            user=db_user,
            password=db_password
        )
        cur = conn.cursor()
        sql = "SELECT id, name, age FROM person_info"
        cur.execute(sql)

        # 创建Prometheus指标
        registry = CollectorRegistry()
        gauge = Gauge('person_info', 'Custom query metrics', ['instance', 'name', 'age'], registry=registry)


        # 将查询结果转换为指标值
        for row in cur.fetchall():
            gauge.labels(instance="192.168.10.30", age=row[2], name=row[1]).set(row[0])

        # 推送指标到Pushgateway
        push_to_gateway('192.168.10.30:9091', job='pushgateway_name', registry=registry)
        # 关闭连接
        cur.close()
        conn.close()

        print("Execution Push succeeded")
    else:
        print("Database health check failed.")


schedule.every().day.at("15:30").do(job)
schedule.every().day.at("15:35").do(job)
schedule.every().day.at("15:40").do(job)
schedule.every().day.at("15:55").do(job)


while True:
    schedule.run_pending()
    time.sleep(30)

采用Shell脚本API推送

[root@prometheus pushgateway]# cat disk_usage_metris.sh
#!/bin/bash

hostname="192.168.10.30"

metrics=""
for line in `df |awk 'NR>1{print $NF "=" int($(NF-1))}'`
do
  disk_name=`echo $line|awk -F'=' '{print $1}'`
  disk_usage=`echo $line|awk -F'=' '{print $2}'`
  metrics="$metrics\ndisk_usage{instance=\"$hostname\",job=\"disk\",disk_name=\"$disk_name\"} $disk_usage"
done

echo -e "# HELP http_request_duration_seconds A histogram of the request duration.\n# TYPE http_request_duration_seconds histogram\n$metrics" | curl --data-binary @- http://192.168.10.30:9091/metrics/job/pushgateway_name/instance/$hostname

prometheus配置

  - job_name: 'pushgateway_name'
    scrape_interval: 30s
    honor_labels: true
    static_configs:
        - targets:
          - 192.168.10.30:9091
          labels:
            instance: pushgateway_instance

验证功能：

分别执行上述shell脚本，python脚本

验证.png

验证prometheus.png