Prometheus Pushgateway 推送监控指标

776 阅读3分钟

Prometheus Pushgateway 推送监控指标

prometheus架构

prometheus架构.png

我们知道 Prometheus 采用的 pull 模式,现在简单谈下pushgateway模式下应用场景:

​ 场景1:当采集端和promtheus不在一个子网或者防火墙时,Prometheus 无法直接拉取监控指标数据,这个时候我们可能就需要一种能够主动 push 的模式了

​ 场景2:对于短周期的任务,例如:定时任务采集类型。这时候就利用到pushgateway push推的模式

缺点:

  • 将多个节点数据汇总到 pushgateway, 如果 pushgateway 挂了,受影响范围更大
  • Prometheus 拉取状态 up 只针对 pushgateway, 无法做到对每个目标有效
  • 由于 Pushgateway 可以持久化推送给它的所有监控数据,所以即使你的监控已经下线,Prometheus 还会拉取到旧的监控数据,需要手动清理 Pushgateway 不要的数据

Pushgateway安装

二进制安装:

<https://github.com/prometheus/pushgateway/releases/download/v1.6.0/pushgateway-1.6.0.linux-amd64.tar.gz>

tar -xvf pushgateway-1.6.0.linux-amd64.tar.gz -C /opt/app
[root@prometheus app]# cd pushgateway/
[root@prometheus pushgateway]# ls
data  disk_usage_metris.sh  LICENSE  NOTICE  pushgateway

启动服务文件:

vim /usr/lib/systemd/system/pushgateway.service
[Unit]
Description=Pushgetway
After=network.target
[Service]
Type=simple
ExecStart=/opt/app/pushgateway/pushgateway --persistence.file="/opt/app/pushgateway/data"
Restart=on-failure
[Install]
WantedBy=multi-user.target

直接执行 Pushgateway 二进制文件即可启动了,要更改监听的地址,可以通过 --web.listen-address 标志(例如0.0.0.0:9091:9091)指定。默认情况下 Pushgateway 不保留指标。但是 --persistence.file 标志允许我们指定一个文件,将推送的指标保存在其中,这样当 Pushgateway 重新启动后指标仍然存在

docker部署:

docker run -d -p 9091:9091 prom/pushgateway

kubernetes部署:

Pushgateway 部署在 Kubernetes 集群中,对应的资源清单文件如下所示:

# pushgateway.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pushgateway-data
  namespace: kube-mon
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 2Gi
  storageClassName: local-path
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: pushgateway
  namespace: kube-mon
  labels:
    app: pushgateway
spec:
  selector:
    matchLabels:
      app: pushgateway
  template:
    metadata:
      labels:
        app: pushgateway
    spec:
      volumes:
        - name: data
          persistentVolumeClaim:
            claimName: pushgateway-data
      containers:
        - name: pushgateway
          image: prom/pushgateway:v1.4.3
          imagePullPolicy: IfNotPresent
          args:
            - "--persistence.file=/data"
          ports:
            - containerPort: 9091
              name: http
          volumeMounts:
            - mountPath: "/data"
              name: data
          resources:
            requests:
              cpu: 100m
              memory: 500Mi
            limits:
              cpu: 100m
              memory: 500Mi
---
apiVersion: v1
kind: Service
metadata:
  name: pushgateway
  namespace: kube-mon
  labels:
    app: pushgateway
spec:
  selector:
    app: pushgateway
  type: NodePort
  ports:
    - name: http
      port: 9091
      targetPort: http

访问pushgateway主页:

pushgateway主页.png

基本使用

Pushgateway 的数据推送支持两种方式,Prometheus Client SDK 推送和 API 推送。

Client SDK 推送

Prometheus 本身提供了支持多种语言的 SDK,可通过 SDK 的方式,生成相关的数据,并推送到 Pushgateway,当然这种方式需要客户端代码支持,这也是官方推荐的方案。目前的 SDK 覆盖语言有官方的:

  • Go
  • Java or Scala
  • Python
  • Ruby

采用Python实现推送(将Postgresql查询的结果转换promtheus指标)

#!/usr/bin/env python3

import psycopg2
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
import schedule
import time


# PostgreSQL数据库连接信息
db_host = "192.168.10.30"
db_name = "postgres"
db_user = "postgres"
db_password = "123456"


# 健康检查函数
def check_health():
    try:
        conn = psycopg2.connect(
            host=db_host,
            database=db_name,
            user=db_user,
            password=db_password
        )
        cur = conn.cursor()
        cur.execute("SELECT 1")
        result = cur.fetchone()
        if result[0] == 1:
            return True
        else:
            return False
    except:
        return False


def job():
    # 检查数据库健康状态
    if check_health():
        # 执行查询
        conn = psycopg2.connect(
            host=db_host,
            database=db_name,
            user=db_user,
            password=db_password
        )
        cur = conn.cursor()
        sql = "SELECT id, name, age FROM person_info"
        cur.execute(sql)

        # 创建Prometheus指标
        registry = CollectorRegistry()
        gauge = Gauge('person_info', 'Custom query metrics', ['instance', 'name', 'age'], registry=registry)


        # 将查询结果转换为指标值
        for row in cur.fetchall():
            gauge.labels(instance="192.168.10.30", age=row[2], name=row[1]).set(row[0])

        # 推送指标到Pushgateway
        push_to_gateway('192.168.10.30:9091', job='pushgateway_name', registry=registry)
        # 关闭连接
        cur.close()
        conn.close()

        print("Execution Push succeeded")
    else:
        print("Database health check failed.")


schedule.every().day.at("15:30").do(job)
schedule.every().day.at("15:35").do(job)
schedule.every().day.at("15:40").do(job)
schedule.every().day.at("15:55").do(job)


while True:
    schedule.run_pending()
    time.sleep(30)

采用Shell脚本API推送

[root@prometheus pushgateway]# cat disk_usage_metris.sh
#!/bin/bash

hostname="192.168.10.30"

metrics=""
for line in `df |awk 'NR>1{print $NF "=" int($(NF-1))}'`
do
  disk_name=`echo $line|awk -F'=' '{print $1}'`
  disk_usage=`echo $line|awk -F'=' '{print $2}'`
  metrics="$metrics\ndisk_usage{instance=\"$hostname\",job=\"disk\",disk_name=\"$disk_name\"} $disk_usage"
done

echo -e "# HELP http_request_duration_seconds A histogram of the request duration.\n# TYPE http_request_duration_seconds histogram\n$metrics" | curl --data-binary @- http://192.168.10.30:9091/metrics/job/pushgateway_name/instance/$hostname

prometheus配置

  - job_name: 'pushgateway_name'
    scrape_interval: 30s
    honor_labels: true
    static_configs:
        - targets:
          - 192.168.10.30:9091
          labels:
            instance: pushgateway_instance

验证功能:

分别执行上述shell脚本,python脚本

验证.png

验证prometheus.png