Prometheus Pushgateway 推送监控指标
prometheus架构
我们知道 Prometheus 采用的 pull 模式,现在简单谈下pushgateway模式下应用场景:
场景1:当采集端和promtheus不在一个子网或者防火墙时,Prometheus 无法直接拉取监控指标数据,这个时候我们可能就需要一种能够主动 push 的模式了
场景2:对于短周期的任务,例如:定时任务采集类型。这时候就利用到pushgateway push推的模式
缺点:
- 将多个节点数据汇总到 pushgateway, 如果 pushgateway 挂了,受影响范围更大
- Prometheus 拉取状态 up 只针对 pushgateway, 无法做到对每个目标有效
- 由于 Pushgateway 可以持久化推送给它的所有监控数据,所以即使你的监控已经下线,Prometheus 还会拉取到旧的监控数据,需要手动清理 Pushgateway 不要的数据
Pushgateway安装
二进制安装:
<https://github.com/prometheus/pushgateway/releases/download/v1.6.0/pushgateway-1.6.0.linux-amd64.tar.gz>
tar -xvf pushgateway-1.6.0.linux-amd64.tar.gz -C /opt/app
[root@prometheus app]# cd pushgateway/
[root@prometheus pushgateway]# ls
data disk_usage_metris.sh LICENSE NOTICE pushgateway
启动服务文件:
vim /usr/lib/systemd/system/pushgateway.service
[Unit]
Description=Pushgetway
After=network.target
[Service]
Type=simple
ExecStart=/opt/app/pushgateway/pushgateway --persistence.file="/opt/app/pushgateway/data"
Restart=on-failure
[Install]
WantedBy=multi-user.target
直接执行 Pushgateway 二进制文件即可启动了,要更改监听的地址,可以通过 --web.listen-address 标志(例如0.0.0.0:9091或:9091)指定。默认情况下 Pushgateway 不保留指标。但是 --persistence.file 标志允许我们指定一个文件,将推送的指标保存在其中,这样当 Pushgateway 重新启动后指标仍然存在
docker部署:
docker run -d -p 9091:9091 prom/pushgateway
kubernetes部署:
Pushgateway 部署在 Kubernetes 集群中,对应的资源清单文件如下所示:
# pushgateway.yaml
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pushgateway-data
namespace: kube-mon
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: local-path
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: pushgateway
namespace: kube-mon
labels:
app: pushgateway
spec:
selector:
matchLabels:
app: pushgateway
template:
metadata:
labels:
app: pushgateway
spec:
volumes:
- name: data
persistentVolumeClaim:
claimName: pushgateway-data
containers:
- name: pushgateway
image: prom/pushgateway:v1.4.3
imagePullPolicy: IfNotPresent
args:
- "--persistence.file=/data"
ports:
- containerPort: 9091
name: http
volumeMounts:
- mountPath: "/data"
name: data
resources:
requests:
cpu: 100m
memory: 500Mi
limits:
cpu: 100m
memory: 500Mi
---
apiVersion: v1
kind: Service
metadata:
name: pushgateway
namespace: kube-mon
labels:
app: pushgateway
spec:
selector:
app: pushgateway
type: NodePort
ports:
- name: http
port: 9091
targetPort: http
访问pushgateway主页:
基本使用
Pushgateway 的数据推送支持两种方式,Prometheus Client SDK 推送和 API 推送。
Client SDK 推送
Prometheus 本身提供了支持多种语言的 SDK,可通过 SDK 的方式,生成相关的数据,并推送到 Pushgateway,当然这种方式需要客户端代码支持,这也是官方推荐的方案。目前的 SDK 覆盖语言有官方的:
- Go
- Java or Scala
- Python
- Ruby
采用Python实现推送(将Postgresql查询的结果转换promtheus指标)
#!/usr/bin/env python3
import psycopg2
from prometheus_client import CollectorRegistry, Gauge, push_to_gateway
import schedule
import time
# PostgreSQL数据库连接信息
db_host = "192.168.10.30"
db_name = "postgres"
db_user = "postgres"
db_password = "123456"
# 健康检查函数
def check_health():
try:
conn = psycopg2.connect(
host=db_host,
database=db_name,
user=db_user,
password=db_password
)
cur = conn.cursor()
cur.execute("SELECT 1")
result = cur.fetchone()
if result[0] == 1:
return True
else:
return False
except:
return False
def job():
# 检查数据库健康状态
if check_health():
# 执行查询
conn = psycopg2.connect(
host=db_host,
database=db_name,
user=db_user,
password=db_password
)
cur = conn.cursor()
sql = "SELECT id, name, age FROM person_info"
cur.execute(sql)
# 创建Prometheus指标
registry = CollectorRegistry()
gauge = Gauge('person_info', 'Custom query metrics', ['instance', 'name', 'age'], registry=registry)
# 将查询结果转换为指标值
for row in cur.fetchall():
gauge.labels(instance="192.168.10.30", age=row[2], name=row[1]).set(row[0])
# 推送指标到Pushgateway
push_to_gateway('192.168.10.30:9091', job='pushgateway_name', registry=registry)
# 关闭连接
cur.close()
conn.close()
print("Execution Push succeeded")
else:
print("Database health check failed.")
schedule.every().day.at("15:30").do(job)
schedule.every().day.at("15:35").do(job)
schedule.every().day.at("15:40").do(job)
schedule.every().day.at("15:55").do(job)
while True:
schedule.run_pending()
time.sleep(30)
采用Shell脚本API推送
[root@prometheus pushgateway]# cat disk_usage_metris.sh
#!/bin/bash
hostname="192.168.10.30"
metrics=""
for line in `df |awk 'NR>1{print $NF "=" int($(NF-1))}'`
do
disk_name=`echo $line|awk -F'=' '{print $1}'`
disk_usage=`echo $line|awk -F'=' '{print $2}'`
metrics="$metrics\ndisk_usage{instance=\"$hostname\",job=\"disk\",disk_name=\"$disk_name\"} $disk_usage"
done
echo -e "# HELP http_request_duration_seconds A histogram of the request duration.\n# TYPE http_request_duration_seconds histogram\n$metrics" | curl --data-binary @- http://192.168.10.30:9091/metrics/job/pushgateway_name/instance/$hostname
prometheus配置
- job_name: 'pushgateway_name'
scrape_interval: 30s
honor_labels: true
static_configs:
- targets:
- 192.168.10.30:9091
labels:
instance: pushgateway_instance
验证功能:
分别执行上述shell脚本,python脚本