项目需要更细粒度的监控Flink集群的状态,本文将介绍整个metrics监控方案以及具体的部署配置细节。
Flink 内置的 各项指标,连同自己定义的业务指标,统一通过 Prometheus Pushgateway 的方式,推送到自建或者腾讯云 Prometheus 服务端,随后即可对 Grafana 面板进行分组、聚合和数据展示。
部署pushgateway
deploy.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: pushgateway
labels:
app: pushgateway
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
replicas: 1
revisionHistoryLimit: 0
selector:
matchLabels:
app: pushgateway
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: "25%"
maxUnavailable: "25%"
template:
metadata:
name: pushgateway
labels:
app: pushgateway
spec:
containers:
- name: pushgateway
image: prom/pushgateway:v1.5.1
imagePullPolicy: IfNotPresent
livenessProbe:
initialDelaySeconds: 600
periodSeconds: 10
successThreshold: 1
failureThreshold: 10
httpGet:
path: /
port: 9091
ports:
- name: "app-port"
containerPort: 9091
resources:
limits:
memory: "1000Mi"
cpu: 1
requests:
memory: "1000Mi"
cpu: 1
service.yaml
apiVersion: v1
kind: Service
metadata:
name: pushgateway
labels:
app: pushgateway
spec:
selector:
app: pushgateway
ports:
- name: pushgateway
port: 9091
targetPort: 9091
配置Prometheus
Prometheus 使用的官方chart方式, 直接配置value文件
additionalScrapeConfigs:
# prometheus配置文件中配置pull pushgateway组件配置
- job_name: 'pushgateway'
scrape_interval: 60s
metrics_path: /metrics
static_configs:
- targets: ["pushgateway-dev.xxxxxx.com"]
grafana配置告警
问题记录
308 Permanent Redirect
echo "some_metric 3.14" | curl --data-binary @- xxxxxx.com/metrics/job…
nginx.ingress.kubernetes.io/ssl-redirect
默认为 true
,启用 TLS
时,http请求会 308
重定向到https
ingress配置为false后解决。