手把手带你玩转ArgoCD --- 部署pushgateway采集flink数据

183 阅读1分钟

项目需要更细粒度的监控Flink集群的状态,本文将介绍整个metrics监控方案以及具体的部署配置细节。

image.png

Flink 内置的 各项指标,连同自己定义的业务指标,统一通过 Prometheus Pushgateway 的方式,推送到自建或者腾讯云 Prometheus 服务端,随后即可对 Grafana 面板进行分组、聚合和数据展示。

部署pushgateway

deploy.yaml

apiVersion: apps/v1
kind: Deployment
metadata:
  name:  pushgateway
  labels:
    app:  pushgateway
  annotations:
    prometheus.io/scrape: "true"
    prometheus.io/port: "8080"
spec:
  replicas: 1
  revisionHistoryLimit: 0
  selector:
    matchLabels:
      app:  pushgateway
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: "25%"
      maxUnavailable: "25%"
  template:
    metadata:
      name:  pushgateway
      labels:
        app:  pushgateway
    spec:
      containers:
        - name:  pushgateway
          image: prom/pushgateway:v1.5.1
          imagePullPolicy: IfNotPresent
          livenessProbe:
            initialDelaySeconds: 600
            periodSeconds: 10
            successThreshold: 1
            failureThreshold: 10
            httpGet:
              path: /
              port: 9091
          ports:
            - name: "app-port"
              containerPort: 9091
          resources:
            limits:
              memory: "1000Mi"
              cpu: 1
            requests:
              memory: "1000Mi"
              cpu: 1

service.yaml


apiVersion: v1
kind: Service
metadata:
  name: pushgateway
  labels:
    app: pushgateway
spec:
  selector:
    app: pushgateway
  ports:
    - name: pushgateway
      port: 9091
      targetPort: 9091

配置Prometheus

Prometheus 使用的官方chart方式, 直接配置value文件

    additionalScrapeConfigs:
    # prometheus配置文件中配置pull pushgateway组件配置
    - job_name: 'pushgateway'
      scrape_interval: 60s
      metrics_path: /metrics
      static_configs:
      - targets: ["pushgateway-dev.xxxxxx.com"]

grafana配置告警

问题记录

308 Permanent Redirect

echo "some_metric 3.14" | curl --data-binary @- xxxxxx.com/metrics/job…

nginx.ingress.kubernetes.io/ssl-redirect 默认为 true,启用 TLS 时,http请求会 308 重定向到https

ingress配置为false后解决。

参考

image.png Flink Metrics指标采集方案

pushgateway on k8s 部署yaml

Prometheus 使用 PushGateway 进行数据上报采集

流计算Oceanus --- 接入 Prometheus 自定义监控