promethues详细部署包含alter|granfna)

153 阅读5分钟

1. 初识prometheus

1.1 prometheus简介

Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined theCloud Native Computing Foundationin 2016 as the second hosted project, afterKubernetes.

Prometheus是一个开源的监控系统+告警系统工具集,最早由SoudCloud开发,目前已被很多公司广泛使用,于2016年加入CNCF组织,成为继kubernetes之后第二个管理的项目。得益于kubernetes的火热,prometheus被越来越多的企业应用,已成为新一代的监控系统,成为CNCF第二个毕业的项目。

prometheus特点

  • 一个指标和键值对标识的时间序列化多维度数据模型
  • PromQL提供一个便捷查询语言实现多维度数据查询
  • 不依赖于分布式存储,单个节点能提供自治功能
  • 通过HTTP协议拉取时间系列数据模型
  • 支持通过gateway主动推送时间序列
  • 支持服务发现或者静态配置发现节点
  • 内置有多维度数据画图和集成grafana数据展示

1.2 prometheus架构

prometheus架构

prometheus架构

prometheus架构:

  • prometheus-server,prometheus主服务端,从exporters端采集和存储数据,并提供PromQL数据查询语言

    • Retrieval 采集模块,从exporters和pushgateway中采集数据,采集数据经过一定规则处理
    • TSDB 数据存储,TSDB是时序化数据库,将Retrieval采集数据存储,默认存储在本地
    • http server 提供http接口查询和数据展板,默认端口是9090,可以登陆查询监控指标和绘图
    • PromQL 提供边界的PromQL语言,用于数据统计,数据输出和数据展示接口集成
  • 数据采集,数据采集模块,包含两种数据采集方式:拉去pull和推送push

    • Jobs exporters 采集宿主机和container的性能指标,通过http方式拉取,支持多种不同数据类型采集
    • Short-lived jobs 瞬时在线任务,适用于实时监控指标,server端拉去时可能消失了,采用主动上报机制
    • Pushgateway 推动网关,Short-lived jobs将数据主动push到过gateway,server再从gateway拉取
  • 数据展示,借助于PromQL语言实现实现数据的展示,包含还有prometheus UI,Gafana和API clients

    • Prometheus Web UI,prometheus默认提一个数据查询和画图展示的UI,通过http 9090端口
    • Grafana,一个开源非常优秀绚烂的数据展示框架,从Prometheus中获取数据,采用模版绘图
    • API Clients,支持多种不同的客户端SDK语言,包括Go,python,Java等,便于编写开发监控系统
  • 告警系统,从server接受告警,推送给AlertManager告警系统,告警系统接受告警信息去重,分组。通知包含

    • pageduty
    • Email,邮件告警,结合smtp
    • 其他,如webhook等
  • 服务发现,借助于第三方接口实现服务机制,如DNS,Consul,Kubernetes等,如和kubernetes apiserver结合,获取目标target的是列表,并定期轮训获取到监控数据。

P8S 部署

部署告警服务alertmanager

apiVersion: v1
kind: ConfigMap
metadata:
  name: alertmanager-conf
  namespace: bigdata
data:
  alertmanager.yml: |-
    route:
      group_by: [ 'alertname' ]
      group_wait: 30s
      group_interval: 5m
      repeat_interval: 1h
      receiver: 'web.hook'
    receivers:
      - name: 'web.hook'
        webhook_configs:
          - url: 'http://127.0.0.1:5001/'
    inhibit_rules:
      - source_match:
          severity: 'critical'
        target_match:
          severity: 'warning'
        equal: [ 'alertname', 'dev', 'instance' ]
---
apiVersion: apps/v1
kind: Deployment
metadata:
  labels:
    app: prometheus-alert
    alertname: prometheus-alert
  name: prometheus-alert
  namespace: bigdata
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus-alert
      alertname: prometheus-alert
  template:
    metadata:
      labels:
        app: prometheus-alert
        alertname: prometheus-alert
    spec:
      containers:
        - image: prom/alertmanager:v0.23.0
          name: prometheus-alert
          env:
            - name: TZ
              value: "Asia/Shanghai"
          ports:
            - containerPort: 9093
          resources:
            limits:
              cpu: 200m
              memory: 200Mi
            requests:
              cpu: 100m
              memory: 100Mi
          volumeMounts:
            - name: pod-time
              mountPath: /etc/localtime
            - name: alertmanager-conf
              mountPath: /etc/alertmanager
            - name: alertmanager-template
              mountPath: /etc/alertmanager/tmpl
      volumes:
        - name: alertmanager-conf
          configMap:
            name: alertmanager-conf
        - name: pod-time
          hostPath:
            path: /usr/share/zoneinfo/Asia/Shanghai
        - name: alertmanager-template
          configMap:
            name: alter-template
            
---
apiVersion: v1
kind: Service
metadata:
  labels:
    alertname: prometheus-alert
  name: alertmanager-svc
  namespace: bigdata
  annotations:
    prometheus.io/scrape: 'true'
    prometheus.io/port: '8080'
spec:
  type: NodePort
  ports:
    - name: http
      port: 9093
      targetPort: 9093
  selector:
    app: prometheus-alert
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  annotations:
    kubernetes.io/ingress.class: nginx
  name: alertmanager-igs
  namespace: bigdata
spec:
  rules:
    - host: alertmanager.study.com
      http:
        paths:
          - backend:
              service:
                name: alertmanager-svc
                port:
                  number: 9093
            path: /
            pathType: ImplementationSpecific

部署 p8s

apiVersion: v1
kind: ConfigMap
metadata:
  name: prometheus-config
  namespace: prometheus
data:
  prometheus.yml: |
    global:
      scrape_interval: 15s
      evaluation_interval: 30s
    scrape_configs:
      - job_name: "prometheus"
        static_configs:
          - targets: ["localhost:9090"]
    rule_files:
      - /etc/prometheus/rules/*.yaml
    alerting:
      alertmanagers:
        - static_configs:
            - targets: ["alertmanager-svc:9093"]            

部署p8s所需要的pvc

这里的pv模式采用的是nfs

apiVersion: v1
kind: PersistentVolume
metadata:
  name: nfs-pv1
  namespace: bigdata
spec:
  # 持久卷容量
  capacity:
    storage: 200Mi
  # 访问模式
  accessModes:
    - ReadWriteMany
  # 持久卷回收策略
  persistentVolumeReclaimPolicy: Retain
  # 存储类名称
  storageClassName: nfs-pv
  # NFS 配置
  nfs:  
    # NFS 服务器地址
    server: 192.168.199.91
    # NFS 路径
    path: "/data/promethues/grafana"   #NFS目录,需要该目录在NFS上存在
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: nfs-pvc1
  namespace: bigdata
spec:
  # 持久卷声明的存储类
  storageClassName: nfs-pv1
  # 访问模式
  accessModes:
    - ReadWriteMany
  # 资源请求
  resources:
    requests:
      storage: 200Mi  #容量

部署p8s

apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus
  namespace: bigdata
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
  template:
    metadata:
      labels:
        app: prometheus
    spec:
      volumes:
        - name: config-volume
          configMap:
            name:  prometheus-config
            defaultMode: 420
        - name: prometheus-rule
          configMap:
            name: prometheus-rule
        - name: pod-time
          hostPath:
            path: /etc/localtime
        - name: storage-volume
          persistentVolumeClaim:
            claimName: nfs-pvc
      containers:
        - name: prometheus-server
          image: prom/prometheus:v2.34.0
          ports:
            - containerPort: 9090
              protocol: TCP
          resources: {}
          volumeMounts:
            - name: config-volume
              mountPath: /etc/prometheus/
            - name: storage-volume
              mountPath: /prometheus
            - name: pod-time
              mountPath: /etc/localtime
            - name: prometheus-rule
              mountPath: /etc/prometheus/rules
          imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
  name: prometheus-svc
  namespace: bigdata
  labels:
    app: prometheus
spec:
  ports:
    - port: 9090
  selector:
    app: prometheus
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: prometheus-ingress
  namespace: bigdata
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
    - host: prometheus.Study.com
      http:
        paths:
          - path: /
            backend:
              service:
                name: prometheus-svc
                port:
                  number: 9090
            pathType: ImplementationSpecific

部署granfna

创建granfna-pv

 apiVersion: v1
 kind: PersistentVolume
 metadata:
   name: bigdata-grafana-pv
   namespace: bigdata
 spec:
   capacity:
     storage: 200Mi
   accessModes:
     - ReadWriteMany
   persistentVolumeReclaimPolicy: Retain
   storageClassName: bigdata-grafana-pv
   nfs:  
     server: 192.168.1.1
     path: "/data/bigdata/grafana"   #NFS目录,需要该目录在NFS上存在
---
 apiVersion: v1
 kind: PersistentVolumeClaim
 metadata:
   name: bigdata-grafana-pvc
   namespace: bigdata
 spec:
   storageClassName: bigdata-grafana-pv
   accessModes:
     - ReadWriteMany
   resources:
     requests:
       storage: 200Mi  #容量

部署granfna

apiVersion: apps/v1
kind: Deployment
metadata:
  name: grafana-core
  namespace: bigdata
  labels:
    app: grafana
    component: core
spec:
  replicas: 1
  selector:
    matchLabels:
      app: grafana
  template:
    metadata:
      labels:
        app: grafana
        component: core
    spec:
      volumes:

        - name: pod-time
          hostPath:
            path: /etc/localtime
        - name: grafana-pvc
          persistentVolumeClaim:
            claimName: bigdata-grafana-pvc
      containers:
        - image: grafana/grafana-oss:6.7.6
          name: grafana-core
          imagePullPolicy: IfNotPresent
          volumeMounts:   #先不进行挂载
          - name: pod-time
            mountPath: /etc/localtime
          - name: grafana-pvc
            mountPath: /var/lib/grafana
---
apiVersion: v1
kind: Service
metadata:
  name: grafana
  namespace: bigdata
  labels:
    app: grafana
    component: core
spec:
  #  type: NodePort
  ports:
    - port: 3000
  selector:
    app: grafana
    component: core
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: grafana
  namespace: bigdata
  annotations:
    kubernetes.io/ingress.class: nginx
spec:
  rules:
    - host: grafana.jjiy.com
      http:
        paths:
          - path: /
            backend:
              service:
                name: grafana
                port:
                  number: 3000
            pathType: ImplementationSpecific