1. 初识prometheus
1.1 prometheus简介
Prometheus is an open-source systems monitoring and alerting toolkit originally built at SoundCloud. It is now a standalone open source project and maintained independently of any company. To emphasize this, and to clarify the project's governance structure, Prometheus joined theCloud Native Computing Foundationin 2016 as the second hosted project, afterKubernetes.
Prometheus是一个开源的监控系统+告警系统工具集,最早由SoudCloud开发,目前已被很多公司广泛使用,于2016年加入CNCF组织,成为继kubernetes之后第二个管理的项目。得益于kubernetes的火热,prometheus被越来越多的企业应用,已成为新一代的监控系统,成为CNCF第二个毕业的项目。
prometheus特点:
- 一个指标和键值对标识的时间序列化多维度数据模型
- PromQL提供一个便捷查询语言实现多维度数据查询
- 不依赖于分布式存储,单个节点能提供自治功能
- 通过HTTP协议拉取时间系列数据模型
- 支持通过gateway主动推送时间序列
- 支持服务发现或者静态配置发现节点
- 内置有多维度数据画图和集成grafana数据展示
1.2 prometheus架构
prometheus架构
prometheus架构:
-
prometheus-server,prometheus主服务端,从exporters端采集和存储数据,并提供PromQL数据查询语言
- Retrieval 采集模块,从exporters和pushgateway中采集数据,采集数据经过一定规则处理
- TSDB 数据存储,TSDB是时序化数据库,将Retrieval采集数据存储,默认存储在本地
- http server 提供http接口查询和数据展板,默认端口是9090,可以登陆查询监控指标和绘图
- PromQL 提供边界的PromQL语言,用于数据统计,数据输出和数据展示接口集成
-
数据采集,数据采集模块,包含两种数据采集方式:拉去pull和推送push
- Jobs exporters 采集宿主机和container的性能指标,通过http方式拉取,支持多种不同数据类型采集
- Short-lived jobs 瞬时在线任务,适用于实时监控指标,server端拉去时可能消失了,采用主动上报机制
- Pushgateway 推动网关,Short-lived jobs将数据主动push到过gateway,server再从gateway拉取
-
数据展示,借助于PromQL语言实现实现数据的展示,包含还有prometheus UI,Gafana和API clients
- Prometheus Web UI,prometheus默认提一个数据查询和画图展示的UI,通过http 9090端口
- Grafana,一个开源非常优秀绚烂的数据展示框架,从Prometheus中获取数据,采用模版绘图
- API Clients,支持多种不同的客户端SDK语言,包括Go,python,Java等,便于编写开发监控系统
-
告警系统,从server接受告警,推送给AlertManager告警系统,告警系统接受告警信息去重,分组。通知包含
- pageduty
- Email,邮件告警,结合smtp
- 其他,如webhook等
-
服务发现,借助于第三方接口实现服务机制,如DNS,Consul,Kubernetes等,如和kubernetes apiserver结合,获取目标target的是列表,并定期轮训获取到监控数据。
P8S 部署
部署告警服务alertmanager
apiVersion: v1
kind: ConfigMap
metadata:
name: alertmanager-conf
namespace: bigdata
data:
alertmanager.yml: |-
route:
group_by: [ 'alertname' ]
group_wait: 30s
group_interval: 5m
repeat_interval: 1h
receiver: 'web.hook'
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: [ 'alertname', 'dev', 'instance' ]
---
apiVersion: apps/v1
kind: Deployment
metadata:
labels:
app: prometheus-alert
alertname: prometheus-alert
name: prometheus-alert
namespace: bigdata
spec:
replicas: 1
selector:
matchLabels:
app: prometheus-alert
alertname: prometheus-alert
template:
metadata:
labels:
app: prometheus-alert
alertname: prometheus-alert
spec:
containers:
- image: prom/alertmanager:v0.23.0
name: prometheus-alert
env:
- name: TZ
value: "Asia/Shanghai"
ports:
- containerPort: 9093
resources:
limits:
cpu: 200m
memory: 200Mi
requests:
cpu: 100m
memory: 100Mi
volumeMounts:
- name: pod-time
mountPath: /etc/localtime
- name: alertmanager-conf
mountPath: /etc/alertmanager
- name: alertmanager-template
mountPath: /etc/alertmanager/tmpl
volumes:
- name: alertmanager-conf
configMap:
name: alertmanager-conf
- name: pod-time
hostPath:
path: /usr/share/zoneinfo/Asia/Shanghai
- name: alertmanager-template
configMap:
name: alter-template
---
apiVersion: v1
kind: Service
metadata:
labels:
alertname: prometheus-alert
name: alertmanager-svc
namespace: bigdata
annotations:
prometheus.io/scrape: 'true'
prometheus.io/port: '8080'
spec:
type: NodePort
ports:
- name: http
port: 9093
targetPort: 9093
selector:
app: prometheus-alert
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
annotations:
kubernetes.io/ingress.class: nginx
name: alertmanager-igs
namespace: bigdata
spec:
rules:
- host: alertmanager.study.com
http:
paths:
- backend:
service:
name: alertmanager-svc
port:
number: 9093
path: /
pathType: ImplementationSpecific
部署 p8s
apiVersion: v1
kind: ConfigMap
metadata:
name: prometheus-config
namespace: prometheus
data:
prometheus.yml: |
global:
scrape_interval: 15s
evaluation_interval: 30s
scrape_configs:
- job_name: "prometheus"
static_configs:
- targets: ["localhost:9090"]
rule_files:
- /etc/prometheus/rules/*.yaml
alerting:
alertmanagers:
- static_configs:
- targets: ["alertmanager-svc:9093"]
部署p8s所需要的pvc
这里的pv模式采用的是nfs
apiVersion: v1
kind: PersistentVolume
metadata:
name: nfs-pv1
namespace: bigdata
spec:
# 持久卷容量
capacity:
storage: 200Mi
# 访问模式
accessModes:
- ReadWriteMany
# 持久卷回收策略
persistentVolumeReclaimPolicy: Retain
# 存储类名称
storageClassName: nfs-pv
# NFS 配置
nfs:
# NFS 服务器地址
server: 192.168.199.91
# NFS 路径
path: "/data/promethues/grafana" #NFS目录,需要该目录在NFS上存在
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: nfs-pvc1
namespace: bigdata
spec:
# 持久卷声明的存储类
storageClassName: nfs-pv1
# 访问模式
accessModes:
- ReadWriteMany
# 资源请求
resources:
requests:
storage: 200Mi #容量
部署p8s
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus
namespace: bigdata
labels:
app: prometheus
spec:
replicas: 1
selector:
matchLabels:
app: prometheus
template:
metadata:
labels:
app: prometheus
spec:
volumes:
- name: config-volume
configMap:
name: prometheus-config
defaultMode: 420
- name: prometheus-rule
configMap:
name: prometheus-rule
- name: pod-time
hostPath:
path: /etc/localtime
- name: storage-volume
persistentVolumeClaim:
claimName: nfs-pvc
containers:
- name: prometheus-server
image: prom/prometheus:v2.34.0
ports:
- containerPort: 9090
protocol: TCP
resources: {}
volumeMounts:
- name: config-volume
mountPath: /etc/prometheus/
- name: storage-volume
mountPath: /prometheus
- name: pod-time
mountPath: /etc/localtime
- name: prometheus-rule
mountPath: /etc/prometheus/rules
imagePullPolicy: IfNotPresent
---
apiVersion: v1
kind: Service
metadata:
name: prometheus-svc
namespace: bigdata
labels:
app: prometheus
spec:
ports:
- port: 9090
selector:
app: prometheus
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: prometheus-ingress
namespace: bigdata
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: prometheus.Study.com
http:
paths:
- path: /
backend:
service:
name: prometheus-svc
port:
number: 9090
pathType: ImplementationSpecific
部署granfna
创建granfna-pv
apiVersion: v1
kind: PersistentVolume
metadata:
name: bigdata-grafana-pv
namespace: bigdata
spec:
capacity:
storage: 200Mi
accessModes:
- ReadWriteMany
persistentVolumeReclaimPolicy: Retain
storageClassName: bigdata-grafana-pv
nfs:
server: 192.168.1.1
path: "/data/bigdata/grafana" #NFS目录,需要该目录在NFS上存在
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: bigdata-grafana-pvc
namespace: bigdata
spec:
storageClassName: bigdata-grafana-pv
accessModes:
- ReadWriteMany
resources:
requests:
storage: 200Mi #容量
部署granfna
apiVersion: apps/v1
kind: Deployment
metadata:
name: grafana-core
namespace: bigdata
labels:
app: grafana
component: core
spec:
replicas: 1
selector:
matchLabels:
app: grafana
template:
metadata:
labels:
app: grafana
component: core
spec:
volumes:
- name: pod-time
hostPath:
path: /etc/localtime
- name: grafana-pvc
persistentVolumeClaim:
claimName: bigdata-grafana-pvc
containers:
- image: grafana/grafana-oss:6.7.6
name: grafana-core
imagePullPolicy: IfNotPresent
volumeMounts: #先不进行挂载
- name: pod-time
mountPath: /etc/localtime
- name: grafana-pvc
mountPath: /var/lib/grafana
---
apiVersion: v1
kind: Service
metadata:
name: grafana
namespace: bigdata
labels:
app: grafana
component: core
spec:
# type: NodePort
ports:
- port: 3000
selector:
app: grafana
component: core
---
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: grafana
namespace: bigdata
annotations:
kubernetes.io/ingress.class: nginx
spec:
rules:
- host: grafana.jjiy.com
http:
paths:
- path: /
backend:
service:
name: grafana
port:
number: 3000
pathType: ImplementationSpecific