服务监控告警套件

319 阅读2分钟

持续创作,加速成长!这是我参与「掘金日新计划 · 6 月更文挑战」的第2天,点击查看活动详情

服务监控告警套件

Prometheus

简介: Prometheus是一个服务监控系统。它以给定的时间间隔从配置的目标收集指标,评估规则表达式,显示结果,并在观察到指定条件时触发警报。

docker安装prometheus

docker pull prom/prometheus
docker run -d -p 9090:9090 --restart=always --name prometheus -v /home/work/prometheus.yml:/etc/prometheus/prometheus.yml -v /home/work/rules:/usr/local/prometheus/rules prom/prometheus 

监控采集配置 SpringBoot项目中加入micrometer-registry-prometheus并开启actuator, 加入监控endpoints配置 management.endpoints.web.exposure.include: "*"

# 全局配置
global:
  # 多久采集一次数据
  scrape_interval: 15s
  # 采集时的超时时间
  scrape_timeout: 10s
# 告警通知
alerting:
 alertmanagers:
 - static_configs:
   - targets:
      - ip:9093
# 告警alert规则
rule_files:
  - "/usr/local/prometheus/rules/*.rules"

# 监控采集任务
scrape_configs:
    # web服务
  - job_name: 'book'
    # 采集的路径
    metrics_path: '/api/actuator/prometheus'
    # 采集服务的地址
    static_configs:
      - targets: ['ip:8081']
    # MySQL
  - job_name: 'mysql'
    static_configs:
      - targets: ['ip:9104']
    # Redis
  - job_name: 'redis'
    static_configs:
      - targets: ['ip:9121']
    # Nginx
  - job_name: 'nginx'
    static_configs:
      - targets: ['ip:9913']
    # Linux
  - job_name: 'linux'
    static_configs:
      - targets: ['ip:9100','ip:9100']

告警规则配置

groups:
- name: Warning
  rules:
  - alert: book-status
    # 名称为book的任务状态为0表示宕机
    expr: up{job="book"} == 0
    # 服务状态转移时间间隔
    # Inactive -> Pending -> Firing
    # 当状态为Firing是会发送告警邮件
    for: 10s
    labels:
       status: Warning
       severity: 1
    annotations:
       summary: "{{$labels.instance}}: 服务宕机"
       description: "{{$labels.instance}}: 服务中断超过10s"

Alertmanager

docker安装alertmanager

docker pull quay.io/prometheus/alertmanager
docker run -d -p 9093:9093 --restart=always --name alertmanager -v /home/work/alertmanager/alertmanager.yml:/etc/alertmanager/alertmanager.yml -v /home/work/alertmanager:/alertmanager prom/alertmanager

alertmanager.yml

钉钉告警

docker安装prometheus-webhook-dingtalk

# v0.3.0
docker pull timonwong/prometheus-webhook-dingtalk:v0.3.0
docker run -d -p 8060:8060 -v /home/work/alertmanager/template:/usr/share/prometheus-webhook-dingtalk/template --name webhook-dingding timonwong/prometheus-webhook-dingtalk:v0.3.0 --template.file=/usr/share/prometheus-webhook-dingtalk/template/default.tmpl --ding.profile="webhook1=https://oapi.dingtalk.com/robot/send?access_token=xx" 
# latest
# 需要配置config.yml
docker run -d -p 8060:8060 -v /home/work/alertmanager/template:/usr/share/prometheus-webhook-dingtalk/template -v  /home/work/webhook:/usr/share/prometheus-webhook-dingtalk --restart=always --name webhook-dingding timonwong/prometheus-webhook-dingtalk --config.file=/usr/share/prometheus-webhook-dingtalk/config.yml 

配置 dingtalk config.yml

# 自定义模板文件
templates:  
  - /usr/share/prometheus-webhook-dingtalk/template/default.tmpl
targets:
  webhook1:
    url: https://oapi.dingtalk.com/robot/send?access_token=xx
    mention:
      all: true
      # mobiles: ['123']

自定义模板文件传送门

告警效果

告警.png

Nginx日志高级监控Nginx-Loki

  • docker安装
# 拉取loki-config.yaml
wget https://raw.githubusercontent.com/grafana/loki/v2.1.0/cmd/loki/loki-local-config.yaml -O loki-config.yaml
docker run -d -v $(pwd):/mnt/config -p 3100:3100 --name loki grafana/loki:2.1.0 -config.file=/mnt/config/loki-config.yaml
# 拉取promtail-config.yaml
wget https://raw.githubusercontent.com/grafana/loki/v2.1.0/cmd/promtail/promtail-docker-config.yaml -O promtail-config.yaml
docker run -d -v $(pwd):/mnt/config -v /usr/local/nginx/logs:/usr/local/nginx/logs --name promtail grafana/promtail:2.1.0 -config.file=/mnt/config/promtail-config.yaml

数据接入Grafana

搜索自己想要的模板

Grafana数据可视化效果图

简介: Grafana是数据的开源分析和监控解决方案

创建一个自己的监控面板流程如下 创建一个监控面板.gif 推荐一些我觉得比较好用的模板

  • Linux服务器监控 (10795, 8919) 25fce39519f2085cb83dcedc120ea92.png
  • JVM (10280, 4701) 5eb610b270233ddeb2681d97882f73e.png
  • Nginx Nginx-Loki(2949, 12559) 3a554b531173e1b486e8fc74acc5e75.png
  • Redis (11835)
  • MySQL (7362)