公司开发平台之技术选型-监控(Prometheus+Grafana)

1,507 阅读3分钟

如何选择合适的监控平台

  • 1.监控类型支持多(数据库、操作系统、中间件、开发的应用监控(java,php,c#)等)?
  • 2.丰富的监控表报+自定义监控表报+预警通知?
  • 3.可扩展的自定义监控插件定义?

监控平台构建

Prometheus使用

Prometheus:支持各种类型的数据并以时间序列形式进行数据的采集、存储.

架构

安装配置

  • 1.下载
wget https://github.com/prometheus/prometheus/releases/download/v2.13.0/prometheus-2.13.0.linux-amd64.tar.gz
  • 2.配置
tar -zxvf prometheus-2.13.0.linux-amd64.tar.gz
cd prometheus-2.13.0.linux-amd64
./prometheus --help -- 看启动命令
./prometheus --config.file='prometheus.yml' --启动配置
访问地址:http://localhost:9090
  • 2.1 配置文件详解

prometheus.yml参考

prometheus.yml:
#全局配置
global:
#通知预警
alerting:
#规则配置
rule_files:
#job任务配置(数据采集)
scrape_configs:
 #任务名称
 - job_name:
   #采集数据频率
   scrape_interval:
   #超时时间
   scrape_timeout:
   #采集数据的路径(默认:/metrics)
   metrics_path:
   #定义标签冲突解决方式
   honor_labels:
   #定义支持处理协议(默认:http)
   scheme:
   #http 请求url参数
   params:
   #http Authorization处理
   basic_auth:
    username:
    password:
    password_file:
   #基于beaer_token模式的验证
   bearer_token:
   #job处理任务配置
   static_configs:
   #目标主机
    - targets:[]
    #标签配置
    - labels:[]

Grafana使用

Grafana:开源的分析、监控系统

架构

使用(Linux)

  • 1.安装
wget https://dl.grafana.com/oss/release/grafana-6.4.2.linux-amd64.tar.gz
tar -zxvf grafana-6.4.2.linux-amd64.tar.gz
  • 2.运行
cd grafana-6.4.2.linux-amd64/bin
./grafana-server 
http://localhost:3000
  • 2.1 配置说明
defaults.ini:默认配置
ustom.ini or grafana.ini:自定义配置
-----------------------------------
paths:存储相关路径
server:服务器配置
database:存储数据库配置
remote_cache:缓存配置
security:安全配置
users:用户相关配置
dashboards:面板相关配置
smtp:邮箱相关配置
log:日志相关配置
alerting:预警配置

监控平台构建

Prometheus预警使用

预警提供对监控服务器、应用、数据的情况的汇报以达到最终监控目的

参考:
AlertManager 预警通知相关配置

  • 1.alertmanager.yml参数使用说明
#全局参数配置(邮件服务、Slack、微信等)
global: 
#通知模板配置
template:
#用来设置报警的分发策略,它是一个树状结构,按照深度优先从左向右的顺序进行匹配
#实际在配置:在什么时间点以什么方式给谁发送消息
route:
#通知端配置
receivers:
-------------------------------------
参考:
global:
  resolve_timeout: 5m
  smtp_smarthost: 'smtp.exmail.qq.com:465' # 邮箱smtp服务器代理
  smtp_from: 'xxxxx' # 发送邮箱名称
  smtp_auth_username: 'xxxx' # 邮箱名称
  smtp_auth_password: 'xxxx' # 邮箱密码或授权码
  smtp_require_tls: false #禁用tls
route:
  group_by: ['alertname']
  group_wait: 10s
  group_interval: 10s
  repeat_interval: 1h
  receiver: 'forEmail'
receivers:
#邮箱配置
- name: 'forEmail'
  email_configs:
  - to: 'xxxx'
    send_resolved: true
inhibit_rules:
  - source_match:
      severity: 'critical'
    target_match:
      severity: 'warning'
    equal: ['alertname', 'dev', 'instance']

预警采集配置

1. Propmetheus开启预警配置
#alert通知预警配置
alerting:
  alertmanagers:
  - static_configs:
    - targets: ["localhost:9093"]
2. 预警信息规则采集配置(rule配置)
rule_files:
    - "/data/monitor/prometheus-2.13.0.linux-amd64/rule/*.rules"
groups:
- name: #规则名
  rules: 
  - alert: #规则名
    expr:  #使用PromQL表达式进行规则条件定义
    #达到、或者持续多长时间触发规则
    for:
    labels:
      severity:
      team:
    #添加到所用alert上的统一信息
    annotations:
      summary:
--------------------------------
参考:
服务器宕机监控
groups:
- name: node-up
  rules:
  - alert: node-up
    expr: up{job="node"} == 0
    for: 15s
    labels:
      severity: 1
      team: node
    annotations:
      summary: "{{ $labels.instance }} 已停止运行超过 15s!"

Prometheus常见exporters使用

Java 应用(Spring Boot)
  • 1.Spring Boot 应用端引依赖并配置
    <dependency>
        <groupId>org.springframework.boot</groupId>
        <artifactId>spring-boot-starter-actuator</artifactId>
    </dependency>
    <dependency>
        <groupId>io.micrometer</groupId>
        <artifactId>micrometer-registry-prometheus</artifactId>
        <version>1.1.3</version>
    </dependency>
        
    @Bean
    MeterRegistryCustomizer<MeterRegistry> configurer(
            @Value("${spring.application.name}") String applicationName) {
        return (registry) -> registry.config().commonTags("application", applicationName);
    }
management:
  endpoints:
    web:
      exposure:
        include: '*'
  metrics:
    tags:
      application: ${spring.application.name}
  • 2.prometheus采集端配置
- job_name: 'springboot_prometheus'
    scrape_interval: 5s
    metrics_path: '/actuator/prometheus' #采集点
    static_configs:
    - targets: ['192.168.0.106:8080','192.168.0.106:8081']  #多应用配置
服务器监控(Linux/Window)
1.下载
wget https://github.com/prometheus/node_exporter/releases/download/v0.18.1/node_exporter-0.18.1.linux-386.tar.gz
2.启动
tar -zxvf node_exporter-0.18.1.linux-386.tar.gz
./node_exporter
3.采集端配置
- job_name: 'node'
    static_configs:
    - targets: ['192.168.0.106:9100']
数据库
MongoDB 监控配置[待定]
https://github.com/percona/mongodb_exporter/releases
https://devconnected.com/mongodb-monitoring-with-grafana-prometheus/#b_Installing_the_MongoDB_exporter
1.下载
wget https://github.com/percona/mongodb_exporter/releases/download/v0.10.0/mongodb_exporter-0.10.0.linux-amd64.tar.gz
2.监控配置

3.采集配置
- job_name: 'mongoDb'
    static_configs:
    - targets: ['192.168.0.106:9001']
Redis
1.下载并配置
https://github.com/oliver006/redis_exporter
2.监控配置
https://grafana.com/grafana/dashboards/763
3.采集配置
  - job_name: 'Redis'
    static_configs:
    - targets: ['192.168.0.106:9121']
Mysql
1.下载采集器
https://github.com/prometheus/mysqld_exporter
1.1 mysql 数据库创建采集用户并授权:
CREATE USER 'exporter'@'localhost' IDENTIFIED BY 'exporter' WITH MAX_USER_CONNECTIONS 3;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'localhost';
1.2 采集器配置(创建my.conf配置文件)
[client]
host=192.168.0.102 #mysql server ip地址
port=3306
user=exporter#这里的用户就是第一步设置的用户
password=exporter
1.3 启动服务
./mysqld_exporter --config.my-cnf=/指定目录下/my.conf
2.dashboard 配置
https://grafana.com/grafana/dashboards/7362
https://grafana.com/grafana/dashboards/6239
3.监控采集配置
  - job_name: 'Mysql'
    static_configs:
    - targets: ['192.168.0.106:9104']
PostgreSQL
# 1.postgresql 采集器下载

wget https://github.com/prometheus-community/postgres_exporter/releases/download/v0.9.0/postgres_exporter-0.9.0.linux-amd64.tar.gz

# 2.postgresql 数据库配置
CREATE USER postgres_exporter PASSWORD '123456';
ALTER USER postgres_exporter SET SEARCH_PATH TO postgres_exporter,pg_catalog;

CREATE SCHEMA postgres_exporter AUTHORIZATION postgres_exporter;

CREATE FUNCTION postgres_exporter.f_select_pg_stat_activity()
RETURNS setof pg_catalog.pg_stat_activity
LANGUAGE sql
SECURITY DEFINER
AS $$
  SELECT * from pg_catalog.pg_stat_activity;
$$;

CREATE FUNCTION postgres_exporter.f_select_pg_stat_replication()
RETURNS setof pg_catalog.pg_stat_replication
LANGUAGE sql
SECURITY DEFINER
AS $$
  SELECT * from pg_catalog.pg_stat_replication;
$$;

CREATE VIEW postgres_exporter.pg_stat_replication
AS
  SELECT * FROM postgres_exporter.f_select_pg_stat_replication();

CREATE VIEW postgres_exporter.pg_stat_activity
AS
  SELECT * FROM postgres_exporter.f_select_pg_stat_activity();

GRANT SELECT ON postgres_exporter.pg_stat_replication TO postgres_exporter;
GRANT SELECT ON postgres_exporter.pg_stat_activity TO postgres_exporter;

# 3.启动采集器

vim pg_export.sh #创建启动脚本 
#!/bin/sh
export DATA_SOURCE_NAME="user=postgres_exporter host=127.0.0.1 password=123456 port=5432 dbname=test_monitor sslm
ode=disable"
./postgres_exporter

# 4.prometheus job 配置
- job_name: 'postgres'
    static_configs:
    - targets: ['192.168.3.32:9187']
    
 http://192.168.3.32:9187/metrics #查看采集数据
# 5.postgres_exporter自定义查询配置
说明:
1.基于postgres_exporter源码进行修改,添加查询语句
2.源码打包编译上传并更新queries.yaml文件
消息中间件
  • RabbitMq
 前提:安装RabbitMq(如果是测试环境,使用docker简化操作)
 https://www.cnblogs.com/yufeng218/p/9452621.html
 1.下载并配置
 https://github.com/kbudde/rabbitmq_exporter
 1.1-通过配置文件
 1.2-通过环境变量(更多参数参考文档)
 如:RABBIT_USER=admin RABBIT_PASSWORD=admin RABBIT_URL=http://192.168.0.105:15672 SKIP_QUEUES="RPC_.*" MAX_QUEUES=5000 ./rabbitmq_exporter
 2.监控模板
 https://grafana.com/grafana/dashboards/4279
 2.采集配置
   - job_name: 'Redis'
    static_configs:
    - targets: ['192.168.0.106:9419']

其他中间件

  • docker

应用服务器

  • Nginx
1.安装nginx-module-vts提供采集nginx相关信息(以源码方式进行Nginx安装)[监控前提]
https://github.com/vozlt/nginx-module-vts
nginx.conf中配置
http {
    vhost_traffic_status_zone;
    server {
        location /status {
            vhost_traffic_status_display;
            vhost_traffic_status_display_format html;
        }
    }
}
查看/status/format/json 是否能够访问到数据,如果可以表示配置成功
2.下载nginx-vts-exporter并进行监控配置
参考:https://hnlq715.github.io/nginx-vts-exporter/
./nginx-vts-exporter --help --->查看详细参数
./nginx-vts-exporter -nginx.scrape_uri=http://localhost/status/format/json
nginx.scrape_uri:监控获取的参数指标路径(基于nginx.config中配置访问路径为准)
3.采集配置
  - job_name: 'nginx'
    static_configs:
    - targets: ['192.168.0.106:9913']