概述

白盒监控：是指我们日常监控主机的资源用量、容器的运行状态、数据库中间件的运行数据。这些都是支持业务和服务的基础设施，通过白盒能够了解其内部的实际运行状态，通过对监控指标的观察能够预判可能出现的问题，从而对潜在的不确定因素进行优化。

墨盒监控：即以用户的身份测试服务的外部可见性，常见的黑盒监控包括 HTTP探针、TCP探针、Dns、Icmp等用于检测站点、服务的可访问性、服务的连通性，以及访问效率等。

两者比较：黑盒监控相较于白盒监控最大的不同在于黑盒监控是以故障为导向当故障发生时，黑盒监控能快速发现故障，而白盒监控则侧重于主动发现或者预测潜在的问题。一个完善的监控目标是要能够从白盒的角度发现潜在问题，能够在黑盒的角度快速发现已经发生的问题。

Prometheus 基本原理

描述: Prometheus 基本工作流程步骤如下:

Prometheus Server 读取配置解析静态监控端点（static_configs），以及服务发现规则(xxx_sd_configs)自动收集需要监控的端点

Prometheus Server 周期刮取(scrape_interval)监控端点通过HTTP的Pull方式采集监控数据

Prometheus Server HTTP 请求到达 Node Exporter，Exporter 返回一个文本响应，每个非注释行包含一条完整的时序数据：Name + Labels + Samples(一个浮点数和一个时间戳构成), 数据来源是一些官方的exporter或自定义sdk或接口;

Prometheus Server 收到响应，Relabel处理之后(relabel_configs)将其存储在TSDB中并建立倒排索引

Prometheus Server 另一个周期计算任务(evaluation_interval)开始执行，根据配置的Rules逐个计算与设置的阈值进行匹配，若结果超过阈值并持续时长超过临界点将进行报警，此时发送Alert到AlertManager独立组件中。

AlertManager 收到告警请求，根据配置的策略决定是否需要触发告警，如需告警则根据配置的路由链路依次发送告警，比如邮件、微信、Slack、PagerDuty、WebHook等等。

当通过界面或HTTP调用查询时序数据利用PromQL表达式查询，Prometheus Server 处理过滤完之后返回瞬时向量(Instant vector, N条只有一个Sample的时序数据)，区间向量(Range vector，N条包含M个Sample的时序数据)，或标量数据 (Scalar, 一个浮点数)

采用Grafana开源的分析和可视化工具进行数据的图形化展示。
作者：WeiyiGeek www.bilibili.com/read/cv1329… 出处：bilibili

安装使用

安装

docker 安装

新建目录prometheus，编辑配置文件prometheus.yml

global:
  scrape_interval:     60s
  evaluation_interval: 60s
 
scrape_configs:
  - job_name: prometheus
    static_configs:
      - targets: ['localhost:9090']
        labels:
          instance: prometheus
  - job_name: 'node_161'
 
    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.
 
    static_configs:
    - targets: ['172.171.100.161:19100']
    - targets: ['172.171.100.157:19100']
  - job_name: springboot-minio
    honor_timestamps: true
    scrape_interval: 15s
    scrape_timeout: 10s
    metrics_path: /actuator/prometheus
    scheme: http
    follow_redirects: true
    static_configs:
    - targets:
      - 172.171.100.157:18099

启动

docker run  -d  -p 9090:9090   -v /data/prometheus/prometheus.yml:/etc/prometheus/prometheus.yml   prom/prometheus

3. 查看

1. http://localhost:9090/targets
2. http://localhost:9090/graph

yum 安装

(1) 下载prometheus安装包
prometheus.io/download/ 选择Liunx amd64架构

(2) 解压

#tar xf prometheus-2.27.1.linux-amd64.tar.gz -C /data/
#cd /data
#mv prometheus-2.27.1.linux-amd64/ prometheus

(3) 配置prometheus

vim /usr/lib/systemd/system/prometheus.service

[Unit]
Description=Prometheus
After=network.target

[Service]
Type=simple
Environment="GOMAXPROCS=4"
#User=prometheus
#Group=prometheus
ExecReload=/bin/kill -HUP $MAINPID
ExecStart=/data/prometheus/prometheus 
--config.file=/data/prometheus/prometheus.yml 
--storage.tsdb.path=/data/prometheus/data 
--storage.tsdb.retention=30d 
--web.console.libraries=/data/prometheus/console_libraries 
--web.console.templates=/data/prometheus/consoles 
--web.listen-address=0.0.0.0:9090 
--web.read-timeout=5m 
--web.max-connections=10 
--query.max-concurrency=20 
--query.timeout=2m 
--web.enable-lifecycle
PrivateTmp=true
PrivateDevices=true
ProtectHome=true
NoNewPrivileges=true
LimitNOFILE=infinity
ReadWriteDirectories=/data/prometheus
ProtectSystem=full

SyslogIdentifier=prometheus
Restart=always

[Install]
WantedBy=multi-user.target

(4) 启动prometheus

systemctl daemon-reload

systemctl start prometheus

systemctl status prometheus

# 检查配置文件是否正确
./promtool check config prometheus.yml

node-exporter

用户监控服务器节点，安装在被监控的服务器节点上，启动后通过暴露服务器相关数据指标，被prometheus采集达到监控的目的

下载node-exporter

地址：github.com/prometheus/…

下载版本：node_exporter-0.18.1.linux-amd64 解压后放到 /data/文件中

编写服务

#vim /etc/systemd/system/node_exporter.service
[Unit]
Description=prometheus node_exporter Daemon
Documentation=https://github.com/prometheus/node_exporter
Requires=network.target
After=network.target

[Service]
Type=simple
WorkingDirectory=/data/node_exporter
ExecStart=/data/node_exporter/node_exporter --log.level=info --web.listen-address=:19100	
TimeoutSec=30
Restart=on-failure

[Install]
WantedBy=default.target

systemctl daemon-reload && systemctl start node_exporter
systemctl enable node_exporter
systemctl stop node_exporter
systemctl restart node_exporter

黑盒监控 black-exporter

通过prometheus发出请求服务器数据达到监控目的

yum安装

# wget https://github.com/prometheus/blackbox_exporter/releases/download/v0.16.0/blackbox_exporter-0.16.0.linux-amd64.tar.gz

# tar xf blackbox_exporter-0.16.0.linux-amd64.tar.gz -C /usr/local/
# ln -s /usr/local/blackbox_exporter-0.16.0.linux-amd64/ /usr/local/blackbox_exporter

# 使用systemd进行管理blackbox_exporter服务
# vim /usr/lib/systemd/system/blackbox_exporter.service
[Unit]
Description=blackbox_exporter
After=network.target

[Service]
User=root
Type=simple
ExecStart=/usr/local/blackbox_exporter/blackbox_exporter --config.file=/usr/local/blackbox_exporter/blackbox.yml
Restart=on-failure

[Install]
WantedBy=multi-user.target

# systemctl daemon-reload
# systemctl start blackbox_exporter.service 
# systemctl enable blackbox_exporter.service

配置

blackbox.yml

modules:
  http_2xx:
    prober: http
  http_3xx:
    prober: http
  http_post_2xx:
    prober: http
    http:
      method: POST
  tcp_connect:
    prober: tcp
  pop3s_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^+OK"
      tls: true
      tls_config:
        insecure_skip_verify: false
  ssh_banner:
    prober: tcp
    tcp:
      query_response:
      - expect: "^SSH-2.0-"
  irc_banner:
    prober: tcp
    tcp:
      query_response:
      - send: "NICK prober"
      - send: "USER prober prober prober :prober"
      - expect: "PING :([^ ]+)"
        send: "PONG ${1}"
      - expect: "^:[^ ]+ 001"
  icmp:
    prober: icmp

接入prometheus

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - 172.171.100.174:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
  - "rules/*.yml"
  # - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'
    file_sd_configs:
    - files:
      - targets/prometheus-*.yaml
      refresh_interval: 1m

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
    - targets: ['localhost:9090']

  - job_name: 'nodes'
    file_sd_configs:
    - files:
      - targets/node-*.yaml
      refresh_interval: 1m
  - job_name: "http-prod-200"
    metrics_path: /probe
    params:
      module: [http_2xx]  # Look for a HTTP 200 response.
    file_sd_configs:
    - refresh_interval: 1m
      files:
      - "/data/prometheus/blackbox/http-prod-200.yml"
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 172.171.100.174:9115
  - job_name: "http-prod-302"
    metrics_path: /probe
    params:
      module: [http_3xx]  # Look for a HTTP 200 respons
    file_sd_configs:
    - refresh_interval: 1m
      files:
      - "/data/prometheus/blackbox/http-prod-302.yml"
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 172.171.100.174:9115
  - job_name: "http-test-200"
    metrics_path: /probe
    params:
      module: [http_3xx]  # Look for a HTTP 200 respons
    file_sd_configs:
    - refresh_interval: 1m
      files:
      - "/data/prometheus/blackbox/http-test-200.yml"
    relabel_configs:
    - source_labels: [__address__]
      target_label: __param_target
    - source_labels: [__param_target]
      target_label: instance
    - target_label: __address__
      replacement: 172.171.100.174:9115

http-prod-200.yml

- targets:
  - https://www.alibaba.com
  - https://www.tencent.com
  - https://www.baidu.com
  - http://www.test-nginx.com

http-prod-302.yml

- targets:
  - https://uc.lxyun.cn

接入grafana

dashboarID :7587

- resolve：DNS解析持续时间
- connect：TCP连接建立的持续时间
- tls：    TLS连接协商持续时间（我认为这包括TCP连接建立持续时间）
- processing：建立连接与接收响应的第一个字节之间的持续时间
- transfer：转移响应的持续时间

监控actuator

用于监控应用程序例如java

应用配置

<dependency>
  <groupId>org.springframework.boot</groupId>
  <artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<dependency>
  <groupId>io.micrometer</groupId>
  <artifactId>micrometer-registry-prometheus</artifactId>
</dependency>

management:
  endpoints:
    web:
      exposure:
        include: 'prometheus'
  metrics:
    tags:
      application: ${spring.application.name}

访问： http://ip:oprt/actuator/prometheus

Prometheus 配置

- job_name: 'spring'
  # 多久采集一次数据
  scrape_interval: 15s
  # 采集时的超时时间
  scrape_timeout: 10s
  # 采集的路径是啥
  metrics_path: '/actuator/prometheus'
  # 采集服务的地址，设置成上面Spring Boot应用所在服务器的具体地址。
  static_configs:
  - targets: ['172.171.100.157:18099']

grafana DashboardID:12900

mysql 监控

监控mysql需要在被监控机器安装mysql_exporter

mysql_exporter下载地址：prometheus.io/download/

[root@xinsz08-20 ~]# mv mysqld_exporter-0.12.1.linux-amd64 mysqld_exporter
[root@xinsz08-20 ~]# cd mysqld_exporter
# 添加配置文件
[root@xinsz08-20 ~]# vim .my.cnf

[client]
user=root
password=123456

# 启动
[root@xinsz08-20 mysqld_exporter]# nohup ./mysqld_exporter --config.my-cnf=my.cnf &

查看端口（9104）


[root@zmedu-17 prometheus-2.16.0.linux-amd64]# vim prometheus.yml 
- job_name: 'mysql-lxy'
    static_configs:
    - targets: ['172.171.100.172:9104']

dashboardID :7362

Prometheus的搭建和使用

概述