小知识，大挑战！本文正在参与“程序员必备小知识”创作活动。

基于Docker部署Prometheus

第一阶段：入门

1.监控平台的使用背景

日志监控：监控项目的日志输出。目前我们项目的日志一般都是输出到某个日志文件中，并且制定了一些日志策略，需要聚合项目日志统一查询，分析。

服务监控：监控服务的状态，内存使用情况。如Java项目，可以对JVM的状态进行监控。

资源监控：监控服务器的资源使用情况，如CPU使用率，内存使用率，磁盘使用率等。

2.安装步骤

2.1 安装可视化工具Grafana

Grafana可以将监控数据转为可视化的图表，可以将Prometheus ，Loki，elasticsearch等作为数据源，我们这次使用了loki，Prometheus作为数据源分别来监控日志和服务器资源。查看官网

部署命令

docker run -d --name grafana  -p 3000:3000 grafana/grafana grafana

安装验证

浏览器中访问 IP:3000 ，访问grafana。
登录。默认用户admin，密码admin

2.2 安装Prometheus

开源实时监控告警解决方案。查看官网

安装配置

mkdir /home/monitor/prometheus

创建配置文件 prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

部署命令

docker run --name  prometheus   -p 9090:9090 -d     -v /home/monitor/prometheus/:/etc/prometheus     prom/prometheus

验证安装

浏览器访问 IP:9090

查看监控目标

2.3 Grafana中配置Prometheus为数据源

我们需要在Grafana添加Prometheus为数据源，Grafana会将监控数据图表化

配置Prometheus数据源

进入数据源配置页面

点击 Add data source 添加数据源

选择 Prometheus 为数据源

输入Prometheus的 IP和端口信息

Save & test 保存配置

2.4 安装 Node Exporter

采集服务器主机的运行数据。如果需要监控哪一台机器，就在哪一台机器上面部署 Node Exporter。

修改Prometheus配置文件，添加 Node Exporter 配置

添加配置

  - job_name: 'server'
    static_configs:
      - targets: ['yourip:9100']

修改后的 prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'server'
    static_configs:
      - targets: ['yourip:9100']

重启Prometheus

docker restart prometheus

部署命令

docker run -d --name node_exporter   -p 9100:9100     -v "/:/host:ro,rslave"   quay.io/prometheus/node-exporter   --path.rootfs=/host

导入 Node Exporter 控制面板，ID: 8919

配置选项

验证安装与配置

2.5 安装 mysqld-exporter

采集数据库的运行数据。同上，需要监控哪天机器上的数据库，就需要在哪台机器上部署

修改Prometheus配置文件，添加 mysqld-exporter 配置

添加配置

  - job_name: 'Mysql'
    static_configs:
      - targets: ['yourip:9104']

修改后的 prometheus.yml

# my global config
global:
  scrape_interval:     15s # Set the scrape interval to every 15 seconds. Default is every 1 minute.
  evaluation_interval: 15s # Evaluate rules every 15 seconds. The default is every 1 minute.
  # scrape_timeout is set to the global default (10s).

# Alertmanager configuration
alerting:
  alertmanagers:
    - static_configs:
        - targets:
          # - alertmanager:9093

# Load rules once and periodically evaluate them according to the global 'evaluation_interval'.
rule_files:
# - "first_rules.yml"
# - "second_rules.yml"

# A scrape configuration containing exactly one endpoint to scrape:
# Here it's Prometheus itself.
scrape_configs:
  # The job name is added as a label `job=<job_name>` to any timeseries scraped from this config.
  - job_name: 'prometheus'

    # metrics_path defaults to '/metrics'
    # scheme defaults to 'http'.

    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'server'
    static_configs:
      - targets: ['yourip:9100']
      
  - job_name: 'Mysql'
    static_configs:
      - targets: ['yourip:9104']

重启Prometheus

docker restart prometheus

导入 MySQL 控制面板，ID: 7362

导入的方式参考 Node Exporter 。

2.6 部署 Grafana Loki

Loki 是 Grafana Labs 团队最新的开源项目，是一个水平可扩展，高可用性，多租户的日志聚合系统。它的设计非常经济高效且易于操作，因为它不会为日志内容编制索引，而是为每个日志流编制一组标签，专门为 Prometheus 和 Kubernetes 用户做了相关优化。该项目受 Prometheus 启发，官方的介绍就是： Like Prometheus,But For Logs.，类似于 Prometheus 的日志系统。官方部署文档

配置文件

创建loki配置文件

cd /home/monitor/loki

loki-config.yaml

auth_enabled: false

server:
  http_listen_port: 3100
  grpc_listen_port: 9096

ingester:
  wal:
    enabled: true
    dir: /tmp/wal
  lifecycler:
    address: 127.0.0.1
    ring:
      kvstore:
        store: inmemory
      replication_factor: 1
    final_sleep: 0s
  chunk_idle_period: 1h       # Any chunk not receiving new logs in this time will be flushed
  max_chunk_age: 1h           # All chunks will be flushed when they hit this age, default is 1h
  chunk_target_size: 1048576  # Loki will attempt to build chunks up to 1.5MB, flushing first if chunk_idle_period or max_chunk_age is reached first
  chunk_retain_period: 30s    # Must be greater than index read cache TTL if using an index cache (Default index read cache TTL is 5m)
  max_transfer_retries: 0     # Chunk transfers disabled

schema_config:
  configs:
    - from: 2020-10-24
      store: boltdb-shipper
      object_store: filesystem
      schema: v11
      index:
        prefix: index_
        period: 24h

storage_config:
  boltdb_shipper:
    active_index_directory: /tmp/loki/boltdb-shipper-active
    cache_location: /tmp/loki/boltdb-shipper-cache
    cache_ttl: 24h         # Can be increased for faster performance over longer query periods, uses more disk space
    shared_store: filesystem
  filesystem:
    directory: /tmp/loki/chunks

compactor:
  working_directory: /tmp/loki/boltdb-shipper-compactor
  shared_store: filesystem

limits_config:
  reject_old_samples: true
  reject_old_samples_max_age: 168h

chunk_store_config:
  max_look_back_period: 0s

table_manager:
  retention_deletes_enabled: false
  retention_period: 0s

ruler:
  storage:
    type: local
    local:
      directory: /tmp/loki/rules
  rule_path: /tmp/loki/rules-temp
  alertmanager_url: http://localhost:9093
  ring:
    kvstore:
      store: inmemory
  enable_api: true

部署命令

docker run -d -v $(pwd):/mnt/config --name loki -p 3100:3100 grafana/loki:2.3.0 -config.file=/mnt/config/loki-config.yaml

Grafana中配置Loki

数据源选择Loki

配置IP和端口，Save & test

2.6.1 部署 promtail 监听日志

Promtail is an agent which ships the contents of local logs to a private Loki instance or Grafana Cloud. It is usually deployed to every machine that has applications needed to be monitored. 官方文档

如果需要监听哪一台服务器，就需要在哪一台服务器上安装

配置文件

创建 promtail-config.yaml

cd /home/monitor/loki

配置内容：

server:
  http_listen_port: 9080
  grpc_listen_port: 0

positions:
  filename: /tmp/positions.yaml

clients:
  - url: http://loki:3100/loki/api/v1/push  # Loki的地址，只需要将ip替换成你部署的

scrape_configs:
- job_name: system
  static_configs:
  - targets:
      - localhost
    labels:
      job: varlogs
      __path__: /home/abc/def/logs  # 需要监听的日志目录

部署命令

docker run -d --privileged=true -v $(pwd):/mnt/config -v /home/abc/def/logs:/home/abc/def/logs grafana/promtail:2.3.0 -config.file=/mnt/config/promtail-config.yaml

-v /home/abc/def/logs:/home/abc/def/logs ：日志的挂载目录，你需要监听的日志目录

Prometheus---简单易用的监控平台

基于Docker部署Prometheus

第一阶段 ：入门

1.监控平台的使用背景

2.安装步骤

2.1 安装可视化工具Grafana

2.2 安装Prometheus

2.3 Grafana中配置Prometheus为数据源

2.4 安装 Node Exporter

2.5 安装 mysqld-exporter

2.6 部署 Grafana Loki

2.6.1 部署 promtail 监听日志

第一阶段：入门