Grafana 使用
添加数据源
创建测试dashboard
网卡测试数据
panel中操作
- 设置单位
- panel改名
- 曲线别名
- 曲线sort
- 曲线复制
- 曲线静默
- panel复制
- 设置告警线
设置tables
使用变量查询
变量嵌套
dashboard开屏时需要加载变量,这是导致他们慢的原因之一
导入dashboard商城中的node_exporter模板
-
两种导入模式
- url导入
- json文件导入
添加 grafana 的监控
grafana也会暴露指标,添加采集
http://grafana.prome.me:3000/metrics
添加到prometheus采集池
- targets:
- 172.20.70.205:3000
导入商城的dashboard
添加 mysql 的监控
在mysql的机器上部署 mysql_exporter
使用ansible部署 mysql_exporter
ansible-playbook -i host_file service_deploy.yaml -e "tgz=mysqld_exporter-0.12.1.linux-amd64.tar.gz" -e "app=mysqld_exporter"
我把 Ansible 的部署文件贴出来:
service_deploy.yaml
- name: install
hosts: all
user: root
gather_facts: false
vars:
local_path: /opt/tgzs
app_dir: /opt/app
tasks:
- name: mkdir
file: path={{ app_dir }}/{{ app }} state=directory
- name: mkdir
file: path={{ local_path }} state=directory
- name: copy config and service
copy:
src: '{{ item.src }}'
dest: '{{ item.dest }}'
owner: root
group: root
mode: 0644
force: true
with_items:
- { src: '{{ local_path }}/{{ tgz }}', dest: '{{ local_path }}/{{ tgz }}' }
- { src: '{{ local_path }}/{{ app }}.service', dest: '/etc/systemd/system/{{ app }}.service' }
register: result
- name: Show debug info
debug: var=result verbosity=0
- name: tar gz
shell: rm -rf /root/{{ app }}* ; \
tar xf {{ local_path }}/{{ tgz }} -C /root/ ; \
/bin/cp -far /root/{{ app }}*/* {{ app_dir }}/{{ app }}/ \
register: result
- name: Show debug info
debug: var=result verbosity=0
- name: restart service
systemd:
name: "{{ item }}"
state: restarted
daemon_reload: yes
enabled: yes
with_items:
- '{{ app }}'
register: result
- name: Show debug info
debug: var=result verbosity=0
部署后发现服务未启动 ,报错如下
Mar 30 10:41:10 prome_master_01 mysqld_exporter[30089]: time="2021-03-30T10:41:10+08:00" level=fatal msg="failed reading ini file: open .my.cnf: no such file or directory" source="mysqld_exporter.go:264"
创建采集用户,并授权
mysql -uroot -p123123
CREATE USER 'exporter'@'%' IDENTIFIED BY '123123' ;
GRANT PROCESS, REPLICATION CLIENT, SELECT ON *.* TO 'exporter'@'%';
FLUSH PRIVILEGES;
方式一:在mysqld_exporter的service 文件中使用环境变量 DATA_SOURCE_NAME
# 代表localhost
Environment=DATA_SOURCE_NAME=exporter:123123@tcp/
重启mysqld_exporter服务
systemctl daemon-reload
systemctl restart mysqld_exporter
查看mysqld_exporter日志
[root@prome_master_01 logs]# systemctl status mysqld_exporter -l
● mysqld_exporter.service - mysqld_exporter Exporter
Loaded: loaded (/etc/systemd/system/mysqld_exporter.service; enabled; vendor preset: disabled)
Active: active (running) since Tue 2021-03-30 10:53:08 CST; 11min ago
Main PID: 30158 (mysqld_exporter)
CGroup: /system.slice/mysqld_exporter.service
└─30158 /opt/app/mysqld_exporter/mysqld_exporter
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg="Starting mysqld_exporter (version=0.12.1, branch=HEAD, revision=48667bf7c3b438b5e93b259f3d17b70a7c9aff96)" source="mysqld_exporter.go:257"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg="Build context (go=go1.12.7, user=root@0b3e56a7bc0a, date=20190729-12:35:58)" source="mysqld_exporter.go:258"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg="Enabled scrapers:" source="mysqld_exporter.go:269"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg=" --collect.global_status" source="mysqld_exporter.go:273"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg=" --collect.global_variables" source="mysqld_exporter.go:273"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg=" --collect.slave_status" source="mysqld_exporter.go:273"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg=" --collect.info_schema.innodb_cmp" source="mysqld_exporter.go:273"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg=" --collect.info_schema.innodb_cmpmem" source="mysqld_exporter.go:273"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg=" --collect.info_schema.query_response_time" source="mysqld_exporter.go:273"
Mar 30 10:53:08 prome_master_01 mysqld_exporter[30158]: time="2021-03-30T10:53:08+08:00" level=info msg="Listening on :9104" source="mysqld_exporter.go:283"
方式二:使用my.cnf启动服务
Environment=DATA_SOURCE_NAME='exporter:123123@(localhost:3306)/'
将mysqld_exporter 采集加入的采集池中
static_configs:
- targets:
- 172.20.70.205:9104
grafana 上导入mysqld-dashboard
按需启动采集项
添加进程监控 process_exporter
体验atop
yum -y install atop
在机器上部署 process-exporter
使用ansible部署 process-exporter
ansible-playbook -i host_file service_deploy.yaml -e "tgz=process-exporter-0.7.5.linux-amd64.tar.gz" -e "app=process-exporter"
准备配置文件 process-exporter.yaml
- 指定采集进程的方式,下面的例子代表所有cmdline
cat <<EOF >/opt/app/process-exporter/process-exporter.yaml
process_names:
- name: "{{.Comm}}"
cmdline:
- '.+'
EOF
将process-exporter采集加入的采集池中
static_configs:
- targets:
- 172.20.70.205:9256
- 172.20.70.215:9256
grafana 上导入process-exporter dashboard
-
- 变量替换
- label_values(namedprocess_namegroup_num_procs, instance)
- label_values(namedprocess_namegroup_cpu_seconds_total{instance=~"$host"},groupname)
黑盒探针 blackbox_exporter
在机器上部署 blackbox_exporter
使用ansible部署 blackbox_exporter
ansible-playbook -i host_file service_deploy.yaml -e "tgz=blackbox_exporter-0.18.0.linux-amd64.tar.gz" -e "app=blackbox_exporter"
页面访问blackbox
页面访问target http探测
http://172.20.70.205:9115/probe?target=https://www.baidu.com&module=http_2xx&debug=true
结果解读
Logs for the probe:
ts=2021-03-30T07:28:17.405299592Z caller=main.go:304 module=http_2xx target=https://www.baidu.com level=info msg="Beginning probe" probe=http timeout_seconds=119.5
ts=2021-03-30T07:28:17.40563586Z caller=http.go:342 module=http_2xx target=https://www.baidu.com level=info msg="Resolving target address" ip_protocol=ip6
ts=2021-03-30T07:28:17.414113889Z caller=http.go:342 module=http_2xx target=https://www.baidu.com level=info msg="Resolved target address" ip=110.242.68.4
ts=2021-03-30T07:28:17.414249109Z caller=client.go:252 module=http_2xx target=https://www.baidu.com level=info msg="Making HTTP request" url=https://110.242.68.4 host=www.baidu.com
ts=2021-03-30T07:28:17.459576352Z caller=main.go:119 module=http_2xx target=https://www.baidu.com level=info msg="Received HTTP response" status_code=200
ts=2021-03-30T07:28:17.459696667Z caller=main.go:119 module=http_2xx target=https://www.baidu.com level=info msg="Response timings for roundtrip" roundtrip=0 start=2021-03-30T15:28:17.414370915+08:00 dnsDone=2021-03-30T15:28:17.414370915+08:00 connectDone=2021-03-30T15:28:17.423500145+08:00 gotConn=2021-03-30T15:28:17.449441723+08:00 responseStart=2021-03-30T15:28:17.459467652+08:00 end=2021-03-30T15:28:17.459684294+08:00
ts=2021-03-30T07:28:17.459886914Z caller=main.go:304 module=http_2xx target=https://www.baidu.com level=info msg="Probe succeeded" duration_seconds=0.054504338
Metrics that would have been returned:
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.008485086
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.054504338
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_http_content_length Length of http content response
# TYPE probe_http_content_length gauge
probe_http_content_length 227
# HELP probe_http_duration_seconds Duration of http request by phase, summed over all redirects
# TYPE probe_http_duration_seconds gauge
probe_http_duration_seconds{phase="connect"} 0.009129316
probe_http_duration_seconds{phase="processing"} 0.01002596
probe_http_duration_seconds{phase="resolve"} 0.008485086
probe_http_duration_seconds{phase="tls"} 0.035070878
probe_http_duration_seconds{phase="transfer"} 0.000216612
# HELP probe_http_redirects The number of redirects
# TYPE probe_http_redirects gauge
probe_http_redirects 0
# HELP probe_http_ssl Indicates if SSL was used for the final redirect
# TYPE probe_http_ssl gauge
probe_http_ssl 1
# HELP probe_http_status_code Response HTTP status code
# TYPE probe_http_status_code gauge
probe_http_status_code 200
# HELP probe_http_uncompressed_body_length Length of uncompressed response body
# TYPE probe_http_uncompressed_body_length gauge
probe_http_uncompressed_body_length 227
# HELP probe_http_version Returns the version of HTTP of the probe response
# TYPE probe_http_version gauge
probe_http_version 1.1
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 4.37589817e+08
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_ssl_earliest_cert_expiry Returns earliest SSL cert expiry in unixtime
# TYPE probe_ssl_earliest_cert_expiry gauge
probe_ssl_earliest_cert_expiry 1.627277462e+09
# HELP probe_ssl_last_chain_expiry_timestamp_seconds Returns last SSL chain expiry in timestamp seconds
# TYPE probe_ssl_last_chain_expiry_timestamp_seconds gauge
probe_ssl_last_chain_expiry_timestamp_seconds 1.627277462e+09
# HELP probe_ssl_last_chain_info Contains SSL leaf certificate information
# TYPE probe_ssl_last_chain_info gauge
probe_ssl_last_chain_info{fingerprint_sha256="2ed189349f818f3414132ebea309e36f620d78a0507a2fa523305f275062d73c"} 1
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# HELP probe_tls_version_info Contains the TLS version used
# TYPE probe_tls_version_info gauge
probe_tls_version_info{version="TLS 1.2"} 1
Module configuration:
prober: http
http:
ip_protocol_fallback: true
tcp:
ip_protocol_fallback: true
icmp:
ip_protocol_fallback: true
dns:
ip_protocol_fallback: true
http trace中对于http各个状态的描述
- dns解析时间: DNSDone-DNSStart
- tls握手时间: gotConn - DNSDone
- tls connect连接时间: connectDone - DNSDone
- 非tls connect连接时间: gotConn - DNSDone
- processing 服务端处理时间: responseStart - gotConn
- transfer 数据传输时间: end - responseStart
trace := &httptrace.ClientTrace{
DNSStart: tt.DNSStart,
DNSDone: tt.DNSDone,
ConnectStart: tt.ConnectStart,
ConnectDone: tt.ConnectDone,
GotConn: tt.GotConn,
GotFirstResponseByte: tt.GotFirstResponseByte,
}
blackbox_exporter 需要传入target 和 module 参数,采用下列方式加入的采集池中
- job_name: 'blackbox-http'
# metrics的path 注意不都是/metrics
metrics_path: /probe
# 传入的参数
params:
module: [http_2xx] # Look for a HTTP 200 response.
target: [prometheus.io,www.baidu.com,172.20.70.205:3000]
static_configs:
- targets:
- 172.20.70.205:9115
会发现如此配置之后 实例数据只有blackbox_exporter的地址 而没有target的地址
probe_duration_seconds{instance="172.20.70.205:9115", job="blackbox-http"}
请看015 多实例采集的说明
blackbox_exporter 采集加入的采集池中
scrape_configs:
- job_name: 'blackbox-http'
# metrics的path 注意不都是/metrics
metrics_path: /probe
# 传入的参数
params:
module: [http_2xx] # Look for a HTTP 200 response.
static_configs:
- targets:
- http://prometheus.io # Target to probe with http.
- https://www.baidu.com # Target to probe with https.
- http://172.20.70.205:3000 # Target to probe with http on port 3000.
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 127.0.0.1:9115 # The blackbox exporter's real hostname:port.
- job_name: 'blackbox-ssh'
# metrics的path 注意不都是/metrics
metrics_path: /probe
# 传入的参数
params:
module: [ssh_banner] # Look for a HTTP 200 response.
static_configs:
- targets:
- 172.20.70.205 # Target to probe with http.
- 172.20.70.215 # Target to probe with https.
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.20.70.205:9115 # The blackbox exporter's real hostname:port.
grafana 上导入 blackbox_exporter dashboard
ssh探测 基于tcp
- 页面访问探测
# 模块使用 ssh_banner 探测172.20.70.215:22
http://172.20.70.205:9115/probe?module=ssh_banner&target=172.20.70.215:22
# 结果解读
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 2.5331e-05
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.02228226
# HELP probe_failed_due_to_regex Indicates if probe failed due to regex
# TYPE probe_failed_due_to_regex gauge
probe_failed_due_to_regex 0
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 9.51584696e+08
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
# ssh_banner 模块解读
# 使用tcp进行探测,并且 期望得到 SSH-2.0-的响应
ssh_banner:
prober: tcp
tcp:
query_response:
- expect: "^SSH-2.0-"
# 和telnet结果一致
[root@prome_master_01 blackbox_exporter]# telnet 172.20.70.215 22
Trying 172.20.70.215...
Connected to 172.20.70.215.
Escape character is '^]'.
SSH-2.0-OpenSSH_7.4
Protocol mismatch.
Connection closed by foreign host.
- 配置
- job_name: 'blackbox-ssh'
# metrics的path 注意不都是/metrics
metrics_path: /probe
# 传入的参数
params:
module: [ssh_banner] # Look for a HTTP 200 response.
static_configs:
- targets:
- 172.20.70.205:22 # Target to probe with http.
- 172.20.70.215:22 # Target to probe with https.
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.20.70.205:9115 # The blackbox exporter's real hostname:port.
- blackbox-ping 配置
- job_name: 'blackbox-ping'
# metrics的path 注意不都是/metrics
metrics_path: /probe
# 传入的参数
params:
module: [icmp] # Look for a HTTP 200 response.
static_configs:
- targets:
- 192.168.26.112 # Target to probe with http.
- 192.168.26.112 # Target to probe with https.
- 114.114.114.114
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 192.168.26.112:9115 # The blackbox exporter's real hostname:port.
进行ping探测
- 页面访问
http://172.20.70.205:9115/probe?module=icmp&target=www.baidu.com
# 结果解读
# HELP probe_dns_lookup_time_seconds Returns the time taken for probe dns lookup in seconds
# TYPE probe_dns_lookup_time_seconds gauge
probe_dns_lookup_time_seconds 0.195704171
# HELP probe_duration_seconds Returns how long the probe took to complete in seconds
# TYPE probe_duration_seconds gauge
probe_duration_seconds 0.378563375
# HELP probe_icmp_duration_seconds Duration of icmp request by phase
# TYPE probe_icmp_duration_seconds gauge
probe_icmp_duration_seconds{phase="resolve"} 0.195704171
probe_icmp_duration_seconds{phase="rtt"} 0.182456226
probe_icmp_duration_seconds{phase="setup"} 0.000145827
# HELP probe_icmp_reply_hop_limit Replied packet hop limit (TTL for ipv4)
# TYPE probe_icmp_reply_hop_limit gauge
probe_icmp_reply_hop_limit 49
# HELP probe_ip_addr_hash Specifies the hash of IP address. It's useful to detect if the IP address changes.
# TYPE probe_ip_addr_hash gauge
probe_ip_addr_hash 2.282787449e+09
# HELP probe_ip_protocol Specifies whether probe ip protocol is IP4 or IP6
# TYPE probe_ip_protocol gauge
probe_ip_protocol 4
# HELP probe_success Displays whether or not the probe was a success
# TYPE probe_success gauge
probe_success 1
ssh探测过程说明
prometheus --> blackbox_exporter 使用配置 http://192.168.0.112:9115/probe?module=ssh_banner&target=192.168.0.127%3A22 --> 192.168.0.127:22
redis_exporter 采集多实例
项目地址
使用ansible部署 redis_exporter
ansible-playbook -i host_file service_deploy.yaml -e "tgz=redis_exporter-v1.20.0.linux-amd64.tar.gz" -e "app=redis_exporter"
redis_exporter 采集加入的采集池中,按照之前blackbox_exporter的模式
scrape_configs:
## config for the multiple Redis targets that the exporter will scrape
- job_name: 'redis_exporter_targets'
static_configs:
- targets:
- redis://172.20.70.205:6379
- redis://172.20.70.205:6479
metrics_path: /scrape
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: 172.20.70.215:9121
[root@prome-master01 prometheus]# systemctl start redis_6379 [root@prome-master01 prometheus]# systemctl start redis_6479
复制项目grafana json导入大盘图
效果图