04-随便玩玩先-Grafana的搭建和学习Grafana的学习，从搭建开始，到监控表的构建，告警的触发等等。其中也包含

前言

前几天，一个急性阑尾炎，直接把我整进医院了。

手术过后，终于出院，开始继续乱学之旅。

自我学习的要求

学会安装Grafana
学会新增、编辑一个监控表格
学会监控一些，比较基础的机器数据

下载和安装

官方版本

grafana.com/get/

云监控暂时用不着（还得注册），后面那个OSS（Open Source Software）才是本次正题。

事前没了解那么多，直接下载第一个吧。

详细页面也给出，Linux部分不同的发布系统，如何安装Grafana，按照流程来就行。

镜像版本

有想过，如果没有过桥梯，咋办？

抱着“国内肯定有镜像”的想法，去搜索了下镜像网站。

清华大学，和阿里有，其余的没搜索过了。

mirrors.tuna.tsinghua.edu.cn/help/grafan…

developer.aliyun.com/mirror/?ser…

流程都写的很详细，照做就行。

`CentOS 7`和阿里镜像的报错

有点小问题，CentOS 7尝试用yum直接安装的时候，报了一个416的错。

不大清楚底层是怎么带range请求的，我尝试先下载，后安装。

wget 'https://mirrors.aliyun.com/grafana/yum/rpm/Packages/grafana-9.3.6-1.x86_64.rpm'
sudo yum localinstall grafana-9.3.6-1.x86_64.rpm

什么？完成了？那就不管了，安装好了，就直接开始后面的学习了。

运行看看-失败排错记录

（什么，服务启动，服务开机自启之类的，可以先执行一次。为了方便，也可以在host里面，把Linux虚拟机的IP，给host成grafana的域名。）

不太行，连不上去。

可以尝试telnet看看情况（要是Windows说没有这个命令，或许你可以在控制面板那边，开启它。）

不大行，继续看看。

事后记录——`iptables`的拦截

后面是通过，添加了一条accept记录，并且将这个记录放在规则链的第一位，才使这个访问成功。

sudo iptables -I INPUT 1 -p tcp --dport 3000 -j ACCEPT

在iptables的规则链中，3000端口会被识别为hbci。

[watcher@functiona ~]$ sudo iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:hbci
2    ACCEPT     udp  --  anywhere             anywhere             udp dpt:domain
3    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:domain
4    ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootps
5    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:bootps
6    ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
7    ACCEPT     all  --  anywhere             anywhere            
8    INPUT_direct  all  --  anywhere             anywhere            
9    INPUT_ZONES_SOURCE  all  --  anywhere             anywhere            
10   INPUT_ZONES  all  --  anywhere             anywhere            
11   DROP       all  --  anywhere             anywhere             ctstate INVALID
12   REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited

后面搜索了下资料，发现这个端口已经被指定了。

类似，80就是http，443就是https这种，大家默认的端口一样。

unix.stackexchange.com/questions/3…

事后记录2——Linux重启之后，iptables的策略重置了

一般来说，Linux服务器不会重启，它作为服务器，需要24小时保持业务服务的。

但是，总有需要维护，需要重启的时候吧。

重启之后（我是VMware关机，再开机），我发现我先前的，放行的规则没了。

[watcher@functiona ~]$ sudo iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    ACCEPT     udp  --  anywhere             anywhere             udp dpt:domain
2    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:domain
3    ACCEPT     udp  --  anywhere             anywhere             udp dpt:bootps
4    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:bootps
5    ACCEPT     all  --  anywhere             anywhere             ctstate RELATED,ESTABLISHED
6    ACCEPT     all  --  anywhere             anywhere            
7    INPUT_direct  all  --  anywhere             anywhere            
8    INPUT_ZONES_SOURCE  all  --  anywhere             anywhere            
9    INPUT_ZONES  all  --  anywhere             anywhere            
10   DROP       all  --  anywhere             anywhere             ctstate INVALID
11   REJECT     all  --  anywhere             anywhere             reject-with icmp-host-prohibited

决定想一个一劳永逸的法子。

纯`iptables`，添加规则——重启后规则依旧重置

搜索后发现，CentOS 7的默认防火墙是firewalld，虽然底层调用的命令，依旧是iptables。

按照豆包给的建议，我决定试试，纯iptables。

# 安装iptables
sudo yum install iptables-services

# 关闭默认的防火墙
sudo systemctl stop firewalld
sudo systemctl disable firewalld

# 启动并设置iptables服务开机自启
sudo systemctl start iptables
sudo systemctl enable iptables

添加了一个3000端口放行的规则，测试访问也是正常的。

[watcher@functiona ~]$  sudo iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
[watcher@functiona ~]$  sudo iptables -I INPUT 1 -p tcp --dport 3000 -j ACCEPT
[watcher@functiona ~]$  sudo iptables -L INPUT --line-numbers
Chain INPUT (policy ACCEPT)
num  target     prot opt source               destination         
1    ACCEPT     tcp  --  anywhere             anywhere             tcp dpt:hbci

重启之后再来访问看看。

结果是，不大行。

规则添加并保存——重启后自定义规则还生效

我在想，是不是需要变更配置文件，让我的自定义规则，主动保存到规则配置文件中去。然后Linux重启、或者服务重启时，组件读取对应的配置文件，就能自动读取到自定义规则。

# 添加一个新规则，针对3000端口的请求，放行
sudo iptables -I INPUT 1 -p tcp --dport 3000 -j ACCEPT
# 将当前的规则，保存到iptables的配置文件中（如果权限不够，可能要先sudo -i，获取root权限）
iptables-save > /etc/sysconfig/iptables

这一次，重启之后，规则还在，挺不错的。

成功后的记录

一开始登录的话，账号和密码都是admin。

登录成功后，要求你改密码。

~~作为一个练手项目，我就不改了。~~

语言的修改

其实可以修改语言成中文。

但是，毕竟是学习技术，英文啥的终归是要接触的，不会自己去搜词典就是了。

决定还是使用英文界面。

准备1——数据库的新建

曾想过，可以直接一键创建新的dashboard，后面，在创建新面板的提示界面，发现没有数据来源。

这才知道，自己连数据库都没建。

整个数据库吧。

各种类型的数据库都很多，决定先搭建一个，就第一个吧，Prometheus，说是时序数据库（Time-series Database）。

PS：后面发现，这里面有一套小连招——k8s + Docker + Prometheus + Grafana。

涉及到的新东西有点多，决定先抛弃前两者，先完成后两者的学习和实践，再来学习前两个组件/程序吧。

下载安装一条龙

prometheus.io/download/

# 下载最新的暗转包
wget https://github.com/prometheus/prometheus/releases/download/v3.3.0/prometheus-3.3.0.linux-amd64.tar.gz

# 将其解压到opt目录下
sudo tar -xvf ~/prometheus-3.3.0.linux-amd64.tar.gz -C /opt

# 跟MySQL类似，创建一个专门的用户和组，来确保Prometheus的运行没有权限的困扰
sudo groupadd --system prometheus
sudo useradd -s /sbin/nologin --system -g prometheus prometheus

# 移动二进制文件到 /usr/local/bin
sudo mv /opt/prometheus-3.3.0.linux-amd64/prometheus /usr/local/bin/
sudo mv /opt/prometheus-3.3.0.linux-amd64/promtool /usr/local/bin/

# 创建配置文件和数据存储目录
sudo mkdir -p /etc/prometheus
sudo mkdir -p /var/lib/prometheus

# 移动配置文件和相关库到指定目录
sudo mv /opt/prometheus-3.3.0.linux-amd64/prometheus.yml /etc/prometheus

# 设置目录和文件的权限
sudo chown -R prometheus:prometheus /usr/local/bin/prometheus
sudo chown -R prometheus:prometheus /usr/local/bin/promtool
sudo chown -R prometheus:prometheus /etc/prometheus
sudo chown -R prometheus:prometheus /var/lib/prometheus

后面打算编辑配置文件的，发现有默认内容，决定暂时不管了。

# 配置prometheus服务，实现开机启动
touch /usr/lib/systemd/system/prometheus.service
# 给权限
sudo chown prometheus:prometheus /usr/lib/systemd/system/prometheus.service

service文件里面的信息，我是按照下面这么写的，都是AI给的。

[Unit]
# 服务的描述信息，方便用户识别该服务的用途
Description=Prometheus
# 表明该服务依赖网络在线，在网络就绪后再启动
Wants=network-online.target
# 确保该服务在网络服务启动之后再启动
After=network-online.target

[Service]
# 指定运行该服务的用户为 prometheus
User=prometheus
# 指定运行该服务的用户组为 prometheus
Group=prometheus
# 服务类型为简单类型，即直接执行 ExecStart 中的命令
Type=simple
# 启动服务时执行的命令，指定 Prometheus 的配置文件路径和数据存储路径
ExecStart=/usr/local/bin/prometheus \
    --config.file /etc/prometheus/prometheus.yml \
    --storage.tsdb.path /var/lib/prometheus/

[Install]
# 表示该服务在多用户模式下被激活，即开机自动启动
WantedBy=multi-user.target

编写完毕后，enable，start，status看看情况，一般来说就是启动了。

然后，自己整个host，或者记住Linux的IP，直接访问。

http://prometheustest:9090/query

能进，能展示，就是成功。

准备2——采集工具的新建

新建Prometheus之后，发现这是一个数据库，它本身不能完成信息采集的工作。

问题不大，再问问看，需要安装什么。

`Node Exporter`的一条龙

Node Exporter是社区提供的一个，监控NGINX内核指标的一个采集工具。

wget https://github.com/prometheus/node_exporter/releases/download/v1.9.1/node_exporter-1.9.1.linux-amd64.tar.gz

tar -zxvf node_exporter-1.9.1.linux-amd64.tar.gz

sudo mv ./node_exporter-1.9.1.linux-amd64/node_exporter /usr/local/bin/

sudo chown prometheus:prometheus /usr/local/bin/node_exporter

然后就是写自启文件，enable，检测看看工具是否开始采集。

[watcher@functiona bin]$ sudo vim /etc/systemd/system/node_exporter.service
[Unit]
Description=Node Exporter
Wants=network-online.target
After=network-online.target

[Service]
User=prometheus
Group=prometheus
Type=simple
ExecStart=/usr/local/bin/node_exporter

[Install]
WantedBy=multi-user.target


[watcher@functiona bin]$ sudo systemctl daemon-reload
[watcher@functiona bin]$ sudo systemctl start node_exporter
[watcher@functiona bin]$ sudo systemctl enable node_exporter
Created symlink from /etc/systemd/system/multi-user.target.wants/node_exporter.service to /etc/systemd/system/node_exporter.service.

有数据显示，那就意味着Node Exporter运行成功。

然后就是，把Node Exporter的工具，导入到配置文件中，方便数据库抓取数据。

sudo vim /etc/prometheus/prometheus.yml

# 加入下面一小段
  - job_name: "node_exporter"
    static_configs:
      - targets: ["localhost:9100"]

写完后，restart一下，再去看抓取任务，就发现有Node Exporter了。

串联，体现在`Grafana`上

然后按照豆包提供的，一些基础的查询语句，可以查询CPU或者内存的占用信息。

Prometheus应该是通了。

现在直接在Grafana上面新建数据库，一切默认。

填写默认的URL后，成功了。

新建一个`dashboard`

在新建的dashboard设定中，有很多新东西。

为了便于我创建图表，我决定使用代码模式。

诸如，Legend栏可以变更标签名字之类的，那都是可以私下摸索的。

04-随便玩玩先-Grafana的搭建和学习

前言