Kubernetes部署与运维06 Metrics Server部署

208 阅读8分钟

Kubernetes目前大约每年发布三次,由于迭代速度快,目前市场上绝大部分的教程相对陈旧。更重要的是,从Kubernetes 1.20开始,Kubernetes官方宣布逐步弃用Docker作为容器运行时,并计划在Kubernetes 1.24版本中完全移除对Docker作为容器运行时的支持。这意味着,从Kubernetes 1.24版本开始,将不能使用Docker作为容器运行时来运行Kubernetes节点上的Pods。因此,市场上关于直接使用containerd容器运行时的新版Kubernetes教程几乎没有,更重要的是因为的Kubernetes涉及到镜像需要单独配置才能获取,这无疑拉高了初学者门槛。本教程采用互联网的形式进行发布,便于保持与Kubernetes最新版的同步,尽量自包含,便于读者学习、实践。

关键字Kubernetes 1.32; containerd; nerdctl; debain 12

Kubernetes部署与运维v4.知乎.png

整体规划

为方便后续内容学习,本部分将基于【Kubernetes部署与运维05 节点添加、删除与重置】中的环境开展KubernetesMetrics Server部署学习。整体规划如下:

虚拟机名称IP地址主机名域名CPU核心内存角色
k8s_Master1_2G192.168.152.200master1master.rz22GBmaster
K8s_Worker1_2G192.168.152.201worker1worker1.rz12GBworker
K8s_Worker2_2G192.168.152.202worker2worker2.rz12GBworker

软件版本:

软件版本
metrics server0.7.2

参见【Kubernetes部署与运维02 Nerdctl Rootful部署】,Kubernetes基础环境各组件与版本信息如下:

- nerdctl: v1.7.7
- containerd: v1.7.22
- runc: v1.1.14
- CNI plugins: v1.5.1
- BuildKit: v0.15.2
- Stargz Snapshotter: v0.15.1
- imgcrypt: v1.1.11
- RootlessKit: v2.3.1
- slirp4netns: v1.3.1
- bypass4netns: v0.4.1
- fuse-overlayfs: v1.13
- containerd-fuse-overlayfs: v1.0.8
- Kubo (IPFS): v0.29.0
- Tini: v0.19.0
- buildg: v0.4.1

Kubernetes版本号为1.32

理论知识

Metrics ServerKubernetes内置自动缩放管道的可扩展、高效的容器资源指标来源。

Metrics ServerKubelets收集资源指标,并通过Metrics APIKubernetes apiserver中公开它们,供Horizontal Pod AutoscalerVertical Pod Autoscaler使用。简单的说:Metrics Server是集群解析监控数据的聚合器,安装后用户可以通过标准的API/apis/metrics.k8s.io)来访问监控数据,此处值得注意的是Metrics-Server并非kube-apiserver的一部分,而是通过Aggregator这种插件机制,在独立部署的情况下同kube-apiserver一起统一对外服务的,当进行API请求时kube-aggregator统一接口会分析访问API具体的类型,然后将其负载到具体的API上。

GET	/apis/metrics.k8s.io/V1beta1  
					|  
				Kube-aggregator  
		|			|				|  
Kube-apiserver	Metrics-server	another-add-onapiserver

Metrics Server的特点:

  • 适用于大多数集群的单一部署;
  • 快速自动缩放,每 15 秒收集一次指标;
  • 资源效率,集群中每个节点使用1mCPU核心和2 MB的内存;
  • 可扩展支持多达5,000个节点集群。

Metrics Server功能:(horizontalpodautoscalers.autoscaling水平扩展必备)

  • 基于CPU/内存的水平自动缩放;
  • 自动调整/建议容器所需的资源。

案例实践

前期准备

1)同时开启K8s_Master1_2GK8s_Worker1_2GK8s_Worker2_2G三台虚拟机。

2)在主节点master1上创建/root/software/metrics-server0.7.2文件夹,用于存储相关文件:

root@master1:~# mkdir /root/software/metrics-server0.7.2

root@master1:~# cd /root/software/metrics-server0.7.2/

root@master1:~/software/metrics-server0.7.2#

Metrics Server部署

【实践01-Metrics Server部署】 在集群中安装Metrics Server

1)下载Metrics ServerYAML清单配置文件:

root@master1:~/software/metrics-server0.7.2# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.2/components.yaml
--2025-01-30 13:14:38--  https://github.com/kubernetes-sigs/metrics-server/relea ses/download/v0.7.2/components.yaml
Resolving github.com (github.com)... 20.205.243.166
...
--2025-01-30 13:16:41-- (37.4 MB/s) - ‘components.yaml’ saved [4307/4307]

root@master1:~/software/metrics-server0.7.2# ll
total 8
-rw-r--r-- 1 root root 4307 Jan 30 13:16 components.yaml

root@master1:~/software/metrics-server0.7.2# cp components.yaml components.yaml.bak

root@master1:~/software/metrics-server0.7.2# ll
total 16
-rw-r--r-- 1 root root 4307 Jan 30 13:16 components.yaml
-rw-r--r-- 1 root root 4307 Jan 30 13:18 components.yaml.bak

若无法下载,请查看本教程最后的百度网盘下载链接,其中提供了本教程所涉及到的相关文件与资源。在配套资源中将componets.yaml文件手动上传至/root/software/metrics-server0.7.2/文件夹内。

2)编辑components.yaml清单文件,修改image部分内容:

 140            image: registry.aliyuncs.com/google_containers/metrics-server:v0.7.2
 141            imagePullPolicy: IfNotPresent

image原值为registry.k8s.io/metrics-server/metrics-server:v0.7.2,需要将原值修改为大陆可用镜像源registry.aliyuncs.com/google_containers/metrics-server:v0.7.2

3)使用kubectl apply命令部署Metrics Server,会出现关于Readinessprobe失败,HTTP状态码为500的错误信息。

root@master1:~/software/metrics-server0.7.2# kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created

root@master1:~/software/metrics-server0.7.2# cd

root@master1:~# kubectl -n kube-system get pods
NAME                                       READY   STATUS 
calico-kube-controllers-5745477d4d-x4dbj   1/1     Running
calico-node-d9sv5                          1/1     Running
calico-node-ntx28                          1/1     Running
calico-node-zpbc2                          1/1     Running
coredns-6766b7b6bb-hqgsd                   1/1     Running
coredns-6766b7b6bb-w8ztk                   1/1     Running
etcd-master1                               1/1     Running
kube-apiserver-master1                     1/1     Running
kube-controller-manager-master1            1/1     Running
kube-proxy-b8n2k                           1/1     Running
kube-proxy-h2v5l                           1/1     Running
kube-proxy-jf5tm                           1/1     Running
kube-scheduler-master1                     1/1     Running
metrics-server-554dcb6944-kplgb            0/1     Running

root@master1:~# kubectl -n kube-system describe pod metrics-server-554dcb6944-kplgb
Name:                 metrics-server-554dcb6944-kplgb
Namespace:            kube-system
...
Events:
  Type     Reason     Age                 From      Message
  ----     ------     ----                ----      -------
...
  Warning  Unhealthy  2s (x14 over 2m9s)  kubelet   Readiness probe failed: HTTP probe failed with statuscode: 500

metrics-server-554dcb6944-kplgbREADY列为0/1,错误原因:由于metrics-server未获得TLS Bootstrap签发证书导致访问各节点资源时报错。

针对以上错误,可以使用两种方式解决:

  • 修改components.yaml文件,对于spec.containers.-args参数,在- --metric-resolution=15s之后,增加一行- --kubelet-insecure-tls。明确容器启动时,可以使用非安全方式访问;
  • 启用TLS BootStrap证书签发(推荐解决方式)。

4)分别在Mastermaster1)与Nodeworker1worker2)节点中启用TLS BootStrap 证书签发,在kubeletyaml配置中追加入如下K/V(键/值)。修改所有节点(master1worker1worker2)的/var/lib/kubelet/config.yaml配置文件(在创建控制平台时生成),在文件末尾增加serverTLSBootstrap: true,如下所示:

...
  47   │ syncFrequency: 0s
  48   │ volumeStatsAggPeriod: 0s
  49   │ serverTLSBootstrap: true

5)分别重启各个节点(包含master1worker1worker2)的kubelet服务:

root@master1:~# systemctl daemon-reload

root@master1:~# systemctl restart kubelet.service

root@master1:~# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
     Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; preset: enable
    Drop-In: /usr/lib/systemd/system/kubelet.service.d
             └─10-kubeadm.conf
     Active: active (running) since Thu 2025-01-30 13:47:46 CST; 9s ago
       Docs: https://kubernetes.io/docs/
   Main PID: 54601 (kubelet)
      Tasks: 12 (limit: 2264)
     Memory: 41.2M
        CPU: 600ms
     CGroup: /system.slice/kubelet.service
             └─54601 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/boot

6)在主节点master1上查看节点的证书签发请求CONDITION

root@master1:~# kubectl get csr
NAME        SIGNERNAME            REQUESTOR             CONDITION
csr-d5qp6   .../kubelet-serving   system:node:worker1   Pending
...
csr-tgwhr   .../kubelet-serving   system:node:worker2   Pending
csr-w9f77   .../kubelet-serving   system:node:master1   Pending

由于排版限制,略去部分内容。

7)手动允许节点证书签发请求(涉及到所有CONDITION状态为Pandingcsr

root@master1:~# kubectl certificate approve csr-d5qp6
certificatesigningrequest.certificates.k8s.io/csr-d5qp6 approved

root@master1:~# kubectl certificate approve csr-tgwhr
certificatesigningrequest.certificates.k8s.io/csr-tgwhr approved

root@master1:~# kubectl certificate approve csr-w9f77
certificatesigningrequest.certificates.k8s.io/csr-w9f77 approved

8)再次在主节点master1上查看节点的证书签发请求CONDITION

root@master1:~# kubectl get csr
NAME        SIGNERNAME            REQUESTOR             CONDITION
csr-d5qp6   .../kubelet-serving   system:node:worker1   Approved,Issued
csr-tgwhr   .../kubelet-serving   system:node:worker2   Approved,Issued
csr-w9f77   .../kubelet-serving   system:node:master1   Approved,Issued

9)再次在主节点master1上使用kubectl apply命令正式部署Metrics Server

root@master1:~# kubectl apply -f /root/software/metrics-server0.7.2/components.yaml
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged

10)查看Kubernetes核心组件状态,以及资源占用情况:

root@master1:~# kubectl -n kube-system get pods
NAME                                       READY   STATUS 
calico-kube-controllers-5745477d4d-x4dbj   1/1     Running
calico-node-d9sv5                          1/1     Running
calico-node-ntx28                          1/1     Running
calico-node-zpbc2                          1/1     Running
coredns-6766b7b6bb-hqgsd                   1/1     Running
coredns-6766b7b6bb-w8ztk                   1/1     Running
etcd-master1                               1/1     Running
kube-apiserver-master1                     1/1     Running
kube-controller-manager-master1            1/1     Running
kube-proxy-b8n2k                           1/1     Running
kube-proxy-h2v5l                           1/1     Running
kube-proxy-jf5tm                           1/1     Running
kube-scheduler-master1                     1/1     Running
metrics-server-554dcb6944-kplgb            1/1     Running

11)现在,可以使用Metrics Server查看资源占用情况:

root@master1:~# kubectl -n kube-system top pods
NAME                                       CPU(cores)   MEMORY(bytes)
calico-kube-controllers-5745477d4d-x4dbj   5m           27Mi
calico-node-d9sv5                          19m          187Mi
calico-node-ntx28                          21m          188Mi
calico-node-zpbc2                          23m          187Mi
coredns-6766b7b6bb-hqgsd                   2m           21Mi
coredns-6766b7b6bb-w8ztk                   1m           22Mi
etcd-master1                               16m          71Mi
kube-apiserver-master1                     29m          279Mi
kube-controller-manager-master1            14m          81Mi
kube-proxy-b8n2k                           1m           21Mi
kube-proxy-h2v5l                           1m           32Mi
kube-proxy-jf5tm                           1m           21Mi
kube-scheduler-master1                     6m           39Mi
metrics-server-554dcb6944-kplgb            2m           18Mi

root@master1:~# kubectl top nodes
NAME      CPU(cores)   CPU(%)   MEMORY(bytes)   MEMORY(%)
master1   136m         6%       1202Mi          65%
worker1   36m          1%       766Mi           41%
worker2   35m          1%       732Mi           39%

收尾工作

1)使用poweroff命令,关闭三台虚拟机。

2)在VMware Workstation Pro中对三台虚拟机均拍摄快照,快照名为Metrics Server0.7.2部署完成

3)AI时代背景之下,运维将从传统CPU服务器切入到GPU服务器与端边设备,对于运维开发人员,技术玩家而言,也同步需要跟上新的技术栈。本学习内容涉及到的软件包、配置文件等资源,可以直接从百度网盘下载获取:

  • 百度网盘分享文件:Kubernetes1.32
  • 链接:https://pan.baidu.com/s/18XeGQ28BDPjHh8JKj0uZFQ?pwd=6x17 提取码:6x17