Kubernetes目前大约每年发布三次,由于迭代速度快,目前市场上绝大部分的教程相对陈旧。更重要的是,从Kubernetes 1.20开始,Kubernetes官方宣布逐步弃用Docker作为容器运行时,并计划在Kubernetes 1.24版本中完全移除对Docker作为容器运行时的支持。这意味着,从Kubernetes 1.24版本开始,将不能使用Docker作为容器运行时来运行Kubernetes节点上的Pods。因此,市场上关于直接使用containerd容器运行时的新版Kubernetes教程几乎没有,更重要的是因为的Kubernetes涉及到镜像需要单独配置才能获取,这无疑拉高了初学者门槛。本教程采用互联网的形式进行发布,便于保持与Kubernetes最新版的同步,尽量自包含,便于读者学习、实践。
关键字:Kubernetes 1.32; containerd; nerdctl; debain 12
整体规划
为方便后续内容学习,本部分将基于【Kubernetes部署与运维05 节点添加、删除与重置】中的环境开展Kubernetes的Metrics Server部署学习。整体规划如下:
| 虚拟机名称 | IP地址 | 主机名 | 域名 | CPU核心 | 内存 | 角色 |
|---|---|---|---|---|---|---|
k8s_Master1_2G | 192.168.152.200 | master1 | master.rz | 2 | 2GB | master |
K8s_Worker1_2G | 192.168.152.201 | worker1 | worker1.rz | 1 | 2GB | worker |
K8s_Worker2_2G | 192.168.152.202 | worker2 | worker2.rz | 1 | 2GB | worker |
软件版本:
| 软件 | 版本 |
|---|---|
metrics server | 0.7.2 |
参见【Kubernetes部署与运维02 Nerdctl Rootful部署】,Kubernetes基础环境各组件与版本信息如下:
- nerdctl: v1.7.7
- containerd: v1.7.22
- runc: v1.1.14
- CNI plugins: v1.5.1
- BuildKit: v0.15.2
- Stargz Snapshotter: v0.15.1
- imgcrypt: v1.1.11
- RootlessKit: v2.3.1
- slirp4netns: v1.3.1
- bypass4netns: v0.4.1
- fuse-overlayfs: v1.13
- containerd-fuse-overlayfs: v1.0.8
- Kubo (IPFS): v0.29.0
- Tini: v0.19.0
- buildg: v0.4.1
Kubernetes版本号为1.32。
理论知识
Metrics Server是Kubernetes内置自动缩放管道的可扩展、高效的容器资源指标来源。
Metrics Server从Kubelets收集资源指标,并通过Metrics API在Kubernetes apiserver中公开它们,供Horizontal Pod Autoscaler和Vertical Pod Autoscaler使用。简单的说:Metrics Server是集群解析监控数据的聚合器,安装后用户可以通过标准的API(/apis/metrics.k8s.io)来访问监控数据,此处值得注意的是Metrics-Server并非kube-apiserver的一部分,而是通过Aggregator这种插件机制,在独立部署的情况下同kube-apiserver一起统一对外服务的,当进行API请求时kube-aggregator统一接口会分析访问API具体的类型,然后将其负载到具体的API上。
GET /apis/metrics.k8s.io/V1beta1
|
Kube-aggregator
| | |
Kube-apiserver Metrics-server another-add-onapiserver
Metrics Server的特点:
- 适用于大多数集群的单一部署;
- 快速自动缩放,每 15 秒收集一次指标;
- 资源效率,集群中每个节点使用
1m的CPU核心和2 MB的内存; - 可扩展支持多达
5,000个节点集群。
Metrics Server功能:(horizontalpodautoscalers.autoscaling水平扩展必备)
- 基于
CPU/内存的水平自动缩放; - 自动调整/建议容器所需的资源。
案例实践
前期准备
1)同时开启K8s_Master1_2G、K8s_Worker1_2G、K8s_Worker2_2G三台虚拟机。
2)在主节点master1上创建/root/software/metrics-server0.7.2文件夹,用于存储相关文件:
root@master1:~# mkdir /root/software/metrics-server0.7.2
root@master1:~# cd /root/software/metrics-server0.7.2/
root@master1:~/software/metrics-server0.7.2#
Metrics Server部署
【实践01-Metrics Server部署】
在集群中安装Metrics Server。
1)下载Metrics Server的YAML清单配置文件:
root@master1:~/software/metrics-server0.7.2# wget https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.7.2/components.yaml
--2025-01-30 13:14:38-- https://github.com/kubernetes-sigs/metrics-server/relea ses/download/v0.7.2/components.yaml
Resolving github.com (github.com)... 20.205.243.166
...
--2025-01-30 13:16:41-- (37.4 MB/s) - ‘components.yaml’ saved [4307/4307]
root@master1:~/software/metrics-server0.7.2# ll
total 8
-rw-r--r-- 1 root root 4307 Jan 30 13:16 components.yaml
root@master1:~/software/metrics-server0.7.2# cp components.yaml components.yaml.bak
root@master1:~/software/metrics-server0.7.2# ll
total 16
-rw-r--r-- 1 root root 4307 Jan 30 13:16 components.yaml
-rw-r--r-- 1 root root 4307 Jan 30 13:18 components.yaml.bak
若无法下载,请查看本教程最后的百度网盘下载链接,其中提供了本教程所涉及到的相关文件与资源。在配套资源中将
componets.yaml文件手动上传至/root/software/metrics-server0.7.2/文件夹内。
2)编辑components.yaml清单文件,修改image部分内容:
140 │ image: registry.aliyuncs.com/google_containers/metrics-server:v0.7.2
141 │ imagePullPolicy: IfNotPresent
image原值为registry.k8s.io/metrics-server/metrics-server:v0.7.2,需要将原值修改为大陆可用镜像源registry.aliyuncs.com/google_containers/metrics-server:v0.7.2。
3)使用kubectl apply命令部署Metrics Server,会出现关于Readinessprobe失败,HTTP状态码为500的错误信息。
root@master1:~/software/metrics-server0.7.2# kubectl apply -f components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
root@master1:~/software/metrics-server0.7.2# cd
root@master1:~# kubectl -n kube-system get pods
NAME READY STATUS
calico-kube-controllers-5745477d4d-x4dbj 1/1 Running
calico-node-d9sv5 1/1 Running
calico-node-ntx28 1/1 Running
calico-node-zpbc2 1/1 Running
coredns-6766b7b6bb-hqgsd 1/1 Running
coredns-6766b7b6bb-w8ztk 1/1 Running
etcd-master1 1/1 Running
kube-apiserver-master1 1/1 Running
kube-controller-manager-master1 1/1 Running
kube-proxy-b8n2k 1/1 Running
kube-proxy-h2v5l 1/1 Running
kube-proxy-jf5tm 1/1 Running
kube-scheduler-master1 1/1 Running
metrics-server-554dcb6944-kplgb 0/1 Running
root@master1:~# kubectl -n kube-system describe pod metrics-server-554dcb6944-kplgb
Name: metrics-server-554dcb6944-kplgb
Namespace: kube-system
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
...
Warning Unhealthy 2s (x14 over 2m9s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 500
metrics-server-554dcb6944-kplgb的READY列为0/1,错误原因:由于metrics-server未获得TLS Bootstrap签发证书导致访问各节点资源时报错。
针对以上错误,可以使用两种方式解决:
- 修改
components.yaml文件,对于spec.containers.-args参数,在- --metric-resolution=15s之后,增加一行- --kubelet-insecure-tls。明确容器启动时,可以使用非安全方式访问; - 启用
TLS BootStrap证书签发(推荐解决方式)。
4)分别在Master(master1)与Node(worker1与worker2)节点中启用TLS BootStrap 证书签发,在kubelet的yaml配置中追加入如下K/V(键/值)。修改所有节点(master1与worker1、worker2)的/var/lib/kubelet/config.yaml配置文件(在创建控制平台时生成),在文件末尾增加serverTLSBootstrap: true,如下所示:
...
47 │ syncFrequency: 0s
48 │ volumeStatsAggPeriod: 0s
49 │ serverTLSBootstrap: true
5)分别重启各个节点(包含master1、worker1与worker2)的kubelet服务:
root@master1:~# systemctl daemon-reload
root@master1:~# systemctl restart kubelet.service
root@master1:~# systemctl status kubelet.service
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; preset: enable
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Thu 2025-01-30 13:47:46 CST; 9s ago
Docs: https://kubernetes.io/docs/
Main PID: 54601 (kubelet)
Tasks: 12 (limit: 2264)
Memory: 41.2M
CPU: 600ms
CGroup: /system.slice/kubelet.service
└─54601 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/boot
6)在主节点master1上查看节点的证书签发请求CONDITION:
root@master1:~# kubectl get csr
NAME SIGNERNAME REQUESTOR CONDITION
csr-d5qp6 .../kubelet-serving system:node:worker1 Pending
...
csr-tgwhr .../kubelet-serving system:node:worker2 Pending
csr-w9f77 .../kubelet-serving system:node:master1 Pending
由于排版限制,略去部分内容。
7)手动允许节点证书签发请求(涉及到所有CONDITION状态为Panding的csr:
root@master1:~# kubectl certificate approve csr-d5qp6
certificatesigningrequest.certificates.k8s.io/csr-d5qp6 approved
root@master1:~# kubectl certificate approve csr-tgwhr
certificatesigningrequest.certificates.k8s.io/csr-tgwhr approved
root@master1:~# kubectl certificate approve csr-w9f77
certificatesigningrequest.certificates.k8s.io/csr-w9f77 approved
8)再次在主节点master1上查看节点的证书签发请求CONDITION:
root@master1:~# kubectl get csr
NAME SIGNERNAME REQUESTOR CONDITION
csr-d5qp6 .../kubelet-serving system:node:worker1 Approved,Issued
csr-tgwhr .../kubelet-serving system:node:worker2 Approved,Issued
csr-w9f77 .../kubelet-serving system:node:master1 Approved,Issued
9)再次在主节点master1上使用kubectl apply命令正式部署Metrics Server:
root@master1:~# kubectl apply -f /root/software/metrics-server0.7.2/components.yaml
serviceaccount/metrics-server unchanged
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader unchanged
clusterrole.rbac.authorization.k8s.io/system:metrics-server unchanged
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader unchanged
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator unchanged
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server unchanged
service/metrics-server unchanged
deployment.apps/metrics-server configured
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io unchanged
10)查看Kubernetes核心组件状态,以及资源占用情况:
root@master1:~# kubectl -n kube-system get pods
NAME READY STATUS
calico-kube-controllers-5745477d4d-x4dbj 1/1 Running
calico-node-d9sv5 1/1 Running
calico-node-ntx28 1/1 Running
calico-node-zpbc2 1/1 Running
coredns-6766b7b6bb-hqgsd 1/1 Running
coredns-6766b7b6bb-w8ztk 1/1 Running
etcd-master1 1/1 Running
kube-apiserver-master1 1/1 Running
kube-controller-manager-master1 1/1 Running
kube-proxy-b8n2k 1/1 Running
kube-proxy-h2v5l 1/1 Running
kube-proxy-jf5tm 1/1 Running
kube-scheduler-master1 1/1 Running
metrics-server-554dcb6944-kplgb 1/1 Running
11)现在,可以使用Metrics Server查看资源占用情况:
root@master1:~# kubectl -n kube-system top pods
NAME CPU(cores) MEMORY(bytes)
calico-kube-controllers-5745477d4d-x4dbj 5m 27Mi
calico-node-d9sv5 19m 187Mi
calico-node-ntx28 21m 188Mi
calico-node-zpbc2 23m 187Mi
coredns-6766b7b6bb-hqgsd 2m 21Mi
coredns-6766b7b6bb-w8ztk 1m 22Mi
etcd-master1 16m 71Mi
kube-apiserver-master1 29m 279Mi
kube-controller-manager-master1 14m 81Mi
kube-proxy-b8n2k 1m 21Mi
kube-proxy-h2v5l 1m 32Mi
kube-proxy-jf5tm 1m 21Mi
kube-scheduler-master1 6m 39Mi
metrics-server-554dcb6944-kplgb 2m 18Mi
root@master1:~# kubectl top nodes
NAME CPU(cores) CPU(%) MEMORY(bytes) MEMORY(%)
master1 136m 6% 1202Mi 65%
worker1 36m 1% 766Mi 41%
worker2 35m 1% 732Mi 39%
收尾工作
1)使用poweroff命令,关闭三台虚拟机。
2)在VMware Workstation Pro中对三台虚拟机均拍摄快照,快照名为Metrics Server0.7.2部署完成。
3)AI时代背景之下,运维将从传统CPU服务器切入到GPU服务器与端边设备,对于运维开发人员,技术玩家而言,也同步需要跟上新的技术栈。本学习内容涉及到的软件包、配置文件等资源,可以直接从百度网盘下载获取:
- 百度网盘分享文件:
Kubernetes1.32 - 链接:
https://pan.baidu.com/s/18XeGQ28BDPjHh8JKj0uZFQ?pwd=6x17提取码:6x17