基于 CentOS 7.9(2009) 的 Kubernetes Worker 节点安装

160 阅读7分钟

安装系统

下载镜像:CentOS-7-x86_64-Minimal-2009.iso

mirrors.tuna.tsinghua.edu.cn/centos/7.9.…

在 VMware 中执行标准的安装,资源的话,仅采用 1C+1GB+20GB 的配置。

初始配置

主机名配置

要把不同的 worker 改成不同的主机名,修改 /etc/hostname 之后重启一下即可,我开了三台,分别叫做 k8s-wk0k8s-wk1k8s-wk2

注意也务必将 /etc/hostsk8s-wk# 指定到本机地址,否则 kubeadm join 的时候会报类似这个错误:

[WARNING Hostname]: hostname "k8s-wk0" could not be reached
[WARNING Hostname]: hostname "k8s-wk0": lookup k8s-wk0 on 192.168.154.2:53: server misbehaving

IP 配置

安装之后,发现 ip address 默认拿不到 ipv4 的地址,要手动配置一下:/etc/sysconfig/network-scripts/ifcfg-ens33

原版长这样:

TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=a045e402-1757-4e9a-8c34-e7978e3a5230
DEVICE=ens33
ONBOOT=no

把最后一行改成 ONBOOT=yes,重启网络服务 systemctl restart network,就可以拿到 ipv4 了,然后 yum update 成功,网络配置好了。

以上这种是基于 DHCP 动态获取 IP 的,后面可能 IP 漂变会产生一定麻烦,因此推荐把下面几个也改了,把 IP 固定下来,具体的值可以参照 DHCP 获取一次之后实际分配的值:

BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.154.130
NETMASK=255.255.255.0
GATEWAY=192.168.154.2
DNS1=192.168.154.2

然后通过开发环境 ssh 进去,做个免密码登录之类的,后面就不用直接进这台机的控制台了。

Kubernetes 相关的防火墙设置

cat <<EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF

cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward=1 # better than modify /etc/sysctl.conf
EOF

sysctl --system

关闭防火墙:

systemctl stop firewalld
systemctl disable firewalld

关闭 swap

进入 /etc/fstab 注释掉 swap 行:

# /swapfile  none  swap  sw  0  0

然后立即关掉 swap

swapoff -a

成功关掉之后,在 free -m 中查看内存,应该不显示交换内存,即为成功。

关闭 SELinux

setenforce 0

软件预装

yum install -y open-vm-tools

Containerd 安装

yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum -y install containerd.io

默认的 containerd 配置:

mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml

修改 /etc/containerd/config.toml

registry.k8s.io/pause:3.6
# 改为 >>>
registry.aliyuncs.com/google_containers/pause:3.9

SystemdCgroup = false
# 改为 >>>
SystemdCgroup = true

自启动 containerd 服务

systemctl enable containerd
systemctl restart containerd

Kubernetes 安装

配置源:

cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF

安装:

yum update -y
# yum list kubeadm kubectl kubelet --showduplicates | sort -r  # 查看所有可用版本
yum install -y kubelet-1.28.2 kubeadm-1.28.2 kubectl-1.28.2

# 锁定版本
yum install -y yum-plugin-versionlock
yum versionlock kubeadm-1.28.2 kubectl-1.28.2 kubelet-1.28.2
yum versionlock ls  # 查看锁定的版本

把默认 CRI 设置为 containerd:

cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 0
debug: false
EOF

设置 kubelet 自启动:

systemctl enable kubelet.service

加入集群

找到 master 创建的时候 log 出来的那句命令,运行一下:

kubeadm join 192.168.154.128:6443 --token xxxxxx.xxxxxx \
	--discovery-token-ca-cert-hash sha256:xxxxxxxx 

如果找不到了,在 master 节点运行这个可以重新获取:

kubeadm token create --print-join-command

然后从 master 节点查看节点 kubectl get no,看到节点加进去了:

k8s-wk0     NotReady   <none>          5m32s   v1.28.2

这里踩了坑,一直 NotReady,然后进入 k8s-wk0,查看 kubelet 状态 systemctl status kubelet -l,发现报这个:

● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since 五 2024-01-05 19:14:39 CST; 30s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 1395 (kubelet)
    Tasks: 8
   Memory: 34.2M
   CGroup: /system.slice/kubelet.service
           └─1395 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9

1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218284    1395 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="calico-system/calico-node-x7dtq"
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218298    1395 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="calico-system/calico-node-x7dtq"
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218325    1395 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"calico-node-x7dtq_calico-system(7ccc22e2-6f02-4036-8b9a-76185cc20b73)\" with CreatePodSandboxError: \"Failed to generate sandbox config for pod \\\"calico-node-x7dtq_calico-system(7ccc22e2-6f02-4036-8b9a-76185cc20b73)\\\": open /run/systemd/resolve/resolv.conf: no such file or directory\"" pod="calico-system/calico-node-x7dtq" podUID="7ccc22e2-6f02-4036-8b9a-76185cc20b73"
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218421    1395 pod_workers.go:1300] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-b228k" podUID="658dff40-bb63-491b-8209-697620f4268d"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.217414    1395 pod_workers.go:1300] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-b228k" podUID="658dff40-bb63-491b-8209-697620f4268d"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218022    1395 dns.go:284] "Could not open resolv conf file." err="open /run/systemd/resolve/resolv.conf: no such file or directory"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218036    1395 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-proxy-jlvd8"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218045    1395 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-proxy-jlvd8"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218063    1395 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-proxy-jlvd8_kube-system(fbda2935-da6a-48bc-b469-d46687c6cbc8)\" with CreatePodSandboxError: \"Failed to generate sandbox config for pod \\\"kube-proxy-jlvd8_kube-system(fbda2935-da6a-48bc-b469-d46687c6cbc8)\\\": open /run/systemd/resolve/resolv.conf: no such file or directory\"" pod="kube-system/kube-proxy-jlvd8" podUID="fbda2935-da6a-48bc-b469-d46687c6cbc8"
1月 05 19:15:10 k8s-wk0 kubelet[1395]: E0105 19:15:10.217450    1395 pod_workers.go:1300] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-b228k" podUID="658dff40-bb63-491b-8209-697620f4268d"

这个坑的关键问题在于 "open /run/systemd/resolve/resolv.conf: no such file or directory",然后导致后续的 calico 在 worker 节点上装不上。

为何这样不清楚,上网找了一圈,再问了 ChatGPT,发现是因为:

CentOS 7 默认使用 NetworkManager 来管理网络连接,而不是直接使用 systemd-resolved。因此,你在 /run/systemd/resolve/resolv.conf 文件中找不到 resolv.conf 的原因是缺少 systemd-resolved 服务。

而事实上默认的 NetworkManager 是直接生成管理 /etc/resolv.conf 文件的,而 systemd-resolved 则是通过 /run/systemd/resolve/resolv.conf 指定配置,再软链接到 /etc/resolv.conf 中,Kuberlet 会默认找 /run/systemd/resolve/resolv.conf,导致找不到:

之前找的有些网上的解决办法通过这种办法,可以临时解决,但重启会失效。

mkdir -p /run/systemd/resolve
ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf
systemctl restart kubelet

彻底解决的办法:

yum install -y systemd-resolved
systemctl enable systemd-resolved
systemctl start systemd-resolved
systemctl restart kubelet

然后再看 kubelet 状态 systemctl status kubelet -l,发现已经好了,然后从 master 节点查看 pod,可以看到 worker 节点 的 calico pod 已经正在起来了:

huangwc21@alfred-pc:~$ k get po -A
NAMESPACE          NAME                                           READY   STATUS              RESTARTS       AGE
...
calico-system      calico-node-x7dtq                              0/1     Init:1/2            0              17m
calico-system      csi-node-driver-b228k                          0/2     ContainerCreating   0              17m

然后等待拉镜像,calico 装好之后,Worker 节点变成 Ready:

huangwc21@alfred-pc:~$ k get no -owide
NAME        STATUS   ROLES           AGE   VERSION   INTERNAL-IP       EXTERNAL-IP   OS-IMAGE                KERNEL-VERSION                 CONTAINER-RUNTIME
alfred-pc   Ready    control-plane   15d   v1.28.2   192.168.154.128   <none>        Ubuntu 22.04.3 LTS      6.2.0-39-generic               containerd://1.6.26
k8s-wk0     Ready    <none>          18m   v1.28.2   192.168.154.130   <none>        CentOS Linux 7 (Core)   3.10.0-1160.105.1.el7.x86_64   containerd://1.6.26