安装系统
下载镜像:CentOS-7-x86_64-Minimal-2009.iso
mirrors.tuna.tsinghua.edu.cn/centos/7.9.…
在 VMware 中执行标准的安装,资源的话,仅采用 1C+1GB+20GB 的配置。
初始配置
主机名配置
要把不同的 worker 改成不同的主机名,修改 /etc/hostname
之后重启一下即可,我开了三台,分别叫做 k8s-wk0
、k8s-wk1
和 k8s-wk2
。
注意也务必将 /etc/hosts
将 k8s-wk#
指定到本机地址,否则 kubeadm join
的时候会报类似这个错误:
[WARNING Hostname]: hostname "k8s-wk0" could not be reached
[WARNING Hostname]: hostname "k8s-wk0": lookup k8s-wk0 on 192.168.154.2:53: server misbehaving
IP 配置
安装之后,发现 ip address
默认拿不到 ipv4 的地址,要手动配置一下:/etc/sysconfig/network-scripts/ifcfg-ens33
原版长这样:
TYPE=Ethernet
PROXY_METHOD=none
BROWSER_ONLY=no
BOOTPROTO=dhcp
DEFROUTE=yes
IPV4_FAILURE_FATAL=no
IPV6INIT=yes
IPV6_AUTOCONF=yes
IPV6_DEFROUTE=yes
IPV6_FAILURE_FATAL=no
IPV6_ADDR_GEN_MODE=stable-privacy
NAME=ens33
UUID=a045e402-1757-4e9a-8c34-e7978e3a5230
DEVICE=ens33
ONBOOT=no
把最后一行改成 ONBOOT=yes
,重启网络服务 systemctl restart network
,就可以拿到 ipv4 了,然后 yum update
成功,网络配置好了。
以上这种是基于 DHCP 动态获取 IP 的,后面可能 IP 漂变会产生一定麻烦,因此推荐把下面几个也改了,把 IP 固定下来,具体的值可以参照 DHCP 获取一次之后实际分配的值:
BOOTPROTO=static
ONBOOT=yes
IPADDR=192.168.154.130
NETMASK=255.255.255.0
GATEWAY=192.168.154.2
DNS1=192.168.154.2
然后通过开发环境 ssh 进去,做个免密码登录之类的,后面就不用直接进这台机的控制台了。
Kubernetes 相关的防火墙设置
cat <<EOF | tee /etc/modules-load.d/k8s.conf
br_netfilter
EOF
cat <<EOF | tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward=1 # better than modify /etc/sysctl.conf
EOF
sysctl --system
关闭防火墙:
systemctl stop firewalld
systemctl disable firewalld
关闭 swap
进入 /etc/fstab 注释掉 swap 行:
# /swapfile none swap sw 0 0
然后立即关掉 swap
swapoff -a
成功关掉之后,在 free -m
中查看内存,应该不显示交换内存,即为成功。
关闭 SELinux
setenforce 0
软件预装
yum install -y open-vm-tools
Containerd 安装
yum install -y yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum -y install containerd.io
默认的 containerd 配置:
mkdir -p /etc/containerd
containerd config default | tee /etc/containerd/config.toml
修改 /etc/containerd/config.toml
registry.k8s.io/pause:3.6
# 改为 >>>
registry.aliyuncs.com/google_containers/pause:3.9
SystemdCgroup = false
# 改为 >>>
SystemdCgroup = true
自启动 containerd 服务
systemctl enable containerd
systemctl restart containerd
Kubernetes 安装
配置源:
cat <<EOF | tee /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64
enabled=1
gpgcheck=0
repo_gpgcheck=0
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
安装:
yum update -y
# yum list kubeadm kubectl kubelet --showduplicates | sort -r # 查看所有可用版本
yum install -y kubelet-1.28.2 kubeadm-1.28.2 kubectl-1.28.2
# 锁定版本
yum install -y yum-plugin-versionlock
yum versionlock kubeadm-1.28.2 kubectl-1.28.2 kubelet-1.28.2
yum versionlock ls # 查看锁定的版本
把默认 CRI 设置为 containerd:
cat <<EOF | sudo tee /etc/crictl.yaml
runtime-endpoint: "unix:///run/containerd/containerd.sock"
timeout: 0
debug: false
EOF
设置 kubelet 自启动:
systemctl enable kubelet.service
加入集群
找到 master 创建的时候 log 出来的那句命令,运行一下:
kubeadm join 192.168.154.128:6443 --token xxxxxx.xxxxxx \
--discovery-token-ca-cert-hash sha256:xxxxxxxx
如果找不到了,在 master 节点运行这个可以重新获取:
kubeadm token create --print-join-command
然后从 master 节点查看节点 kubectl get no
,看到节点加进去了:
k8s-wk0 NotReady <none> 5m32s v1.28.2
这里踩了坑,一直 NotReady
,然后进入 k8s-wk0,查看 kubelet 状态 systemctl status kubelet -l
,发现报这个:
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /usr/lib/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since 五 2024-01-05 19:14:39 CST; 30s ago
Docs: https://kubernetes.io/docs/
Main PID: 1395 (kubelet)
Tasks: 8
Memory: 34.2M
CGroup: /system.slice/kubelet.service
└─1395 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --container-runtime-endpoint=unix:///var/run/containerd/containerd.sock --pod-infra-container-image=registry.aliyuncs.com/google_containers/pause:3.9
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218284 1395 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="calico-system/calico-node-x7dtq"
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218298 1395 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="calico-system/calico-node-x7dtq"
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218325 1395 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"calico-node-x7dtq_calico-system(7ccc22e2-6f02-4036-8b9a-76185cc20b73)\" with CreatePodSandboxError: \"Failed to generate sandbox config for pod \\\"calico-node-x7dtq_calico-system(7ccc22e2-6f02-4036-8b9a-76185cc20b73)\\\": open /run/systemd/resolve/resolv.conf: no such file or directory\"" pod="calico-system/calico-node-x7dtq" podUID="7ccc22e2-6f02-4036-8b9a-76185cc20b73"
1月 05 19:15:06 k8s-wk0 kubelet[1395]: E0105 19:15:06.218421 1395 pod_workers.go:1300] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-b228k" podUID="658dff40-bb63-491b-8209-697620f4268d"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.217414 1395 pod_workers.go:1300] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-b228k" podUID="658dff40-bb63-491b-8209-697620f4268d"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218022 1395 dns.go:284] "Could not open resolv conf file." err="open /run/systemd/resolve/resolv.conf: no such file or directory"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218036 1395 kuberuntime_sandbox.go:45] "Failed to generate sandbox config for pod" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-proxy-jlvd8"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218045 1395 kuberuntime_manager.go:1166] "CreatePodSandbox for pod failed" err="open /run/systemd/resolve/resolv.conf: no such file or directory" pod="kube-system/kube-proxy-jlvd8"
1月 05 19:15:08 k8s-wk0 kubelet[1395]: E0105 19:15:08.218063 1395 pod_workers.go:1300] "Error syncing pod, skipping" err="failed to \"CreatePodSandbox\" for \"kube-proxy-jlvd8_kube-system(fbda2935-da6a-48bc-b469-d46687c6cbc8)\" with CreatePodSandboxError: \"Failed to generate sandbox config for pod \\\"kube-proxy-jlvd8_kube-system(fbda2935-da6a-48bc-b469-d46687c6cbc8)\\\": open /run/systemd/resolve/resolv.conf: no such file or directory\"" pod="kube-system/kube-proxy-jlvd8" podUID="fbda2935-da6a-48bc-b469-d46687c6cbc8"
1月 05 19:15:10 k8s-wk0 kubelet[1395]: E0105 19:15:10.217450 1395 pod_workers.go:1300] "Error syncing pod, skipping" err="network is not ready: container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized" pod="calico-system/csi-node-driver-b228k" podUID="658dff40-bb63-491b-8209-697620f4268d"
这个坑的关键问题在于 "open /run/systemd/resolve/resolv.conf: no such file or directory"
,然后导致后续的 calico 在 worker 节点上装不上。
为何这样不清楚,上网找了一圈,再问了 ChatGPT,发现是因为:
CentOS 7 默认使用 NetworkManager 来管理网络连接,而不是直接使用 systemd-resolved。因此,你在
/run/systemd/resolve/resolv.conf
文件中找不到 resolv.conf 的原因是缺少 systemd-resolved 服务。
而事实上默认的 NetworkManager 是直接生成管理 /etc/resolv.conf
文件的,而 systemd-resolved
则是通过 /run/systemd/resolve/resolv.conf
指定配置,再软链接到 /etc/resolv.conf
中,Kuberlet 会默认找 /run/systemd/resolve/resolv.conf
,导致找不到:
之前找的有些网上的解决办法通过这种办法,可以临时解决,但重启会失效。
mkdir -p /run/systemd/resolve
ln -s /etc/resolv.conf /run/systemd/resolve/resolv.conf
systemctl restart kubelet
彻底解决的办法:
yum install -y systemd-resolved
systemctl enable systemd-resolved
systemctl start systemd-resolved
systemctl restart kubelet
然后再看 kubelet 状态 systemctl status kubelet -l
,发现已经好了,然后从 master 节点查看 pod,可以看到 worker 节点 的 calico pod 已经正在起来了:
huangwc21@alfred-pc:~$ k get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
...
calico-system calico-node-x7dtq 0/1 Init:1/2 0 17m
calico-system csi-node-driver-b228k 0/2 ContainerCreating 0 17m
然后等待拉镜像,calico 装好之后,Worker 节点变成 Ready:
huangwc21@alfred-pc:~$ k get no -owide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
alfred-pc Ready control-plane 15d v1.28.2 192.168.154.128 <none> Ubuntu 22.04.3 LTS 6.2.0-39-generic containerd://1.6.26
k8s-wk0 Ready <none> 18m v1.28.2 192.168.154.130 <none> CentOS Linux 7 (Core) 3.10.0-1160.105.1.el7.x86_64 containerd://1.6.26