一个3主节点的k8s集群。主节点需要下线运维,看看操作步骤如何
下掉master1
删除master1节点
3台master下掉一个,剩下2个master运行基本也没问题。坚持个一两天问题不大。
kubectl drain paas-m-k8s-master-1 --delete-local-data --force --ignore-daemonsets
kubectl delete node paas-m-k8s-master-1
清理etcd集群
进去etcd容器
kubectl -n kube-system exec -it etcd-paas-m-k8s-master-2 -- /bin/sh
查看member list
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
看到现在还是3个member,把下掉的那个删除
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778
image-20240110135504525
然后master-1做了一些硬盘修复
之后再加入集群
master1再重新加入集群
重置下master1
kubeadm reset
配置一个对域名apiserver.cluster.local的解析
修改 /etc/hosts
你的活着的master的ip apiserver.cluster.local
在kubeadm join的时候会用到
在master2上生成join命令
[root@paas-m-k8s-master-2 ~]# kubeadm init phase upload-certs --upload-certs
I0110 10:10:11.254956 12245 version.go:252] remote version is much newer: v1.29.0; falling back to: stable-1.18
W0110 10:10:13.812440 12245 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324
[root@paas-m-k8s-master-2 ~]# kubeadm token create --print-join-command
W0110 10:11:40.990463 14694 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join apiserver.cluster.local:6443 --token yubedv.0rg185no5jgqwn07 --discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482
master1加入集群
到要被加入的机器上执行
加master节点才需要--control-plane --certificate-key
kubeadm join apiserver.cluster.local:6443 \
--token yubedv.0rg185no5jgqwn07 \
--discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482 \
--control-plane --certificate-key 23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324
成功
image-20240110135409427
遇到过的问题总结
- 域名解析不到apiserver.cluster.local
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://apiserver.cluster.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: dial tcp: lookup apiserver.cluster.local on 10.138.xx.xx:53: no such host
是域名解析的问题,找不到apiserver.cluster.local
解决:
直接在/ets/hosts里配上
你的活着的master的ip apiserver.cluster.local
- kubelet的端口占用
Port 10250 is in use
kubelet还活着。kubeadm join时会启动kubelet
使用kubeadm reset 重置配置
- etcd目录不为空
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
删除即可。
rm -rf /var/lib/etcd
也可以kubeadm reset 重置
- etcd健康检查失败
原因是之前的etcd记录还存在,查看
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list
删除
etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778
本文使用 markdown.com.cn 排版