k8s集群master节点的删除和重新加入

999 阅读3分钟

一个3主节点的k8s集群。主节点需要下线运维,看看操作步骤如何

下掉master1

删除master1节点

3台master下掉一个,剩下2个master运行基本也没问题。坚持个一两天问题不大。

kubectl drain paas-m-k8s-master-1 --delete-local-data --force --ignore-daemonsets
kubectl delete node paas-m-k8s-master-1

清理etcd集群

进去etcd容器

kubectl -n kube-system exec -it etcd-paas-m-k8s-master-2 -- /bin/sh

查看member list

etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list

看到现在还是3个member,把下掉的那个删除

etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778

image-20240110135504525.png

image-20240110135504525

然后master-1做了一些硬盘修复

之后再加入集群

master1再重新加入集群

重置下master1

kubeadm reset

配置一个对域名apiserver.cluster.local的解析

修改 /etc/hosts

你的活着的master的ip apiserver.cluster.local

在kubeadm join的时候会用到

在master2上生成join命令

[root@paas-m-k8s-master-2 ~]# kubeadm init phase upload-certs --upload-certs
I0110 10:10:11.254956   12245 version.go:252] remote version is much newer: v1.29.0; falling back to: stable-1.18
W0110 10:10:13.812440   12245 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324
[root@paas-m-k8s-master-2 ~]# kubeadm token create --print-join-command
W0110 10:11:40.990463   14694 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join apiserver.cluster.local:6443 --token yubedv.0rg185no5jgqwn07     --discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482

master1加入集群

到要被加入的机器上执行

加master节点才需要--control-plane --certificate-key

kubeadm join apiserver.cluster.local:6443 \
--token yubedv.0rg185no5jgqwn07 \
--discovery-token-ca-cert-hash sha256:be87c7200420224f1f8d439a5f058de7be88282eec1fc833b346b38c62ddf482 \
--control-plane --certificate-key 23d8e27402b4f982d9ec894c37b1a3271c9f27bef2e653ca471426cc57025324

成功

image-20240110135409427.png

image-20240110135409427

遇到过的问题总结

  1. 域名解析不到apiserver.cluster.local
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://apiserver.cluster.local:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s: dial tcp: lookup apiserver.cluster.local on 10.138.xx.xx:53: no such host

是域名解析的问题,找不到apiserver.cluster.local

解决:

直接在/ets/hosts里配上

你的活着的master的ip apiserver.cluster.local
  1. kubelet的端口占用
Port 10250 is in use

kubelet还活着。kubeadm join时会启动kubelet

使用kubeadm reset 重置配置

  1. etcd目录不为空
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty

删除即可。

rm -rf /var/lib/etcd

也可以kubeadm reset 重置

  1. etcd健康检查失败

原因是之前的etcd记录还存在,查看

etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member list

删除

etcdctl --endpoints=127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=etc/kubernetes/pki/etcd/peer.crt --key=/etc/kubernetes/pki/etcd/peer.key member remove 7eab7c23b19f6778

本文使用 markdown.com.cn 排版