系统环境
- Docker 版本:19.3.8
- Kubeadm 版本:1.16.15
- Kubernetes 版本:1.16.15
- Kubernetes Master 数量:3
操作流程
- 查看集群master节点信息
[root@k8s-portal-master1 olami]# kubectl get nodes |grep master
k8s-portal-master1 Ready master 367d v1.16.15
k8s-portal-master2 Ready master 367d v1.16.15
k8s-portal-master3 Ready master 367d v1.16.15
由于k8s-portal-master1 节点异常,踢出集群重置之后,想要重新作为master节点身份加入集群
- 重置之后加入集群异常
[root@k8s-portal-master1 ~]# kubeadm join 10.3.175.168:6443 --token b8mdec.mh10mojlfl4zdqrd --discovery-token-ca-cert-hash sha256:23921aa3bd9d8acd048633613a9174c4d52caf404739a67b71bd55075a52mq56 --control-plane
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.0. Latest validated version: 18.09
[WARNING Hostname]: hostname "k8s-portal-master1" could not be reached
[WARNING Hostname]: hostname "k8s-portal-master1": lookup k8s-portal-master1 on 119.29.29.29:53: no such host
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-portal-master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.3.175.165 10.3.175.168 10.3.175.165 10.3.175.166 10.3.175.167 10.3.175.168]
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
error execution phase check-etcd: etcd cluster is not healthy: failed to dial endpoint https://10.3.175.165:2379 with maintenance client: context deadline exceeded
To see the stack trace of this error execute with --v=5 or higher
结果一直卡在check-etcd,日志显示etcd的健康检查失败,根据 "error execution phase check-etcd" 可知,在执行加入etcd时候出现的错误,导致master无法加入原先的kubernetes集群
- 通过kubectl命令查看集群信息
[root@k8s-portal-master2 ~]# kubectl get nodes |grep master
k8s-portal-master2 Ready master 367d v1.16.15
k8s-portal-master3 Ready master 367d v1.16.15
发现k8s-portal-master1节点确实不在节点列表中,表明加入集群失败
- 查看集群的kubeadm-config信息
[root@k8s-portal-master2 ~]# kubectl describe configmaps kubeadm-config -n kube-system
Name: kubeadm-config
Namespace: kube-system
Labels: <none>
Annotations: <none>
Data
====
ClusterConfiguration:
----
apiServer:
certSANs:
- 10.3.175.165
- 10.3.175.166
- 10.3.175.167
- 10.3.175.168
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: 10.3.175.168:6443
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: gcr.azk8s.cn/google_containers
kind: ClusterConfiguration
kubernetesVersion: v1.16.15
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
ClusterStatus:
----
apiEndpoints:
k8s-portal-master1:
advertiseAddress: 10.3.175.165
bindPort: 6443
k8s-portal-master2:
advertiseAddress: 10.3.175.166
bindPort: 6443
k8s-portal-master3:
advertiseAddress: 10.3.175.167
bindPort: 6443
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterStatus
Events: <none>
k8s-portal-master1节点信息还存在于kubeadm-config配置中,说明etcd中还保存着k8s-portal-master1相关信息,集群是通过kubeadm搭建的,并且etcd与master节点在一起,所以每个Master节点上都会存在一个etcd容器实例。当剔除一个master节点时etcd集群未删除剔除的节点的etcd成员信息,该信息还存在etcd集群列表中
- 获取etcd集群节点信息
[root@k8s-portal-master2 ~]# kubectl get pods -n kube-system | grep etcd
etcd-k8s-portal-master2 1/1 Running 0 88m
etcd-k8s-portal-master3 1/1 Running 0 124m
- 进入etcd删除etcd成员信息
[root@k8s-portal-master2 ~]# kubectl exec -it etcd-k8s-portal-master2 sh -n kube-system
# export ETCDCTL_API=3
# alias etcdctl='etcdctl --endpoints=https://127.0.0.1:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/server.crt --key=/etc/kubernetes/pki/etcd/server.key'
# etcdctl member list
81823df8357bcc71, started, k8s-portal-master3, https://10.3.175.167:2380, https://10.3.175.167:2379
9d7d493298ff2c5f, started, k8s-portal-master1, https://10.3.175.165:2380, https://10.3.175.165:2379
fac8c4b57ce3b0af, started, k8s-portal-master2, https://10.3.175.166:2380, https://10.3.175.166:2379
# etcdctl member remove 9d7d493298ff2c5f
Member 9d7d493298ff2c5f removed from cluster bd092b6d7796dffd
# etcdctl member list
81823df8357bcc71, started, k8s-portal-master3, https://10.3.175.167:2380, https://10.3.175.167:2379
fac8c4b57ce3b0af, started, k8s-portal-master2, https://10.3.175.166:2380, https://10.3.175.166:2379
#
# exit
- 重置k8s-portal-master1环境
[root@k8s-portal-master1 ~]# kubeadm reset
[reset] Reading configuration from the cluster...
[reset] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
W1215 12:49:00.069254 4316 reset.go:96] [reset] Unable to fetch the kubeadm-config ConfigMap from cluster: failed to get node registration: failed to get node name from kubelet config: open /etc/kubernetes/kubelet.conf: no such file or directory
[reset] WARNING: Changes made to this host by 'kubeadm init' or 'kubeadm join' will be reverted.
[reset] Are you sure you want to proceed? [y/N]: y
[preflight] Running pre-flight checks
W1215 12:49:02.730590 4316 removeetcdmember.go:79] [reset] No kubeadm config, using etcd pod spec to get data directory
[reset] No etcd config found. Assuming external etcd
[reset] Please, manually reset etcd to prevent further issues
[reset] Stopping the kubelet service
[reset] Unmounting mounted directories in "/var/lib/kubelet"
W1215 12:49:02.733156 4316 cleanupnode.go:99] [reset] Failed to evaluate the "/var/lib/kubelet" directory. Skipping its unmount and cleanup: lstat /var/lib/kubelet: no such file or directory
[reset] Deleting contents of config directories: [/etc/kubernetes/manifests /etc/kubernetes/pki]
[reset] Deleting files: [/etc/kubernetes/admin.conf /etc/kubernetes/kubelet.conf /etc/kubernetes/bootstrap-kubelet.conf /etc/kubernetes/controller-manager.conf /etc/kubernetes/scheduler.conf]
[reset] Deleting contents of stateful directories: [/etc/cni/net.d /var/lib/dockershim /var/run/kubernetes /var/lib/cni]
The reset process does not reset or clean up iptables rules or IPVS tables.
If you wish to reset iptables, you must do so manually by using the "iptables" command.
If your cluster was setup to utilize IPVS, run ipvsadm --clear (or similar)
to reset your system's IPVS tables.
The reset process does not clean your kubeconfig files and you must remove them manually.
Please, check the contents of the $HOME/.kube/config file.
- 拷贝相关证书之后,重新加入集群
[root@k8s-portal-master1 ~]# kubeadm join 10.3.175.168:6443 --token b8mdec.mh10mojlfl4zdqrd --discovery-token-ca-cert-hash sha256:23921aa3bd9d8acd048633613a9174c4d52caf404739a67b71bd55075a52mq56 --control-plane
[preflight] Running pre-flight checks
[WARNING SystemVerification]: this Docker version is not on the list of validated versions: 20.10.0. Latest validated version: 18.09
[WARNING Hostname]: hostname "k8s-portal-master1" could not be reached
[WARNING Hostname]: hostname "k8s-portal-master1": lookup k8s-portal-master1 on 119.29.29.29:53: no such host
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [k8s-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [k8s-portal-master1 localhost] and IPs [10.3.175.165 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [k8s-portal-master1 kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.3.175.165 10.3.175.168 10.3.175.165 10.3.175.166 10.3.175.167 10.3.175.168]
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Using existing kubeconfig file: "/etc/kubernetes/admin.conf"
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.16" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
{"level":"warn","ts":"2020-12-15T12:52:08.222+0800","caller":"clientv3/retry_interceptor.go:61","msg":"retrying of unary invoker failed","target":"passthrough:///https://10.3.175.165:2379","attempt":0,"error":"rpc error: code = DeadlineExceeded desc = context deadline exceeded"}
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[mark-control-plane] Marking the node k8s-portal-master1 as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node k8s-portal-master1 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
This node has joined the cluster and a new control plane instance was created:
* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane (master) label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.
To start administering your cluster from this node, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
Run 'kubectl get nodes' to see this node join the cluster.
- 查看集群master节点信息
[root@k8s-portal-master1 olami]# kubectl get nodes |grep master
k8s-portal-master1 Ready master 2m v1.16.15
k8s-portal-master2 Ready master 367d v1.16.15
k8s-portal-master3 Ready master 367d v1.16.15
正常加入集群,集群恢复正常