kubernetes恢复

64 阅读5分钟

恢复master节点

[root@sc-master-3 nginx]#  kubeadm init phase upload-certs --upload-certs
I0906 07:57:31.078641  352435 version.go:256] remote version is much newer: v1.28.1; falling back to: stable-1.24
[upload-certs] Storing the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[upload-certs] Using certificate key:
f85303a67adc35b6aae6a4f4ac214063101fa69b38920c6bb8786875be95efe9
[root@sc-master-3 nginx]# kubeadm token create --print-join-command
kubeadm join 172.70.21.9:16443 --token 6tdoc5.zi2lmp74r23q0nue --discovery-token-ca-cert-hash sha256:ace0c89c5c1d184c12c81573c23a1dc9e2ce91c2c9de69dc866a874915426142 

将两个命令的输出拼接

[root@sc-master-2 ~]# kubeadm join 172.70.21.9:16443 --token 6tdoc5.zi2lmp74r23q0nue --discovery-token-ca-cert-hash sha256:ace0c89c5c1d184c12c81573c23a1dc9e2ce91c2c9de69dc866a874915426142  --control-plane --certificate-key f85303a67adc35b6aae6a4f4ac214063101fa69b38920c6bb8786875be95efe9
[preflight] Running pre-flight checks
        [WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks before initializing the new control plane instance
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[download-certs] Downloading the certificates in Secret "kubeadm-certs" in the "kube-system" Namespace
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local sc-master-2] and IPs [10.96.0.1 172.70.21.12 172.70.21.9]
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [localhost sc-master-2] and IPs [172.70.21.12 127.0.0.1 ::1]
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [localhost sc-master-2] and IPs [172.70.21.12 127.0.0.1 ::1]
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Valid certificates and keys now exist in "/etc/kubernetes/pki"
[certs] Using the existing "sa" key
[kubeconfig] Generating kubeconfig files
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
W0906 08:12:44.381724   32000 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "admin.conf" kubeconfig file
W0906 08:12:44.592614   32000 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
W0906 08:12:44.736257   32000 endpoint.go:57] [endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[check-etcd] Checking that the etcd cluster is healthy
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
The 'update-status' phase is deprecated and will be removed in a future release. Currently it performs no operation
[mark-control-plane] Marking the node sc-master-2 as control-plane by adding the labels: [node-role.kubernetes.io/control-plane node.kubernetes.io/exclude-from-external-load-balancers]
[mark-control-plane] Marking the node sc-master-2 as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule node-role.kubernetes.io/control-plane:NoSchedule]

This node has joined the cluster and a new control plane instance was created:

* Certificate signing request was sent to apiserver and approval was received.
* The Kubelet was informed of the new secure connection details.
* Control plane label and taint were applied to the new node.
* The Kubernetes control plane instances scaled up.
* A new etcd member was added to the local/stacked etcd cluster.

To start administering your cluster from this node, you need to run the following as a regular user:

        mkdir -p $HOME/.kube
        sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
        sudo chown $(id -u):$(id -g) $HOME/.kube/config

Run 'kubectl get nodes' to see this node join the cluster.

[root@sc-master-2 ~]# mkdir ^C
[root@sc-master-2 ~]# mkdir -p $HOME/.kube
[root@sc-master-2 ~]# cp -i /etc/kubernetes/admin.conf $HOME/.kube/config

以上便加入了master节点

加入node节点

etcd操作

etcd健康状态

[root@sc-master-1 ~]# ETCDCTL_API=3 etcdctl \
--endpoints="https://172.70.21.11:2379,https://172.70.21.12:2379,https://172.70.21.13:2379" \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt  endpoint health
https://172.70.21.13:2379 is healthy: successfully committed proposal: took = 11.422783ms
https://172.70.21.11:2379 is healthy: successfully committed proposal: took = 11.322822ms
https://172.70.21.12:2379 is healthy: successfully committed proposal: took = 13.347309ms
[root@sc-master-1 ~]#  ETCDCTL_API=3 etcdctl -w table --endpoints="https://172.70.21.11:2379,https://172.70.21.12:2379,https://172.70.21.13:2379" \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt   endpoint status
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
|         ENDPOINT          |        ID        | VERSION | DB SIZE | IS LEADER | IS LEARNER | RAFT TERM | RAFT INDEX | RAFT APPLIED INDEX | ERRORS |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+
| https://172.70.21.11:2379 | ebded2f087891c8b |   3.5.6 |   46 MB |     false |      false |         7 |    3023086 |            3023086 |        |
| https://172.70.21.12:2379 | ba484335423d616e |   3.5.6 |   46 MB |     false |      false |         7 |    3023086 |            3023086 |        |
| https://172.70.21.13:2379 | 92b3f37e068a4333 |   3.5.6 |   45 MB |      true |      false |         7 |    3023087 |            3023087 |        |
+---------------------------+------------------+---------+---------+-----------+------------+-----------+------------+--------------------+--------+

etcd列表与移除成员

[root@sc-master-1 ~]#  ETCDCTL_API=3 etcdctl -w table --endpoints="https://172.70.21.11:2379,https://172.70.21.12:2379,https://172.70.21.13:2379" \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt   member list 
+------------------+---------+-------------+---------------------------+---------------------------+------------+
|        ID        | STATUS  |    NAME     |        PEER ADDRS         |       CLIENT ADDRS        | IS LEARNER |
+------------------+---------+-------------+---------------------------+---------------------------+------------+
| 92b3f37e068a4333 | started | sc-master-3 | https://172.70.21.13:2380 | https://172.70.21.13:2379 |      false |
| ba484335423d616e | started | sc-master-2 | https://172.70.21.12:2380 | https://172.70.21.12:2379 |      false |
| ebded2f087891c8b | started | sc-master-1 | https://172.70.21.11:2380 | https://172.70.21.11:2379 |      false |
+------------------+---------+-------------+---------------------------+---------------------------+------------+
ETCDCTL_API=3 etcdctl --endpoints="https://172.70.21.11:2379,https://172.70.21.12:2379,https://172.70.21.13:2379" \
--cert=/etc/kubernetes/pki/apiserver-etcd-client.crt \
--key=/etc/kubernetes/pki/apiserver-etcd-client.key \
--cacert=/etc/kubernetes/pki/etcd/ca.crt  member remove ebded2f087891c8b

master节点数据备份

A,主节点数据备份

主节点数据的备份包括三个部分:

1,/etc/kubernetes/目录下的所有文件(证书,manifest文件)

2,用户主目录下.kube/config文件(kubectl连接认证)

3,/var/lib/kubelet/目录下所有文件(plugins容器连接认证)

主节点组件恢复

主节点组件的恢复可按以下步骤进行:

    1,按之前的安装脚本进行全新安装(kubeadm reset,iptables –X…)

    2,停止系统服务systemctl stop kubelet.service。

    3,删除相关插件容器(coredns,flannel,dashboard)。

    4,恢复etcd数据(参见第一章节操作)。

    5,将之前备份的三个目录依次还原。

    6,重启系统服务systemctl start kubelet.service。

    7,一杯咖啡,稍等片刻,待所有组件启动成功后进行验证。

参照(etcd):www.infvie.com/ops-notes/k…