开发集群证书过期解决

720 阅读4分钟

今天登录了下自己的开发环境,发现一个问题,kubectl 命令不能用了,看样子好像是api-server 有点问题,6443端口连接超时,

[root@dev-1 root]# kubectl get pod 
E0731 16:12:11.008773    3160 memcache.go:265] couldn't get current server API group list: Get "https://10.40.30.125:6443/api?timeout=32s": net/http: TLS handshake timeout
E0731 16:12:14.576349    3160 memcache.go:265] couldn't get current server API group list: Get "https://10.40.30.125:6443/api?timeout=32s": dial tcp 10.40.30.125:6443: connect: connection refused - error from a previous attempt: read tcp 10.40.30.125:41220->10.40.30.125:6443: read: connection reset by peer
E0731 16:12:14.576932    3160 memcache.go:265] couldn't get current server API group list: Get "https://10.40.30.125:6443/api?timeout=32s": dial tcp 10.40.30.125:6443: connect: connection refused
E0731 16:12:14.578884    3160 memcache.go:265] couldn't get current server API group list: Get "https://10.40.30.125:6443/api?timeout=32s": dial tcp 10.40.30.125:6443: connect: connection refused
E0731 16:12:14.580557    3160 memcache.go:265] couldn't get current server API group list: Get "https://10.40.30.125:6443/api?timeout=32s": dial tcp 10.40.30.125:6443: connect: connection refused

检查下api-server 容器的状态,好家伙死了好多次然后又重启了,直接查看下日志,好像是证书的问题。

[root@dev-1 root]# docker ps -a |grep api
ae18b7325416   771ffcf9ca63                                          "kube-apiserver --ad…"   2 seconds ago        Up 1 second                               k8s_kube-apiserver_kube-apiserver-dev-1_kube-system_f3db9fee3bdc5df10da40f9cf7397e2d_143
07692db807ed   771ffcf9ca63                                          "kube-apiserver --ad…"   41 seconds ago       Exited (1) 20 seconds ago                 k8s_kube-apiserver_kube-apiserver-dev-1_kube-system_f3db9fee3bdc5df10da40f9cf7397e2d_142

[root@dev-1 root]# docker logs 92c8421d3ad3 -f --tail 10 
W0731 08:13:15.960932       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-07-31T08:13:15Z is after 2023-07-28T05:38:26Z". Reconnecting...
W0731 08:13:16.528727       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-07-31T08:13:16Z is after 2023-07-28T05:38:26Z". Reconnecting...
W0731 08:13:17.601180       1 clientconn.go:1223] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379  <nil> 0 <nil>}. Err :connection error: desc = "transport: authentication handshake failed: x509: certificate has expired or is not yet valid: current time 2023-07-31T08:13:17Z is after 2023-07-28T05:38:26Z". Reconnecting...

使用kubeadm 命令检查下证书的状态,看看是不是过期了。

确认下证书

[root@dev-1 root]# kubeadm certs check-expiration
[check-expiration] Reading configuration from the cluster...
[check-expiration] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[check-expiration] Error reading configuration from the Cluster. Falling back to default configuration

CERTIFICATE                EXPIRES                  RESIDUAL TIME   CERTIFICATE AUTHORITY   EXTERNALLY MANAGED
admin.conf                 Jul 28, 2023 05:38 UTC   <invalid>                               no      
apiserver                  Jul 28, 2023 05:38 UTC   <invalid>       ca                      no      
apiserver-etcd-client      Jul 28, 2023 05:38 UTC   <invalid>       etcd-ca                 no      
apiserver-kubelet-client   Jul 28, 2023 05:38 UTC   <invalid>       ca                      no      
controller-manager.conf    Jul 28, 2023 05:38 UTC   <invalid>                               no      
etcd-healthcheck-client    Jul 28, 2023 05:38 UTC   <invalid>       etcd-ca                 no      
etcd-peer                  Jul 28, 2023 05:38 UTC   <invalid>       etcd-ca                 no      
etcd-server                Jul 28, 2023 05:38 UTC   <invalid>       etcd-ca                 no      
front-proxy-client         Jul 28, 2023 05:38 UTC   <invalid>       front-proxy-ca          no      
scheduler.conf             Jul 28, 2023 05:38 UTC   <invalid>                               no      

CERTIFICATE AUTHORITY   EXPIRES                  RESIDUAL TIME   EXTERNALLY MANAGED
ca                      Jul 25, 2032 05:38 UTC   8y              no      
etcd-ca                 Jul 25, 2032 05:38 UTC   8y              no      
front-proxy-ca          Jul 25, 2032 05:38 UTC   8y              no    

果然,证书过期了,更新下证书就搞定了

[root@dev-1 root]# kubeadm certs renew all 
[renew] Reading configuration from the cluster...
[renew] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[renew] Error reading configuration from the Cluster. Falling back to default configuration

certificate embedded in the kubeconfig file for the admin to use and for kubeadm itself renewed
certificate for serving the Kubernetes API renewed
certificate the apiserver uses to access etcd renewed
certificate for the API server to connect to kubelet renewed
certificate embedded in the kubeconfig file for the controller manager to use renewed
certificate for liveness probes to healthcheck etcd renewed
certificate for etcd nodes to communicate with each other renewed
certificate for serving etcd renewed
certificate for the front proxy client renewed
certificate embedded in the kubeconfig file for the scheduler manager to use renewed

Done renewing certificates. You must restart the kube-apiserver, kube-controller-manager, kube-scheduler and etcd, so that they can use the new certificates.



[root@dev-1 root]# cp /etc/kubernetes/admin.conf /root/.kube/config 
cp: overwrite ‘/root/.kube/config’? yes