21 k8s集群维护与调优

940 阅读12分钟

一、速查

1.1、nodePort端口占用情况

[root@VM-16-14-centos data]# netstat -nlpt  | grep -Po ':::\K\d+(?=.+kube-proxy)' | sort -rn | xargs -n8
10256

二、报错

2.1、scheduler Unhealthy

  • prometheus部署后,发现的报警之一KubeSchedulerDown (1 active)

image.png

image.png

  • 原因默认配置 --port=0,导致 image.png

[root@VM-16-14-centos ~]# vim /etc/kubernetes/manifests/kube-controller-manager.yaml

  • 注释 - --port=0,26行

[root@VM-16-14-centos ~]# vim /etc/kubernetes/manifests/kube-scheduler.yaml

  • 注释 - --port=0,19行

  • master重启服务,即可 [root@VM-16-14-centos ~]# systemctl restart kubelet.service

image.png

2.2、node NotReady/Kubelet stopped posting node status.

image.png

[root@VM-16-14-centos ~]# kubectl describe node

Conditions:
  Type                 Status    LastHeartbeatTime                 LastTransitionTime                Reason              Message
  ----                 ------    -----------------                 ------------------                ------              -------
  NetworkUnavailable   False     Wed, 27 Oct 2021 10:46:28 +0800   Wed, 27 Oct 2021 10:46:28 +0800   FlannelIsUp         Flannel is running on this node
  MemoryPressure       Unknown   Sat, 30 Oct 2021 00:35:18 +0800   Sat, 30 Oct 2021 00:37:44 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  DiskPressure         Unknown   Sat, 30 Oct 2021 00:35:18 +0800   Sat, 30 Oct 2021 00:37:44 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  PIDPressure          Unknown   Sat, 30 Oct 2021 00:35:18 +0800   Sat, 30 Oct 2021 00:37:44 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  Ready                Unknown   Sat, 30 Oct 2021 00:35:18 +0800   Sat, 30 Oct 2021 00:37:44 +0800   NodeStatusUnknown   Kubelet stopped posting node status.
  
[root@VM-16-4-centos ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sun 2021-10-24 18:09:59 CST; 5 days ago
     Docs: https://kubernetes.io/docs/
 Main PID: 59205 (kubelet)
    Tasks: 28 (limit: 23722)
   Memory: 201.4M
   CGroup: /system.slice/kubelet.service
           └─59205 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --network->

1030 00:55:27 VM-16-4-centos kubelet[59205]: E1030 00:55:27.840207   59205 remote_image.go:114] "PullImage from image service failed" err="rpc error: code = Unknown desc = Error respons>
10月 30 00:55:27 VM-16-4-centos kubelet[59205]: E1030 00:55:27.840255   59205 kuberuntime_image.go:51] "Failed to pull image" err="rpc error: code = Unknown desc = Error response from daem>
1030 00:55:27 VM-16-4-centos kubelet[59205]: E1030 00:55:27.840347   59205 kuberuntime_manager.go:895] container &Container{Name:kube-state-metrics,Image:k8s.gcr.io/kube-state-metrics/k>
1030 00:55:27 VM-16-4-centos kubelet[59205]: E1030 00:55:27.840391   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">
10月 30 00:55:40 VM-16-4-centos kubelet[59205]: E1030 00:55:40.827330   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">
1030 00:55:52 VM-16-4-centos kubelet[59205]: E1030 00:55:52.826881   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">
10月 30 00:56:07 VM-16-4-centos kubelet[59205]: E1030 00:56:07.832414   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">
1030 00:56:22 VM-16-4-centos kubelet[59205]: E1030 00:56:22.827684   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">
10月 30 00:56:36 VM-16-4-centos kubelet[59205]: E1030 00:56:36.826836   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">
1030 00:56:50 VM-16-4-centos kubelet[59205]: E1030 00:56:50.827561   59205 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics\">


[root@VM-16-4-centos ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
   Loaded: loaded (/usr/lib/systemd/system/kubelet.service; enabled; vendor preset: disabled)
  Drop-In: /usr/lib/systemd/system/kubelet.service.d
           └─10-kubeadm.conf
   Active: active (running) since Sat 2021-10-30 00:57:32 CST; 3s ago
     Docs: https://kubernetes.io/docs/
 Main PID: 4165835 (kubelet)
    Tasks: 13 (limit: 23722)
   Memory: 158.2M
   CGroup: /system.slice/kubelet.service
           └─4165835 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --networ>

10月 30 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.675939 4165835 secret.go:195] Couldn't get secret ingress-nginx/ingress-nginx-admission: failed to sync secret cache: timed>
10月 30 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.675977 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/d816894a-063a-4375-b363-8019>
1030 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.676914 4165835 secret.go:195] Couldn't get secret ingress-nginx/ingress-nginx-admission: failed to sync secret cache: timed>
1030 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.676965 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/6db0c84f-17f3-4165-93d7-3485>
10月 30 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.677386 4165835 configmap.go:200] Couldn't get configMap monitoring/grafana-dashboards: failed to sync configmap cache: time>
10月 30 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.677430 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
1030 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.677521 4165835 secret.go:195] Couldn't get secret monitoring/prometheus-k8s: failed to sync secret cache: timed out waiting>
1030 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.677557 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/9382ba05-e129-4e2f-bf29-75f8>
10月 30 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.677571 4165835 configmap.go:200] Couldn't get configMap monitoring/adapter-config: failed to sync configmap cache: timed ou>
10月 30 00:57:35 VM-16-4-centos kubelet[4165835]: E1030 00:57:35.677603 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/d3b8a6f8-4caa-4f7e-aaf2-2>

[root@VM-16-4-centos ~]# journalctl -e -u kubelet
1030 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193666 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
10月 30 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193680 4165835 configmap.go:200] Couldn't get configMap monitoring/grafana-dashboard-node-rsrc-use: failed to sync configma>
10月 30 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193709 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
1030 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193724 4165835 configmap.go:200] Couldn't get configMap monitoring/grafana-dashboard-namespace-by-workload: failed to sync >
1030 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193753 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
10月 30 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193765 4165835 configmap.go:200] Couldn't get configMap monitoring/adapter-config: failed to sync configmap cache: timed ou>
10月 30 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193800 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/03d8ad53-d0e7-4233-8fc5-e>
1030 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193819 4165835 configmap.go:200] Couldn't get configMap monitoring/grafana-dashboard-node-cluster-rsrc-use: failed to sync >
1030 00:57:37 VM-16-4-centos kubelet[4165835]: E1030 00:57:37.193852 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.208158 4165835 configmap.go:200] Couldn't get configMap monitoring/grafana-dashboard-workload-total: failed to sync configm>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.208262 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.209228 4165835 secret.go:195] Couldn't get secret monitoring/grafana-datasources: failed to sync secret cache: timed out wa>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.209234 4165835 configmap.go:200] Couldn't get configMap monitoring/prometheus-k8s-rulefiles-0: failed to sync configmap cac>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.209270 4165835 configmap.go:200] Couldn't get configMap monitoring/grafana-dashboard-alertmanager-overview: failed to sync >
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.209276 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/9382ba05-e129-4e2f-bf29-7>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.209293 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/4873ae3f-ffc3-49ea-91b0-c26c>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.209307 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/4873ae3f-ffc3-49ea-91b0-c>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217507 4165835 secret.go:195] Couldn't get secret monitoring/prometheus-k8s: failed to sync secret cache: timed out waiting>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217545 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/9382ba05-e129-4e2f-bf29-75f8>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217560 4165835 configmap.go:200] Couldn't get configMap monitoring/adapter-config: failed to sync configmap cache: timed ou>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217584 4165835 secret.go:195] Couldn't get secret monitoring/prometheus-k8s-web-config: failed to sync secret cache: timed >
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217610 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/9382ba05-e129-4e2f-bf29-75f8>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217615 4165835 secret.go:195] Couldn't get secret monitoring/alertmanager-main-tls-assets: failed to sync secret cache: tim>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217643 4165835 secret.go:195] Couldn't get secret ingress-nginx/ingress-nginx-admission: failed to sync secret cache: timed>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217647 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/a9a38983-958b-47c8-bfa6-b823>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217670 4165835 secret.go:195] Couldn't get secret ingress-nginx/ingress-nginx-admission: failed to sync secret cache: timed>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217678 4165835 configmap.go:200] Couldn't get configMap monitoring/adapter-config: failed to sync configmap cache: timed ou>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217670 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/6db0c84f-17f3-4165-93d7-3485>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217703 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/d3b8a6f8-4caa-4f7e-aaf2-2>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217705 4165835 secret.go:195] Couldn't get secret monitoring/prometheus-k8s-tls-assets: failed to sync secret cache: timed >
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.217729 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/9382ba05-e129-4e2f-bf29-75f8>
10月 30 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.218024 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/secret/d816894a-063a-4375-b363-8019>
1030 00:57:39 VM-16-4-centos kubelet[4165835]: E1030 00:57:39.218045 4165835 nestedpendingoperations.go:301] Operation for "{volumeName:kubernetes.io/configmap/03d8ad53-d0e7-4233-8fc5-e>
10月 30 00:57:45 VM-16-4-centos kubelet[4165835]: E1030 00:57:45.374208 4165835 server.go:273] "Unable to authenticate the request due to an error" err="context canceled"
10月 30 00:57:45 VM-16-4-centos kubelet[4165835]: I1030 00:57:45.565261 4165835 request.go:665] Waited for 10.889919834s due to client-side throttling, not priority and fairness, request: >
10月 30 00:58:01 VM-16-4-centos kubelet[4165835]: E1030 00:58:01.741420 4165835 remote_image.go:114] "PullImage from image service failed" err="rpc error: code = Unknown desc = Error respo>
1030 00:58:01 VM-16-4-centos kubelet[4165835]: E1030 00:58:01.741488 4165835 kuberuntime_image.go:51] "Failed to pull image" err="rpc error: code = Unknown desc = Error response from da>
10月 30 00:58:01 VM-16-4-centos kubelet[4165835]: E1030 00:58:01.741595 4165835 kuberuntime_manager.go:895] container &Container{Name:kube-state-metrics,Image:k8s.gcr.io/kube-state-metrics>
10月 30 00:58:01 VM-16-4-centos kubelet[4165835]: E1030 00:58:01.741658 4165835 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics>
10月 30 00:58:02 VM-16-4-centos kubelet[4165835]: E1030 00:58:02.011300 4165835 pod_workers.go:765] "Error syncing pod, skipping" err="failed to \"StartContainer\" for \"kube-state-metrics>
[root@VM-16-4-centos ~]#

三、调优

3.1、nodePort The range of valid ports is 30000-32767

image.png

[root@VM-16-14-centos data]# vim /etc/kubernetes/manifests/kube-apiserver.yaml
在
    - --service-cluster-ip-range=10.254.0.0/16
之后增加
    - --service-node-port-range=1-65535
    

[root@VM-16-14-centos data]# systemctl daemon-reload

[root@VM-16-14-centos data]# systemctl restart kubelet

image.png