1.问题背景
自建k8s集群 无法正常使用kubectl top 命令, 如: kubectl top node/kubectl top pod 无法查看 node或pod资源占用,导致无法精确查看pod占用资源情况 使用 kubectl top pod 报错如下: Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
问题原因: 未安装 Metrics Server,需要安装 Metrics Server 才能采集 node/pod资源占用数据,才能使用 kuebctl top node/pod 命令查看资源占用
2.解决方案
1.安装 Metrics Server 组件参考
kubernetes-sigs.github.io/metrics-ser…
github.com/kubernetes-… metrics-server helm chart配置
安装 Metrics Server需要注意下 与 k8s版本兼容性
helm repo add metrics-server [https://kubernetes-sigs.github.io/metrics-server/](https://kubernetes-sigs.github.io/metrics-server/)
helm search repo metrics-server
NAME CHART VERSION APP VERSION DESCRIPTION
metrics-server/metrics-server 3.11.0 0.6.4 Metrics Server is a scalable, efficient source ...
[root@aivpp et-industry-prometheus]# helm search repo metrics-server --versions
NAME CHART VERSION APP VERSION DESCRIPTION
metrics-server/metrics-server 3.11.0 0.6.4 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.10.0 0.6.3 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.9.0 0.6.3 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.8.4 0.6.2 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.8.3 0.6.2 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.8.2 0.6.1 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.8.1 0.6.1 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.8.0 0.6.0 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.7.0 0.5.2 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.6.0 0.5.1 Metrics Server is a scalable, efficient source ...
metrics-server/metrics-server 3.5.0 0.5.0 Metrics Server is a scalable, efficient source ...
2.安装 Metrics Server 组件
注: 本地安装k8s集群为1.18版本 ,需要使用 Metrics Server 0.5.x 版本才能兼容 使用版本: metrics-server/metrics-server 3.7.0 0.5.2
1.下载 helm chart: metrics-server-3.7.0.tgz
helm fetch kubernetes-sigs.github.io/metrics-ser…metrics-server/metrics-server --version=3.7.0 tar -zxvf metrics-server-3.7.0.tgz helm upgrade --install metrics-server metrics-server/metrics-server
2.修改helm chart配置: 添加namespace
metrics-server-3.7.0
vi templates/serviceaccount.yaml
{{- if .Values.serviceAccount.create -}}
apiVersion: v1
kind: ServiceAccount
metadata:
name: {{ template "metrics-server.serviceAccountName" . }}
namespace: {{ .Release.Namespace }}
{{- with .Values.serviceAccount.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
labels:
{{- include "metrics-server.labels" . | nindent 4 }}
{{- end -}}
vi templates/service.yaml
apiVersion: v1
kind: Service
metadata:
name: {{ include "metrics-server.fullname" . }}
namespace: {{ .Release.Namespace }}
{{- with .Values.service.annotations }}
annotations:
{{- toYaml . | nindent 4 }}
{{- end }}
labels:
{{- include "metrics-server.labels" . | nindent 4 }}
{{- with .Values.service.labels -}}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
type: {{ .Values.service.type }}
ports:
- name: https
port: {{ .Values.service.port }}
protocol: TCP
targetPort: https
selector:
{{- include "metrics-server.selectorLabels" . | nindent 4 }}
vi templates/pdb.yaml
{{- if .Values.podDisruptionBudget.enabled -}}
apiVersion: {{ include "metrics-server.pdb.apiVersion" . }}
kind: PodDisruptionBudget
metadata:
name: {{ include "metrics-server.fullname" . }}
namespace: {{ .Release.Namespace }}
labels:
{{- include "metrics-server.labels" . | nindent 4 }}
spec:
{{- if .Values.podDisruptionBudget.minAvailable }}
minAvailable: {{ .Values.podDisruptionBudget.minAvailable }}
{{- end }}
{{- if .Values.podDisruptionBudget.maxUnavailable }}
maxUnavailable: {{ .Values.podDisruptionBudget.maxUnavailable }}
{{- end }}
selector:
matchLabels:
{{- include "metrics-server.selectorLabels" . | nindent 6 }}
{{- end -}}
vi templates/deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: {{ include "metrics-server.fullname" . }}
namespace: {{ .Release.Namespace }}
vi values.yaml
image:
# repository: k8s.gcr.io/metrics-server/metrics-server
repository: registry.cn-hangzhou.aliyuncs.com/google_containers/metrics-server
# Overrides the image tag whose default is v{{ .Chart.AppVersion }}
tag: "v0.5.2"
pullPolicy: IfNotPresent
args:
# 默认参数
- --secure-port=4443
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
# 添加忽略 kubelet tls证书验证即可 => fix: kubelet https tls证书验证失败问题
- --kubelet-insecure-tls
resources:
requests:
cpu: 100m
memory: 200Mi
limits:
cpu: 200m
memory: 200Mi
3.helm chart 安装 metrics-server
tar -xvf metrics-server-3.7.0.tar
安装 metrics-server
helm install metrics-server . -n monitor
问题: 无法正常拉起Pod:metrics-server pod状态一直 0/1 Running 状态
错误日志:
I1011 12:07:51.215241 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
E1011 12:07:57.718982 1 scraper.go:139] "Failed to scrape node" err="Get \"https://192.168.0.22:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 192.168.0.22 because it doesn't contain any IP SANs" node="aivpp"
I1011 12:08:01.216519 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
I1011 12:08:11.215193 1 server.go:188] "Failed probe" probe="metric-storage-ready" err="not metrics to serve"
E1011 12:08:12.715475 1 scraper.go:139] "Failed to scrape node" err="Get \"https://192.168.0.22:10250/stats/summary?only_cpu_and_memory=true\": x509: cannot validate certificate for 192.168.0.22 because it doesn't contain any IP SANs" node="aivpp"
问题原因: kubelet 的10250端口使用的是https协议,链接时需要验证tls证书 blog.csdn.net/avatar_2009…
fix方案: 添加忽略kubelet tls证书验证 vi values.yaml
args:
# 默认参数
- --secure-port=4443
- --cert-dir=/tmp
- --kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
- --kubelet-use-node-status-port
- --metric-resolution=15s
# 添加忽略 kubelet tls证书验证即可
- --kubelet-insecure-tls
参考:
github.com/kubernetes-…
验证问题修复:
更新升级 metrics-server
helm upgrade metrics-server . -n monitor
metrics-server 正常被拉起,kubelet证书验证失败问题修复完成
kubectl top 命令正常可用
kubectl top pod --sort-by=memory
kubectl top pod --sort-by=cpu
kubectl top node
卸载 metrics-server
helm uninstall metrics-server -n monitor
FAQ:
1. metrics-server 版本兼容性问题: 3.11.0 版本不兼容 k8s 1.18 需要 1.19+
helm fetch kubernetes-sigs.github.io/metrics-ser… metrics-server/metrics-server --version=3.11.0
tar -zxvf metrics-server-3.11.0.tgz
安装
helm install metrics-server . -n monitor
[root@xxx metrics-server]# helm install metrics-server . -n monitor
k8s版本不match
Error: unable to build kubernetes objects from release manifest: error validating "": error validating data: ValidationError(Deployment.spec.template.spec.containers[0].securityContext): unknown field "seccompProfile" in io.k8s.api.core.v1.SecurityContext