对于 kubernetes 集群的控制平面组件,监控是必要的, 他可以帮助我们获取到集群的整体负载压力,并在核心组件出问题的时候配合告警让管理员及时发现问题,及时处理,更稳定的保证集群的生命周期。
一、Prometheus 如何自动发现 Kubernetes Metrics 接口?
prometheus 收集 kubernetes 集群中的指标有两种方式,一种是使用 crd(servicemonitors.monitoring.coreos.com)的方式,主要通过标签匹配;另一种是通过 scrape_config,支持根据配置好的"relabel_configs"中的具体目标, 进行不断拉取(拉取间隔为"scrape_interval")
- 配置权限:
k8s 中 RBAC 支持授权资源对象的权限,比如可以 get、list、watch 集群中的 pod,还支持直接赋予对象访问 api 路径的权限,比如获取/healthz, /api 等, 官方对于 non_resource_urls 的解释如下:
non_resource_urls - (Optional) NonResourceURLs is a set of partial urls that a user should have access to. *s are allowed, but only as the full, final step in the path Since non-resource URLs are not namespaced, this field is only applicable for ClusterRoles referenced from a ClusterRoleBinding. Rules can either apply to API resources (such as "pods" or "secrets") or non-resource URL paths (such as "/api"), but not both.
既然 prometheus 要主动抓取指标,就必须对他使用的 serviceaccount 提前进行 RBAC 授权:
# clusterrole.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: monitor
rules:
- apiGroups: [""]
resources:
- nodes
- pods
- endpoints
- services
verbs: ["get", "list", "watch"]
- nonResourceURLs: ["/metrics"]
verbs: ["get"]
# clusterrolebinding
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: prometheus-api-monitor
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: monitor
subjects:
- kind: ServiceAccount
name: prometheus-operator-nx-prometheus
namespace: monitor
- 获取 apiserver 自身的 metric 信息:
prometheus 中配置"scrape_config", 或者 prometheus-operator 中配置"additionalScrapeConfigs", 配置获取 default 命名空间下的 kubernetes endpoints
- job_name: 'kubernetes-apiservers'
kubernetes_sd_configs:
- role: endpoints
scheme: https
tls_config:
ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
relabel_configs:
- source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
action: keep
regex: default;kubernetes;https
- 获取 controller-manager、scheduler 的 metric 信息:
controller-manager 和 scheduler 因为自身暴露 metric 接口,需要修改对应 manifests 下的静态 pod 文件,添加匹配的 annotations 即可完成抓取:
# prometheus端配置
kubernetes_sd_configs:
- role: pod
relabel_configs:
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_scrape]
action: "keep"
regex: "true"
- source_labels: [__address__, __meta_kubernetes_pod_annotation_prometheus_io_port]
action: replace
regex: ([^:]+)(?::\d+)?;(\d+)
replacement: $1:$2
target_label: __address__
- source_labels: [__meta_kubernetes_pod_annotation_prometheus_io_path]
action: replace
target_label: __metrics_path__
regex: "(.+)"
- action: labelmap
regex: __meta_kubernetes_pod_label_(.+)
- source_labels: [__meta_kubernetes_namespace]
action: replace
target_label: kubernetes_namespace
- source_labels: [__meta_kubernetes_pod_name]
action: replace
target_label: kubernetes_pod_name
# controller-manager配置:
metadata:
annotations:
prometheus_io_scrape: "true"
prometheus.io/port: "10252"
# scheduler配置:
metadata:
annotations:
prometheus_io_scrape: "true"
prometheus.io/port: "10251"
- 获取 etcd 的 metric 信息:
etcd 是跑在物理机上的,所以我们先创建对应的 endpoints 绑定好 service,然后采用 servicemonitor 的方式去匹配获取 etcd 的监控指标:
# service.yaml
apiVersion: v1
kind: Service
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
spec:
type: ClusterIP
clusterIP: None
ports:
- name: port
port: 2379
protocol: TCP
# endpoint.yaml
apiVersion: v1
kind: Endpoints
metadata:
name: etcd-k8s
namespace: kube-system
labels:
k8s-app: etcd
subsets:
- addresses:
- ip: xx.xx.xx.xx
- ip: xx.xx.xx.xx
- ip: xx.xx.xx.xx
ports:
- name: port
port: 2379
protocol: TCP
# servicemonitor.yaml(需要配置好相关的证书)
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: etcd-k8s
namespace: monitor
labels:
k8s-app: etcd-k8s
release: prometheus-operator-nx
spec:
jobLabel: k8s-app
endpoints:
- port: port
interval: 30s
scheme: https
tlsConfig:
caFile: /ca.pem
certFile: /server.pem
keyFile: /server-key.pem
insecureSkipVerify: true
selector:
matchLabels:
k8s-app: etcd
namespaceSelector:
matchNames:
- kube-system
# 最后附上etcd secret的创建方法,将etcd证书挂载进入提供连接使用
apiVersion: v1
data:
ca.pem: xx
server.pem: xx
server-key.pem: xx
kind: Secret
metadata:
name: etcd-certs
namespace: monitor
type: Opaque
二、我该重点关注哪些 control plane 指标?
- apiserver: 其中计算延迟可以采用"percentiles"而不是平均数去更好的展示延迟出现情况
apiserver_request_duration_seconds: 计算读(Non-LIST)请求,读(LIST)请求,写请求的平均处理时间 apiserver_request_total: 计算 apiserver 的 QPS、计算读请求、写请求的成功率; 还可以计算请求错误数量以及错误码 apiserver_current_inflight_requests: 计算正在处理的读、写请求 apiserver_dropped_requests_total: 计算失败的请求
- controller-manager:
leader_election_master_status: 关注是否有 leader xxx_depth: 关注正在调和的控制队列深度
- scheduler:
leader_election_master_status: 关注是否有 leader scheduler_schedule_attempts_total: 帮助查看是否调度器不能正常工作; Number of attempts to schedule pods, by the result. 'unschedulable' means a pod could not be scheduled, while 'error' means an internal scheduler problem scheduler_e2e_scheduling_duration_seconds_sum: scheduler 调度延迟(参数弃用) rest_client_requests_total: client 请求次数(次重要); Number of HTTP requests, partitioned by status code, method, and host
- etcd:
etcd_server_has_leader: etcd 是否有 leader etcd_server_leader_changes_seen_total: etcd leader 切换的次数,如果太频繁可能是一些连接不稳定现象或者 etcd 集群负载过大 etcd_server_proposals_failed_total: 一个提议请求是需要完整走过 raft protocol 的,这个指标帮助我们提供请求出错次数,大多数情况是 etcd 选举 leader 失败或者集群缺乏选举的候选人 etcd_disk_wal_fsync_duration_seconds_sum/etcd_disk_backend_commit_duration_seconds_sum: etcd 磁盘存储了 kubernetes 的所有重要信息,如果磁盘同步有很大延迟会影响 kubernetes 集群的操作, 此指标提供了 etcd 磁盘同步的平均延迟 etcd_debugging_mvcc_db_total_size_in_bytes: etcd 各节点容量 etcd_network_peer_sent_bytes_total/etcd_network_peer_received_bytes_total: 可以计算 etcd 节点的发送/接收数据速率 grpc_server_started_total: 可以用于计算 etcd 各个方法的调用速率
最后采用的 kubernetes 维护界面由:apiserver 专用仪表盘、etcd 仪表盘、综合控制平面仪表盘和证书监控组成;并且在维护使用过程中根据不同参数,不断调整, 够用就行。
参考链接:
registry.terraform.io/providers/h… prometheus.io/docs/promet… www.datadoghq.com/blog/kubern… sysdig.com/blog/monito… sysdig.com/blog/monito…