Kubernetes Healthcheck

250 阅读3分钟

Kubernetes Healthcheck

对 Pod 的健康状态检查可以通过两类探针来检查:LivenessProbe 和 ReadinessProbe。

  • LivenessProbe :用于判断容器是否存活,如果 LivenessProbe 探针探测到容器不健康,则 kubelet 将杀掉容器,并根据容器的启动策略做相应的处理,如果一个容器不包括 LivenessProbe 探针,kubelet 认为该容器的 LivenessProbe 探针返回的值永远是 Success。
  • ReadinessProbe:用于判断容器是否启动完成(ready 状态),可以接受请求,如果 ReadinessProbe 探针检测到失败,则 Pod 的状态则被更改。

1)ExecAction :容器内不执行一个命令,如果该命令的返回码为 0 ,则表示容器健康,15秒探测一次,超时1秒后会重启服务。

kind: ...``spec:``  ``containers:``  ``- name: ..``    ``livenessProbe:``     ``exec``:``     ``command``:``      ``- ``cat``      ``- ``/tmp/health``      ``initialDelaySeconds: 15``      ``timeoutSeconds: 1 

2)TCPSocketAction:通过容器的 IP 地址和端口号执行 TCP 检查,如果能够建立 TCP 连接,则表明容器健康。

...``spec:``  ``containers:``  ``- name:``  ``...``    ``livenessProbe:``      ``tcpSocket:``        ``port: 80``      ``initialDelaySeconds: 30``      ``timeoutSeconds: 1

3)HTTPGETAction:通过容器的 IP 地址,端口及路径调用 HTTP GET 方法,如果响应的状态码大于等于 200 且小于 400,则任务健康,访问的路径是 localhost:80/_status/healthz

...``spec:``  ``containers:``  ``- ports:``  ``...``    ``livenessProbe:``      ``httpGet:``        ``path: ``/_status/healthz``        ``port: 80``      ``initialDelaySeconds: 30``      ``timeoutSeconds: 1

它们的含义分别如下:

  • initialDelaySeconds:启动容器后进行首次健康检查的等待时间,单位为 s 。

  • timeutSeconds:健康检查发送请求后等待相应的超时时间,单位为 s。当超时发生时,kubeler 会认为容器已经无法提供服务,将会重启该容器。

  • initialDelaySeconds:检查开始执行的时间,以容器启动完成为起点计算

  • periodSeconds:检查执行的周期,默认为10秒,最小为1秒

  • timeoutSeconds:检查超时的时间,默认为1秒,最小为1秒

  • successThreshold:从上次检查失败后重新认定检查成功的检查次数阈值(必须是连续成功),默认为1

  • failureThreshold:从上次检查成功后认定检查失败的检查次数阈值(必须是连续失败),默认为1

  • httpGet的属性

    • host:主机名或IP
    • scheme:链接类型,HTTP或HTTPS,默认为HTTP
    • path:请求路径
    • httpHeaders:自定义请求头
    • port:请求端口

定义说明如下:

$ kubectl explain pod.spec.containers.livenessProbe``KIND:     Pod``VERSION:  v1``RESOURCE: livenessProbe <Object>``DESCRIPTION:``     ``Periodic probe of container liveness. Container will be restarted ``if the``     ``probe fails. Cannot be updated. More info:``     ``https:``//kubernetes``.io``/docs/concepts/workloads/pods/pod-lifecycle``#container-probes``     ``Probe describes a health check to be performed against a container to``     ``determine whether it is alive or ready to receive traffic.``FIELDS:``   ``exec <Object>``     ``One and only one of the following should be specified. Exec specifies the``     ``action to take.``   ``failureThreshold <integer>``     ``Minimum consecutive failures ``for the probe to be considered failed after``     ``having succeeded. Defaults to 3. Minimum value is 1.``   ``httpGet  <Object>``     ``HTTPGet specifies the http request to perform.``   ``initialDelaySeconds  <integer>``     ``Number of seconds after the container has started before liveness probes``     ``are initiated. More info:``     ``https:``//kubernetes``.io``/docs/concepts/workloads/pods/pod-lifecycle``#container-probes``   ``periodSeconds    <integer>``     ``How often (``in seconds) to perform the probe. Default to 10 seconds. Minimum``     ``value is 1.``   ``successThreshold <integer>``     ``Minimum consecutive successes ``for the probe to be considered successful``     ``after having failed. Defaults to 1. Must be 1 ``for liveness. Minimum value``     ``is 1.``   ``tcpSocket    <Object>``     ``TCPSocket specifies an action involving a TCP port. TCP hooks not yet``     ``supported``   ``timeoutSeconds   <integer>``     ``Number of seconds after ``which the probe ``times out. Defaults to 1 second.``     ``Minimum value is 1. More info:``     ``https:``//kubernetes``.io``/docs/concepts/workloads/pods/pod-lifecycle``#container-probes

startupProbe

新增 startupProbe, 主要解决 livenessProbe 启动时,如果无法正常启动或服务启动时间较长引起的容器重新启动的问题。

apiVersion: apps``/v1``kind: Deployment``metadata:``  ``name: busybox-lifecycles-nginx-``sleep``spec:``  ``replicas: 2``  ``selector:``    ``matchLabels:``      ``app: busybox-lifecycles-nginx-``sleep``      ``env``-o: nginx-``sleep``  ``template:``    ``metadata:``      ``labels:``        ``app: busybox-lifecycles-nginx-``sleep``        ``env``-o: nginx-``sleep``      ``annotations:``        ``consul.hashicorp.com``/connect-inject``: ``"true"``    ``spec:``      ``terminationGracePeriodSeconds: 120``      ``containers:``      ``- image: slzcc``/terminal-ctl``:ubuntu-20.04``        ``imagePullPolicy: Always``        ``command``:``          ``- nginx``          ``- -g``          ``- daemon off;``        ``name: busybox``        ``livenessProbe:``          ``exec``:``            ``command``:``            ``- ``/bin/bash``            ``- -c``            ``- nc -z -``v -n 127.0.0.1 80``          ``failureThreshold: 10``          ``initialDelaySeconds: 5``          ``periodSeconds: 10``          ``successThreshold: 1``          ``timeoutSeconds: 5``        ``readinessProbe:``          ``exec``:``            ``command``:``            ``- ``/bin/bash``            ``- -c``            ``- nc -z -``v -n 127.0.0.1 80``          ``initialDelaySeconds: 5``          ``periodSeconds: 10``        ``startupProbe:``          ``httpGet:``            ``path: /``            ``port: 801``          ``failureThreshold: 3``          ``periodSeconds: 10``      ``restartPolicy: Always``---``apiVersion: v1``kind: Service``metadata:``  ``name: busybox-lifecycles-nginx-``sleep``  ``labels:``    ``app: busybox-lifecycles-nginx-``sleep``spec:``  ``ports:``   ``- name: http``     ``port: 80``     ``targetPort: 80``     ``protocol: TCP``  ``selector:``    ``app: busybox-lifecycles-nginx-``sleep