Graceful node shutdown Kubernetes 使用 Lease API 将 kubelet 节点心

Node heartbeats

node Lease简介

Kubernetes 使用 Lease API 将 kubelet 节点心跳传递到 Kubernetes API 服务器。在此基础上，每个 kubelet 心跳都是对该 Lease 对象的 update 请求，更新该 Lease 的 spec.renewTime 字段。由此控制面能够检测到节点故障。

Node Controller简介

在节点不可达的情况下，在 Node 的 .status 中更新 Ready 状况。在这种情况下，节点控制器将 NodeReady 状况更新为 Unknown。

两者之间的关系

Kubelet is periodically computing NodeStatus every 10s (at it is now), but that will be independent from reporting status
Kubelet is reporting NodeStatus if:
a. there was a meaningful change in it (initially we can probably assume that every change is meaningful, including e.g. images on the node)
b. or it didn’t report it over last node-status-update-period seconds
Kubelet creates and periodically updates its own Lease object and frequency of those updates is independent from NodeStatus update frequency.

In the meantime, we will change NodeController to treat both updates of NodeStatus object as well as updates of the new Lease object corresponding to a given node as healthiness signal from a given Kubelet. This will make it work for both old and new Kubelets.

github.com/kubernetes/…

GracefulNodeShutdown

现状

Currently, when a node shuts down, pods do not follow the expected pod termination lifecycle and are not terminated gracefully which can cause issues for some workloads.

目的

Make kubelet aware of underlying node shutdown event and trigger pod termination with sufficient grace period to shutdown properly

处理节点shutdown的行为

Update the Node's Ready condition to false, with the reason Node is shutting down
Gracefully terminate all non critical system pods with a gracePeriodOverride computed as min(podSpec.terminationGracePeriodSeconds, ShutdownGracePeriod-ShutdownGracePeriodCriticalPods)
Gracefully terminate all critical system pods with gracePeriodOverride of ShutdownGracePeriodCriticalPods seconds

Kubelet will use the same existing killPod function to perform the termination of pods, using gracePeriodOverride to set the appropriate grace period. During the termination process, normal pod termination processes will apply, e.g. preStop Hooks will be called, SIGTERM to containers delivered, etc.

github.com/kubernetes/…

tolerationSeconds

概述

node.kubernetes.io/not-ready

node.kubernetes.io/unreachable

节点控制器还负责根据节点故障（例如节点不可访问或没有就绪）为其添加上述污点。

表示在给节点添加了上述污点之后， Pod 还能继续在节点上运行的时间。