OpenKruise: PodUnavailableBudget

263 阅读1分钟

教程一览:Openkruise

doc :openkruise.io/zh/docs/use…

详细设计:github.com/openkruise/…

pkg/webhook/podunavailablebudget/

pkg/controller/podunavailablebudget/

pkg/webhook/pod/validating/pod_unavailable_budget.go

Webhook-PUBValidating

这里是校验 pub 本身 spec 的合法性

只需要关注函数 Handle() =>validatingPodUnavailableBudgetFn

  1. 校验本身 spec 的合法性 validatePodUnavailableBudgetSpec
  2. 校验 pub 的矛盾性 validatePubConflict :pod 不能被多个 pub 管理

Controller

Reconcile

Reconcile 本身只是根据 pub 所对应的 pods的信息来更新 pub.status, 来保证其状态的正确性

Reconcile 只调用了函数 syncPodUnavailableBudget来操作

  1. control.GetPodsForPub() 拿到 pub 所管理的 pods

  2. desiredAvailable, err := r.getDesiredAvailableForPub(pub, expectedCount): 获取期望的 pod number

  3. disruptedPods, unavailablePods, recheckTime = r.buildDisruptedAndUnavailablePods(pods, pubClone, currentTime)

    1. disruptedPods:已经被驱逐/删除的 pods

      • 通过判断 pod 的 disruptionTime,来进行归类

        • expectedDeletion := disruptionTime.Time.Add(DeletionTimeout)
          if expectedDeletion.Before(currentTime) {
             r.recorder.Eventf(pod, corev1.EventTypeWarning, "NotDeleted", "Pod was expected by PUB %s/%s to be deleted but it wasn't",
             pub.Namespace, pub.Name)
          } else {
            resultDisruptedPods[pod.Name] = disruptionTime
            if recheckTime == nil || expectedDeletion.Before(*recheckTime) {
            recheckTime = &expectedDeletion
            }
          }
          
    2. unavailablePods 不可用的 pods

  4. 计算 avaliables pod: currentAvailable := countAvailablePods(pods, disruptedPods, unavailablePods, control),逻辑是从 pods 里面找不在disruptedPods 和 unavalilablepods 里的 pod 数

  5. 更新 pub.Spec.Status,

    1. unavailableAllowed := currentAvailable - desiredAvailable

Webhook-PodValidating-pub

pkg/webhook/pod/validating/pod_unavailable_budget

调用栈

|- Handle()
 |- validatingPodFn()
   |- case admissionv1.Update, admissionv1.Delete, admissionv1.Create:
      allowed, reason, err = podUnavailableBudgetValidatingPod()

podUnavailableBudgetValidatingPod

  1. 根据 pod 获取 workload: workload, err := p.finders.GetScaleAndSelectorForRef(ref.APIVersion, ref.Kind, newPod.Namespace, ref.Name, ref.UID)

  2. 这几种状态无需经过 pub 处理判断

    1. pod.Status.Phase == Success
    2. pod.Status.Phase == Failed
    3. pod.Status.Phase == Pending
    4. pod.Status.Phase == ""
    5. !pod.ObjectMeta.DeletionTimestamp.IsZero() (已经被标记删除)
  3. 根据 pod 获取 pub : pub, err := pubcontrol.GetPodUnavailableBudgetForPod(p.Client, p.finders, newPod)

  4. 通过 control.IsPodUnavailableChanged(oldPod, newPod) 判断操作是否会使得 pod unavailable condition 状态为 true; 如果返回值为 false,那么意味着 pod.spec 不变,则直接返回 true,跳过 pub 检查

    1. 其实就是看 pod spec 是否发生改变,如果发生改变则返回 true, 否则返回 false
  5. 返回函数 return pubcontrol.PodUnavailableBudgetValidatePod(p.Client, newPod, control, pubcontrol.Operation(req.Operation), dryRun)

},该函数返回一个 bool 值( allowed )表明了操作的合法性

  1. Pod notReady | pod in pub.Status.DisruptedPods,return true

  2. Err := checkAndDecrement(pod.Name, pubClone, operation)

    1. pub.Status.UnavailableAllowed <= 0: 报错 (操作非法)
    2. pub.Status.UnavailableAllowed--, return nil (操作合法)