阅读本节前请了解 Sidecar 的使用和功能

系列教程: Openkruise

openkruise doc ：openkruise.io/zh/docs/use…

SidecarSet

SidecarSet 将 sidecar 容器的定义和生命周期与业务容器解耦，通过 selector 注入容器到指定 pod

创建时注入
运行时原地升级[包括热升级]

e.g. vedb-proxy-agent

selector:
    matchLabels:
      cluster_name: vedbm-1gbjnnthtmwp
      component: proxy

Reconcile

初始化阶段

获取改 sidecarset 管理的所有 pods pods, err := p.getMatchingPods(sidecarSet)
注册最新的 revision latestRevision, collisionCount, err := p.registerLatestRevision(sidecarSet, pods)
1. 这个函数主要用于管理该 sidecarset 所对应的 revision 列表，用于保证 revisions 列表(revisions, err := p.historyController.ListControllerRevisions(sidecarSet, selector))中最新版本的正确性

刷新一遍此时 pod 的各种状态，并保存到 status 结构体中，结构体内容如下

// MatchedPods: all matched pods number
// UpdatedPods: updated pods number
// ReadyPods: ready pods number
// UpdatedReadyPods: updated and ready pods number
// UnavailablePods: MatchedPods - UpdatedReadyPods
return &appsv1alpha1.SidecarSetStatus{
        ObservedGeneration: sidecarset.Generation,
        MatchedPods:        matchedPods,
        UpdatedPods:        updatedPods,
        ReadyPods:          readyPods,
        UpdatedReadyPods:   updatedAndReady,
        LatestRevision:     latestRevision.Name,
        CollisionCount:     pointer.Int32Ptr(collisionCount),
    }

Do

流程非常清晰分为以下步骤

对于需要热升级的 container，将 empty_container 初始化
检查当前的 sidecarset 是否满足更新条件，如果不满足直接返回
1. 更新策略不为 NotUpdate: Spec.UpdateStrategy.Type != "NotUpdate"
2. 还没有更新完: status.UpdatedPods < status.MatchedPods
3. 不应处于暂停状态: sidecarSet.Spec.UpdateStrategy.Paused == false
更新 sidecar
1. 通过更新策略获取需要更新的 pods : upgradePods := NewStrategy().GetNextUpgradePods(control, pods)
2. Inplace 更新 container：函数 func updatePodSidecarContainer(control sidecarcontrol.SidecarControl, pod *corev1.Pod)
  1. 数据卷共享 & 环境变量共享
    1. 获取共享所有卷( spec.containers[i].shareVolumePolicy.type = enabled )
    2. 获取共享的环境变量
    3. 更新的 container 中的 volumes & env 信息，与上述获取的共享卷和envs合并
  2. 在 pod 结构体中更新容器信息，包括 image volumes envs 信息
3. 对于热升级容器，还应该在 pod 层面更新 annotations 以标注哪个 container 是有效的
4. Apply pod 更新到 k8s

热升级

apiVersion: apps.kruise.io/v1alpha1
kind: SidecarSet
metadata:
  name: hotupgrade-sidecarset
spec:
  selector:
    matchLabels:
      app: hotupgrade
  containers:
  - name: sidecar
    image: openkruise/hotupgrade-sample:sidecarv1
    imagePullPolicy: Always
    lifecycle:
      postStart:
        exec:
          command:
          - /bin/sh
          - /migrate.sh
    upgradeStrategy:
      upgradeType: HotUpgrade
      hotUpgradeEmptyImage: openkruise/hotupgrade-sample:empty

判断条件 isSidecarSetHasHotUpgradeContainer(sidecarSet)(即 sidecarContainer.UpgradeStrategy.UpgradeType == "HotUpgrade")
reset empty container
1. Empty container 是 pingpang 的，所以需要通过 pod annotation "kruise.io/sidecarset-working-hotupgrade-container" 来看谁是 work container
2. reset empty conainer 的 image & annatations by function flipPodSidecarContainer(control, pod)
执行热升级 p.updatePods(control, pods)=> updatePodSidecarAndHash(control, pod)
1. 与普通升级流程一样需要获取 envs & volums 信息
2. 注意函数 newContainer := control.UpgradeSidecarContainer(&sidecarContainer, pod)，这里面会对热升级容器生成 ping-pang 的名字，以满足上图所示交替使用的需求
3. 在 pod 结构体中更新 container 信息，另外再更新 annotations 以表明热更新容器的有效性
  1. kruise.io/sidecarset-working-hotupgrade-container
  2. version.sidecarset.kruise.io/
  3. versionalt.sidecarset.kruise.io/newContainer.Name
  4. versionalt.sidecarset.kruise.io/oldSidecar/

更新策略

Sidecar 有多种更新策略，代码入口都在 func (p *spreadingStrategy) GetNextUpgradePods，该函数会返回需要更新的 pods。

筛选所有符合条件的 pods

对所有的 pods 通过一下条件进行筛选

Pod 还没被更新：Pod 的修订版本和当前 sidecarset 最新版本不同
Pod selector 与 sidecar 匹配(这里注意，如果 selector==nil, 则始终认为匹配)
确定 sidecar 和 pod 版本匹配。由于 k8s 只允许 inplace update image, 所以 pod 和对应的 sidecar 中除了 image 的字段都应该是一样的。为了实现这一个校验，sidecar 和 pod 中都维护了一个 annotation "kruise.io/sidecarset-hash-without-image", 它记录除 image 之外字段的 hash 值。因此只需要比对这两个字段内容的 revision 即可
排序 pods [发布顺序控制]

waitUpgradedIndexes = SortUpdateIndexes(strategy, pods, waitUpgradedIndexes)

首先按照优先级排序

    //Sort Pods with default sequence
    //  - Unassigned < assigned
    //  - PodPending < PodUnknown < PodRunning
    //  - Not ready < ready
    //  - Been ready for empty time < less time < more time
    //  - Pods with containers with higher restart counts < lower restart counts
    //  - Empty creation time pods < newer pods < older pods
    sort.Slice(waitUpdateIndexes, sidecarcontrol.GetPodsSortFunc(pods, waitUpdateIndexes))

然后按照 scatter 策略进行打散

scatter := parseUpdateScatterTerms(strategy.ScatterStrategy, pods)
waitUpdateIndexes = updatesort.NewScatterSorter(scatter).Sort(pods, waitUpdateIndexes

计算本次需要更新的 pod 数

根据分批发布和最大不可用数量规则进行计算

needToUpgradeCount := calculateUpgradeCount(control, waitUpgradedIndexes, pods)
if needToUpgradeCount < len(waitUpgradedIndexes) {
  waitUpgradedIndexes = waitUpgradedIndexes[:needToUpgradeCount]
}

返回结果

把 waitUpgradedIndexes 中标记的 pods 作为结果返回

Container Launch Priority

使用 ContainerLaunchPriority 功能需要打开 PodWebhook feature-gate（默认就是打开的，除非显式关闭）。

优先级定义

入口函数： containerLaunchPriorityInitialization

首先会看 pod annotation 是否有 "apps.kruise.io/container-launch-priority == ordered", 如果有会生成一个递减的优先级数组，最大值为 0
如果在 container 中发现有环境变量 KRUISE_CONTAINER_PRIORITY，则优先级数组的所有值首先会被重置为 0，对应 container 的优先级被设置为该变量的数值 priority, priorityFlag, err := h.getPriority(pod)
此时我们已经有一个优先级数组了，数字越大代表优先级越高，再根据这个优先级数组向每个 container 中注入 KRUISE_CONTAINER_BARRIER 环境变量以此描述优先级 h.setPodEnv(priority, pod) 拼写规则

- name: KRUISE_CONTAINER_BARRIER
      valueFrom:
        configMapKeyRef:
          name: {pod-name}-barrier
          key: "p_0"

优先级启动

入口函数 func (r *ReconcileContainerLaunchPriority) Reconcile(_ context.Context, request reconcile.Request) (res reconcile.Result, err error)

Kruise 会查找 configMap {pod.Name)-barrier 是否存在，如果不存在会尝试创建
在初次创建时该 configmap 应该为空，所以每个 container 都会报错 CreateContainerConfigError ，这是因为找不到对应的 key
接着 kruise 根据 container 中我们在 “优先级定义” 流程中写入的 key 来判断应当启动哪个 container （函数 findNextPatchKey ==> getLaunchPriority）
1. 具体逻辑其实就是遍历所有还未 ready 的 container.env KRUISE_CONTAINER_BARRIER 的 key, 取其末尾的数字，例如 p_0 就会得到数字 0，找到当前最大的数字
根据上述步骤我们拿到了 unready containers 中最大优先级数字，再根据该数字拼接一个形如 p_0:true 的 kv 对 patch 到 configmap 里
对应容器的环境变量能从 cm 里读到 key, 则状态流转为 ready，开始创建

SidecarSet + PodLaunchPriority

SidecarSet

Reconcile

初始化阶段

Do

热升级

更新策略

筛选所有符合条件的 pods

排序 pods [发布顺序控制]

计算本次需要更新的 pod 数

返回结果

Container Launch Priority

优先级定义

优先级启动