04. kube-controller-manager在创建deployment时如何处理数据

693 阅读8分钟

概述

我们在学习kubernetes除了要了解其相关概念外,我们还需要深入了解整个kubernetes的实现机制,如果还能了解其源码,那我们才算是对kubernetes比较熟悉。我将用kubernetes是如何生成一个deployment的资源,并且附带源码解读的方式讲解kubernetes的整个实现机制。

之前的文章

前面的文章提到过,一个depoyment资源的创建,由kubectl发送请求到apiserver,apiserver负责将请求的资源信息存入到etcd中。而实际对这个资源进行控制的是controller-manager。通过前面文章的分析,我们通过启动时,会启动各汇总controller,也会生成各种infomer,而infomer会去监听数据。而监听到数据后,又是怎么处理的呢?这篇文章将讲解一个controller-manager如何处理新建的deployment资源。

前言

controller-manager基本介绍

在控制器模式下,每次操作对象都会触发一次事件,然后 controller 会进行一次 syncLoop 操作,controller 是通过 informer 监听事件以及进行 ListWatch 操作的,informer 的基础知识我在前面一篇文章kube-controller-manager处理创建deployment资源之informer监听处理已经分析。我们在回顾下controller-manger的一个结构。

deployment 的本质是控制 replicaSet,replicaSet 会控制 pod,然后由 controller 驱动各个对象达到期望状态。

32b36ab99dbdf5526c1e98d6380f65fa-3506656.png

整个创建过程将涉及到

  • Deployment-controller-manager
  • Replicaset-controller-manager

处理数据

我们创建的是一个deployment资源,那问题来了:

  1. 最终是deployment资源是如何生成pod资源的呢?

我们还是先看deployment-controller-manager。

deployment-controller-manager

我们直接看源码addDeployment这个方法。

// controller代码中添加deployment的方法
func (dc *DeploymentController) addDeployment(obj interface{}) {
	d := obj.(*apps.Deployment)
	klog.V(4).InfoS("Adding deployment", "deployment", klog.KObj(d))
	dc.enqueueDeployment(d)
}

/// 将事件加入到workqueue中
func (dc *DeploymentController) enqueue(deployment *apps.Deployment) {
	key, err := controller.KeyFunc(deployment)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("couldn't get key for object %#v: %v", deployment, err))
		return
	}
  // 这里我们看下deploymentController的数据结构,可以发现workqueue
	dc.queue.Add(key)
}

对照DeploymentController的结构看

// DeploymentController is responsible for synchronizing Deployment objects stored
// in the system with actual running replica sets and pods.
type DeploymentController struct {
	// rsControl is used for adopting/releasing replica sets.
	rsControl     controller.RSControlInterface
	client        clientset.Interface
	eventRecorder record.EventRecorder

	// To allow injection of syncDeployment for testing.
	syncHandler func(dKey string) error
	// used for unit testing
	enqueueDeployment func(deployment *apps.Deployment)

	// dLister can list/get deployments from the shared informer's store
	dLister appslisters.DeploymentLister
	// rsLister can list/get replica sets from the shared informer's store
	rsLister appslisters.ReplicaSetLister
	// podLister can list/get pods from the shared informer's store
	podLister corelisters.PodLister

	// dListerSynced returns true if the Deployment store has been synced at least once.
	// Added as a member to the struct to allow injection for testing.
	dListerSynced cache.InformerSynced
	// rsListerSynced returns true if the ReplicaSet store has been synced at least once.
	// Added as a member to the struct to allow injection for testing.
	rsListerSynced cache.InformerSynced
	// podListerSynced returns true if the pod store has been synced at least once.
	// Added as a member to the struct to allow injection for testing.
	podListerSynced cache.InformerSynced

	// Deployments that need to be synced
  // 这里跟上面的分析呼应
	queue workqueue.RateLimitingInterface
}

我们使用到的是workqueue,这里也跟前面图中提到的呼应上了。实际上controller-manager会将该事件发送到workqueue,而到了workqueue有会怎么做呢?

// Run begins watching and syncing.
// 运行deploymentController的关键方法
func (dc *DeploymentController) Run(workers int, stopCh <-chan struct{}) {
	defer utilruntime.HandleCrash()
	defer dc.queue.ShutDown()

	klog.InfoS("Starting controller", "controller", "deployment")
	defer klog.InfoS("Shutting down controller", "controller", "deployment")
  
  // 1. 等待informer同步缓存
	if !cache.WaitForNamedCacheSync("deployment", stopCh, dc.dListerSynced, dc.rsListerSynced, dc.podListerSynced) {
		return
	}
  // 2. 启动5个gorutine执行worker方法
	for i := 0; i < workers; i++ {
		go wait.Until(dc.worker, time.Second, stopCh)
	}

	<-stopCh
}

// 3. worker方法中条用processNextWorkItem
// worker runs a worker thread that just dequeues items, processes them, and marks them done.
// It enforces that the syncHandler is never invoked concurrently with the same key.
func (dc *DeploymentController) worker() {
	for dc.processNextWorkItem() {
	}
}

func (dc *DeploymentController) processNextWorkItem() bool {
  // 4. 从队列中取出对象
	key, quit := dc.queue.Get()
	if quit {
		return false
	}
	defer dc.queue.Done(key)
  // 5. 执行sync操作
	err := dc.syncHandler(key.(string))
	dc.handleErr(err, key)

	return true
}

可以看出deployment-controller-manager运行后,会对workqueue进行监听。而真正处理的是syncDeployment,这里在前面的构造的时候就定义了。

func (dc *DeploymentController) syncDeployment(key string) error {
  // 解析key,从而获取namespace和name
	namespace, name, err := cache.SplitMetaNamespaceKey(key)
  ...
	// 从indexer里获取数据
	deployment, err := dc.dLister.Deployments(namespace).Get(name)
  ...
  // 构造一个deployment实例
	d := deployment.DeepCopy()
  // 查看关联数据
	everything := metav1.LabelSelector{}
	if reflect.DeepEqual(d.Spec.Selector, &everything) {
		dc.eventRecorder.Eventf(d, v1.EventTypeWarning, "SelectingAll", "This deployment is selecting all pods. A non-empty selector is required.")
		if d.Status.ObservedGeneration < d.Generation {
			d.Status.ObservedGeneration = d.Generation
			dc.client.AppsV1().Deployments(d.Namespace).UpdateStatus(context.TODO(), d, metav1.UpdateOptions{})
		}
		return nil
	}

	// List ReplicaSets owned by this Deployment, while reconciling ControllerRef
	// through adoption/orphaning.
  // 这里会判断是否有replicaset
	rsList, err := dc.getReplicaSetsForDeployment(d)
	if err != nil {
		return err
	}
	// List all Pods owned by this Deployment, grouped by their ReplicaSet.
	// Current uses of the podMap are:
	//
	// * check if a Pod is labeled correctly with the pod-template-hash label.
	// * check that no old Pods are running in the middle of Recreate Deployments.
  // 判断对应的pod情况
	podMap, err := dc.getPodMapForDeployment(d, rsList)
	if err != nil {
		return err
	}
  // 判断是否需要扩容
  scalingEvent, err := dc.isScalingEvent(d, rsList)
  if err != nil {
		return err
	}
  // 如果需要则进行扩容
	if scalingEvent {
		return dc.sync(d, rsList)
	}
  // 这里将会判断deployment的发布方式,最终创建的方法都在这里体现
	switch d.Spec.Strategy.Type {
	case apps.RecreateDeploymentStrategyType:
		return dc.rolloutRecreate(d, rsList, podMap)
  // 默认情况下我们使用的是滚动升级模式
	case apps.RollingUpdateDeploymentStrategyType:
		return dc.rolloutRolling(d, rsList)
	}
	return fmt.Errorf("unexpected deployment strategy type: %s", d.Spec.Strategy.Type)
	...
}

通过dc.sync(d, rsList)的函数追踪下去,在这段代码中将看到更新replicaset。

Kubernetes/pkg/controller/deployment/sync.go

func (dc *DeploymentController) scaleReplicaSet(rs *apps.ReplicaSet, newScale int32, deployment *apps.Deployment, scalingOperation string) (bool, *apps.ReplicaSet, error) {

	sizeNeedsUpdate := *(rs.Spec.Replicas) != newScale

	annotationsNeedUpdate := deploymentutil.ReplicasAnnotationsNeedUpdate(rs, *(deployment.Spec.Replicas), *(deployment.Spec.Replicas)+deploymentutil.MaxSurge(*deployment))

	scaled := false
	var err error
	if sizeNeedsUpdate || annotationsNeedUpdate {
		rsCopy := rs.DeepCopy()
		*(rsCopy.Spec.Replicas) = newScale
		deploymentutil.SetReplicasAnnotations(rsCopy, *(deployment.Spec.Replicas), *(deployment.Spec.Replicas)+deploymentutil.MaxSurge(*deployment))
    // 最终对replicaset进行更新
		rs, err = dc.client.AppsV1().ReplicaSets(rsCopy.Namespace).Update(context.TODO(), rsCopy, metav1.UpdateOptions{})
		if err == nil && sizeNeedsUpdate {
			scaled = true
			dc.eventRecorder.Eventf(deployment, v1.EventTypeNormal, "ScalingReplicaSet", "Scaled %s replica set %s to %d", scalingOperation, rs.Name, newScale)
		}
	}
	return scaled, rs, err
}

源码中可以看到,deployment会去判断是否存在replicaset,离最终的pod生成又近了一小步。而getReplicaSetsForDeployment则将两者产生关联关系。

func (dc *DeploymentController) getReplicaSetsForDeployment(d *apps.Deployment) ([]*apps.ReplicaSet, error) {
	...
  // deployment与relicaset产生关联关系
	cm := controller.NewReplicaSetControllerRefManager(dc.rsControl, d, deploymentSelector, controllerKind, canAdoptFunc)
	return cm.ClaimReplicaSets(rsList)
}

最终我们将要采用滚动升级的方式,并更新replicaset资源。滚动更新过程中主要是不断更新数据,判断pod的状态,以及更新pod的状态。

// rolloutRolling implements the logic for rolling a new replica set.
func (dc *DeploymentController) rolloutRolling(d *apps.Deployment, rsList []*apps.ReplicaSet) error {
	newRS, oldRSs, err := dc.getAllReplicaSetsAndSyncRevision(d, rsList, true)
	if err != nil {
		return err
	}
	allRSs := append(oldRSs, newRS)

	// Scale up, if we can.
	scaledUp, err := dc.reconcileNewReplicaSet(allRSs, newRS, d)
	if err != nil {
		return err
	}
	if scaledUp {
		// Update DeploymentStatus
		return dc.syncRolloutStatus(allRSs, newRS, d)
	}

	// Scale down, if we can.
	scaledDown, err := dc.reconcileOldReplicaSets(allRSs, controller.FilterActiveReplicaSets(oldRSs), newRS, d)
	if err != nil {
		return err
	}
	if scaledDown {
		// Update DeploymentStatus
		return dc.syncRolloutStatus(allRSs, newRS, d)
	}

	if deploymentutil.DeploymentComplete(d, &d.Status) {
		if err := dc.cleanupDeployment(oldRSs, d); err != nil {
			return err
		}
	}
	// Sync deployment status
	return dc.syncRolloutStatus(allRSs, newRS, d)
}

这里简单总结下,这个deployment-controller-manager的处理逻辑:

  1. 调用 getReplicaSetsForDeployment 获取集群中与 Deployment 相关的 ReplicaSet,若发现匹配但没有关联 deployment 的 rs 则通过设置 ownerReferences 字段与 deployment 关联,已关联但不匹配的则删除对应的 ownerReferences;
  2. 调用 getPodMapForDeployment 获取当前 Deployment 对象关联的 pod,并根据 rs.UID 对上述 pod 进行分类;
  3. 通过判断 deployment 的 DeletionTimestamp 字段确认是否为删除操作;
  4. 执行 checkPausedConditions检查 deployment 是否为pause状态并添加合适的condition
  5. 调用 getRollbackTo 函数检查 Deployment 是否有Annotations:"deprecated.deployment.rollback.to"字段,如果有,调用 dc.rollback 方法执行 rollback 操作;
  6. 调用 dc.isScalingEvent 方法检查是否处于 scaling 状态中;
  7. 最后检查是否为更新操作,并根据更新策略 RecreateRollingUpdate 来执行对应的操作;

RealRSControl

在deployment-controller-manager中,我们看到启动的是RealRSControl,竟然不是replicaset-controller-manager。那我们看看,它又做了什么呢?我们直接看下addReplicaSet函数

// addReplicaSet enqueues the deployment that manages a ReplicaSet when the ReplicaSet is created.
func (dc *DeploymentController) addReplicaSet(obj interface{}) {
	rs := obj.(*apps.ReplicaSet)

	if rs.DeletionTimestamp != nil {
		// On a restart of the controller manager, it's possible for an object to
		// show up in a state that is already pending deletion.
		dc.deleteReplicaSet(rs)
		return
	}

	// If it has a ControllerRef, that's all that matters.
	if controllerRef := metav1.GetControllerOf(rs); controllerRef != nil {
		d := dc.resolveControllerRef(rs.Namespace, controllerRef)
		if d == nil {
			return
		}
		klog.V(4).InfoS("ReplicaSet added", "replicaSet", klog.KObj(rs))
		dc.enqueueDeployment(d)
		return
	}

	// Otherwise, it's an orphan. Get a list of all matching Deployments and sync
	// them to see if anyone wants to adopt it.
	ds := dc.getDeploymentsForReplicaSet(rs)
	if len(ds) == 0 {
		return
	}
	klog.V(4).InfoS("Orphan ReplicaSet added", "replicaSet", klog.KObj(rs))
	for _, d := range ds {
		dc.enqueueDeployment(d)
	}
}

这里也会将该事件发送到workqueue中,这就意味着该deployment的相关事件,都会被监听到。并作出响应动作,主要是对状态进行一些更新。这里我将不详细阐述了。

replicaset-controller-manager

前面我们知道deployment产生了replicaset的update事件,那replicaset-controller-manager会怎么处理呢?这里replicaset自身的逻辑跟deployment很类似,我们主要看看其监听到addRs时的处理逻辑。

Kubernetes/pkg/controller/replicaset/replica_set.go

func (rsc *ReplicaSetController) syncReplicaSet(key string) error {
	startTime := time.Now()
	defer func() {
		klog.V(4).Infof("Finished syncing %v %q (%v)", rsc.Kind, key, time.Since(startTime))
	}()

	namespace, name, err := cache.SplitMetaNamespaceKey(key)
	if err != nil {
		return err
	}
  // 1. 根据ns/name 从informer cache中获取rs对象
  // 若 rs 已经被删除则直接删除 expectations 中的对象
	rs, err := rsc.rsLister.ReplicaSets(namespace).Get(name)
	if apierrors.IsNotFound(err) {
		klog.V(4).Infof("%v %v has been deleted", rsc.Kind, key)
		rsc.expectations.DeleteExpectations(key)
		return nil
	}
	if err != nil {
		return err
	}
  // 2. 判断该 rs 是否需要执行 sync 操作
	rsNeedsSync := rsc.expectations.SatisfiedExpectations(key)
	selector, err := metav1.LabelSelectorAsSelector(rs.Spec.Selector)
	if err != nil {
		utilruntime.HandleError(fmt.Errorf("error converting pod selector to selector: %v", err))
		return nil
	}
	// list all pods to include the pods that don't match the rs`s selector
	// anymore but has the stale controller ref.
	// TODO: Do the List and Filter in a single pass, or use an index.
  // 3. 获取所有pod list
	allPods, err := rsc.podLister.Pods(rs.Namespace).List(labels.Everything())
	if err != nil {
		return err
	}
	// Ignore inactive pods.
  // 4. 过滤掉异常 pod,处于删除状态或者 failed 状态的 pod 都为非 active 状态
	filteredPods := controller.FilterActivePods(allPods)

	// NOTE: filteredPods are pointing to objects from cache - if you need to
	// modify them, you need to copy it first.
  // 5. 检查所有 pod,根据 pod 并进行 adopt 与 release 操作,最后获取与该 rs 关联的 pod list
	filteredPods, err = rsc.claimPods(rs, selector, filteredPods)
	if err != nil {
		return err
	}

  // 6. 若需要 sync 则执行 manageReplicas 创建/删除 pod
	var manageReplicasErr error
	if rsNeedsSync && rs.DeletionTimestamp == nil {
		manageReplicasErr = rsc.manageReplicas(filteredPods, rs)
	}
	rs = rs.DeepCopy()
  // 7. 计算 rs 当前的 status
	newStatus := calculateStatus(rs, filteredPods, manageReplicasErr)

	// Always updates status as pods come up or die.
  // 8. 更新 rs status
	updatedRS, err := updateReplicaSetStatus(rsc.kubeClient.AppsV1().ReplicaSets(rs.Namespace), rs, newStatus)
	if err != nil {
		// Multiple things could lead to this update failing. Requeuing the replica set ensures
		// Returning an error causes a requeue without forcing a hotloop
		return err
	}
	// Resync the ReplicaSet after MinReadySeconds as a last line of defense to guard against clock-skew.
  // 9. 判断是否需要将rs加入到延迟队列中
	if manageReplicasErr == nil && updatedRS.Spec.MinReadySeconds > 0 &&
		updatedRS.Status.ReadyReplicas == *(updatedRS.Spec.Replicas) &&
		updatedRS.Status.AvailableReplicas != *(updatedRS.Spec.Replicas) {
		rsc.queue.AddAfter(key, time.Duration(updatedRS.Spec.MinReadySeconds)*time.Second)
	}
	return manageReplicasErr
}

// 创建/删除 pod的关键方法
func (rsc *ReplicaSetController) manageReplicas(filteredPods []*v1.Pod, rs *apps.ReplicaSet) error {
	  ...
    // 这里的方法将创建pod
		successfulCreations, err := slowStartBatch(diff, controller.SlowStartInitialBatchSize, func() error {
			err := rsc.podControl.CreatePods(rs.Namespace, &rs.Spec.Template, rs, metav1.NewControllerRef(rs, rsc.GroupVersionKind))
			if err != nil {
				if apierrors.HasStatusCause(err, v1.NamespaceTerminatingCause) {
					// if the namespace is being terminated, we don't have to do
					// anything because any creation will fail
					return nil
				}
			}
			return err
		})
  ...
}

syncReplicaSet 是 controller 的核心方法,它会驱动 controller 所控制的对象达到期望状态,主要逻辑如下所示:

  1. 根据 ns/name 获取 rs 对象;
  2. 调用 expectations.SatisfiedExpectations 判断是否需要执行真正的 sync 操作;
  3. 获取所有 pod list;
  4. 根据 pod label 进行过滤获取与该 rs 关联的 pod 列表,对于其中的孤儿 pod 若与该 rs label 匹配则进行关联,若已关联的 pod 与 rs label 不匹配则解除关联关系;
  5. 调用 manageReplicas 进行同步 pod 操作,add/del pod;
  6. 计算 rs 当前的 status 并进行更新;
  7. 若 rs 设置了 MinReadySeconds 字段则将该 rs 加入到延迟队列中;

manageReplicas是syncReplicaSet中的关键调用,它会计算 replicaSet 需要创建或者删除多少个 pod 并调用 apiserver 的接口进行操作,在此阶段仅仅是调用 apiserver 的接口进行创建,并不保证 pod 成功运行,如果在某一轮,未能成功创建的所有 Pod 对象,则不再创建剩余的 pod。一个周期内最多只能创建或删除 500 个 pod,若超过上限值未创建完成的 pod 数会在下一个 syncLoop 继续进行处理。

由于rs对应创建的pod是一对多的情况,比较容易存在异常情况,基于此,在创建pod的过程中,需要有更多的考虑。这里rs是引入了一个expectations机制。在执行manageReplicas之前有一个函数rsc.expectations.SatisfiedExpectations(key)来判断是否需要执行。

// SatisfiedExpectations returns true if the required adds/dels for the given controller have been observed.
// Add/del counts are established by the controller at sync time, and updated as controllees are observed by the controller
// manager.
// 在syncRepliceSet方法中调用到的函数
func (r *ControllerExpectations) SatisfiedExpectations(controllerKey string) bool {
  // 1.若该key存在时,判断是否满足条件或者是否超过同步周期
	if exp, exists, err := r.GetExpectations(controllerKey); exists {
    // 3. 若add <= 0 且 del <= 0 说明本地观察的状态已经为期望状态了
		if exp.Fulfilled() {
			klog.V(4).Infof("Controller expectations fulfilled %#v", exp)
			return true
    // 4. 判断key是否过期,ExpectationsTimeout默认为5*time.Minute
		} else if exp.isExpired() {
			klog.V(4).Infof("Controller expectations expired %#v", exp)
			return true
		} else {
			klog.V(4).Infof("Controller still waiting on expectations %#v", exp)
			return false
		}
	} else if err != nil {
		klog.V(2).Infof("Error encountered while checking expectations %#v, forcing sync", err)
	} else {
    // 2. 该rs可能为新创建的,需要进行sync
		// When a new controller is created, it doesn't have expectations.
		// When it doesn't see expected watch events for > TTL, the expectations expire.
		//	- In this case it wakes up, creates/deletes controllees, and sets expectations again.
		// When it has satisfied expectations and no controllees need to be created/destroyed > TTL, the expectations expire.
		//	- In this case it continues without setting expectations till it needs to create/delete controllees.
		klog.V(4).Infof("Controller %v either never recorded expectations, or the ttl expired.", controllerKey)
	}
	// Trigger a sync if we either encountered and error (which shouldn't happen since we're
	// getting from local store) or this controller hasn't established expectations.
	return true
}

整个调用过程也可以这张图来简单展示。

                                    SatisfiedExpectations
                                    (expectations 中不存在
                                     rsKey,rsNeedsSync
                                     为 true)
                                              |              判断 add/del pod
                                              |                     |
                                              ||             创建 expectations 对象,
                                              |             并设置 add/del 值
                                              ∨                     |
create rs --> syncReplicaSet -->       manageReplicas  -->          ∨
                                       (为 rs 创建 pod)       调用 slowStartBatch 批量创建 pod/
                                              |               删除筛选出的多余 pod
                                              |                     |
                                              ||               更新 expectations 对象
                                              ∨
                                    updateReplicaSetStatus
                                    (更新 rs 的 status
                                    subResource)
expectations 机制

前面在syncReplicaSet提到过expectations,那expectations到底气什么作用呢?

rs 除了有 informer 的缓存外,还有一个本地缓存就是 expectations,expectations 会记录 rs 所有对象需要 add/del 的 pod 数量,若两者都为 0 则说明该 rs 所期望创建的 pod 或者删除的 pod 数已经被满足,若不满足则说明某次在 syncLoop 中创建或者删除 pod 时有失败的操作,则需要等待 expectations 过期后再次同步该 rs。

但是所有的更新事件是否都需要执行 sync 操作?对于除 rs.Spec.Replicas 之外的更新操作其实都没必要执行 sync 操作,因为 spec 其他字段和 status 的更新都不需要创建或者删除 pod。

以下触发 replicaSet 对象发生同步事件的条件:

  1. 与 rs 相关的:AddRS、UpdateRS、DeleteRS;
  2. 与 pod 相关的:AddPod、UpdatePod、DeletePod;
  3. informer 二级缓存的同步;

具体我们看看源码

type UIDTrackingControllerExpectations struct {
	ControllerExpectationsInterface
	// TODO: There is a much nicer way to do this that involves a single store,
	// a lock per entry, and a ControlleeExpectationsInterface type.
	uidStoreLock sync.Mutex
	// Store used for the UIDs associated with any expectation tracked via the
	// ControllerExpectationsInterface.
	uidStore cache.Store
}

我们可以看到Expectation本质上也是一个缓存服务。我们以AddPod为例,看看Exceptation会做如何处理。

// When a pod is created, enqueue the replica set that manages it and update its expectations.
func (rsc *ReplicaSetController) addPod(obj interface{}) {
	...
	// If it has a ControllerRef, that's all that matters.
	if controllerRef := metav1.GetControllerOf(pod); controllerRef != nil {
    ...
    // 更新 rsKey 的 expectations,将其 add 值 -1
		rsc.expectations.CreationObserved(rsKey)
		rsc.enqueueReplicaSet(rs)
		return
	}
  ...
}

// CreationObserved atomically decrements the `add` expectation count of the given controller.
func (r *ControllerExpectations) CreationObserved(controllerKey string) {
	r.LowerExpectations(controllerKey, 1, 0)
}

// Decrements the expectation counts of the given controller.
func (r *ControllerExpectations) LowerExpectations(controllerKey string, add, del int) {
	if exp, exists, err := r.GetExpectations(controllerKey); err == nil && exists {
		exp.Add(int64(-add), int64(-del))
		// The expectations might've been modified since the update on the previous line.
		klog.V(4).Infof("Lowered expectations %#v", exp)
	}
}

从前面的代码可以看到,在 sync 操作真正开始之前,依据 expectations 机制进行判断,确定是否要真正地启动一次 sync,因为在 eventHandler 阶段也会更新 expectations 值,从上面的 eventHandler 中可以看到在 addPod 中会调用 rsc.expectations.CreationObserved 更新 rsKey 的 expectations,将其 add 值 -1,而在 deletePod 中调用 rsc.expectations.DeletionObserved 将其 del 值 -1。所以等到 sync 时,若 controllerKey(name 或者 ns/name)满足 expectations 机制则进行 sync 操作,而 updatePod 并不会修改 expectations,所以,expectations 的设计就是当需要创建或删除 pod 才会触发对应的 sync 操作,expectations 机制的目的就是减少不必要的 sync 操作。

小结

  1. 创建deployment会先由deployment-controller-manager核心方法syncDeployment创建Replicaset,再由replicaSet-controller-manager的核心方法syncReplicaSet创建对应的pod
  2. controller-manager是通过监听informer的事件,将事件添加到workqueue中,再由worker将队列中的事件取出,进行syncHandler。
  3. deployment-controller-manager中的rs对象是RealRSControl
  4. ReplicaSet-controller-manager中引入了expectations 机制,其目的就是减少不必要的 sync 操作

最后留下两个问题,待我们后面来弄清楚。

  1. kube-controller-manager中是否存在pod-controller-manager?
  2. pod的状态又是如何控制的呢?

结束语

kubernetes庖丁解牛系列中,kube-controller-manager分成了两部分进行分析,此篇主要介绍了controller-manager是如何处理监听数据,大家可以好好研究,如果想开发operator那就更要了解其中原理了。文章中必然会有一些不严谨的地方,还希望大家包涵,大家吸取精华(如果有的话),去其糟粕。如果大家感兴趣可以关注我的公众号:gungunxi。我的微信号:lcomedy2021

参考文档