概述
我们在学习kubernetes除了要了解其相关概念完,我们还需要深入了解整个kubernetes的实现机制是如何,如果还能了解其源码,那基本上我们才算是对kubernetes很熟悉吧。我将用kubernetes是如何生成一个deployment的资源,并且附带源码解读的方式讲解kubernetes的整个实现机制。
之前的文章
前面的文章提到过,一个depoyment资源的创建,由kubectl发送请求到apiserver,apiserver负责将请求的资源信息存入到etcd中。而实际对这个资源进行控制的是controller-manager。这篇文章将讲解一个controller-manager如何处理新建的deployment资源,由于此次涉及的内容比较多,我将kube-controller-manager分为两个部分进行分析,此篇主要分析controller-manager是如何监听到资源的变化的。
前言
controller-manager基本介绍
在控制器模式下,每次操作对象都会触发一次事件,然后 controller 会进行一次 syncLoop 操作,controller 是通过 informer 监听事件以及进行 ListWatch 操作的,关于 informer 的基础知识后续我再单独来讲。
deployment 的本质是控制 replicaSet,replicaSet 会控制 pod,然后由 controller 驱动各个对象达到期望状态。
整个创建过程将涉及到
- Informer
- Deployment-controller-manager
- Replicaset-controller-manager
源码分析
为了高质量的阅读源码,在此之前,我将带着几个问题进行阅读:
- controller-manager如何实现资源监听的呢?
- controller-manager与apiserver之间又是如何交互的呢?
- 各个controller-manager之间是如何进行协调保证deployment上的资源信息能够生成的呢?
另外我们前面了解到,controller-manager在k8s上主要干的活就是监听资源,然后对资源进行一定处理。所以源码我也将分两个模块来分析,再分析的过程中,找到前面提到问题的答案。
启动流程
在分析之前,我们还是有必要知道kube-controller-manager的启动过程。
// Run runs the KubeControllerManagerOptions. This should never exit.
func Run(c *config.CompletedConfig, stopCh <-chan struct{}) error {
...
// 构建一个客户端,apiserver的资源监听的方法就在客户端中。
clientBuilder, rootClientBuilder := createClientBuilders(c)
// 生成一个token
saTokenControllerInitFunc := serviceAccountTokenControllerStarter{rootClientBuilder: rootClientBuilder}.startServiceAccountTokenController
// controller-manager启动的方法,通过ControllerInitializersFunc依次启动controller
run := func(ctx context.Context, startSATokenController InitFunc, initializersFunc ControllerInitializersFunc) {
controllerContext, err := CreateControllerContext(c, rootClientBuilder, clientBuilder, ctx.Done())
if err != nil {
klog.Fatalf("error building controller context: %v", err)
}
controllerInitializers := initializersFunc(controllerContext.LoopMode)
if err := StartControllers(controllerContext, startSATokenController, controllerInitializers, unsecuredMux); err != nil {
klog.Fatalf("error starting controllers: %v", err)
}
controllerContext.InformerFactory.Start(controllerContext.Stop)
controllerContext.ObjectOrMetadataInformerFactory.Start(controllerContext.Stop)
close(controllerContext.InformersStarted)
select {}
}
// No leader election, run directly
if !c.ComponentConfig.Generic.LeaderElection.LeaderElect {
run(context.TODO(), saTokenControllerInitFunc, NewControllerInitializers)
panic("unreachable")
}
...
}
所有的controller都在controllerInitializers方法中,本次我们分析apply deployment的情况下的controller。所以我们重点分析DeploymentController和ReplicaSetController。
func NewControllerInitializers(loopMode ControllerLoopMode) map[string]InitFunc {
controllers := map[string]InitFunc{}
...
controllers["deployment"] = startDeploymentController
controllers["replicaset"] = startReplicaSetController
...
return controllers
}
func startDeploymentController(ctx ControllerContext) (http.Handler, bool, error) {
dc, err := deployment.NewDeploymentController(
// 这里有一个informer的工厂函数,用于生成各种资源的informer
ctx.InformerFactory.Apps().V1().Deployments(),
ctx.InformerFactory.Apps().V1().ReplicaSets(),
ctx.InformerFactory.Core().V1().Pods(),
ctx.ClientBuilder.ClientOrDie("deployment-controller"),
)
if err != nil {
return nil, true, fmt.Errorf("error creating Deployment controller: %v", err)
}
go dc.Run(int(ctx.ComponentConfig.DeploymentController.ConcurrentDeploymentSyncs), ctx.Stop)
return nil, true, nil
}
这里我们先看下deployment-controller对象具体包含哪些。
// NewDeploymentController creates a new DeploymentController.
func NewDeploymentController(dInformer appsinformers.DeploymentInformer, rsInformer appsinformers.ReplicaSetInformer, podInformer coreinformers.PodInformer, client clientset.Interface) (*DeploymentController, error) {
// 生成事件实例,后续用于记录相关事件
eventBroadcaster := record.NewBroadcaster()
eventBroadcaster.StartStructuredLogging(0)
eventBroadcaster.StartRecordingToSink(&v1core.EventSinkImpl{Interface: client.CoreV1().Events("")})
if client != nil && client.CoreV1().RESTClient().GetRateLimiter() != nil {
// 生成client的客户端实例,并配置客户端限流
if err := ratelimiter.RegisterMetricAndTrackRateLimiterUsage("deployment_controller", client.CoreV1().RESTClient().GetRateLimiter()); err != nil {
return nil, err
}
}
// 生成deployment的controller
dc := &DeploymentController{
client: client,
eventRecorder: eventBroadcaster.NewRecorder(scheme.Scheme, v1.EventSource{Component: "deployment-controller"}),
queue: workqueue.NewNamedRateLimitingQueue(workqueue.DefaultControllerRateLimiter(), "deployment"),
}
// 生成replicaset的controller
dc.rsControl = controller.RealRSControl{
KubeClient: client,
Recorder: dc.eventRecorder,
}
// 生成deployment的informer
dInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: dc.addDeployment,
UpdateFunc: dc.updateDeployment,
// This will enter the sync loop and no-op, because the deployment has been deleted from the store.
DeleteFunc: dc.deleteDeployment,
})
// 生成replicaset的informer
rsInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
AddFunc: dc.addReplicaSet,
UpdateFunc: dc.updateReplicaSet,
DeleteFunc: dc.deleteReplicaSet,
})
// 生成pod的informer
podInformer.Informer().AddEventHandler(cache.ResourceEventHandlerFuncs{
DeleteFunc: dc.deletePod,
})
dc.syncHandler = dc.syncDeployment
dc.enqueueDeployment = dc.enqueue
dc.dLister = dInformer.Lister()
dc.rsLister = rsInformer.Lister()
dc.podLister = podInformer.Lister()
dc.dListerSynced = dInformer.Informer().HasSynced
dc.rsListerSynced = rsInformer.Informer().HasSynced
dc.podListerSynced = podInformer.Informer().HasSynced
return dc, nil
}
这里controller中会启动deployment,replicaset,pod的informer来监听数据。
// Run begins watching and syncing.
func (dc *DeploymentController) Run(workers int, stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
defer dc.queue.ShutDown()
klog.InfoS("Starting controller", "controller", "deployment")
defer klog.InfoS("Shutting down controller", "controller", "deployment")
if !cache.WaitForNamedCacheSync("deployment", stopCh, dc.dListerSynced, dc.rsListerSynced, dc.podListerSynced) {
return
}
for i := 0; i < workers; i++ {
go wait.Until(dc.worker, time.Second, stopCh)
}
<-stopCh
}
实际运行过程中,controller会启动worker去消费队列里的数据,后面的监听中会有一张图,可以配合理解下。deployment的controller-manager分析到这里,我们可以知道,deployment-controller-manager会生成deployment,relicaset和pod的informer。informer主要用于监听资源,那它到底如何做到的呢?
监听资源
原理介绍
controller中的传感器主要由Reflector, Informer, Indexer组成。
Reflector通过List&Watch kube-apiserver来获取k8s资源数据,获取到资源数据后,会在Delta队列放入一个包括资源对象信息本身以及资源对象事件类型的Delta记录。Informer不断从Delta队列中弹出Delta记录,一方面把事件交给事件回调函数,另一方面把资源对象交给Indexer。
Indexer把资源记录在一个缓存中,controller中的控制器主要由事件处理函数及worker组成事件处理函数会监听Informer中资源的新增、更新、删除事件,并根据控制器的逻辑决定是否需要处理。对于需要处理的事件,会把相关信息放到工作队列中,并由后续worker池中的worker来处理。
worker在处理资源对象时一般需要用资源对象的名字去缓存中重新获取最新的资源数据
client-go组件
Reflector
:reflector用来watch特定的k8s API资源。具体的实现是通过ListAndWatch
的方法,watch可以是k8s内建的资源或者是自定义的资源。当reflector通过watch API接收到有关新资源实例存在的通知时,它使用相应的列表API获取新创建的对象,并将其放入watchHandler函数内的Delta Fifo队列中。Informer
:informer从Delta Fifo队列中弹出对象。执行此操作的功能是processLoop。base controller的作用是保存对象以供以后检索,并调用我们的控制器将对象传递给它。Indexer
:索引器提供对象的索引功能。典型的索引用例是基于对象标签创建索引。 Indexer可以根据多个索引函数维护索引。Indexer使用线程安全的数据存储来存储对象及其键。 在Store中定义了一个名为MetaNamespaceKeyFunc
的默认函数,该函数生成对象的键作为该对象的<namespace> / <name>
组合。
源码解析
我们先看看informer是如何构造出来的。这使用了工厂模式的设计模式,该模式用于封装和管理对象的创建,是一种创建型模式。我们可以通过源码顺便学习下。
//k8s.io/client-go/informers/factory.go
// NewSharedInformerFactoryWithOptions constructs a new instance of a SharedInformerFactory with additional options.
func NewSharedInformerFactoryWithOptions(client kubernetes.Interface, defaultResync time.Duration, options ...SharedInformerOption) SharedInformerFactory {
// 生成工厂实例,自带client,informers等属性
factory := &sharedInformerFactory{
client: client,
namespace: v1.NamespaceAll,
defaultResync: defaultResync,
informers: make(map[reflect.Type]cache.SharedIndexInformer),
startedInformers: make(map[reflect.Type]bool),
customResync: make(map[reflect.Type]time.Duration),
}
// Apply all options
for _, opt := range options {
factory = opt(factory)
}
return factory
}
// InternalInformerFor returns the SharedIndexInformer for obj using an internal
// 通过该方法来产生各种类型的informer
func (f *sharedInformerFactory) InformerFor(obj runtime.Object, newFunc internalinterfaces.NewInformerFunc) cache.SharedIndexInformer {
f.lock.Lock()
defer f.lock.Unlock()
informerType := reflect.TypeOf(obj)
informer, exists := f.informers[informerType]
if exists {
return informer
}
resyncPeriod, exists := f.customResync[informerType]
if !exists {
resyncPeriod = f.defaultResync
}
informer = newFunc(f.client, resyncPeriod)
f.informers[informerType] = informer
return informer
}
通过工厂函数,最终传入生成deployment-informer的函数为,NewFilteredDeploymentInformer。其在k8s.io/client-go/tools/cache这个模块中。informer中包含ListWatch对资源的监听。
// NewFilteredDeploymentInformer constructs a new informer for Deployment type.
// Always prefer using an informer factory to get a shared informer instead of getting an independent
// one. This reduces memory footprint and number of connections to the server.
func NewFilteredDeploymentInformer(client kubernetes.Interface, namespace string, resyncPeriod time.Duration, indexers cache.Indexers, tweakListOptions internalinterfaces.TweakListOptionsFunc) cache.SharedIndexInformer {
// 最终生成一个SharedIndexInformer
return cache.NewSharedIndexInformer(
&cache.ListWatch{
ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
if tweakListOptions != nil {
tweakListOptions(&options)
}
return client.AppsV1().Deployments(namespace).List(context.TODO(), options)
},
WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
if tweakListOptions != nil {
tweakListOptions(&options)
}
return client.AppsV1().Deployments(namespace).Watch(context.TODO(), options)
},
},
&appsv1.Deployment{},
resyncPeriod,
indexers,
)
}
最终生成了一个sharedIndexInformer实例。我们再观察下该实例的结构。
type sharedIndexInformer struct {
// 索引器提供对象的索引功能
indexer Indexer
// 资源控制器
controller Controller
processor *sharedProcessor
cacheMutationDetector MutationDetector
listerWatcher ListerWatcher
// objectType is an example object of the type this informer is
// expected to handle. Only the type needs to be right, except
// that when that is `unstructured.Unstructured` the object's
// `"apiVersion"` and `"kind"` must also be right.
objectType runtime.Object
// resyncCheckPeriod is how often we want the reflector's resync timer to fire so it can call
// shouldResync to check if any of our listeners need a resync.
resyncCheckPeriod time.Duration
// defaultEventHandlerResyncPeriod is the default resync period for any handlers added via
// AddEventHandler (i.e. they don't specify one and just want to use the shared informer's default
// value).
defaultEventHandlerResyncPeriod time.Duration
// clock allows for testability
clock clock.Clock
started, stopped bool
startedLock sync.Mutex
// blockDeltas gives a way to stop all event distribution so that a late event handler
// can safely join the shared informer.
blockDeltas sync.Mutex
// Called whenever the ListAndWatch drops the connection with an error.
watchErrorHandler WatchErrorHandler
}
生成informer后,我们看看informer运行会发生什么呢?
func (s *sharedIndexInformer) Run(stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
// 生成DeltaFIFO实例
fifo := NewDeltaFIFOWithOptions(DeltaFIFOOptions{
KnownObjects: s.indexer,
EmitDeltaTypeReplaced: true,
})
cfg := &Config{
Queue: fifo,
ListerWatcher: s.listerWatcher,
ObjectType: s.objectType,
FullResyncPeriod: s.resyncCheckPeriod,
RetryOnError: false,
ShouldResync: s.processor.shouldResync,
// 注册回调函数 HandleDeltas,资源变更时,存到到本地Indexer
Process: s.HandleDeltas,
WatchErrorHandler: s.watchErrorHandler,
}
// controller初始化
func() {
s.startedLock.Lock()
defer s.startedLock.Unlock()
s.controller = New(cfg)
s.controller.(*controller).clock = s.clock
s.started = true
}()
// Separate stop channel because Processor should be stopped strictly after controller
processorStopCh := make(chan struct{})
var wg wait.Group
defer wg.Wait() // Wait for Processor to stop
defer close(processorStopCh) // Tell Processor to stop
// 检查缓存对象是否存在
wg.StartWithChannel(processorStopCh, s.cacheMutationDetector.Run)
// 运行process.Run方法
wg.StartWithChannel(processorStopCh, s.processor.run)
defer func() {
s.startedLock.Lock()
defer s.startedLock.Unlock()
s.stopped = true // Don't want any new listeners
}()
// 控制器运行
s.controller.Run(stopCh)
}
从源码看到,DeltaFIFO,Controller,indexer等在前面出现过的概念,这里终于出现了。其启动的 核心逻辑包括:
- DeltaFIFO的初始化
- Controller的初始化
- 运行process.Run方法,该方法将会启动processorListenner进行周期性的消费
- 运行controller.Run方法
我们再重点看看controller的run方法。
// Run begins processing items, and will continue until a value is sent down stopCh or it is closed.
// It's an error to call Run more than once.
// Run blocks; call via go.
func (c *controller) Run(stopCh <-chan struct{}) {
defer utilruntime.HandleCrash()
go func() {
<-stopCh
c.config.Queue.Close()
}()
// 生成了一个Reflector的实例,前面也提到过,这个对象很关键,主要用于监听apiserver的数据变化,并将数据放入DeltaFIFO中
r := NewReflector(
c.config.ListerWatcher,
c.config.ObjectType,
// 这里是将DeltaFIFO传入到store
c.config.Queue,
c.config.FullResyncPeriod,
)
r.ShouldResync = c.config.ShouldResync
r.WatchListPageSize = c.config.WatchListPageSize
r.clock = c.clock
if c.config.WatchErrorHandler != nil {
r.watchErrorHandler = c.config.WatchErrorHandler
}
c.reflectorMutex.Lock()
c.reflector = r
c.reflectorMutex.Unlock()
var wg wait.Group
// 运行Reflector,将调用reflector.ListAndWatch,执行r.List、r.watch、r.watchHandler,进行对etcd的缓存;
wg.StartWithChannel(stopCh, r.Run)
// 调用c.processLoop,reflector向queue里面添加数据,processLoop会不停去消费这里这些数据
wait.Until(c.processLoop, time.Second, stopCh)
wg.Wait()
}
// Run repeatedly uses the reflector's ListAndWatch to fetch all the
// objects and subsequent deltas.
// Run will exit when stopCh is closed.
func (r *Reflector) Run(stopCh <-chan struct{}) {
klog.V(2).Infof("Starting reflector %s (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
wait.BackoffUntil(func() {
if err := r.ListAndWatch(stopCh); err != nil {
r.watchErrorHandler(r, err)
}
}, r.backoffManager, true, stopCh)
klog.V(2).Infof("Stopping reflector %s (%s) from %s", r.expectedTypeName, r.resyncPeriod, r.name)
}
基于我们是创建一个deployment的思路去看,Refelctor中将会通过watch监听到新增deployment资源的事件,我直接摘出核心代码,监听watch的add事件做了什么。
k8s.io/client-go/tools/cache/reflector.go
// watchHandler watches w and keeps *resourceVersion up to date.
func (r *Reflector) watchHandler(start time.Time, w watch.Interface, resourceVersion *string, errc chan error, stopCh <-chan struct{}) error {
...
// 如果监听到是add事件,则将对象加入到sotre中。
case watch.Added:
err := r.store.Add(event.Object)
if err != nil {
utilruntime.HandleError(fmt.Errorf("%s: unable to add watch event object (%#v) to store: %v", r.name, event.Object, err))
}
...
这里体现了往store中存数据,而中store其实就是在前面实例Refelector中传入的DeltaFIFO。而DeltaFIFO中的数据如何处理呢?前面我们提到了controller的run方法中,除了Refelctor的run方法,还有一个wait.Until(c.processLoop, time.Second, stopCh)。我们看看那这个具体做了什么。
func (c *controller) processLoop() {
for {
// 这里可看到我们会将process从队列取出,而process,在前面看infomer结构时,其实就是Handler函数。
obj, err := c.config.Queue.Pop(PopProcessFunc(c.config.Process))
if err != nil {
if err == ErrFIFOClosed {
return
}
if c.config.RetryOnError {
// This is the safe way to re-enqueue.
c.config.Queue.AddIfNotPresent(obj)
}
}
}
}
那接下来看看DeltaFIFO中的队列中取出数据后如何处理。根据前面的原理图,加入DeltaFIFO中的是事件处理函数,那获得该函数后,直接处理是不是就可以了呢?
func (s *sharedIndexInformer) HandleDeltas(obj interface{}) error {
s.blockDeltas.Lock()
defer s.blockDeltas.Unlock()
// from oldest to newest
for _, d := range obj.(Deltas) {
...
// 现将数据存入indexer中
if err := s.indexer.Add(d.Object); err != nil {
return err
}
//调用s.processor.distribute方法,将调用Listener.add,负责将watch的资源传到listener;
s.processor.distribute(addNotification{newObj: d.Object}, false)
...
}
return nil
}
HandlerDeltas方法将事件存入到indexer中,同时还会将该资源进行通知。
func (p *sharedProcessor) distribute(obj interface{}, sync bool) {
p.listenersLock.RLock()
defer p.listenersLock.RUnlock()
if sync {
for _, listener := range p.syncingListeners {
//调用Listener.add,负责将watch的资源传到listener
listener.add(obj)
}
} else {
for _, listener := range p.listeners {
listener.add(obj)
}
}
}
看到这里processor的distribute的方法,我们就要回去继续看infomer中的wg.StartWithChannel(processorStopCh, s.processor.run)。我们来看看
func (p *sharedProcessor) run(stopCh <-chan struct{}) {
func() {
p.listenersLock.RLock()
defer p.listenersLock.RUnlock()
for _, listener := range p.listeners {
p.wg.Start(listener.run)
// 这里就与前面的listener.add相互呼应
p.wg.Start(listener.pop)
}
p.listenersStarted = true
}()
<-stopCh
p.listenersLock.RLock()
defer p.listenersLock.RUnlock()
for _, listener := range p.listeners {
close(listener.addCh) // Tell .pop() to stop. .pop() will tell .run() to stop
}
p.wg.Wait() // Wait for all .pop() and .run() to stop
}
我们继续看listener.run函数
func (p *processorListener) run() {
// this call blocks until the channel is closed. When a panic happens during the notification
// we will catch it, **the offending item will be skipped!**, and after a short delay (one second)
// the next notification will be attempted. This is usually better than the alternative of never
// delivering again.
stopCh := make(chan struct{})
wait.Until(func() {
for next := range p.nextCh {
switch notification := next.(type) {
case updateNotification:
p.handler.OnUpdate(notification.oldObj, notification.newObj)
case addNotification:
p.handler.OnAdd(notification.newObj)
case deleteNotification:
p.handler.OnDelete(notification.oldObj)
default:
utilruntime.HandleError(fmt.Errorf("unrecognized notification: %T", next))
}
}
// the only way to get here is if the p.nextCh is empty and closed
close(stopCh)
}, 1*time.Second, stopCh)
}
可以看出,processorListener只是接受到通知,进行处理即可。而处理这个事情还是要看这个handler是怎么处理的。源码看到这里,我们发现我们离监听资源的全过程还差一个handler。这个其实在前面的启动流程里已经看到了。我们继续看看增加deployment时,controller是怎么处理数据的呢?我将在下一遍文章分析,controller-manager的核心处理逻辑是怎样的。
小结
- kube-controller-manager将会启动多个controller服务,其中deployment-controller-manager将会启动多个informer进行资源的监听。
- 当api-server将deployment数据存入到etcd后,controller-manager通过reflector对数据进行监听,监听到事件后将数据存入DeltaFIFO中,也会存入到自己的缓存中。informer通过消费DeltaFIFO,将资源数据存入indexer中,同时将事件进行通知,由controller接受到通知后,将该事件发送到workerqueue中。而workerqueue中的数据如何进行处理,则是由controller-manager来控制。
- 代码中设计到的工厂模式,利用工厂模式创建各类informer,简化了informer实例的创建。
大家可以思考,deployment-controller-manager对监听到ADD的事件,会如何处理呢?
结束语
kubernetes庖丁解牛系列中,kube-controller-manager分成了两部分进行分析,此篇主要介绍了监听资源的原理,这一块在kubernetes的很多地方都会用到。文章中必然会有一些不严谨的地方,还希望大家包涵,大家吸取精华(如果有的话),去其糟粕。如果大家感兴趣可以关注我的公众号:gungunxi。我的微信号:lcomedy2021
参考文档
- kube-controller-manager中的infomer源码分析:blog.csdn.net/weixin_4359…
- infomer的源码解析:juejin.cn/post/687084…