kube-contoller-manager是什么
根据k8s官方的解释:
The Kubernetes controller manager is a daemon that embeds the core control loops shipped with Kubernetes. In applications of robotics and automation, a control loop is a non-terminating loop that regulates the state of the system. In Kubernetes, a controller is a control loop that watches the shared state of the cluster through the apiserver and makes changes attempting to move the current state towards the desired state.
controller其实就是一个事件驱动的状态机。常用的controller有
- node
- daemonset
- deployment
- statefulset
- service
- ...
它们的实现框架大同小异,业务逻辑各有不同而已。
walk-around
叫walk-around而不叫源码解析,是因为这个系列并不打算仔细细致的分析kube-controller-manager,而打算把目光放在它用到的周边的内容,比如dynamic informer,比如它的leader election机制,比如k8s的GC机制等等……为了不误导人,让人觉得看了之后并没有对kube-controller-manager了解多少,所以不叫源码解析了
controller-manager
controller-manager的入口在cmd/kube-controller-manager/controller-manager.go中,与其他组件类似,通过cobra初始化命令并真正执行。初始化命令的操作在app.NewControllerManagerCommand()
中,其中初始化参数的操作在options.NewKubeControllerManagerOptions()
中。
// NewKubeControllerManagerOptions creates a new KubeControllerManagerOptions with a default config.
func NewKubeControllerManagerOptions() (*KubeControllerManagerOptions, error) {
componentConfig, err := NewDefaultComponentConfig(ports.InsecureKubeControllerManagerPort)
if err != nil {
return nil, err
}
s := KubeControllerManagerOptions{
Generic: cmoptions.NewGenericControllerManagerConfigurationOptions(&componentConfig.Generic),
KubeCloudShared: cmoptions.NewKubeCloudSharedOptions(&componentConfig.KubeCloudShared),
ServiceController: &cmoptions.ServiceControllerOptions{
ServiceControllerConfiguration: &componentConfig.ServiceController,
},
AttachDetachController: &AttachDetachControllerOptions{
&componentConfig.AttachDetachController,
},
CSRSigningController: &CSRSigningControllerOptions{
&componentConfig.CSRSigningController,
},
DaemonSetController: &DaemonSetControllerOptions{
&componentConfig.DaemonSetController,
},
DeploymentController: &DeploymentControllerOptions{
&componentConfig.DeploymentController,
},
DeprecatedFlags: &DeprecatedControllerOptions{
&componentConfig.DeprecatedController,
},
EndpointController: &EndpointControllerOptions{
&componentConfig.EndpointController,
},
GarbageCollectorController: &GarbageCollectorControllerOptions{
&componentConfig.GarbageCollectorController,
},
HPAController: &HPAControllerOptions{
&componentConfig.HPAController,
},
JobController: &JobControllerOptions{
&componentConfig.JobController,
},
NamespaceController: &NamespaceControllerOptions{
&componentConfig.NamespaceController,
},
NodeIPAMController: &NodeIPAMControllerOptions{
&componentConfig.NodeIPAMController,
},
NodeLifecycleController: &NodeLifecycleControllerOptions{
&componentConfig.NodeLifecycleController,
},
PersistentVolumeBinderController: &PersistentVolumeBinderControllerOptions{
&componentConfig.PersistentVolumeBinderController,
},
PodGCController: &PodGCControllerOptions{
&componentConfig.PodGCController,
},
ReplicaSetController: &ReplicaSetControllerOptions{
&componentConfig.ReplicaSetController,
},
ReplicationController: &ReplicationControllerOptions{
&componentConfig.ReplicationController,
},
ResourceQuotaController: &ResourceQuotaControllerOptions{
&componentConfig.ResourceQuotaController,
},
SAController: &SAControllerOptions{
&componentConfig.SAController,
},
TTLAfterFinishedController: &TTLAfterFinishedControllerOptions{
&componentConfig.TTLAfterFinishedController,
},
SecureServing: apiserveroptions.NewSecureServingOptions().WithLoopback(),
InsecureServing: (&apiserveroptions.DeprecatedInsecureServingOptions{
BindAddress: net.ParseIP(componentConfig.Generic.Address),
BindPort: int(componentConfig.Generic.Port),
BindNetwork: "tcp",
}).WithLoopback(),
Authentication: apiserveroptions.NewDelegatingAuthenticationOptions(),
Authorization: apiserveroptions.NewDelegatingAuthorizationOptions(),
}
s.Authentication.RemoteKubeConfigFileOptional = true
s.Authorization.RemoteKubeConfigFileOptional = true
s.Authorization.AlwaysAllowPaths = []string{"/healthz"}
// Set the PairName but leave certificate directory blank to generate in-memory by default
s.SecureServing.ServerCert.CertDirectory = ""
s.SecureServing.ServerCert.PairName = "kube-controller-manager"
s.SecureServing.BindPort = ports.KubeControllerManagerPort
gcIgnoredResources := make([]garbagecollectorconfig.GroupResource, 0, len(garbagecollector.DefaultIgnoredResources()))
for r := range garbagecollector.DefaultIgnoredResources() {
gcIgnoredResources = append(gcIgnoredResources, garbagecollectorconfig.GroupResource{Group: r.Group, Resource: r.Resource})
}
s.GarbageCollectorController.GCIgnoredResources = gcIgnoredResources
return &s, nil
}
复制代码
这里几乎囊括了所有我们关心的kubernetes原生资源,如daemonset, statefulset, deployment等,还有我们感兴趣的机制,如GC等。如同其他组件一样,在new出一个controller-manager之后,通过Run()
来运行,stopCh设为了NeverStop
,毕竟它管理这所有的控制器,而控制器管理所有的资源,即使用户不想运行k8s了,它也要进行善后工作,不能被中断。
Run()
的源码太长,不贴在这里,在Run()
中,有几个tip:
- 最先启动的controller是
ServiceAccountTokenController
,因为它负责service account的资源维护。 - controller-manager启动前会查询apiserver的状态
- kube-controller-manager会和cloud-controller-manager一起工作,但只引用了cloud-controller-manager,保持两者的解耦
- 之前说到了Informer是k8s资源同步的基石,controller-manager不止用了sharedIndexInformer,还用了client-go中另一种informer: dynamicInformer,这个informer在GC和namespace中用到。
- leader-election值得研究
下文我们将一一解析各个感兴趣的点,这篇文章就先解析dynamic informer了。
dynamicInformer
回忆一下sharedinformer,它通常是通过clientset得到一个sharedInformerFactory,之后实例化出具体资源的Informer。然而如果我们想要同时监听多种资源呢?client-go提供dynamicInformer。
从controller-manager的使用来看,dynamic informer的使用方法和informer类似,如下面的代码所示
dynamicClient := dynamic.NewForConfigOrDie(rootClientBuilder.ConfigOrDie("dynamic-informers"))
dynamicInformers := dynamicinformer.NewDynamicSharedInformerFactory(dynamicClient, ResyncPeriod(s)())
复制代码
dynamic informer的源代码k8s.io/client-go/dynamic/dynamicinformer/informer.go中,其结构体为
type dynamicSharedInformerFactory struct {
client dynamic.Interface
defaultResync time.Duration
namespace string
lock sync.Mutex
informers map[schema.GroupVersionResource]informers.GenericInformer
// startedInformers is used for tracking which informers have been started.
// This allows Start() to be called multiple times safely.
startedInformers map[schema.GroupVersionResource]bool
tweakListOptions TweakListOptionsFunc
}
复制代码
对比一下sharedIndexInformer的结构体
type sharedInformerFactory struct {
client kubernetes.Interface
namespace string
tweakListOptions internalinterfaces.TweakListOptionsFunc
lock sync.Mutex
defaultResync time.Duration
customResync map[reflect.Type]time.Duration
informers map[reflect.Type]cache.SharedIndexInformer
// startedInformers is used for tracking which informers have been started.
// This allows Start() to be called multiple times safely.
startedInformers map[reflect.Type]bool
}
复制代码
两者的区别与联系有:
- 整体结构非常类似,都有client, namespace, tweakListOptions, resync, startedInformers等元素
- dynamicFactory的informer是GenericInformer,而这个genericInformer包含了
cache.SharedIndexInformer
与schema.GroupResource
- startedInformers中,sharedIndexInformer由于制定了类型,可以用reflect.Type作为key,而dynamicInformers用了GroupVersionResource作为key
那么问题就变成了:dynamicInformer如何与sharedIndexInformer联系起来?GVR在这里起到了什么作用?先看一下DynamicSharedInformerFactory实现了哪些方法
// DynamicSharedInformerFactory provides access to a shared informer and lister for dynamic client
type DynamicSharedInformerFactory interface {
Start(stopCh <-chan struct{})
ForResource(gvr schema.GroupVersionResource) informers.GenericInformer
WaitForCacheSync(stopCh <-chan struct{}) map[schema.GroupVersionResource]bool
}
复制代码
waitForCacheSync很好理解了,看一下start和forResource两个函数。首先是ForResource,在他的各个输入中,client很好理解,gvr一会儿再说,namespace指监听哪个命名空间,defaultResync不多说了,IndexFunc在之前的informer的解析中已经详细说过了,tweaListOptions不说了,所以看来把dynamicInformer与普通的informer联系起来的关系就在于GVR。
func (f *dynamicSharedInformerFactory) ForResource(gvr schema.GroupVersionResource) informers.GenericInformer {
// 省略很多
informer = NewFilteredDynamicInformer(f.client, gvr, f.namespace, f.defaultResync, cache.Indexers{cache.NamespaceIndex: cache.MetaNamespaceIndexFunc}, f.tweakListOptions)
f.informers[key] = informer
return informer
}
复制代码
GVR的数据结构是schema.GroupVersionResource
,包含三个东西:
- group: APIGroup,如""
- version: 如v1
- resource: 如pods
这三者可以没有歧义的描述一个kubernetes资源或CR资源,所以通过GCR,可以从dynamicInformer中显式的监听到想要监听的资源。而常用的生成一个GVR的函数是WithResource,如v1.SchemeGroupVersion.WithResource("deployments")
,它将生成一个GVR的结构体
WithResource(resource string) GroupVersionResource {
return GroupVersionResource{Group: gv.Group, Version: gv.Version, Resource: resource}
}
复制代码
此外,可以通过显式的指定GVR的各个元素,毕竟他们也都只是字符串而已。如果觉得裸操作结构体不够优雅,可以通过schema.ParseResourceArg()
,如schema.ParseResourceArg("deployments.v1.apps")
,这里schema调用的包是"k8s.io/apimachinery/pkg/runtime/schema"
。
// GroupVersionResource unambiguously identifies a resource. It doesn't anonymously include GroupVersion
// to avoid automatic coercion. It doesn't use a GroupVersion to avoid custom marshalling
type GroupVersionResource struct {
Group string
Version string
Resource string
}
复制代码
回到NewFilteredDynamicInformer
这个函数,它通过GVR生成了监听特定资源的sharedIndexInformer。我们要注意一下client.Resource(gvr)
将生成一个dynamicResourceClient
,它包含一个dynamic client,而dynamic client其实就是一个rest client。
func NewFilteredDynamicInformer(client dynamic.Interface, gvr schema.GroupVersionResource, namespace string, resyncPeriod time.Duration, indexers cache.Indexers, tweakListOptions TweakListOptionsFunc) informers.GenericInformer {
return &dynamicInformer{
gvr: gvr,
informer: cache.NewSharedIndexInformer(
&cache.ListWatch{
ListFunc: func(options metav1.ListOptions) (runtime.Object, error) {
if tweakListOptions != nil {
tweakListOptions(&options)
}
return client.Resource(gvr).Namespace(namespace).List(options)
},
WatchFunc: func(options metav1.ListOptions) (watch.Interface, error) {
if tweakListOptions != nil {
tweakListOptions(&options)
}
return client.Resource(gvr).Namespace(namespace).Watch(options)
},
},
&unstructured.Unstructured{},
resyncPeriod,
indexers,
),
}
}
复制代码
那么rest client操作数据的时候和sharedIndexInformer的client有什么差别呢?看下面的两个例子:rest client: func (c *dynamicResourceClient) Create(obj *unstructured.Unstructured, opts metav1.CreateOptions, subresources ...string) (*unstructured.Unstructured, error)
,而sharedIndexInformer的client:func (c *mpiruns) Create(mpirun *v1alpha1.Mpirun) (result *v1alpha1.Mpirun, err error)
(这里找的是我们自己的CRD),区别在于后者由于是针对某种资源的,可以显式的传入某个结构体;前者是Unstructured
这个数据结构。
在注释中可以比较清晰的理解这个包的作用:为了通用的处理数据结构,可以通过json格式将数据解析为unstructured的形式,它其实就是一个interface{}
,程序在接收到这个interface之后可以自行处理,转换为想要转换的数据结构。
// Unstructured allows objects that do not have Golang structs registered to be manipulated
// generically. This can be used to deal with the API objects from a plug-in. Unstructured
// objects still have functioning TypeMeta features-- kind, version, etc.
//
// WARNING: This object has accessors for the v1 standard metadata. You *MUST NOT* use this
// type if you are dealing with objects that are not in the server meta v1 schema.
//
// TODO: make the serialization part of this type distinct from the field accessors.
// +k8s:deepcopy-gen:interfaces=k8s.io/apimachinery/pkg/runtime.Object
// +k8s:deepcopy-gen=true
type Unstructured struct {
// Object is a JSON compatible map with string, float, int, bool, []interface{}, or
// map[string]interface{}
// children.
Object map[string]interface{}
}
复制代码
例如,在dynamic informer注册回调函数的时候,要求输入一个obj interface{}
,我们可以转换为obj.(*unstructured.Unstructured)
类型,利用k8s.io/apimachinery/pkg/apis/meta/v1/unstructured/unstructured.go中的函数,例如GetName()
, GetNamespace()
等等获取到资源的信息。深入这个话题会涉及到golang json与interface的原理,这里不多做展开了。
如果你有阅读过pytorch operator的源代码,会发现在pytorch operator中,即使用了CRD,在处理数据时依然用了unstructured
包,并没有显示的用code-generator生成CRD的sharedIndexInformer。我认为如果想处理具体的数据结构,没有必要再用unstructured包了。这个包倒是可以用在场景比较模糊的应用中回到这一节刚开始的问题,如果我们想要监听多种资源,尤其是多种自定义资源,如果我们只用sharedIndexInformer,我们可以
- 建立多个sharedInformerFactory
- 注册回调函数
- 每个factory进行一次start()
我们也可以利用dynamic informer
- 建立一个factory
- 针对多种GVR,建立多个generic informer
- 注册回调函数
- 对该factory执行一次start()操作
后者是不是简单了一些呢?
dynamic informer in practice
在实际使用中,如果需要对监控的对象做复杂的处理,我倾向于使用sharedIndexInformer。例如在编写某个operator的时候,CRD的字段涉及的业务逻辑比较复杂,使用sharedIndexInformer可以更直白明了的获取到想要监控的资源的信息。
不过在做另外一个小需求的时候,我尝试使用了dynamic informer。理由有三点:
- 我的任务是收集集群的GPU使用情况,最简单的话我可以通过
kubectl describe nodes
用几行脚本搞定,然而我们的调度器做的相对复杂,资源预留等策略将破坏describe得来的数据的正确性,故我需要监听CRD资源podgroups(见kube-batch) - 我们基于kube-batch加入了大量新的特性,且目前仍在快速迭代阶段,代码相对不稳定,我甚至不知道CRD可能添加、删除那些字段。所以我也不想关心CRD到底长什么样,我只想知道我应该怎么读取到我想要的数据而已。
- 我们的代码存在gitlab上,由于go module目前不支持从gitlab私有仓库拉取包,相关issue见这里与这里。当然我们可以在go.mod中用replace;或者用submodule,然而我希望有一种更简洁、更优雅的方式解决它。
使用了dynamic informer后,我的代码变成了下面这样(节选了一部分)
func main() {
stopCh := signals.SetupSignalHandler()
dc, err := dynamic.NewForConfig(config)
dynamicInformerFactory := dynamicinformer.NewDynamicSharedInformerFactory(dc, 0)
gvr, _ := schema.ParseResourceArg("podgroups.v1alpha1.scheduling.incubator.k8s.io")
podGroupsV1alpha1Informer := dynamicInformerFactory.ForResource(*gvr)
metricsServer = server.NewMetricsServer(podGroupsV1alpha1Informer)
dynamicInformerFactory.Start(stopCh)
metricsServer.Run(stopCh)
}
复制代码
我不再需要关心我要如何引用,这解决了gitlab包的问题。此外,我不再需要关心CRD资源的结构体长什么样。当然,我依然需要关心这个字段的值是一个string还是int,要从什么路径得到这个数据,毕竟unstructured还是用json格式的,这些基本的信息还是要知道。例如,如果我想知道podgroups的queue的值是什么(已知queue是一个string,路径是/spec/queue)?那么,在eventHandler中我可以注册这样的handler
func handleObject(obj interface{}) {
unstruct, ok := obj.(*unstructured.Unstructured)
if !ok {
klog.Errorf("failed to convert obj to unstructured")
return
}
klog.Infof("watch event of %s", unstruct.GetName())
queue, ok, err := unstructured.NestedString(unstruct.Object, "spec", "queue")
if err != nil {
klog.Errorf("failed to get queue: %s", err)
return
}
klog.Infof("queue is %s", queue)
}
复制代码
这样的好处在于:
- 当CRD的名称,apiVersion等发生变化时,我不需要重新修改SharedInformerFactory,只要修改GVR即可
- 当CRD资源的某些字段发生变化的时候,只要我关心的数据的路径不变,就不会影响我的代码;而如果结构变化了,我甚至不需要知道细节,只要维护一个数据类型与数据路径即可。
运行起来的效果:
I1218 11:15:21.936303 98792 server.go:113] watch event of test1
I1218 11:15:21.936347 98792 server.go:119] queue is default
小结
- controller-manager是k8s的大脑,控制着k8s资源的生命周期
- dynamic informer实现了资源与关心该资源的组件的解耦