上一篇我们说了,初始化一个 kube-apiserver 首先需要创建 kubeAPIServerConfig,而 kubeAPIServerConfig 的重点又在创建 genericConfig 上,genericConfig 创建流程包括 etcd 配置初始化,认证初始化,鉴权初始化,审计初始化,准入初始化,流控初始化。 这一篇我们来说说审计初始化,准入初始化,流控初始化。
审计初始化
// cmd/kube-apiserver/app/server.go
func buildGenericConfig(
s *options.ServerRunOptions,
proxyTransport *http.Transport,
) (...) {
lastErr = s.Audit.ApplyTo(genericConfig)
if lastErr != nil {
return
}
}
下面是初始化审计的步骤,包括
- 规则评估器创建
- 日志存储后端创建(在本地保存日志)
- webhook 后端创建(远程保存或处理日志)
func (o *AuditOptions) ApplyTo(
c *server.Config,
) error {
...
// 1. Build policy evaluator
evaluator, err := o.newPolicyRuleEvaluator()
if err != nil {
return err
}
// 2. Build log backend
var logBackend audit.Backend
w, err := o.LogOptions.getWriter()
if err != nil {
return err
}
if w != nil {
if evaluator == nil {
klog.V(2).Info("No audit policy file provided, no events will be recorded for log backend")
} else {
logBackend = o.LogOptions.newBackend(w)
}
}
// 3. Build webhook backend
var webhookBackend audit.Backend
if o.WebhookOptions.enabled() {
if evaluator == nil {
klog.V(2).Info("No audit policy file provided, no events will be recorded for webhook backend")
} else {
if c.EgressSelector != nil {
var egressDialer utilnet.DialFunc
egressDialer, err = c.EgressSelector.Lookup(egressselector.ControlPlane.AsNetworkContext())
if err != nil {
return err
}
webhookBackend, err = o.WebhookOptions.newUntruncatedBackend(egressDialer)
} else {
webhookBackend, err = o.WebhookOptions.newUntruncatedBackend(nil)
}
if err != nil {
return err
}
}
}
groupVersion, err := schema.ParseGroupVersion(o.WebhookOptions.GroupVersionString)
if err != nil {
return err
}
// 4. Apply dynamic options.
var dynamicBackend audit.Backend
if webhookBackend != nil {
// if only webhook is enabled wrap it in the truncate options
dynamicBackend = o.WebhookOptions.TruncateOptions.wrapBackend(webhookBackend, groupVersion)
}
// 5. Set the policy rule evaluator
c.AuditPolicyRuleEvaluator = evaluator
// 6. Join the log backend with the webhooks
c.AuditBackend = appendBackend(logBackend, dynamicBackend)
if c.AuditBackend != nil {
klog.V(2).Infof("Using audit backend: %s", c.AuditBackend)
}
return nil
}
审计策略创建
所谓审计策略,就是根据用户提供的审计配置文件,提取需要记录到审计日志的规则,即什么请求的什么阶段和什么资源的什么操作需要记录,审计策略配置文件可以通过如下参数指定
--audit-policy-file="/path/to/audit/policy"
该文件主要包含了什么事件、什么数据需要被记录 ,具体的配置可以参考官方说明
前面我们在介绍认证、鉴权的时候也说到了,他们是在请求处理前在 DefaultBuildHandlerChain 这条链上被执行的,我们这说的审计在请求处理被处理前也是在这条链上执行的
func DefaultBuildHandlerChain(apiHandler http.Handler, c *Config) http.Handler {
...
handler = filterlatency.TrackCompleted(handler)
handler = genericapifilters.WithAudit(handler, c.AuditBackend, c.AuditPolicyRuleEvaluator, c.LongRunningFunc)
handler = filterlatency.TrackStarted(handler, "audit")
...
}
审计执行分为下面几个阶段
const (
StageRequestReceived Stage = "RequestReceived"
StageResponseStarted Stage = "ResponseStarted"
StageResponseComplete Stage = "ResponseComplete"
StagePanic Stage = "Panic"
)
可以在 --audit-policy-file 参数指定的配置中指定哪些规则需要在哪些特定执行审计,或者忽略哪些阶段。在 DefaultBuildHandlerChain 执行的是 RequestReceived 阶段的审计日志记录。
审计后端创建 这里所谓的后端就是审计日志存储的驱动。他通过下面接口来定义
type Sink interface {
ProcessEvents(events ...*auditinternal.Event) bool
}
type Backend interface {
Sink
Run(stopCh <-chan struct{}) error
Shutdown()
String() string
}
下面是创建本地存储后端的流程
// vendor/k8s.io/apiserver/pkg/server/options/audit.go
var logBackend audit.Backend
w, err := o.LogOptions.getWriter()
if err != nil {
return err
}
if w != nil {
if evaluator == nil {
klog.V(2).Info("No audit policy file provided, no events will be recorded for log backend")
} else {
logBackend = o.LogOptions.newBackend(w)
}
}
func (o *AuditLogOptions) getWriter() (io.Writer, error) {
if !o.enabled() {
return nil, nil
}
if o.Path == "-" {
return os.Stdout, nil
}
if err := o.ensureLogFile(); err != nil {
return nil, fmt.Errorf("ensureLogFile: %w", err)
}
return &lumberjack.Logger{
Filename: o.Path,
MaxAge: o.MaxAge,
MaxBackups: o.MaxBackups,
MaxSize: o.MaxSize,
Compress: o.Compress,
}, nil
}
可以看到审计日志写到本地使用的是 lumberjack.Logger 这个库,它实现了 io.Writer 接口。
func (o *AuditLogOptions) newBackend(w io.Writer) audit.Backend {
groupVersion, _ := schema.ParseGroupVersion(o.GroupVersionString)
log := pluginlog.NewBackend(w, o.Format, groupVersion)
log = o.BatchOptions.wrapBackend(log)
log = o.TruncateOptions.wrapBackend(log, groupVersion)
return log
}
func NewBackend(out io.Writer, format string, groupVersion schema.GroupVersion) audit.Backend {
return &backend{
out: out,
format: format,
encoder: audit.Codecs.LegacyCodec(groupVersion),
}
}
backend 实现了 Sink 接口,所以在需要写审计日志的时候,调用初始化时创建出来的 backend 的 processEvents 方法即可,processEvents 最后会调用到 lumberjack.Logger 的 Write 方法。
webhook 后端创建
webhook 后端跟上面说的后端类似,只不过 webhook 后端是将审计日志发送到在配置文件指定的地址中,这个后端可以对审计日志进行数据分析或者将审计日志存储到外部存储中去。webhook 后端的定义如下
type backend struct {
w *webhook.GenericWebhook
name string
}
type GenericWebhook struct {
RestClient *rest.RESTClient
RetryBackoff wait.Backoff
ShouldRetry func(error) bool
}
GenericWebhook 的成员 RestClient 是 client-go 中的 client,用来调用 k8s 接口。这个 client 是通过 --audit-webhook-config-file 参数指定的文件来初始化的,这配置文件是一个 kubeconfig 文件,所以这个 webhook 后端是一个 k8s 集群内的接口,该配置文件如下,具体可以参考官方文档
这个 backend 也实现了 Backend 接口
apiVersion: v1
kind: Config
clusters:
- name: kube-auditing
cluster:
server: https://host:443/audit/webhook/event # webhook 后端地址
insecure-skip-tls-verify: true
contexts:
- context:
cluster: kube-auditing
user: ""
name: default-context
current-context: default-context
preferences: {}
users: []
var webhookBackend audit.Backend
if o.WebhookOptions.enabled() {
if evaluator == nil {
klog.V(2).Info("No audit policy file provided, no events will be recorded for webhook backend")
} else {
if c.EgressSelector != nil {
var egressDialer utilnet.DialFunc
egressDialer, err = c.EgressSelector.Lookup(egressselector.ControlPlane.AsNetworkContext())
if err != nil {
return err
}
webhookBackend, err = o.WebhookOptions.newUntruncatedBackend(egressDialer)
} else {
webhookBackend, err = o.WebhookOptions.newUntruncatedBackend(nil)
}
if err != nil {
return err
}
}
}
func (o *AuditWebhookOptions) newUntruncatedBackend(customDial utilnet.DialFunc) (audit.Backend, error) {
groupVersion, _ := schema.ParseGroupVersion(o.GroupVersionString)
webhook, err := pluginwebhook.NewBackend(o.ConfigFile, groupVersion, webhook.DefaultRetryBackoffWithInitialDelay(o.InitialBackoff), customDial)
if err != nil {
return nil, fmt.Errorf("initializing audit webhook: %v", err)
}
webhook = o.BatchOptions.wrapBackend(webhook)
return webhook, nil
}
func NewBackend(kubeConfigFile string, groupVersion schema.GroupVersion, retryBackoff wait.Backoff, customDial utilnet.DialFunc) (audit.Backend, error) {
w, err := loadWebhook(kubeConfigFile, groupVersion, retryBackoff, customDial)
if err != nil {
return nil, err
}
return &backend{w: w, name: PluginName}, nil
}
创建完审计日志后端和 webhook 后端,这两个后端会被组合在一起被封装进 union 结构中,union 也实现了 Backend 接口,这个结构最后被赋值给 genericConfig.AuditBackend 在调用审计日志处理接口时,调用的是 union 的 ProcessEvents 方法,通过 for 循环这两个后端都会被调用
c.AuditBackend = appendBackend(logBackend, dynamicBackend)
func appendBackend(existing, newBackend audit.Backend) audit.Backend {
if existing == nil {
return newBackend
}
if newBackend == nil {
return existing
}
return audit.Union(existing, newBackend)
}
func Union(backends ...Backend) Backend {
if len(backends) == 1 {
return backends[0]
}
return union{backends}
}
type union struct {
backends []Backend
}
func (u union) ProcessEvents(events ...*auditinternal.Event) bool {
success := true
for _, backend := range u.backends {
success = backend.ProcessEvents(events...) && success
}
return success
}
这样审计的初始化就完毕了
准入初始化
err = s.Admission.ApplyTo(
genericConfig,
versionedInformers,
kubeClientConfig,
utilfeature.DefaultFeatureGate,
pluginInitializers...)
if err != nil {
lastErr = fmt.Errorf("failed to initialize admission: %v", err)
return
}
我们先看看 s.Admission 是怎么创建的
// cmd/kube-apiserver/app/options/options.go
func NewServerRunOptions() *ServerRunOptions {
s := ServerRunOptions{
...
Admission: kubeoptions.NewAdmissionOptions(),
...
}
}
func NewAdmissionOptions() *AdmissionOptions {
options := genericoptions.NewAdmissionOptions()
// 注册所有准入插件
RegisterAllAdmissionPlugins(options.Plugins)
// 设置准入插件执行顺序
options.RecommendedPluginOrder = AllOrderedPlugins
// 设置默认准入插件
options.DefaultOffPlugins = DefaultOffAdmissionPlugins()
return &AdmissionOptions{
GenericAdmission: options,
}
}
初始化 s.Admission 做了三件事:
- 注册准入插件 以 TaintNodesByCondition 准入插件为例
// pkg/kubeapiserver/options/plugins.go
func RegisterAllAdmissionPlugins(plugins *admission.Plugins) {
...
nodetaint.Register(plugins)
...
}
// plugin/pkg/admission/nodetaint/admission.go
func Register(plugins *admission.Plugins) {
plugins.Register(PluginName, func(config io.Reader) (admission.Interface, error) {
return NewPlugin(), nil
})
}
func (ps *Plugins) Register(name string, plugin Factory) {
ps.lock.Lock()
defer ps.lock.Unlock()
if ps.registry != nil {
_, found := ps.registry[name]
if found {
klog.Fatalf("Admission plugin %q was registered twice", name)
}
} else {
ps.registry = map[string]Factory{}
}
klog.V(1).InfoS("Registered admission plugin", "plugin", name)
ps.registry[name] = plugin
}
很明了,所谓的注册就是填充一个map,key是插件的名字,value 是这个插件的创建函数(工厂模式)。
- 准入插件的执行顺序
- 默认开启的准入插件
然后我们再看看 ApplyTo里面做了什么
func (a *AdmissionOptions) ApplyTo(
c *server.Config,
informers informers.SharedInformerFactory,
kubeAPIServerClientConfig *rest.Config,
features featuregate.FeatureGate,
pluginInitializers ...admission.PluginInitializer,
) error {
...
admissionChain, err := a.Plugins.NewFromPlugins(pluginNames, pluginsConfigProvider, initializersChain, a.Decorators)
if err != nil {
return err
}
c.AdmissionControl = admissionmetrics.WithStepMetrics(admissionChain)
return nil
}
首先看看 NewFromPlugins
func (ps *Plugins) NewFromPlugins(pluginNames []string, configProvider ConfigProvider, pluginInitializer PluginInitializer, decorator Decorator) (Interface, error) {
handlers := []Interface{}
mutationPlugins := []string{}
validationPlugins := []string{}
for _, pluginName := range pluginNames {
pluginConfig, err := configProvider.ConfigFor(pluginName)
if err != nil {
return nil, err
}
plugin, err := ps.InitPlugin(pluginName, pluginConfig, pluginInitializer)
if err != nil {
return nil, err
}
if plugin != nil {
if decorator != nil {
handlers = append(handlers, decorator.Decorate(plugin, pluginName))
} else {
handlers = append(handlers, plugin)
}
if _, ok := plugin.(MutationInterface); ok {
mutationPlugins = append(mutationPlugins, pluginName)
}
if _, ok := plugin.(ValidationInterface); ok {
validationPlugins = append(validationPlugins, pluginName)
}
}
}
...
return newReinvocationHandler(chainAdmissionHandler(handlers)), nil
}
func (ps *Plugins) InitPlugin(name string, config io.Reader, pluginInitializer PluginInitializer) (Interface, error) {
...
plugin, found, err := ps.getPlugin(name, config)
...
}
func (ps *Plugins) getPlugin(name string, config io.Reader) (Interface, bool, error) {
ps.lock.Lock()
defer ps.lock.Unlock()
f, found := ps.registry[name]
if !found {
return nil, false, nil
}
config1, config2, err := splitStream(config)
if err != nil {
return nil, true, err
}
if !PluginEnabledFn(name, config1) {
return nil, true, nil
}
ret, err := f(config2)
return ret, true, err
}
这里循环调用前面注册在 map 里面的插件创建函数,创建出插件实例放在 Interface 类型数组里面,然后转换成了 chainAdmissionHandler 类型,这个类型实现了下面两个接口
type MutationInterface interface {
Interface
// Admit makes an admission decision based on the request attributes.
// Context is used only for timeout/deadline/cancellation and tracing information.
Admit(ctx context.Context, a Attributes, o ObjectInterfaces) (err error)
}
// ValidationInterface is an abstract, pluggable interface for Admission Control decisions.
type ValidationInterface interface {
Interface
// Validate makes an admission decision based on the request attributes. It is NOT allowed to mutate
// Context is used only for timeout/deadline/cancellation and tracing information.
Validate(ctx context.Context, a Attributes, o ObjectInterfaces) (err error)
}
下面最关键的点来了,当创建一个资源的时候(如 Pod),会依次执行 mutating、validation 类型的准入插件,而这些插件就是前面初始化然后调用准入插件工厂函数创建出来的,因为有多个插件,他们按照顺序被链式的连接在一起,然后通过 for 循环一次被执行
func (admissionHandler chainAdmissionHandler) Admit(ctx context.Context, a Attributes, o ObjectInterfaces) error {
for _, handler := range admissionHandler {
if !handler.Handles(a.GetOperation()) {
continue
}
if mutator, ok := handler.(MutationInterface); ok {
err := mutator.Admit(ctx, a, o)
if err != nil {
return err
}
}
}
return nil
}
// Validate performs an admission control check using a chain of handlers, and returns immediately on first error
func (admissionHandler chainAdmissionHandler) Validate(ctx context.Context, a Attributes, o ObjectInterfaces) error {
for _, handler := range admissionHandler {
if !handler.Handles(a.GetOperation()) {
continue
}
if validator, ok := handler.(ValidationInterface); ok {
err := validator.Validate(ctx, a, o)
if err != nil {
return err
}
}
}
return nil
}
NewFromPlugins 执行完后就成生了一条链式准入插件链,最后再给插件包裹上一层 metric,也就是说执行插件前,先执行 metric 相关的代码,插件执行完后也执行相关metric的代码,如下的 metric 主要就是为了计算插件执行的时间
func WithMetrics(i admission.Interface, observer ObserverFunc, extraLabels ...string) admission.Interface {
return &pluginHandlerWithMetrics{
Interface: i,
observer: observer,
extraLabels: extraLabels,
}
}
func (p pluginHandlerWithMetrics) Admit(ctx context.Context, a admission.Attributes, o admission.ObjectInterfaces) error {
mutatingHandler, ok := p.Interface.(admission.MutationInterface)
if !ok {
return nil
}
start := time.Now()
err := mutatingHandler.Admit(ctx, a, o)
p.observer(ctx, time.Since(start), err != nil, a, stepAdmit, p.extraLabels...)
return err
}
包裹完后,就把这条包含metric功能的准入插件链赋值给 c.AdmissionControl,准入插件初始化完毕。
流控初始化
k8s 流控有了两种手段
- MaxInFlightLimit:对所有请求进行累加,当达到一定量后,系统返回 "Too many requests, please try again late"
- APIPriorityAndFairness:和MaxInFlightLimit 不同,它是基于优先级和公平性进行限流的,对API的重要性进行区分 可参考官方文档
API 优先级和公平性(APF)是一种替代方案,可提升上述最大并发限制。 APF 以更细粒度的方式对请求进行分类和隔离。 它还引入了空间有限的排队机制,因此在非常短暂的突发情况下,API 服务器不会拒绝任何请求。 通过使用公平排队技术从队列中分发请求,这样, 一个行为不佳的控制器就不会饿死其他控制器 (即使优先级相同)。
流控也是在 DefaultBuildHandlerChain 这条链上处理的,可以看到 c.FlowControl 不空,那么就是用 APIPriorityAndFairness 特性进行流控
func DefaultBuildHandlerChain(apiHandler http.Handler, c *Config) http.Handler {
...
if c.FlowControl != nil {
workEstimatorCfg := flowcontrolrequest.DefaultWorkEstimatorConfig()
requestWorkEstimator := flowcontrolrequest.NewWorkEstimator(
c.StorageObjectCountTracker.Get, c.FlowControl.GetInterestedWatchCount, workEstimatorCfg)
handler = filterlatency.TrackCompleted(handler)
handler = genericfilters.WithPriorityAndFairness(handler, c.LongRunningFunc, c.FlowControl, requestWorkEstimator)
handler = filterlatency.TrackStarted(handler, "priorityandfairness")
} else {
handler = genericfilters.WithMaxInFlightLimit(handler, c.MaxRequestsInFlight, c.MaxMutatingRequestsInFlight, c.LongRunningFunc)
}
...
}
在 1.20 版本后,这个特性是默认开启的
if utilfeature.DefaultFeatureGate.Enabled(genericfeatures.APIPriorityAndFairness) && s.GenericServerRunOptions.EnablePriorityAndFairness {
genericConfig.FlowControl, lastErr = BuildPriorityAndFairness(s, clientgoExternalClient, versionedInformers)
}
至于流控初始化具体做了什么,我们在后续专题讲流控的时候再详细说。