K8s 高级面试题 Top 31. Pod 创建的完整流程考察点对 K8s 核心组件协作机制的理解答案当执行 k

1. Pod 创建的完整流程

考察点

对 K8s 核心组件协作机制的理解

答案

当执行 kubectl create -f pod.yaml 时，完整流程如下：

1) kubectl → API Server

kubectl 将 YAML 转换为 JSON，发送 HTTP POST 请求到 API Server
API Server 执行三道关卡：
- 认证（Authentication）：验证用户身份（证书、Token、OIDC等）
- 鉴权（Authorization）：检查用户是否有权限（RBAC）
- 准入控制（Admission Control）：Mutating → Validating Webhook

2) etcd 持久化

API Server 将 Pod 对象写入 etcd
此时 Pod 状态为 Pending，nodeName 字段为空

3) Scheduler 调度

Scheduler watch 到未调度的 Pod
过滤阶段（Predicates）：排除不满足条件的节点
- 资源是否充足（CPU、内存）
- NodeSelector / NodeAffinity 是否匹配
- Taints 和 Tolerations 检查
打分阶段（Priorities）：对剩余节点打分
- 资源均衡性、亲和性权重等
选择最高分节点，更新 Pod 的 nodeName 字段

4) Kubelet 创建容器

目标节点的 Kubelet watch 到分配给自己的 Pod
调用 CRI（Container Runtime Interface） 创建容器
- containerd / CRI-O → runc
调用 CNI（Container Network Interface） 配置网络
- 分配 Pod IP、设置路由规则
调用 CSI（Container Storage Interface） 挂载存储（如有）

5) 状态上报与服务发现

Kubelet 持续上报 Pod 状态到 API Server
探针检查通过后：
- readinessProbe 通过 → Pod 加入 Endpoints
- Service 可以将流量转发到该 Pod

┌──────────┐    ┌────────────┐    ┌────────┐    ┌───────────┐    ┌─────────┐
│ kubectl  │───▶│ API Server │───▶│  etcd  │    │ Scheduler │    │ Kubelet │
└──────────┘    └────────────┘    └────────┘    └───────────┘    └─────────┘
                      │                              │                 │
                      │◀─────────── watch ───────────┤                 │
                      │                              │                 │
                      │◀──────────────────── watch ──┼─────────────────┤
                      │                              │                 │
                      │         bind Pod to Node     │                 │
                      │◀─────────────────────────────┤                 │
                      │                              │                 │
                      │                              │      CRI/CNI/CSI│
                      │                              │           ┌─────▼─────┐
                      │                              │           │ Container │
                      │                              │           └───────────┘

2. Service 的流量转发原理（ClusterIP 如何工作）

考察点

网络模型、kube-proxy 模式、iptables/IPVS 原理

答案

1) Service 与 Endpoints 的关系

Service 通过 selector 选择 Pod
Endpoints Controller 自动维护 Endpoints 对象
Endpoints 包含所有就绪 Pod 的 IP:Port 列表

# Service
apiVersion: v1
kind: Service
metadata:
  name: my-service
spec:
  selector:
    app: my-app
  ports:
    - port: 80
      targetPort: 8080

# 自动生成的 Endpoints
apiVersion: v1
kind: Endpoints
metadata:
  name: my-service
subsets:
  - addresses:
      - ip: 10.244.1.5
      - ip: 10.244.2.8
    ports:
      - port: 8080

2) kube-proxy 三种模式

模式	原理	优缺点
userspace	kube-proxy 作为代理转发流量	性能差，已废弃
iptables	通过 iptables 规则做 DNAT	默认模式，规则多时性能下降
IPVS	使用内核 IPVS 模块	高性能，支持多种负载均衡算法

3) iptables 模式详解

流量路径：

客户端 Pod → ClusterIP:Port
    │
    ▼
PREROUTING 链
    │
    ▼
KUBE-SERVICES 链（匹配 ClusterIP）
    │
    ▼
KUBE-SVC-XXXX 链（Service 对应的链）
    │
    ▼ （随机/轮询选择后端）
KUBE-SEP-YYYY 链（Endpoint 对应的链）
    │
    ▼ （DNAT 转换）
目标 Pod IP:TargetPort

核心 iptables 规则示例：

# 匹配 Service ClusterIP，跳转到 Service 链
-A KUBE-SERVICES -d 10.96.0.100/32 -p tcp --dport 80 -j KUBE-SVC-XXXX

# Service 链：随机选择后端（概率负载均衡）
-A KUBE-SVC-XXXX -m statistic --mode random --probability 0.5 -j KUBE-SEP-AAA
-A KUBE-SVC-XXXX -j KUBE-SEP-BBB

# Endpoint 链：DNAT 到实际 Pod
-A KUBE-SEP-AAA -p tcp -j DNAT --to-destination 10.244.1.5:8080
-A KUBE-SEP-BBB -p tcp -j DNAT --to-destination 10.244.2.8:8080

4) IPVS 模式优势

性能更好：使用哈希表，O(1) 复杂度
支持更多负载均衡算法：
- rr（轮询）
- lc（最少连接）
- dh（目标哈希）
- sh（源哈希）
- sed（最短期望延迟）
- nq（永不排队）

启用 IPVS：

# kube-proxy ConfigMap
apiVersion: v1
kind: ConfigMap
metadata:
  name: kube-proxy
  namespace: kube-system
data:
  config.conf: |
    mode: "ipvs"
    ipvs:
      scheduler: "rr"

3. Pod 被驱逐/删除时如何保证零宕机

考察点

优雅终止、生产环境稳定性保障

答案

1) Pod 终止流程

kubectl delete pod
        │
        ▼
┌───────────────────────────────────────────────────────┐
│  1. Pod 状态设为 Terminating                           │
│  2. 从 Endpoints 中移除（停止接收新流量）                │
│  3. 执行 preStop Hook                                  │
│  4. 发送 SIGTERM 信号给容器主进程                       │
│  5. 等待 terminationGracePeriodSeconds（默认30秒）     │
│  6. 超时后发送 SIGKILL 强制终止                         │
└───────────────────────────────────────────────────────┘

2) 关键配置

apiVersion: v1
kind: Pod
spec:
  terminationGracePeriodSeconds: 60  # 优雅终止等待时间
  containers:
    - name: app
      lifecycle:
        preStop:
          exec:
            command: ["/bin/sh", "-c", "sleep 10"]  # 等待流量排空
      # 或者使用 HTTP 钩子
        preStop:
          httpGet:
            path: /shutdown
            port: 8080

3) 零宕机的关键点

问题：Endpoints 移除和 preStop 是并行的！

                    ┌─────────────────────────────────┐
                    │        Pod 开始终止              │
                    └─────────────────────────────────┘
                                   │
                    ┌──────────────┴──────────────┐
                    ▼                             ▼
          ┌─────────────────┐           ┌─────────────────┐
          │ Endpoints 移除   │           │ preStop 执行    │
          │ (异步，有延迟)    │           │ + SIGTERM       │
          └─────────────────┘           └─────────────────┘

解决方案：preStop 中加入 sleep

lifecycle:
  preStop:
    exec:
      command: ["/bin/sh", "-c", "sleep 15"]

原因：

Endpoints 更新需要时间传播到所有节点的 kube-proxy
sleep 确保在应用关闭前，流量已经不再进入

4) 应用层优雅关闭

应用需要正确处理 SIGTERM：

// Go 示例
func main() {
    srv := &http.Server{Addr: ":8080"}

    go srv.ListenAndServe()

    // 监听终止信号
    quit := make(chan os.Signal, 1)
    signal.Notify(quit, syscall.SIGTERM, syscall.SIGINT)
    <-quit

    // 优雅关闭：停止接收新请求，等待现有请求完成
    ctx, cancel := context.WithTimeout(context.Background(), 30*time.Second)
    defer cancel()
    srv.Shutdown(ctx)
}

5) PodDisruptionBudget (PDB)

保护应用在主动驱逐时的可用性：

apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2        # 至少保持 2 个 Pod 可用
  # 或者
  maxUnavailable: 1      # 最多允许 1 个 Pod 不可用
  selector:
    matchLabels:
      app: my-app

适用场景：

节点维护 kubectl drain
集群自动伸缩
滚动更新

6) Deployment 滚动更新策略

apiVersion: apps/v1
kind: Deployment
spec:
  replicas: 4
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 1   # 更新时最多 1 个不可用
      maxSurge: 1         # 更新时最多多出 1 个 Pod

7) 完整的零宕机配置清单

apiVersion: apps/v1
kind: Deployment
metadata:
  name: my-app
spec:
  replicas: 3
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 0    # 不允许任何 Pod 不可用
      maxSurge: 1
  template:
    spec:
      terminationGracePeriodSeconds: 60
      containers:
        - name: app
          readinessProbe:          # 就绪探针
            httpGet:
              path: /health
              port: 8080
            initialDelaySeconds: 5
            periodSeconds: 5
          lifecycle:
            preStop:
              exec:
                command: ["/bin/sh", "-c", "sleep 15"]
---
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: my-app-pdb
spec:
  minAvailable: 2
  selector:
    matchLabels:
      app: my-app

总结

面试题	核心考点
Pod 创建流程	组件协作、调度原理、CRI/CNI/CSI
Service 流量转发	kube-proxy 模式、iptables/IPVS、DNAT
零宕机保障	优雅终止、preStop、PDB、滚动更新

这三道题覆盖了 K8s 的调度、网络、生命周期管理三大核心领域，能够有效区分候选人对 K8s 的理解深度。