Kubernetes 容器编排从入门到生产实战:构建高可用的云原生应用平台

5 阅读11分钟

在2026年的今天,Kubernetes 已成为容器编排的事实标准。根据 CNCF 最新调查报告:

指标数据
K8s 采用率生产环境占比 96%
平均集群规模生产集群平均 127 节点
容器化应用比例新应用容器化率 89%
开发者满意度K8s 满意度 91%

本文将从基础概念核心资源应用部署生产实践四个维度,分享一套经过多个生产环境验证的Kubernetes 实战方案


一、Kubernetes 核心概念速览

1.1 架构组成

┌─────────────────────────────────────────────────────────────────┐
│                      Kubernetes 集群架构                         │
├─────────────────────────────────────────────────────────────────┤
│                                                                 │
│  ┌─────────────────────┐         ┌─────────────────────────┐   │
│  │   Control Plane     │         │      Worker Nodes       │   │
│  │   (控制平面)         │         │      (工作节点)          │   │
│  ├─────────────────────┤         ├─────────────────────────┤   │
│  │  • API Server       │◄───────►│  • Kubelet              │   │
│  │  • etcd             │         │  • Kube-proxy           │   │
│  │  • Scheduler        │         │  • Container Runtime    │   │
│  │  • Controller Mgr   │         │  • Pods                 │   │
│  └─────────────────────┘         └─────────────────────────┘   │
│                                                                 │
│  ┌─────────────────────────────────────────────────────────┐   │
│  │                    Cluster Network                       │   │
│  │              (Calico/Flannel/Cilium)                    │   │
│  └─────────────────────────────────────────────────────────┘   │
│                                                                 │
└─────────────────────────────────────────────────────────────────┘

1.2 核心资源对象

资源类型用途使用场景
Pod最小调度单元容器组部署
Deployment无状态应用Web 服务、API
StatefulSet有状态应用数据库、缓存
DaemonSet节点级应用日志收集、监控
Service服务发现与负载均衡服务暴露
Ingress外部访问入口HTTP/HTTPS 路由
ConfigMap配置管理应用配置
Secret敏感信息管理密码、证书
PersistentVolume持久化存储数据持久化
Namespace资源隔离多环境隔离

二、快速入门:搭建第一个 K8s 集群

2.1 本地开发环境(推荐)

# 方案1: Docker Desktop (Mac/Windows)
# 安装 Docker Desktop 后启用 Kubernetes

# 方案2: Kind (Kubernetes in Docker)
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind

# 创建集群
kind create cluster --name my-cluster --config kind-config.yaml

# 方案3: Minikube
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube

# 启动集群
minikube start --cpus=4 --memory=4096
# kind-config.yaml - Kind 集群配置
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
  - role: control-plane
    kubeadmConfigPatches:
      - |
        kind: InitConfiguration
        nodeRegistration:
          kubeletExtraArgs:
            node-labels: "ingress-ready=true"
    extraPortMappings:
      - containerPort: 80
        hostPort: 80
        protocol: TCP
      - containerPort: 443
        hostPort: 443
        protocol: TCP
  - role: worker
  - role: worker

2.2 生产环境部署(kubeadm)

# 所有节点执行 - 环境准备
# 关闭 swap
sudo swapoff -a
sudo sed -i '/ swap / s/^(.*)$/#\1/g' /etc/fstab

# 配置内核参数
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system

# 安装容器运行时 (containerd)
sudo apt-get update
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd

# 安装 kubeadm、kubelet、kubectl
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
# 控制节点执行 - 初始化集群
sudo kubeadm init \
  --pod-network-cidr=10.244.0.0/16 \
  --apiserver-advertise-address=192.168.1.100 \
  --control-plane-endpoint=192.168.1.100

# 配置 kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config

# 安装网络插件 (Calico)
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml

# 生成 worker 节点加入命令
kubeadm token create --print-join-command
# Worker 节点执行 - 加入集群
sudo kubeadm join 192.168.1.100:6443 \
  --token <token> \
  --discovery-token-ca-cert-hash sha256:<hash>

三、核心资源详解与实战

3.1 Pod 基础

# pod-basic.yaml - 基础 Pod 配置
apiVersion: v1
kind: Pod
metadata:
  name: nginx-pod
  labels:
    app: nginx
    version: v1.0
  annotations:
    description: "基础 Nginx Pod 示例"
spec:
  # 重启策略
  restartPolicy: Always
  
  # 容器定义
  containers:
    - name: nginx
      image: nginx:1.25-alpine
      imagePullPolicy: IfNotPresent
      
      # 端口暴露
      ports:
        - containerPort: 80
          protocol: TCP
          name: http
      
      # 资源限制
      resources:
        requests:
          memory: "64Mi"
          cpu: "100m"
        limits:
          memory: "128Mi"
          cpu: "200m"
      
      # 健康检查
      livenessProbe:
        httpGet:
          path: /
          port: 80
        initialDelaySeconds: 30
        periodSeconds: 10
        timeoutSeconds: 5
        failureThreshold: 3
      
      readinessProbe:
        httpGet:
          path: /health
          port: 80
        initialDelaySeconds: 5
        periodSeconds: 5
        timeoutSeconds: 3
        failureThreshold: 3
      
      # 环境变量
      env:
        - name: ENV
          value: "production"
        - name: CONFIG_PATH
          value: "/etc/config"
      
      # 挂载配置
      volumeMounts:
        - name: config-volume
          mountPath: /etc/config
        - name: data-volume
          mountPath: /usr/share/nginx/html
  
  # 卷定义
  volumes:
    - name: config-volume
      configMap:
        name: nginx-config
    - name: data-volume
      emptyDir: {}
  
  # 节点选择
  nodeSelector:
    disktype: ssd
  
  # 容忍度
  tolerations:
    - key: "node-role.kubernetes.io/master"
      operator: "Exists"
      effect: "NoSchedule"
# Pod 操作命令
kubectl apply -f pod-basic.yaml          # 创建 Pod
kubectl get pods                         # 查看 Pod
kubectl get pods -o wide                 # 查看详细信息
kubectl describe pod nginx-pod           # 查看 Pod 详情
kubectl logs nginx-pod                   # 查看日志
kubectl exec -it nginx-pod -- /bin/bash  # 进入容器
kubectl delete pod nginx-pod             # 删除 Pod

3.2 Deployment 无状态应用

# deployment.yaml - 生产级 Deployment 配置
apiVersion: apps/v1
kind: Deployment
metadata:
  name: web-app
  namespace: production
  labels:
    app: web-app
    tier: frontend
spec:
  # 副本数
  replicas: 3
  
  # 选择器
  selector:
    matchLabels:
      app: web-app
  
  # 更新策略
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxSurge: 1           # 最大超出副本数
      maxUnavailable: 0     # 最大不可用副本数
  
  # 模板
  template:
    metadata:
      labels:
        app: web-app
        version: v2.0
      annotations:
        prometheus.io/scrape: "true"
        prometheus.io/port: "8080"
    spec:
      # 服务账号
      serviceAccountName: web-app-sa
      
      # 安全上下文
      securityContext:
        runAsNonRoot: true
        runAsUser: 1000
        fsGroup: 1000
      
      # 容器
      containers:
        - name: web-app
          image: myregistry/web-app:v2.0
          imagePullPolicy: Always
          
          ports:
            - containerPort: 8080
              name: http
          
          resources:
            requests:
              memory: "256Mi"
              cpu: "250m"
            limits:
              memory: "512Mi"
              cpu: "500m"
          
          # 健康检查
          livenessProbe:
            httpGet:
              path: /health/live
              port: 8080
            initialDelaySeconds: 30
            periodSeconds: 15
            timeoutSeconds: 5
            failureThreshold: 3
          
          readinessProbe:
            httpGet:
              path: /health/ready
              port: 8080
            initialDelaySeconds: 10
            periodSeconds: 10
            timeoutSeconds: 3
            failureThreshold: 3
          
          # 环境变量
          env:
            - name: DATABASE_URL
              valueFrom:
                secretKeyRef:
                  name: db-secret
                  key: url
            - name: REDIS_HOST
              valueFrom:
                configMapKeyRef:
                  name: app-config
                  key: redis-host
          
          # 挂载卷
          volumeMounts:
            - name: config
              mountPath: /app/config
              readOnly: true
            - name: logs
              mountPath: /app/logs
      
      # 卷
      volumes:
        - name: config
          configMap:
            name: app-config
        - name: logs
          emptyDir: {}
      
      # 亲和性
      affinity:
        podAntiAffinity:
          preferredDuringSchedulingIgnoredDuringExecution:
            - weight: 100
              podAffinityTerm:
                labelSelector:
                  matchLabels:
                    app: web-app
                topologyKey: kubernetes.io/hostname
      
      # 镜像拉取密钥
      imagePullSecrets:
        - name: registry-secret
# Deployment 操作
kubectl apply -f deployment.yaml           # 创建/更新
kubectl get deployments                    # 查看部署
kubectl rollout status deployment/web-app  # 查看滚动状态
kubectl rollout history deployment/web-app # 查看历史
kubectl rollout undo deployment/web-app    # 回滚
kubectl scale deployment/web-app --replicas=5  # 扩缩容
kubectl set image deployment/web-app web-app=myregistry/web-app:v2.1  # 更新镜像

3.3 StatefulSet 有状态应用

# statefulset.yaml - MySQL StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: database
spec:
  serviceName: mysql
  replicas: 3
  selector:
    matchLabels:
      app: mysql
  
  # 更新策略
  updateStrategy:
    type: RollingUpdate
  
  # 模板
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
        - name: mysql
          image: mysql:8.0
          ports:
            - containerPort: 3306
              name: mysql
          
          env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: mysql-secret
                  key: root-password
            - name: MYSQL_DATABASE
              value: "appdb"
          
          resources:
            requests:
              memory: "1Gi"
              cpu: "500m"
            limits:
              memory: "2Gi"
              cpu: "1000m"
          
          volumeMounts:
            - name: data
              mountPath: /var/lib/mysql
            - name: config
              mountPath: /etc/mysql/conf.d
          
          livenessProbe:
            exec:
              command:
                - mysqladmin
                - ping
                - -h
                - localhost
            initialDelaySeconds: 30
            periodSeconds: 10
          
          readinessProbe:
            exec:
              command:
                - mysql
                - -h
                - localhost
                - -e
                - "SELECT 1"
            initialDelaySeconds: 10
            periodSeconds: 5
  
  # 持久化存储
  volumeClaimTemplates:
    - metadata:
        name: data
      spec:
        accessModes: ["ReadWriteOnce"]
        storageClassName: fast-ssd
        resources:
          requests:
            storage: 50Gi

3.4 Service 服务发现

# service.yaml - 多种 Service 类型
---
# ClusterIP - 集群内部访问
apiVersion: v1
kind: Service
metadata:
  name: web-app-internal
  namespace: production
spec:
  type: ClusterIP
  selector:
    app: web-app
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP

---
# NodePort - 节点端口暴露
apiVersion: v1
kind: Service
metadata:
  name: web-app-nodeport
  namespace: production
spec:
  type: NodePort
  selector:
    app: web-app
  ports:
    - name: http
      port: 80
      targetPort: 8080
      nodePort: 30080
      protocol: TCP

---
# LoadBalancer - 云负载均衡
apiVersion: v1
kind: Service
metadata:
  name: web-app-lb
  namespace: production
  annotations:
    service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
    service.beta.kubernetes.io/aws-load-balancer-internal: "false"
spec:
  type: LoadBalancer
  selector:
    app: web-app
  ports:
    - name: http
      port: 80
      targetPort: 8080
      protocol: TCP
  # 外部 IP(如果指定)
  # loadBalancerIP: 203.0.113.1

---
# Headless Service - StatefulSet 使用
apiVersion: v1
kind: Service
metadata:
  name: mysql-headless
  namespace: database
spec:
  clusterIP: None  # 关键:无集群 IP
  selector:
    app: mysql
  ports:
    - name: mysql
      port: 3306
      targetPort: 3306

3.5 Ingress 外部访问

# ingress.yaml - Nginx Ingress 配置
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: web-app-ingress
  namespace: production
  annotations:
    nginx.ingress.kubernetes.io/rewrite-target: /
    nginx.ingress.kubernetes.io/ssl-redirect: "true"
    nginx.ingress.kubernetes.io/proxy-body-size: "50m"
    nginx.ingress.kubernetes.io/rate-limit: "100"
    nginx.ingress.kubernetes.io/rate-limit-window: "1m"
    cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
  ingressClassName: nginx
  
  # TLS 配置
  tls:
    - hosts:
        - app.example.com
        - api.example.com
      secretName: app-tls-secret
  
  # 路由规则
  rules:
    - host: app.example.com
      http:
        paths:
          - path: /
            pathType: Prefix
            backend:
              service:
                name: web-app-internal
                port:
                  number: 80
    
    - host: api.example.com
      http:
        paths:
          - path: /api
            pathType: Prefix
            backend:
              service:
                name: api-service
                port:
                  number: 8080
          - path: /health
            pathType: Exact
            backend:
              service:
                name: api-service
                port:
                  number: 8080
# 安装 Nginx Ingress Controller
kubectl apply -f https://raw.githubusercontent.com/kubernetes/ingress-nginx/controller-v1.9.5/deploy/static/provider/baremetal/deploy.yaml

# 验证安装
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx

四、配置与存储管理

4.1 ConfigMap 配置管理

# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: app-config
  namespace: production
data:
  # 键值对配置
  redis-host: "redis.production.svc.cluster.local"
  redis-port: "6379"
  log-level: "info"
  
  # 配置文件
  app.properties: |
    server.port=8080
    server.context-path=/api
    spring.datasource.url=jdbc:mysql://mysql:3306/appdb
    spring.redis.host=redis
    
  # JSON 配置
  config.json: |
    {
      "featureFlags": {
        "newUI": true,
        "betaFeatures": false
      },
      "limits": {
        "maxConnections": 1000,
        "timeout": 30
      }
    }
# ConfigMap 操作
kubectl create configmap app-config --from-file=./config/ --from-literal=log-level=info
kubectl get configmap
kubectl describe configmap app-config
kubectl edit configmap app-config

4.2 Secret 敏感信息管理

# secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: db-secret
  namespace: production
type: Opaque
stringData:  # 明文写入,自动 base64 编码
  username: admin
  password: "SuperSecret123!"
  host: "mysql.production.svc.cluster.local"
  port: "3306"
---
# Docker Registry 密钥
apiVersion: v1
kind: Secret
metadata:
  name: registry-secret
  namespace: production
type: kubernetes.io/dockerconfigjson
data:
  .dockerconfigjson: eyJhdXRocyI6e...
---
# TLS 证书
apiVersion: v1
kind: Secret
metadata:
  name: app-tls-secret
  namespace: production
type: kubernetes.io/tls
data:
  tls.crt: LS0tLS1CRUdJTi...
  tls.key: LS0tLS1CRUdJTi...
# Secret 操作
kubectl create secret generic db-secret --from-literal=username=admin --from-literal=password=SuperSecret123!
kubectl create secret tls app-tls-secret --cert=tls.crt --key=tls.key
kubectl create secret docker-registry registry-secret --docker-server=docker.io --docker-username=user --docker-password=pass
kubectl get secrets

4.3 PersistentVolume 持久化存储

# storage-class.yaml - 存储类
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
  type: gp3
  fsType: ext4
  encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
  - debug
volumeBindingMode: WaitForFirstConsumer
---
# persistent-volume.yaml - 持久卷
apiVersion: v1
kind: PersistentVolume
metadata:
  name: pv-data-001
spec:
  capacity:
    storage: 100Gi
  accessModes:
    - ReadWriteOnce
  persistentVolumeReclaimPolicy: Retain
  storageClassName: fast-ssd
  hostPath:
    path: /mnt/data
---
# persistent-volume-claim.yaml - 持久卷声明
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-data
  namespace: production
spec:
  accessModes:
    - ReadWriteOnce
  resources:
    requests:
      storage: 50Gi
  storageClassName: fast-ssd

五、生产环境最佳实践

5.1 资源配额与限制

# resource-quota.yaml - 命名空间资源配额
apiVersion: v1
kind: ResourceQuota
metadata:
  name: production-quota
  namespace: production
spec:
  hard:
    requests.cpu: "10"
    requests.memory: 20Gi
    limits.cpu: "20"
    limits.memory: 40Gi
    pods: "50"
    services: "20"
    secrets: "50"
    configmaps: "50"
    persistentvolumeclaims: "20"
---
# limit-range.yaml - 容器默认限制
apiVersion: v1
kind: LimitRange
metadata:
  name: container-limits
  namespace: production
spec:
  limits:
    - type: Container
      default:
        cpu: "500m"
        memory: "512Mi"
      defaultRequest:
        cpu: "100m"
        memory: "128Mi"
      max:
        cpu: "2"
        memory: "4Gi"
      min:
        cpu: "50m"
        memory: "64Mi"

5.2 网络策略

# network-policy.yaml - 网络隔离
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: web-app-policy
  namespace: production
spec:
  podSelector:
    matchLabels:
      app: web-app
  
  # 策略类型
  policyTypes:
    - Ingress
    - Egress
  
  # 入站规则
  ingress:
    - from:
        - namespaceSelector:
            matchLabels:
              name: ingress-nginx
        - podSelector:
            matchLabels:
              app: api-gateway
      ports:
        - protocol: TCP
          port: 8080
  
  # 出站规则
  egress:
    - to:
        - namespaceSelector:
            matchLabels:
              name: database
        - podSelector:
            matchLabels:
              app: mysql
      ports:
        - protocol: TCP
          port: 3306
    - to:
        - namespaceSelector: {}
      ports:
        - protocol: UDP
          port: 53  # DNS

5.3 自动伸缩

# hpa.yaml - 水平 Pod 自动伸缩
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: web-app-hpa
  namespace: production
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  
  # 副本数范围
  minReplicas: 3
  maxReplicas: 20
  
  # 伸缩指标
  metrics:
    - type: Resource
      resource:
        name: cpu
        target:
          type: Utilization
          averageUtilization: 70
    - type: Resource
      resource:
        name: memory
        target:
          type: Utilization
          averageUtilization: 80
    - type: Pods
      pods:
        metric:
          name: http_requests_per_second
        target:
          type: AverageValue
          averageValue: 100
  
  # 伸缩行为
  behavior:
    scaleDown:
      stabilizationWindowSeconds: 300
      policies:
        - type: Percent
          value: 10
          periodSeconds: 60
    scaleUp:
      stabilizationWindowSeconds: 0
      policies:
        - type: Percent
          value: 100
          periodSeconds: 15
        - type: Pods
          value: 4
          periodSeconds: 15
      selectPolicy: Max
---
# vpa.yaml - 垂直 Pod 自动伸缩
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
  name: web-app-vpa
  namespace: production
spec:
  targetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: web-app
  updatePolicy:
    updateMode: Auto  # Off, Initial, Auto
  resourcePolicy:
    containerPolicies:
      - containerName: '*'
        minAllowed:
          cpu: 100m
          memory: 128Mi
        maxAllowed:
          cpu: 2
          memory: 4Gi

5.4 Pod disruption budget

# pdb.yaml - Pod 干扰预算
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
  name: web-app-pdb
  namespace: production
spec:
  minAvailable: 2  # 或使用 maxUnavailable: 1
  selector:
    matchLabels:
      app: web-app

六、监控与日志体系

6.1 Prometheus + Grafana 监控

# prometheus-servicemonitor.yaml
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
  name: web-app-monitor
  namespace: production
  labels:
    release: prometheus
spec:
  selector:
    matchLabels:
      app: web-app
  endpoints:
    - port: http
      path: /metrics
      interval: 30s
      scrapeTimeout: 10s
# prometheus-alerts.yaml - 告警规则
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
  name: app-alerts
  namespace: monitoring
spec:
  groups:
    - name: application.rules
      rules:
        - alert: PodCrashLooping
          expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 5 > 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod 频繁重启"
            description: "Pod {{ $labels.pod }} 在 5 分钟内重启超过 5 次"
        
        - alert: HighMemoryUsage
          expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
          for: 5m
          labels:
            severity: warning
          annotations:
            summary: "内存使用率过高"
            description: "容器 {{ $labels.container }} 内存使用率超过 90%"
        
        - alert: HighCPUUsage
          expr: rate(container_cpu_usage_seconds_total[5m]) / container_spec_cpu_quota * 100000 > 80
          for: 10m
          labels:
            severity: warning
          annotations:
            summary: "CPU 使用率过高"
            description: "容器 {{ $labels.container }} CPU 使用率超过 80%"
        
        - alert: PodNotReady
          expr: kube_pod_status_ready{condition="true"} == 0
          for: 5m
          labels:
            severity: critical
          annotations:
            summary: "Pod 未就绪"
            description: "Pod {{ $labels.pod }} 已经 5 分钟未就绪"

6.2 EFK 日志收集

# fluentbit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
  name: fluent-bit-config
  namespace: logging
data:
  fluent-bit.conf: |
    [SERVICE]
        Flush         5
        Log_Level     info
        Daemon        off
        Parsers_File  parsers.conf
    
    [INPUT]
        Name              tail
        Path              /var/log/containers/*.log
        Parser            docker
        Tag               kube.*
        Refresh_Interval  10
        Mem_Buf_Limit     5MB
        Skip_Long_Lines   On
    
    [FILTER]
        Name                kubernetes
        Match               kube.*
        Kube_URL            https://kubernetes.default.svc:443
        Kube_CA_File        /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
        Kube_Token_File     /var/run/secrets/kubernetes.io/serviceaccount/token
        Kube_Tag_Prefix     kube.var.log.containers.
        Merge_Log           On
        Keep_Log            Off
        K8S-Logging.Parser  On
        K8S-Logging.Exclude On
    
    [OUTPUT]
        Name            es
        Match           *
        Host            elasticsearch.logging.svc
        Port            9200
        Logstash_Format On
        Logstash_Prefix kubernetes-logs
        Retry_Limit     False
        Replace_Dots    On
        Suppress_Type_Name On

七、生产部署检查清单

## 📋 Kubernetes 生产环境部署检查清单

### 集群配置
- [ ] 使用高可用控制平面(至少 3 个 master 节点)
- [ ] 配置外部 etcd 集群
- [ ] 启用 RBAC 权限控制
- [ ] 配置网络策略(Calico/Cilium)
- [ ] 设置合适的 Pod 网络 CIDR
- [ ] 启用审计日志

### 应用部署
- [ ] 所有 Pod 设置资源 requests/limits
- [ ] 配置健康检查(liveness/readiness)
- [ ] 使用 Deployment/StatefulSet 而非直接 Pod
- [ ] 配置 Pod 反亲和性提高可用性
- [ ] 设置 PodDisruptionBudget
- [ ] 使用 ConfigMap/Secret 管理配置

### 安全加固
- [ ] 启用 Pod Security Standards
- [ ] 配置 NetworkPolicy 网络隔离
- [ ] 使用非 root 用户运行容器
- [ ] 启用镜像安全扫描
- [ ] 配置 Secret 加密
- [ ] 定期更新 K8s 版本

### 监控告警
- [ ] 部署 Prometheus + Grafana
- [ ] 配置核心指标告警
- [ ] 部署日志收集系统(EFK/ELK)
- [ ] 配置链路追踪(Jaeger/Zipkin)
- [ ] 建立告警通知渠道

### 备份恢复
- [ ] 定期备份 etcd 数据
- [ ] 备份 PersistentVolume 数据
- [ ] 制定灾难恢复计划
- [ ] 定期演练恢复流程

八、常见问题排查

8.1 Pod 无法启动

# 查看 Pod 状态
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>

# 查看日志
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -c <container-name> -n <namespace>

# 查看事件
kubectl get events -n <namespace> --sort-by='.lastTimestamp'

# 常见问题
# 1. ImagePullBackOff - 镜像拉取失败
# 2. CrashLoopBackOff - 容器反复崩溃
# 3. Pending - 资源不足或调度失败
# 4. Error - 容器启动错误

8.2 服务无法访问

# 检查 Service
kubectl get svc -n <namespace>
kubectl describe svc <svc-name> -n <namespace>

# 检查 Endpoints
kubectl get endpoints <svc-name> -n <namespace>

# 检查 Ingress
kubectl get ingress -n <namespace>
kubectl describe ingress <ingress-name> -n <namespace>

# 测试连通性
kubectl run test --rm -it --image=busybox -- /bin/sh
# 在容器内测试
wget http://<svc-name>.<namespace>.svc.cluster.local:<port>

8.3 节点问题

# 查看节点状态
kubectl get nodes
kubectl describe node <node-name>

# 查看节点资源
kubectl top nodes
kubectl top pods

# 驱逐节点
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data

# 恢复节点
kubectl uncordon <node-name>

九、总结与建议

🎯 核心要点回顾

领域关键实践预期效果
集群架构高可用控制平面 + 外部 etcd可用性 99.95%+
应用部署Deployment + HPA + PDB自动伸缩 + 高可用
安全加固RBAC + NetworkPolicy + PSP多层次安全防护
监控体系Prometheus + EFK + 告警问题快速定位
存储管理StorageClass + PVC持久化数据保障

🚀 实施建议

  1. 渐进式迁移:从非核心业务开始容器化
  2. 标准化规范:建立 K8s 资源命名和标签规范
  3. 自动化部署:使用 GitOps(ArgoCD/Flux)管理应用
  4. 持续学习:关注 K8s 新版本特性和安全更新
  5. 社区参与:加入 K8s 社区,获取最佳实践