在2026年的今天,Kubernetes 已成为容器编排的事实标准。根据 CNCF 最新调查报告:
| 指标 | 数据 |
|---|
| K8s 采用率 | 生产环境占比 96% |
| 平均集群规模 | 生产集群平均 127 节点 |
| 容器化应用比例 | 新应用容器化率 89% |
| 开发者满意度 | K8s 满意度 91% |
本文将从基础概念、核心资源、应用部署、生产实践四个维度,分享一套经过多个生产环境验证的Kubernetes 实战方案。
一、Kubernetes 核心概念速览
1.1 架构组成
┌─────────────────────────────────────────────────────────────────┐
│ Kubernetes 集群架构 │
├─────────────────────────────────────────────────────────────────┤
│ │
│ ┌─────────────────────┐ ┌─────────────────────────┐ │
│ │ Control Plane │ │ Worker Nodes │ │
│ │ (控制平面) │ │ (工作节点) │ │
│ ├─────────────────────┤ ├─────────────────────────┤ │
│ │ • API Server │◄───────►│ • Kubelet │ │
│ │ • etcd │ │ • Kube-proxy │ │
│ │ • Scheduler │ │ • Container Runtime │ │
│ │ • Controller Mgr │ │ • Pods │ │
│ └─────────────────────┘ └─────────────────────────┘ │
│ │
│ ┌─────────────────────────────────────────────────────────┐ │
│ │ Cluster Network │ │
│ │ (Calico/Flannel/Cilium) │ │
│ └─────────────────────────────────────────────────────────┘ │
│ │
└─────────────────────────────────────────────────────────────────┘
1.2 核心资源对象
| 资源类型 | 用途 | 使用场景 |
|---|
| Pod | 最小调度单元 | 容器组部署 |
| Deployment | 无状态应用 | Web 服务、API |
| StatefulSet | 有状态应用 | 数据库、缓存 |
| DaemonSet | 节点级应用 | 日志收集、监控 |
| Service | 服务发现与负载均衡 | 服务暴露 |
| Ingress | 外部访问入口 | HTTP/HTTPS 路由 |
| ConfigMap | 配置管理 | 应用配置 |
| Secret | 敏感信息管理 | 密码、证书 |
| PersistentVolume | 持久化存储 | 数据持久化 |
| Namespace | 资源隔离 | 多环境隔离 |
二、快速入门:搭建第一个 K8s 集群
2.1 本地开发环境(推荐)
curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.22.0/kind-linux-amd64
chmod +x ./kind
sudo mv ./kind /usr/local/bin/kind
kind create cluster --name my-cluster --config kind-config.yaml
curl -LO https://storage.googleapis.com/minikube/releases/latest/minikube-linux-amd64
sudo install minikube-linux-amd64 /usr/local/bin/minikube
minikube start --cpus=4 --memory=4096
kind: Cluster
apiVersion: kind.x-k8s.io/v1alpha4
nodes:
- role: control-plane
kubeadmConfigPatches:
- |
kind: InitConfiguration
nodeRegistration:
kubeletExtraArgs:
node-labels: "ingress-ready=true"
extraPortMappings:
- containerPort: 80
hostPort: 80
protocol: TCP
- containerPort: 443
hostPort: 443
protocol: TCP
- role: worker
- role: worker
2.2 生产环境部署(kubeadm)
sudo swapoff -a
sudo sed -i '/ swap / s/^(.*)$/#\1/g' /etc/fstab
cat <<EOF | sudo tee /etc/sysctl.d/k8s.conf
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.ipv4.ip_forward = 1
EOF
sudo sysctl --system
sudo apt-get update
sudo apt-get install -y containerd
sudo mkdir -p /etc/containerd
containerd config default | sudo tee /etc/containerd/config.toml
sudo sed -i 's/SystemdCgroup = false/SystemdCgroup = true/' /etc/containerd/config.toml
sudo systemctl restart containerd
sudo apt-get update
sudo apt-get install -y apt-transport-https ca-certificates curl
curl -fsSL https://pkgs.k8s.io/core:/stable:/v1.29/deb/Release.key | sudo gpg --dearmor -o /etc/apt/keyrings/kubernetes-apt-keyring.gpg
echo 'deb [signed-by=/etc/apt/keyrings/kubernetes-apt-keyring.gpg] https://pkgs.k8s.io/core:/stable:/v1.29/deb/ /' | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt-get update
sudo apt-get install -y kubelet kubeadm kubectl
sudo apt-mark hold kubelet kubeadm kubectl
sudo kubeadm init \
--pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=192.168.1.100 \
--control-plane-endpoint=192.168.1.100
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
kubectl apply -f https://raw.githubusercontent.com/projectcalico/calico/v3.27.0/manifests/calico.yaml
kubeadm token create --print-join-command
sudo kubeadm join 192.168.1.100:6443 \
--token <token> \
--discovery-token-ca-cert-hash sha256:<hash>
三、核心资源详解与实战
3.1 Pod 基础
apiVersion: v1
kind: Pod
metadata:
name: nginx-pod
labels:
app: nginx
version: v1.0
annotations:
description: "基础 Nginx Pod 示例"
spec:
restartPolicy: Always
containers:
- name: nginx
image: nginx:1.25-alpine
imagePullPolicy: IfNotPresent
ports:
- containerPort: 80
protocol: TCP
name: http
resources:
requests:
memory: "64Mi"
cpu: "100m"
limits:
memory: "128Mi"
cpu: "200m"
livenessProbe:
httpGet:
path: /
port: 80
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health
port: 80
initialDelaySeconds: 5
periodSeconds: 5
timeoutSeconds: 3
failureThreshold: 3
env:
- name: ENV
value: "production"
- name: CONFIG_PATH
value: "/etc/config"
volumeMounts:
- name: config-volume
mountPath: /etc/config
- name: data-volume
mountPath: /usr/share/nginx/html
volumes:
- name: config-volume
configMap:
name: nginx-config
- name: data-volume
emptyDir: {}
nodeSelector:
disktype: ssd
tolerations:
- key: "node-role.kubernetes.io/master"
operator: "Exists"
effect: "NoSchedule"
kubectl apply -f pod-basic.yaml
kubectl get pods
kubectl get pods -o wide
kubectl describe pod nginx-pod
kubectl logs nginx-pod
kubectl exec -it nginx-pod -- /bin/bash
kubectl delete pod nginx-pod
3.2 Deployment 无状态应用
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-app
namespace: production
labels:
app: web-app
tier: frontend
spec:
replicas: 3
selector:
matchLabels:
app: web-app
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: web-app
version: v2.0
annotations:
prometheus.io/scrape: "true"
prometheus.io/port: "8080"
spec:
serviceAccountName: web-app-sa
securityContext:
runAsNonRoot: true
runAsUser: 1000
fsGroup: 1000
containers:
- name: web-app
image: myregistry/web-app:v2.0
imagePullPolicy: Always
ports:
- containerPort: 8080
name: http
resources:
requests:
memory: "256Mi"
cpu: "250m"
limits:
memory: "512Mi"
cpu: "500m"
livenessProbe:
httpGet:
path: /health/live
port: 8080
initialDelaySeconds: 30
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 3
readinessProbe:
httpGet:
path: /health/ready
port: 8080
initialDelaySeconds: 10
periodSeconds: 10
timeoutSeconds: 3
failureThreshold: 3
env:
- name: DATABASE_URL
valueFrom:
secretKeyRef:
name: db-secret
key: url
- name: REDIS_HOST
valueFrom:
configMapKeyRef:
name: app-config
key: redis-host
volumeMounts:
- name: config
mountPath: /app/config
readOnly: true
- name: logs
mountPath: /app/logs
volumes:
- name: config
configMap:
name: app-config
- name: logs
emptyDir: {}
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 100
podAffinityTerm:
labelSelector:
matchLabels:
app: web-app
topologyKey: kubernetes.io/hostname
imagePullSecrets:
- name: registry-secret
kubectl apply -f deployment.yaml
kubectl get deployments
kubectl rollout status deployment/web-app
kubectl rollout history deployment/web-app
kubectl rollout undo deployment/web-app
kubectl scale deployment/web-app --replicas=5
kubectl set image deployment/web-app web-app=myregistry/web-app:v2.1
3.3 StatefulSet 有状态应用
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: database
spec:
serviceName: mysql
replicas: 3
selector:
matchLabels:
app: mysql
updateStrategy:
type: RollingUpdate
template:
metadata:
labels:
app: mysql
spec:
containers:
- name: mysql
image: mysql:8.0
ports:
- containerPort: 3306
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: root-password
- name: MYSQL_DATABASE
value: "appdb"
resources:
requests:
memory: "1Gi"
cpu: "500m"
limits:
memory: "2Gi"
cpu: "1000m"
volumeMounts:
- name: data
mountPath: /var/lib/mysql
- name: config
mountPath: /etc/mysql/conf.d
livenessProbe:
exec:
command:
- mysqladmin
- ping
- -h
- localhost
initialDelaySeconds: 30
periodSeconds: 10
readinessProbe:
exec:
command:
- mysql
- -h
- localhost
- -e
- "SELECT 1"
initialDelaySeconds: 10
periodSeconds: 5
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: fast-ssd
resources:
requests:
storage: 50Gi
3.4 Service 服务发现
---
apiVersion: v1
kind: Service
metadata:
name: web-app-internal
namespace: production
spec:
type: ClusterIP
selector:
app: web-app
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: web-app-nodeport
namespace: production
spec:
type: NodePort
selector:
app: web-app
ports:
- name: http
port: 80
targetPort: 8080
nodePort: 30080
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: web-app-lb
namespace: production
annotations:
service.beta.kubernetes.io/aws-load-balancer-type: "nlb"
service.beta.kubernetes.io/aws-load-balancer-internal: "false"
spec:
type: LoadBalancer
selector:
app: web-app
ports:
- name: http
port: 80
targetPort: 8080
protocol: TCP
---
apiVersion: v1
kind: Service
metadata:
name: mysql-headless
namespace: database
spec:
clusterIP: None
selector:
app: mysql
ports:
- name: mysql
port: 3306
targetPort: 3306
3.5 Ingress 外部访问
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: web-app-ingress
namespace: production
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/ssl-redirect: "true"
nginx.ingress.kubernetes.io/proxy-body-size: "50m"
nginx.ingress.kubernetes.io/rate-limit: "100"
nginx.ingress.kubernetes.io/rate-limit-window: "1m"
cert-manager.io/cluster-issuer: "letsencrypt-prod"
spec:
ingressClassName: nginx
tls:
- hosts:
- app.example.com
- api.example.com
secretName: app-tls-secret
rules:
- host: app.example.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: web-app-internal
port:
number: 80
- host: api.example.com
http:
paths:
- path: /api
pathType: Prefix
backend:
service:
name: api-service
port:
number: 8080
- path: /health
pathType: Exact
backend:
service:
name: api-service
port:
number: 8080
# 安装 Nginx Ingress Controller
kubectl apply -f https:
# 验证安装
kubectl get pods -n ingress-nginx
kubectl get svc -n ingress-nginx
四、配置与存储管理
4.1 ConfigMap 配置管理
apiVersion: v1
kind: ConfigMap
metadata:
name: app-config
namespace: production
data:
redis-host: "redis.production.svc.cluster.local"
redis-port: "6379"
log-level: "info"
app.properties: |
server.port=8080
server.context-path=/api
spring.datasource.url=jdbc:mysql://mysql:3306/appdb
spring.redis.host=redis
config.json: |
{
"featureFlags": {
"newUI": true,
"betaFeatures": false
},
"limits": {
"maxConnections": 1000,
"timeout": 30
}
}
# ConfigMap 操作
kubectl create configmap app-config --from-file=./config/ --from-literal=log-level=info
kubectl get configmap
kubectl describe configmap app-config
kubectl edit configmap app-config
4.2 Secret 敏感信息管理
apiVersion: v1
kind: Secret
metadata:
name: db-secret
namespace: production
type: Opaque
stringData:
username: admin
password: "SuperSecret123!"
host: "mysql.production.svc.cluster.local"
port: "3306"
---
apiVersion: v1
kind: Secret
metadata:
name: registry-secret
namespace: production
type: kubernetes.io/dockerconfigjson
data:
.dockerconfigjson: eyJhdXRocyI6e...
---
apiVersion: v1
kind: Secret
metadata:
name: app-tls-secret
namespace: production
type: kubernetes.io/tls
data:
tls.crt: LS0tLS1CRUdJTi...
tls.key: LS0tLS1CRUdJTi...
# Secret 操作
kubectl create secret generic db-secret --from-literal=username=admin --from-literal=password=SuperSecret123!
kubectl create secret tls app-tls-secret --cert=tls.crt --key=tls.key
kubectl create secret docker-registry registry-secret --docker-server=docker.io --docker-username=user --docker-password=pass
kubectl get secrets
4.3 PersistentVolume 持久化存储
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: fast-ssd
provisioner: kubernetes.io/aws-ebs
parameters:
type: gp3
fsType: ext4
encrypted: "true"
reclaimPolicy: Retain
allowVolumeExpansion: true
mountOptions:
- debug
volumeBindingMode: WaitForFirstConsumer
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-data-001
spec:
capacity:
storage: 100Gi
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: fast-ssd
hostPath:
path: /mnt/data
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-data
namespace: production
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 50Gi
storageClassName: fast-ssd
五、生产环境最佳实践
5.1 资源配额与限制
apiVersion: v1
kind: ResourceQuota
metadata:
name: production-quota
namespace: production
spec:
hard:
requests.cpu: "10"
requests.memory: 20Gi
limits.cpu: "20"
limits.memory: 40Gi
pods: "50"
services: "20"
secrets: "50"
configmaps: "50"
persistentvolumeclaims: "20"
---
apiVersion: v1
kind: LimitRange
metadata:
name: container-limits
namespace: production
spec:
limits:
- type: Container
default:
cpu: "500m"
memory: "512Mi"
defaultRequest:
cpu: "100m"
memory: "128Mi"
max:
cpu: "2"
memory: "4Gi"
min:
cpu: "50m"
memory: "64Mi"
5.2 网络策略
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: web-app-policy
namespace: production
spec:
podSelector:
matchLabels:
app: web-app
policyTypes:
- Ingress
- Egress
ingress:
- from:
- namespaceSelector:
matchLabels:
name: ingress-nginx
- podSelector:
matchLabels:
app: api-gateway
ports:
- protocol: TCP
port: 8080
egress:
- to:
- namespaceSelector:
matchLabels:
name: database
- podSelector:
matchLabels:
app: mysql
ports:
- protocol: TCP
port: 3306
- to:
- namespaceSelector: {}
ports:
- protocol: UDP
port: 53
5.3 自动伸缩
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: web-app-hpa
namespace: production
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
minReplicas: 3
maxReplicas: 20
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
- type: Pods
pods:
metric:
name: http_requests_per_second
target:
type: AverageValue
averageValue: 100
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
scaleUp:
stabilizationWindowSeconds: 0
policies:
- type: Percent
value: 100
periodSeconds: 15
- type: Pods
value: 4
periodSeconds: 15
selectPolicy: Max
---
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: web-app-vpa
namespace: production
spec:
targetRef:
apiVersion: apps/v1
kind: Deployment
name: web-app
updatePolicy:
updateMode: Auto
resourcePolicy:
containerPolicies:
- containerName: '*'
minAllowed:
cpu: 100m
memory: 128Mi
maxAllowed:
cpu: 2
memory: 4Gi
5.4 Pod disruption budget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: web-app-pdb
namespace: production
spec:
minAvailable: 2
selector:
matchLabels:
app: web-app
六、监控与日志体系
6.1 Prometheus + Grafana 监控
apiVersion: monitoring.coreos.com/v1
kind: ServiceMonitor
metadata:
name: web-app-monitor
namespace: production
labels:
release: prometheus
spec:
selector:
matchLabels:
app: web-app
endpoints:
- port: http
path: /metrics
interval: 30s
scrapeTimeout: 10s
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: app-alerts
namespace: monitoring
spec:
groups:
- name: application.rules
rules:
- alert: PodCrashLooping
expr: rate(kube_pod_container_status_restarts_total[15m]) * 60 * 5 > 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pod 频繁重启"
description: "Pod {{ $labels.pod }} 在 5 分钟内重启超过 5 次"
- alert: HighMemoryUsage
expr: container_memory_usage_bytes / container_spec_memory_limit_bytes > 0.9
for: 5m
labels:
severity: warning
annotations:
summary: "内存使用率过高"
description: "容器 {{ $labels.container }} 内存使用率超过 90%"
- alert: HighCPUUsage
expr: rate(container_cpu_usage_seconds_total[5m]) / container_spec_cpu_quota * 100000 > 80
for: 10m
labels:
severity: warning
annotations:
summary: "CPU 使用率过高"
description: "容器 {{ $labels.container }} CPU 使用率超过 80%"
- alert: PodNotReady
expr: kube_pod_status_ready{condition="true"} == 0
for: 5m
labels:
severity: critical
annotations:
summary: "Pod 未就绪"
description: "Pod {{ $labels.pod }} 已经 5 分钟未就绪"
6.2 EFK 日志收集
# fluentbit-config.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: fluent-bit-config
namespace: logging
data:
fluent-bit.conf: |
[SERVICE]
Flush 5
Log_Level info
Daemon off
Parsers_File parsers.conf
[INPUT]
Name tail
Path /var/log/containers/*.log
Parser docker
Tag kube.*
Refresh_Interval 10
Mem_Buf_Limit 5MB
Skip_Long_Lines On
[FILTER]
Name kubernetes
Match kube.*
Kube_URL https://kubernetes.default.svc:443
Kube_CA_File /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
Kube_Token_File /var/run/secrets/kubernetes.io/serviceaccount/token
Kube_Tag_Prefix kube.var.log.containers.
Merge_Log On
Keep_Log Off
K8S-Logging.Parser On
K8S-Logging.Exclude On
[OUTPUT]
Name es
Match *
Host elasticsearch.logging.svc
Port 9200
Logstash_Format On
Logstash_Prefix kubernetes-logs
Retry_Limit False
Replace_Dots On
Suppress_Type_Name On
七、生产部署检查清单
- [ ] 使用高可用控制平面(至少 3 个 master 节点)
- [ ] 配置外部 etcd 集群
- [ ] 启用 RBAC 权限控制
- [ ] 配置网络策略(Calico/Cilium)
- [ ] 设置合适的 Pod 网络 CIDR
- [ ] 启用审计日志
- [ ] 所有 Pod 设置资源 requests/limits
- [ ] 配置健康检查(liveness/readiness)
- [ ] 使用 Deployment/StatefulSet 而非直接 Pod
- [ ] 配置 Pod 反亲和性提高可用性
- [ ] 设置 PodDisruptionBudget
- [ ] 使用 ConfigMap/Secret 管理配置
- [ ] 启用 Pod Security Standards
- [ ] 配置 NetworkPolicy 网络隔离
- [ ] 使用非 root 用户运行容器
- [ ] 启用镜像安全扫描
- [ ] 配置 Secret 加密
- [ ] 定期更新 K8s 版本
- [ ] 部署 Prometheus + Grafana
- [ ] 配置核心指标告警
- [ ] 部署日志收集系统(EFK/ELK)
- [ ] 配置链路追踪(Jaeger/Zipkin)
- [ ] 建立告警通知渠道
- [ ] 定期备份 etcd 数据
- [ ] 备份 PersistentVolume 数据
- [ ] 制定灾难恢复计划
- [ ] 定期演练恢复流程
八、常见问题排查
8.1 Pod 无法启动
kubectl get pods -n <namespace>
kubectl describe pod <pod-name> -n <namespace>
kubectl logs <pod-name> -n <namespace>
kubectl logs <pod-name> -c <container-name> -n <namespace>
kubectl get events -n <namespace> --sort-by='.lastTimestamp'
8.2 服务无法访问
# 检查 Service
kubectl get svc -n <namespace>
kubectl describe svc <svc-name> -n <namespace>
# 检查 Endpoints
kubectl get endpoints <svc-name> -n <namespace>
# 检查 Ingress
kubectl get ingress -n <namespace>
kubectl describe ingress <ingress-name> -n <namespace>
# 测试连通性
kubectl run test --rm -it --image=busybox -- /bin/sh
# 在容器内测试
wget http:
8.3 节点问题
# 查看节点状态
kubectl get nodes
kubectl describe node <node-name>
# 查看节点资源
kubectl top nodes
kubectl top pods
# 驱逐节点
kubectl drain <node-name> --ignore-daemonsets --delete-emptydir-data
# 恢复节点
kubectl uncordon <node-name>
九、总结与建议
🎯 核心要点回顾
| 领域 | 关键实践 | 预期效果 |
|---|
| 集群架构 | 高可用控制平面 + 外部 etcd | 可用性 99.95%+ |
| 应用部署 | Deployment + HPA + PDB | 自动伸缩 + 高可用 |
| 安全加固 | RBAC + NetworkPolicy + PSP | 多层次安全防护 |
| 监控体系 | Prometheus + EFK + 告警 | 问题快速定位 |
| 存储管理 | StorageClass + PVC | 持久化数据保障 |
🚀 实施建议
- 渐进式迁移:从非核心业务开始容器化
- 标准化规范:建立 K8s 资源命名和标签规范
- 自动化部署:使用 GitOps(ArgoCD/Flux)管理应用
- 持续学习:关注 K8s 新版本特性和安全更新
- 社区参与:加入 K8s 社区,获取最佳实践