挑战用Kubernetes部署Redis集群

37 阅读3分钟

第一步:认清现实——K8s部署Redis集群的挑战

# 你会遇到的灵魂拷问:
- 如何让Redis节点发现彼此? (K8s网络:听我说谢谢你)
- 数据持久化怎么搞? (PV表示:这锅我不背)
- 自动扩缩容怎么办? (HPA:你礼貌吗?)

第二步:选择你的武器——部署方案PK

方案1:StatefulSet+手动配置(适合硬核玩家)

方案2:Operator一键部署(适合想准时下班的)

方案3:Helm Chart三分钟成型(适合赶 DDL 的)


第三步:硬核模式——手动部署全流程

1. 创建命名空间(豪华单间)

# redis-ns.yaml
apiVersion: v1
kind: Namespace
metadata:
  name: redis-club

2. 配置StorageClass(给数据买个保险柜)

# redis-storage.yaml
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
  name: redis-ssd
provisioner: kubernetes.io/gce-pd
parameters:
  type: pd-ssd
allowVolumeExpansion: true

3. 部署StatefulSet(核心大招)

# redis-statefulset.yaml
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-node
  namespace: redis-club
spec:
  serviceName: redis-headless
  replicas: 6  # 3主3从
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      initContainers:
      - name: config-init
        image: redis:7.0
        command: ["/bin/sh", "-c"]
        args:
          - |
            # 生成集群节点配置
            PEERS="redis-node-0.redis-headless.redis-club.svc.cluster.local:6379"
            for i in 1 2 3 4 5; do
              PEERS="$PEERS redis-node-$i.redis-headless.redis-club.svc.cluster.local:6379"
            done
            echo "yes" | redis-cli --cluster create $PEERS --cluster-replicas 1
        volumeMounts:
          - name: config
            mountPath: /redis-config
      containers:
      - name: redis
        image: redis:7.0
        command: ["redis-server", "/redis-config/redis.conf"]
        ports:
        - containerPort: 6379
        volumeMounts:
          - name: data
            mountPath: /data
          - name: config
            mountPath: /redis-config
        readinessProbe:
          exec:
            command: ["redis-cli", "ping"]
          initialDelaySeconds: 5
          periodSeconds: 5
      volumes:
      - name: config
        configMap:
          name: redis-config
  volumeClaimTemplates:
  - metadata:
      name: data
    spec:
      accessModes: [ "ReadWriteOnce" ]
      storageClassName: "redis-ssd"
      resources:
        requests:
          storage: 10Gi

4. 配套服务(让集群能说人话)

# 内部通信服务
apiVersion: v1
kind: Service
metadata:
  name: redis-headless
  namespace: redis-club
spec:
  clusterIP: None
  ports:
  - port: 6379
  selector:
    app: redis

# 外部访问服务(NodePort示例)
apiVersion: v1
kind: Service
metadata:
  name: redis-external
  namespace: redis-club
spec:
  type: NodePort
  ports:
  - port: 6379
    targetPort: 6379
    nodePort: 30637
  selector:
    app: redis

第四步:验证你的漂移技术

1. 检查 Pod 状态

kubectl get pods -n redis-club -l app=redis -w
# 理想状态:6/6 Running,否则准备查日志吧少年

2. 查看集群状态

kubectl exec -it redis-node-0 -n redis-club -- redis-cli cluster nodes
# 应该看到3个master和3个slave的相爱相杀

3. 压力测试 (看看会不会翻车)

kubectl run memtier --image=redislabs/memtier_benchmark -n redis-club -- \
  --server redis-node-0.redis-headless.redis-club.svc.cluster.local \
  --port 6379 \
  --threads 4 \
  --clients 50 \
  --test-time 60

第五步:高阶漂移技巧

1. 自动故障转移(哨兵模式)

# 哨兵专用StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: redis-sentinel
  namespace: redis-club
spec:
  serviceName: sentinel-headless
  replicas: 3
  # ...类似Redis节点的配置,使用sentinel.conf...

2. 动态扩缩容(表演时刻)

# 扩容到8节点
kubectl scale statefulset redis-node -n redis-club --replicas=8

# 然后手动reshard数据
kubectl exec -it redis-node-0 -n redis-club -- redis-cli --cluster reshard redis-node-0:6379

3. 备份恢复(后悔药)

# Velero备份示例
velero backup create redis-backup \
  --include-namespaces redis-club \
  --selector app=redis

第六步:翻车后的自救指南

Q:节点无法加入集群?

✅ 检查DNS解析:nslookup redis-headless.redis-club.svc.cluster.local

✅ 查看节点日志:kubectl logs redis-node-0 -n redis-club

✅ 确认防火墙规则开放了总线端口(通常6379+10000)

Q:数据持久化失败?

✅ 检查PVC状态:kubectl get pvc -n redis-club

✅ 验证StorageClass配置:kubectl get storageclass redis-ssd

✅ 测试PV挂载:kubectl exec -it redis-node-0 -- df -h /data

Q:性能拉胯?

✅ 检查网络延迟:kubectl exec -it redis-node-0 -- ping redis-node-1

✅ 调整内核参数:

sysctl -w net.core.somaxconn=65535
echo never > /sys/kernel/mm/transparent_hugepage/enabled

终极奥义

生产环境请直接使用Redis Operator!

(除非你想在简历写"精通K8s故障排查")