14 Go Eino AI应用开发实战 | Kubernetes 部署

58 阅读7分钟

声明:本AI应用开发系列教程首发在同名公众号:王中阳,未经授权禁止转载。

本指南提供了在 Kubernetes 集群上部署 Go-Eino Interview Agent 平台的全面说明。该部署策略利用容器化服务,具备适当的编排、扩展和监控能力。

架构概述

Kubernetes 部署架构遵循微服务模式,每个组件都容器化并通过 Kubernetes 原语管理。系统由前端、后端、数据库、缓存和向量存储服务组成,作为一个 cohesive unit 进行编排。

image.png

前置条件

在部署到 Kubernetes 之前,请确保你具备以下条件:

  • Kubernetes 集群(v1.24+)
  • 已配置集群访问的 kubectl
  • 容器镜像仓库访问权限(Docker Hub、AWS ECR、GCR 等)
  • 至少 8GB RAM 和 4 CPU 内核可用
  • 已为持久卷配置存储类

image.png

容器镜像准备

项目使用多阶段 Docker 构建来优化生产镜像:

后端容器

基于 backend/Dockerfile,后端容器使用 Go 1.24-alpine 进行构建,并使用 Alpine Linux 作为运行时,提供最小化的占用空间和基本的安全特性。

前端容器

基于 frontend/Dockerfile,前端容器使用 Node.js 18-alpine 构建 Next.js 应用程序,并以生产模式提供服务。

构建镜像并推送到你的仓库

docker build -t your-registry/go-eino-backend:latest ./backend
docker push your-registry/go-eino-backend:latest
 
# Frontend  
docker build -t your-registry/go-eino-frontend:latest ./frontend
docker push your-registry/go-eino-frontend:latest

Kubernetes 清单

命名空间和配置

为面试代理创建专用命名空间:

apiVersion: v1
kind: Namespace
metadata:
  name: interview-agent

ConfigMaps

基于 docker-compose.yml 和 docker-compose-prod.yml 的应用程序配置:

apiVersion: v1
kind: ConfigMap
metadata:
  name: interview-agent-config
  namespace: interview-agent
data:
  DB_HOST: "mysql-service"
  DB_PORT: "3306"
  DB_NAME: "interview_agent"
  REDIS_HOST: "redis-service"
  REDIS_PORT: "6379"
  ETCD_ENDPOINTS: "etcd-service:2379"
  TZ: "Asia/Shanghai"

Secrets

敏感数据管理:

apiVersion: v1
kind: Secret
metadata:
  name: interview-agent-secrets
  namespace: interview-agent
type: Opaque
data:
  DB_PASSWORD: <base64-encoded-password>
  REDIS_PASSWORD: <base64-encoded-password>
  MYSQL_ROOT_PASSWORD: <base64-encoded-password>

数据库 StatefulSet

MySQL 部署及持久存储:

apiVersion: apps/v1
kind: StatefulSet
metadata:
  name: mysql
  namespace: interview-agent
spec:
  serviceName: mysql-service
  replicas: 1
  selector:
    matchLabels:
      app: mysql
  template:
    metadata:
      labels:
        app: mysql
    spec:
      containers:
      - name: mysql
        image: mysql:8.0
        ports:
        - containerPort: 3306
        env:
        - name: MYSQL_ROOT_PASSWORD
          valueFrom:
            secretKeyRef:
              name: interview-agent-secrets
              key: MYSQL_ROOT_PASSWORD
        - name: MYSQL_DATABASE
          valueFrom:
            configMapKeyRef:
              name: interview-agent-config
              key: DB_NAME
        volumeMounts:
        - name: mysql-storage
          mountPath: /var/lib/mysql
  volumeClaimTemplates:
  - metadata:
      name: mysql-storage
    spec:
      accessModes: ["ReadWriteOnce"]
      resources:
        requests:
          storage: 20Gi

Redis Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: redis
  namespace: interview-agent
spec:
  replicas: 1
  selector:
    matchLabels:
      app: redis
  template:
    metadata:
      labels:
        app: redis
    spec:
      containers:
      - name: redis
        image: redis:7-alpine
        ports:
        - containerPort: 6379
        command: ["redis-server", "--appendonly", "yes", "--requirepass", "$(REDIS_PASSWORD)"]
        env:
        - name: REDIS_PASSWORD
          valueFrom:
            secretKeyRef:
              name: interview-agent-secrets
              key: REDIS_PASSWORD
        volumeMounts:
        - name: redis-storage
          mountPath: /data
      volumes:
      - name: redis-storage
        persistentVolumeClaim:
          claimName: redis-pvc

后端 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: interview-agent
spec:
  replicas: 3
  selector:
    matchLabels:
      app: backend
  template:
    metadata:
      labels:
        app: backend
    spec:
      containers:
      - name: backend
        image: your-registry/go-eino-backend:latest
        ports:
        - containerPort: 8888
        envFrom:
        - configMapRef:
            name: interview-agent-config
        - secretRef:
            name: interview-agent-secrets
        livenessProbe:
          httpGet:
            path: /health
            port: 8888
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /health
            port: 8888
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "512Mi"
            cpu: "250m"
          limits:
            memory: "1Gi"
            cpu: "500m"

前端 Deployment

apiVersion: apps/v1
kind: Deployment
metadata:
  name: frontend
  namespace: interview-agent
spec:
  replicas: 2
  selector:
    matchLabels:
      app: frontend
  template:
    metadata:
      labels:
        app: frontend
    spec:
      containers:
      - name: frontend
        image: your-registry/go-eino-frontend:latest
        ports:
        - containerPort: 3000
        env:
        - name: NEXT_PUBLIC_API_BASE_URL
          value: "http://backend-service:8888/api"
        livenessProbe:
          httpGet:
            path: /
            port: 3000
          initialDelaySeconds: 30
          periodSeconds: 10
        readinessProbe:
          httpGet:
            path: /
            port: 3000
          initialDelaySeconds: 5
          periodSeconds: 5
        resources:
          requests:
            memory: "256Mi"
            cpu: "100m"
          limits:
            memory: "512Mi"
            cpu: "250m"

Nginx Ingress Controller

基于 nginx.conf 配置:

apiVersion: apps/v1
kind: Deployment
metadata:
  name: nginx
  namespace: interview-agent
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:alpine
        ports:
        - containerPort: 80
        volumeMounts:
        - name: nginx-config
          mountPath: /etc/nginx/conf.d/default.conf
          subPath: default.conf
      volumes:
      - name: nginx-config
        configMap:
          name: nginx-config
---
apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-config
  namespace: interview-agent
data:
  default.conf: |
    upstream backend {
        server backend-service:8888;
    }
    
    upstream frontend {
        server frontend-service:3000;
    }
    
    server {
        listen 80;
        server_name localhost;
        client_max_body_size 100M;
        
        location /api/ {
            proxy_pass http://backend/api/;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_http_version 1.1;
            proxy_set_header Upgrade $http_upgrade;
            proxy_set_header Connection "upgrade";
            proxy_buffering off;
            proxy_cache off;
            proxy_connect_timeout 7d;
            proxy_send_timeout 7d;
            proxy_read_timeout 7d;
        }
        
        location / {
            proxy_pass http://frontend;
            proxy_set_header Host $host;
            proxy_set_header X-Real-IP $remote_addr;
            proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
            proxy_set_header X-Forwarded-Proto $scheme;
            proxy_buffering off;
            proxy_request_buffering off;
        }
        
        location /health {
            access_log off;
            return 200 "healthy\n";
            add_header Content-Type text/plain;
        }
    }

Services

apiVersion: v1
kind: Service
metadata:
  name: mysql-service
  namespace: interview-agent
spec:
  selector:
    app: mysql
  ports:
  - port: 3306
    targetPort: 3306
  clusterIP: None
---
apiVersion: v1
kind: Service
metadata:
  name: redis-service
  namespace: interview-agent
spec:
  selector:
    app: redis
  ports:
  - port: 6379
    targetPort: 6379
---
apiVersion: v1
kind: Service
metadata:
  name: backend-service
  namespace: interview-agent
spec:
  selector:
    app: backend
  ports:
  - port: 8888
    targetPort: 8888
---
apiVersion: v1
kind: Service
metadata:
  name: frontend-service
  namespace: interview-agent
spec:
  selector:
    app: frontend
  ports:
  - port: 3000
    targetPort: 3000
---
apiVersion: v1
kind: Service
metadata:
  name: nginx-service
  namespace: interview-agent
spec:
  selector:
    app: nginx
  ports:
  - port: 80
    targetPort: 80
  type: LoadBalancer

部署流程

分步部署

按依赖顺序部署服务以避免启动失败。数据库和缓存服务必须在部署应用服务之前完全可用。

部署命令

# 1. Create namespace
kubectl apply -f namespace.yaml
 
# 2. Deploy configuration
kubectl apply -f configmap.yaml
kubectl apply -f secrets.yaml
 
# 3. Deploy storage
kubectl apply -f persistent-volumes.yaml
 
# 4. Deploy database layer
kubectl apply -f mysql-statefulset.yaml
kubectl apply -f redis-deployment.yaml
 
# 5. Wait for database readiness
kubectl wait --for=condition=ready pod -l app=mysql -n interview-agent --timeout=300s
kubectl wait --for=condition=ready pod -l app=redis -n interview-agent --timeout=300s
 
# 6. Deploy application services
kubectl apply -f backend-deployment.yaml
kubectl apply -f frontend-deployment.yaml
 
# 7. Deploy ingress
kubectl apply -f nginx-deployment.yaml
kubectl apply -f services.yaml
 
# 8. Verify deployment
kubectl get pods -n interview-agent
kubectl get services -n interview-agent

扩展和高可用性

水平 Pod 自动扩展

为应用服务配置 HPA:

apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
  name: backend-hpa
  namespace: interview-agent
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: backend
  minReplicas: 3
  maxReplicas: 10
  metrics:
  - type: Resource
    resource:
      name: cpu
      target:
        type: Utilization
        averageUtilization: 70
  - type: Resource
    resource:
      name: memory
      target:
        type: Utilization
        averageUtilization: 80

数据库高可用性

对于生产环境,考虑 MySQL 集群:

apiVersion: mysql.oracle.com/v2
kind: InnoDBCluster
metadata:
  name: mysql-cluster
  namespace: interview-agent
spec:
  instances: 3
  router:
    instances: 1
  secretName: mysql-secret
  tlsUseSelfSigned: true

监控和可观测性

健康检查

所有部署都包含基于 backend/Dockerfile 和 frontend/Dockerfile 容器配置的全面健康检查:

  • Liveness Probes:检测并重启不健康的容器
  • Readiness Probes:确保流量仅路由到就绪的容器
  • Startup Probes:处理启动缓慢的应用程序

日志配置

apiVersion: v1
kind: ConfigMap
metadata:
  name: fluentd-config
  namespace: interview-agent
data:
  fluent.conf: |
    <source>
      @type tail
      path /var/log/containers/*interview-agent*.log
      pos_file /var/log/fluentd-containers.log.pos
      tag kubernetes.*
      format json
    </source>
    
    <match kubernetes.**>
      @type elasticsearch
      host elasticsearch-service
      port 9200
      index_name interview-agent-logs
    </match>

指标收集

部署 Prometheus 监控:

apiVersion: v1
kind: ServiceMonitor
metadata:
  name: interview-agent-metrics
  namespace: interview-agent
spec:
  selector:
    matchLabels:
      app: backend
  endpoints:
  - port: metrics
    interval: 30s
    path: /metrics

安全配置

网络策略

实施网络分段:

apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
  name: interview-agent-netpol
  namespace: interview-agent
spec:
  podSelector: {}
  policyTypes:
  - Ingress
  - Egress
  ingress:
  - from:
    - namespaceSelector:
        matchLabels:
          name: interview-agent
  egress:
  - to:
    - namespaceSelector:
        matchLabels:
          name: interview-agent
  - to: []
    ports:
    - protocol: TCP
      port: 53
    - protocol: UDP
      port: 53

Pod 安全策略

apiVersion: policy/v1beta1
kind: PodSecurityPolicy
metadata:
  name: interview-agent-psp
spec:
  privileged: false
  allowPrivilegeEscalation: false
  requiredDropCapabilities:
    - ALL
  volumes:
    - 'configMap'
    - 'emptyDir'
    - 'projected'
    - 'secret'
    - 'downwardAPI'
    - 'persistentVolumeClaim'
  runAsUser:
    rule: 'MustRunAsNonRoot'
  seLinux:
    rule: 'RunAsAny'
  fsGroup:
    rule: 'RunAsAny'

备份和灾难恢复

数据库备份策略

apiVersion: batch/v1
kind: CronJob
metadata:
  name: mysql-backup
  namespace: interview-agent
spec:
  schedule: "0 2 * * *"  # Daily at 2 AM
  jobTemplate:
    spec:
      template:
        spec:
          containers:
          - name: mysql-backup
            image: mysql:8.0
            command:
            - /bin/bash
            - -c
            - |
              mysqldump -h mysql-service -u root -p$MYSQL_ROOT_PASSWORD \
                --single-transaction --routines --triggers \
                interview_agent > /backup/backup-$(date +%Y%m%d).sql
            env:
            - name: MYSQL_ROOT_PASSWORD
              valueFrom:
                secretKeyRef:
                  name: interview-agent-secrets
                  key: MYSQL_ROOT_PASSWORD
            volumeMounts:
            - name: backup-storage
              mountPath: /backup
          volumes:
          - name: backup-storage
            persistentVolumeClaim:
              claimName: backup-pvc
          restartPolicy: OnFailure

故障排除指南

常见问题和解决方案

问题症状解决方案
Pod Pending资源不足检查节点容量并调整资源请求
数据库连接失败后端日志显示连接错误验证服务名称和网络策略
内存使用率高OOMKilled 事件增加内存限制或优化应用程序
响应时间慢高延迟指标水平扩展或优化数据库查询

调试命令

# Check pod status
kubectl get pods -n interview-agent -o wide
 
# View pod logs
kubectl logs -f deployment/backend -n interview-agent
 
# Debug pod issues
kubectl exec -it deployment/backend -n interview-agent -- /bin/sh
 
# Check resource usage
kubectl top pods -n interview-agent
 
# Verify service connectivity
kubectl port-forward service/backend-service 8888:8888 -n interview-agent

注意:在应用到生产环境之前,始终在预发环境中测试部署。使用金丝雀部署逐步推出新版本。

性能优化

资源调优

基于容器配置,优化资源分配:

  • 后端:每个副本从 512Mi 内存、250m CPU 开始
  • 前端:每个副本从 256Mi 内存、100m CPU 开始
  • 数据库:MySQL 最少 2Gi 内存、1 CPU
  • Redis:最少 256Mi 内存、100m CPU

缓存策略

实施 Redis 缓存以存储频繁访问的数据:

apiVersion: v1
kind: ConfigMap
metadata:
  name: redis-config
  namespace: interview-agent
data:
  redis.conf: |
    maxmemory 256mb
    maxmemory-policy allkeys-lru
    save 900 1
    save 300 10
    save 60 10000

迁移策略

从 Docker Compose 迁移到 Kubernetes

从 docker-compose.yml 迁移到 Kubernetes 涉及:

  1. 服务映射:将每个服务转换为 Kubernetes Deployments/StatefulSets
  2. 卷管理:将绑定挂载替换为 PersistentVolumes
  3. 网络:将 Compose 网络转换为 Kubernetes Services 和 NetworkPolicies
  4. 配置:将环境变量移动到 ConfigMaps 和 Secrets
  5. 健康检查:实施 Kubernetes 原生健康探针

滚动更新策略

apiVersion: apps/v1
kind: Deployment
metadata:
  name: backend
  namespace: interview-agent
spec:
  strategy:
    type: RollingUpdate
    rollingUpdate:
      maxUnavailable: 25%
      maxSurge: 25%
  # ... rest of deployment spec

后续步骤

完成 Kubernetes 部署后,建议探索:

  1. 性能优化 以调整集群性能
  2. 安全实施 以实现高级安全配置
  3. 监控和可观测性 以建立全面的监控体系

Kubernetes 部署为 Go-Eino Interview Agent 平台提供了可扩展、有弹性的基础,通过适当的监控、扩展和灾难恢复能力实现生产级运营。