在AI大模型浪潮中,仅仅训练模型已经不够了。如何高效地部署、管理和服务大模型,才是真正产生商业价值的关键。今天我将带大家从零开始,用Kubernetes编排MindSpore大模型服务,用Milvus构建向量数据库,搭建一套完整的AI基础设施。
1. 环境规划:我们的技术栈架构
1.1 系统架构图
┌─────────────────────────────────────────────────────┐
│ Kubernetes集群 │
├─────────────┬──────────────┬───────────────────────┤
│ MindSpore │ Milvus │ 应用服务层 │
│ 模型服务 │ 向量数据库 │ (API Gateway, │
│ (Pod) │ (StatefulSet)│ 监控, 日志) │
├─────────────┼──────────────┼───────────────────────┤
│ GPU资源 │ PV/PVC存储 │ 服务网格 │
│ (DevicePlugin) │ │ (Istio可选) │
└─────────────┴──────────────┴───────────────────────┘
1.2 硬件要求
- 至少3节点K8s集群(1个Master,2个Worker)
- Worker节点:至少32GB RAM,100GB存储,NVIDIA GPU(可选)
- 网络:节点间千兆互联
1.3 软件版本
- Kubernetes: 1.24+
- MindSpore: 2.2+
- Milvus: 2.3+
- Helm: 3.12+
- NVIDIA GPU Operator: 1.13+(如有GPU)
2. 第一步:搭建Kubernetes集群
2.1 使用kubeadm快速搭建集群(三节点)
Master节点执行:
#!/bin/bash
# 安装依赖
sudo apt update
sudo apt install -y apt-transport-https ca-certificates curl
# 安装Docker
sudo apt install -y docker.io
sudo systemctl enable docker
sudo systemctl start docker
# 安装kubeadm, kubelet, kubectl
sudo curl -fsSLo /usr/share/keyrings/kubernetes-archive-keyring.gpg https://packages.cloud.google.com/apt/doc/apt-key.gpg
echo "deb [signed-by=/usr/share/keyrings/kubernetes-archive-keyring.gpg] https://apt.kubernetes.io/ kubernetes-xenial main" | sudo tee /etc/apt/sources.list.d/kubernetes.list
sudo apt update
sudo apt install -y kubelet=1.28.0-00 kubeadm=1.28.0-00 kubectl=1.28.0-00
sudo apt-mark hold kubelet kubeadm kubectl
# 初始化集群
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 \
--apiserver-advertise-address=$(hostname -I | awk '{print $1}')
# 配置kubectl
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
# 安装网络插件(Flannel)
kubectl apply -f https://github.com/flannel-io/flannel/releases/latest/download/kube-flannel.yml
Worker节点加入集群:
# 在Master节点获取加入命令
kubeadm token create --print-join-command
# 在Worker节点执行输出的命令,例如:
# sudo kubeadm join 192.168.1.100:6443 --token xxxxxx --discovery-token-ca-cert-hash sha256:xxxx
2.2 验证集群状态
# 查看节点状态
kubectl get nodes -o wide
# 查看所有Pod状态
kubectl get pods --all-namespaces
# 查看集群信息
kubectl cluster-info
3. 第二步:部署GPU支持(可选,如有GPU)
3.1 安装NVIDIA GPU Operator
# 添加NVIDIA Helm仓库
helm repo add nvidia https://helm.ngc.nvidia.com/nvidia
helm repo update
# 安装GPU Operator
helm install --wait --generate-name \
-n gpu-operator --create-namespace \
nvidia/gpu-operator \
--set driver.enabled=false \
--set toolkit.enabled=true
# 验证安装
kubectl get pods -n gpu-operator
kubectl describe node | grep -A 10 Capacity
4. 第三步:部署Milvus向量数据库
4.1 使用Helm部署Milvus
# 添加Milvus Helm仓库
helm repo add milvus https://milvus-io.github.io/milvus-helm
helm repo update
# 创建命名空间
kubectl create namespace milvus
# 安装Milvus(生产环境推荐使用独立部署模式)
helm install milvus milvus/milvus \
--namespace milvus \
--set etcd.replicaCount=1 \
--set minio.mode=standalone \
--set standalone.resources.requests.memory=4Gi \
--set standalone.resources.requests.cpu=2 \
--set standalone.persistence.enabled=true \
--set standalone.persistence.size=50Gi
# 或者使用自定义配置文件
cat > milvus-values.yaml << EOF
# Milvus配置
cluster:
enabled: false # 使用单机模式,适合测试
standalone:
replicas: 1
resources:
requests:
memory: "8Gi"
cpu: "2"
limits:
memory: "16Gi"
cpu: "4"
persistence:
enabled: true
size: 100Gi
storageClass: "standard"
# 依赖服务配置
etcd:
replicaCount: 1
persistence:
enabled: true
size: 10Gi
minio:
mode: standalone
persistence:
enabled: true
size: 50Gi
# 暴露服务
service:
type: NodePort
ports:
milvus: 19530
EOF
# 使用自定义配置安装
helm install milvus milvus/milvus \
-f milvus-values.yaml \
--namespace milvus
4.2 验证Milvus部署
# 查看Milvus组件状态
kubectl get pods -n milvus -w
# 检查服务
kubectl get svc -n milvus
# 测试连接(需要等待所有Pod都Ready)
kubectl port-forward svc/milvus -n milvus 19530:19530 &
# 在另一个终端执行
pip install pymilvus
python3 -c "
from pymilvus import connections, CollectionSchema, FieldSchema, DataType, Collection, utility
# 连接Milvus
connections.connect(host='localhost', port='19530')
# 检查连接
print('连接状态:', utility.get_server_version())
# 创建一个测试集合
fields = [
FieldSchema(name='id', dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name='embedding', dtype=DataType.FLOAT_VECTOR, dim=128)
]
schema = CollectionSchema(fields, description='测试集合')
collection = Collection('test_collection', schema)
print('Milvus连接成功!')
"
5. 第四步:构建MindSpore大模型镜像
5.1 创建MindSpore模型服务
# 创建项目目录
mkdir mindspore-milvus-demo
cd mindspore-milvus-demo
5.2 编写模型推理代码
创建 model_server.py:
#!/usr/bin/env python3
"""
MindSpore大模型服务,集成Milvus向量数据库
"""
import numpy as np
import mindspore as ms
import mindspore.nn as nn
import mindspore.ops as ops
from mindspore import Tensor, context
from mindspore.train.serialization import load_checkpoint, load_param_into_net
from flask import Flask, request, jsonify
import json
import time
import threading
from typing import List, Dict, Any
import logging
from pymilvus import connections, Collection, CollectionSchema, FieldSchema, DataType
# 配置日志
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# Flask应用
app = Flask(__name__)
class TextEncoder(nn.Cell):
"""文本编码器(示例模型)"""
def __init__(self, input_dim=768, hidden_dim=1024, output_dim=128):
super(TextEncoder, self).__init__()
self.fc1 = nn.Dense(input_dim, hidden_dim)
self.relu = nn.ReLU()
self.fc2 = nn.Dense(hidden_dim, output_dim)
self.norm = nn.LayerNorm([output_dim])
def construct(self, x):
x = self.fc1(x)
x = self.relu(x)
x = self.fc2(x)
x = self.norm(x)
return x
class VectorSearchEngine:
"""向量搜索引擎"""
def __init__(self, milvus_host: str = "milvus.milvus.svc.cluster.local", milvus_port: int = 19530):
self.milvus_host = milvus_host
self.milvus_port = milvus_port
self.collection_name = "mindspore_vectors"
self.dim = 128
self.init_milvus()
def init_milvus(self):
"""初始化Milvus连接"""
try:
connections.connect(
host=self.milvus_host,
port=str(self.milvus_port)
)
logger.info(f"连接到Milvus: {self.milvus_host}:{self.milvus_port}")
# 检查集合是否存在
from pymilvus import utility
if not utility.has_collection(self.collection_name):
self.create_collection()
else:
self.collection = Collection(self.collection_name)
self.collection.load()
logger.info("Milvus初始化完成")
except Exception as e:
logger.error(f"Milvus连接失败: {e}")
raise
def create_collection(self):
"""创建向量集合"""
fields = [
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
FieldSchema(name="vector", dtype=DataType.FLOAT_VECTOR, dim=self.dim),
FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=1000),
FieldSchema(name="metadata", dtype=DataType.VARCHAR, max_length=2000),
FieldSchema(name="timestamp", dtype=DataType.INT64)
]
schema = CollectionSchema(fields, description="MindSpore模型向量存储")
self.collection = Collection(self.collection_name, schema)
# 创建索引
index_params = {
"metric_type": "L2",
"index_type": "IVF_FLAT",
"params": {"nlist": 128}
}
self.collection.create_index("vector", index_params)
self.collection.load()
logger.info(f"集合 {self.collection_name} 创建成功")
def insert_vectors(self, vectors: List[List[float]], texts: List[str], metadata: List[Dict] = None):
"""插入向量数据"""
if metadata is None:
metadata = [{} for _ in range(len(vectors))]
entities = [
vectors,
texts,
[json.dumps(m) for m in metadata],
[int(time.time())] * len(vectors)
]
insert_result = self.collection.insert(entities)
self.collection.flush()
logger.info(f"插入 {len(vectors)} 个向量,ID: {insert_result.primary_ids}")
return insert_result.primary_ids
def search_similar(self, query_vector: List[float], top_k: int = 10):
"""相似向量搜索"""
search_params = {"metric_type": "L2", "params": {"nprobe": 10}}
results = self.collection.search(
[query_vector],
"vector",
search_params,
limit=top_k,
output_fields=["text", "metadata", "timestamp"]
)
return results[0] if results else []
class MindSporeModelService:
"""MindSpore模型服务"""
def __init__(self, model_path: str = None):
# 设置运行环境(GPU优先)
try:
context.set_context(device_target="GPU", mode=context.GRAPH_MODE)
logger.info("使用GPU运行环境")
except:
context.set_context(device_target="CPU", mode=context.GRAPH_MODE)
logger.info("使用CPU运行环境")
# 初始化模型
self.encoder = TextEncoder()
if model_path:
param_dict = load_checkpoint(model_path)
load_param_into_net(self.encoder, param_dict)
logger.info(f"加载模型权重: {model_path}")
self.encoder.set_train(False)
# 初始化向量搜索引擎
self.vector_engine = VectorSearchEngine()
# 预热模型
self._warm_up()
def _warm_up(self):
"""模型预热"""
dummy_input = Tensor(np.random.randn(1, 768).astype(np.float32))
_ = self.encoder(dummy_input)
logger.info("模型预热完成")
def encode_text(self, text: str) -> List[float]:
"""文本编码(实际项目中这里应该是真正的文本编码)"""
# 这里简化处理,实际应该使用tokenizer和模型
dummy_embedding = np.random.randn(768).astype(np.float32)
input_tensor = Tensor(dummy_embedding.reshape(1, -1))
with ms.context.Context(ms.context.GRAPH_MODE):
vector = self.encoder(input_tensor)
return vector.asnumpy().flatten().tolist()
def process_and_store(self, texts: List[str], metadata: List[Dict] = None):
"""处理文本并存储到Milvus"""
vectors = [self.encode_text(text) for text in texts]
ids = self.vector_engine.insert_vectors(vectors, texts, metadata)
return {"ids": ids, "count": len(texts)}
def semantic_search(self, query: str, top_k: int = 5):
"""语义搜索"""
query_vector = self.encode_text(query)
results = self.vector_engine.search_similar(query_vector, top_k)
formatted_results = []
for result in results:
formatted_results.append({
"text": result.entity.get('text'),
"metadata": json.loads(result.entity.get('metadata', '{}')),
"score": float(result.distance),
"timestamp": result.entity.get('timestamp')
})
return formatted_results
# 全局模型实例
model_service = None
def init_model_service():
"""初始化模型服务"""
global model_service
model_service = MindSporeModelService()
logger.info("MindSpore模型服务初始化完成")
# Flask路由
@app.route('/health', methods=['GET'])
def health_check():
"""健康检查"""
return jsonify({
"status": "healthy",
"service": "mindspore-model-service",
"timestamp": int(time.time())
})
@app.route('/encode', methods=['POST'])
def encode_text():
"""文本编码接口"""
try:
data = request.json
texts = data.get('texts', [])
if not texts:
return jsonify({"error": "texts不能为空"}), 400
results = []
for text in texts:
vector = model_service.encode_text(text)
results.append({
"text": text,
"vector": vector,
"dimension": len(vector)
})
return jsonify({
"results": results,
"count": len(results)
})
except Exception as e:
logger.error(f"编码失败: {e}")
return jsonify({"error": str(e)}), 500
@app.route('/store', methods=['POST'])
def store_vectors():
"""存储向量接口"""
try:
data = request.json
texts = data.get('texts', [])
metadata = data.get('metadata', [])
if len(metadata) > 0 and len(metadata) != len(texts):
return jsonify({"error": "metadata长度必须与texts一致"}), 400
result = model_service.process_and_store(texts, metadata)
return jsonify(result)
except Exception as e:
logger.error(f"存储失败: {e}")
return jsonify({"error": str(e)}), 500
@app.route('/search', methods=['POST'])
def semantic_search():
"""语义搜索接口"""
try:
data = request.json
query = data.get('query', '')
top_k = data.get('top_k', 5)
if not query:
return jsonify({"error": "query不能为空"}), 400
results = model_service.semantic_search(query, top_k)
return jsonify({
"query": query,
"results": results,
"count": len(results)
})
except Exception as e:
logger.error(f"搜索失败: {e}")
return jsonify({"error": str(e)}), 500
@app.route('/batch_process', methods=['POST'])
def batch_process():
"""批量处理接口"""
try:
data = request.json
texts = data.get('texts', [])
batch_size = data.get('batch_size', 32)
if not texts:
return jsonify({"error": "texts不能为空"}), 400
# 分批处理
all_results = []
for i in range(0, len(texts), batch_size):
batch = texts[i:i+batch_size]
vectors = [model_service.encode_text(text) for text in batch]
all_results.extend(vectors)
logger.info(f"处理批次 {i//batch_size + 1}/{(len(texts)-1)//batch_size + 1}")
return jsonify({
"total_count": len(all_results),
"dimension": len(all_results[0]) if all_results else 0,
"batch_count": (len(texts) - 1) // batch_size + 1
})
except Exception as e:
logger.error(f"批量处理失败: {e}")
return jsonify({"error": str(e)}), 500
if __name__ == '__main__':
# 后台初始化模型
init_thread = threading.Thread(target=init_model_service)
init_thread.start()
# 启动Flask服务
app.run(host='0.0.0.0', port=5000, threaded=True)
5.3 创建Dockerfile
# Dockerfile.mindspore
FROM mindspore/mindspore-gpu:2.2.0
# 设置工作目录
WORKDIR /app
# 安装Python依赖
RUN pip install --no-cache-dir \
flask==2.3.3 \
pymilvus==2.3.0 \
numpy==1.24.3 \
protobuf==4.23.4 \
gunicorn==21.2.0
# 复制应用代码
COPY model_server.py /app/
COPY requirements.txt /app/
# 安装额外依赖
RUN if [ -f "requirements.txt" ]; then pip install -r requirements.txt; fi
# 创建非root用户
RUN useradd -m -u 1000 appuser && chown -R appuser:appuser /app
USER appuser
# 暴露端口
EXPOSE 5000
# 健康检查
HEALTHCHECK --interval=30s --timeout=3s --start-period=5s --retries=3 \
CMD curl -f http://localhost:5000/health || exit 1
# 启动命令
CMD ["gunicorn", "--bind", "0.0.0.0:5000", "--workers", "4", "--threads", "2", "model_server:app"]
5.4 创建requirements.txt
flask==2.3.3
pymilvus==2.3.0
numpy==1.24.3
protobuf==4.23.4
gunicorn==21.2.0
5.5 构建和推送镜像
# 构建镜像
docker build -t your-registry/mindspore-model:2.2.0 -f Dockerfile.mindspore .
# 推送镜像(如有私有仓库)
# docker push your-registry/mindspore-model:2.2.0
# 测试镜像
docker run -p 5000:5000 --name test-model your-registry/mindspore-model:2.2.0
6. 第五步:Kubernetes部署配置
6.1 创建命名空间
kubectl create namespace ai-platform
6.2 创建ConfigMap(配置文件)
# configmap.yaml
apiVersion: v1
kind: ConfigMap
metadata:
name: mindspore-config
namespace: ai-platform
data:
model_config.json: |
{
"model_name": "text-encoder",
"version": "2.2.0",
"embedding_dim": 128,
"max_batch_size": 32,
"milvus_host": "milvus.milvus.svc.cluster.local",
"milvus_port": 19530,
"log_level": "INFO"
}
nginx.conf: |
user nginx;
worker_processes auto;
events {
worker_connections 1024;
}
http {
upstream model_backend {
server mindspore-model:5000;
}
server {
listen 80;
location / {
proxy_pass http://model_backend;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
}
location /health {
proxy_pass http://model_backend/health;
access_log off;
}
}
}
6.3 创建MindSpore模型服务Deployment
# mindspore-deployment.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
name: mindspore-model
namespace: ai-platform
labels:
app: mindspore-model
component: ai-model
spec:
replicas: 2
selector:
matchLabels:
app: mindspore-model
strategy:
type: RollingUpdate
rollingUpdate:
maxSurge: 1
maxUnavailable: 0
template:
metadata:
labels:
app: mindspore-model
component: ai-model
spec:
# 节点选择器(如果使用GPU)
# nodeSelector:
# accelerator: nvidia-gpu
containers:
- name: model-server
image: your-registry/mindspore-model:2.2.0
imagePullPolicy: IfNotPresent
ports:
- containerPort: 5000
name: http
protocol: TCP
env:
- name: MILVUS_HOST
value: "milvus.milvus.svc.cluster.local"
- name: MILVUS_PORT
value: "19530"
- name: MODEL_DEVICE
value: "CPU" # 或 "GPU"
- name: LOG_LEVEL
value: "INFO"
resources:
requests:
memory: "4Gi"
cpu: "2"
# nvidia.com/gpu: 1 # 如果使用GPU
limits:
memory: "8Gi"
cpu: "4"
# nvidia.com/gpu: 1 # 如果使用GPU
volumeMounts:
- name: config-volume
mountPath: /app/config
readOnly: true
livenessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 60
periodSeconds: 30
timeoutSeconds: 5
readinessProbe:
httpGet:
path: /health
port: 5000
initialDelaySeconds: 30
periodSeconds: 10
timeoutSeconds: 3
startupProbe:
httpGet:
path: /health
port: 5000
failureThreshold: 30
periodSeconds: 10
- name: nginx-sidecar
image: nginx:1.25-alpine
ports:
- containerPort: 80
name: nginx-http
volumeMounts:
- name: nginx-config
mountPath: /etc/nginx/nginx.conf
subPath: nginx.conf
readOnly: true
volumes:
- name: config-volume
configMap:
name: mindspore-config
- name: nginx-config
configMap:
name: mindspore-config
items:
- key: nginx.conf
path: nginx.conf
---
apiVersion: v1
kind: Service
metadata:
name: mindspore-model
namespace: ai-platform
labels:
app: mindspore-model
spec:
selector:
app: mindspore-model
ports:
- name: http
port: 80
targetPort: 80
protocol: TCP
- name: model-http
port: 5000
targetPort: 5000
protocol: TCP
type: ClusterIP
---
apiVersion: v1
kind: Service
metadata:
name: mindspore-model-external
namespace: ai-platform
spec:
selector:
app: mindspore-model
ports:
- port: 80
targetPort: 80
nodePort: 30080
type: NodePort
6.4 创建Ingress(如果需要外部访问)
# ingress.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
name: mindspore-ingress
namespace: ai-platform
annotations:
nginx.ingress.kubernetes.io/rewrite-target: /
nginx.ingress.kubernetes.io/proxy-body-size: "100m"
cert-manager.io/cluster-issuer: "letsencrypt-prod" # 如果使用cert-manager
spec:
ingressClassName: nginx
tls:
- hosts:
- ai-model.your-domain.com
secretName: mindspore-tls
rules:
- host: ai-model.your-domain.com
http:
paths:
- path: /
pathType: Prefix
backend:
service:
name: mindspore-model
port:
number: 80
7. 第六步:部署和测试
7.1 应用所有配置
# 创建命名空间
kubectl create namespace ai-platform
# 应用配置
kubectl apply -f configmap.yaml
kubectl apply -f mindspore-deployment.yaml
kubectl apply -f ingress.yaml # 可选
# 查看部署状态
kubectl get all -n ai-platform
kubectl get pods -n ai-platform -w
7.2 验证服务
# 端口转发测试
kubectl port-forward svc/mindspore-model -n ai-platform 8080:80
# 在另一个终端测试
curl http://localhost:8080/health
curl -X POST http://localhost:8080/encode \
-H "Content-Type: application/json" \
-d '{"texts": ["你好,世界", "Hello, World"]}'
curl -X POST http://localhost:8080/search \
-H "Content-Type: application/json" \
-d '{"query": "人工智能", "top_k": 3}'
7.3 性能测试脚本
# test_performance.py
import requests
import time
import concurrent.futures
import json
BASE_URL = "http://localhost:8080"
def test_encode_performance():
"""测试编码性能"""
texts = [f"测试文本{i}: 人工智能和大模型正在改变世界" for i in range(100)]
start = time.time()
response = requests.post(
f"{BASE_URL}/encode",
json={"texts": texts, "batch_size": 10}
)
end = time.time()
print(f"编码100个文本耗时: {end-start:.2f}秒")
print(f"平均每个文本: {(end-start)/100*1000:.1f}毫秒")
return response.json()
def test_search_performance():
"""测试搜索性能"""
queries = [
"机器学习",
"深度学习",
"自然语言处理",
"计算机视觉",
"强化学习"
]
times = []
for query in queries:
start = time.time()
response = requests.post(
f"{BASE_URL}/search",
json={"query": query, "top_k": 5}
)
end = time.time()
times.append(end-start)
print(f"查询 '{query}' 耗时: {end-start:.3f}秒")
print(f"结果数量: {len(response.json().get('results', []))}")
print(f"\n平均搜索耗时: {sum(times)/len(times):.3f}秒")
def concurrent_test(num_requests=50):
"""并发测试"""
def make_request(i):
response = requests.post(
f"{BASE_URL}/encode",
json={"texts": [f"并发测试请求{i}"], "batch_size": 1}
)
return response.status_code
start = time.time()
with concurrent.futures.ThreadPoolExecutor(max_workers=10) as executor:
futures = [executor.submit(make_request, i) for i in range(num_requests)]
results = [f.result() for f in concurrent.futures.as_completed(futures)]
end = time.time()
print(f"\n并发{num_requests}个请求耗时: {end-start:.2f}秒")
print(f"QPS: {num_requests/(end-start):.1f}")
print(f"成功请求数: {results.count(200)}/{len(results)}")
if __name__ == "__main__":
print("=== MindSpore模型服务性能测试 ===")
# 1. 健康检查
print("\n1. 健康检查:")
health = requests.get(f"{BASE_URL}/health")
print(f"状态码: {health.status_code}")
print(f"响应: {health.json()}")
# 2. 编码性能测试
print("\n2. 编码性能测试:")
test_encode_performance()
# 3. 搜索性能测试
print("\n3. 搜索性能测试:")
test_search_performance()
# 4. 并发测试
print("\n4. 并发性能测试:")
concurrent_test(100)
8. 第七步:监控和运维
8.1 部署Prometheus监控
# 添加Prometheus Helm仓库
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm repo update
# 安装Prometheus Stack(包括Grafana)
helm install prometheus prometheus-community/kube-prometheus-stack \
--namespace monitoring \
--create-namespace \
--set grafana.adminPassword=admin123
8.2 创建自定义监控指标
# metrics_exporter.py
from prometheus_client import start_http_server, Gauge, Counter, Histogram
import time
import requests
import threading
from model_server import model_service
# 定义指标
MODEL_INFERENCE_TIME = Histogram(
'model_inference_duration_seconds',
'模型推理时间',
['model_name', 'operation']
)
REQUESTS_TOTAL = Counter(
'model_requests_total',
'总请求数',
['endpoint', 'method', 'status']
)
EMBEDDING_VECTOR_DIM = Gauge(
'embedding_vector_dimension',
'嵌入向量维度'
)
MILVUS_CONNECTION_STATUS = Gauge(
'milvus_connection_status',
'Milvus连接状态 (1=正常, 0=异常)'
)
class MetricsExporter:
def __init__(self, port=9100):
self.port = port
self.running = True
def start(self):
# 启动Prometheus HTTP服务器
start_http_server(self.port)
print(f"指标导出器启动在端口 {self.port}")
# 启动后台更新线程
thread = threading.Thread(target=self.update_metrics)
thread.daemon = True
thread.start()
def update_metrics(self):
"""定期更新指标"""
while self.running:
try:
# 更新Milvus连接状态
if model_service and model_service.vector_engine:
MILVUS_CONNECTION_STATUS.set(1)
else:
MILVUS_CONNECTION_STATUS.set(0)
# 更新向量维度
EMBEDDING_VECTOR_DIM.set(128)
except Exception as e:
print(f"更新指标失败: {e}")
time.sleep(10)
def stop(self):
self.running = False
8.3 创建Grafana仪表板
# grafana-dashboard.json
{
"dashboard": {
"title": "MindSpore模型服务监控",
"panels": [
{
"title": "请求QPS",
"type": "graph",
"targets": [{
"expr": "rate(model_requests_total[5m])",
"legendFormat": "{{endpoint}}"
}]
},
{
"title": "推理延迟",
"type": "heatmap",
"targets": [{
"expr": "model_inference_duration_seconds_bucket",
"legendFormat": "{{operation}}"
}]
},
{
"title": "内存使用",
"type": "graph",
"targets": [{
"expr": "container_memory_usage_bytes{pod=~'mindspore-model.*'}",
"legendFormat": "{{pod}}"
}]
}
]
}
}
9. 第八步:扩展和优化
9.1 水平自动伸缩(HPA)
# hpa.yaml
apiVersion: autoscaling/v2
kind: HorizontalPodAutoscaler
metadata:
name: mindspore-model-hpa
namespace: ai-platform
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: mindspore-model
minReplicas: 2
maxReplicas: 10
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 70
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 80
behavior:
scaleDown:
stabilizationWindowSeconds: 300
policies:
- type: Percent
value: 10
periodSeconds: 60
9.2 使用GPU资源
# 在Deployment中添加GPU资源请求
resources:
limits:
nvidia.com/gpu: 1
requests:
nvidia.com/gpu: 1
# 添加节点选择器
nodeSelector:
accelerator: nvidia-gpu
# 添加节点标签
kubectl label nodes <node-name> accelerator=nvidia-gpu
10. 故障排除指南
10.1 常见问题解决
问题1: Pod启动失败
# 查看Pod详情
kubectl describe pod <pod-name> -n ai-platform
# 查看Pod日志
kubectl logs <pod-name> -n ai-platform -c model-server
kubectl logs <pod-name> -n ai-platform -c nginx-sidecar
# 查看事件
kubectl get events -n ai-platform --sort-by='.lastTimestamp'
问题2: Milvus连接失败
# 检查Milvus服务状态
kubectl get pods -n milvus
kubectl logs -n milvus <milvus-pod-name>
# 测试网络连通性
kubectl exec -it <mindspore-pod> -n ai-platform -- curl milvus.milvus.svc.cluster.local:19530
问题3: GPU不可用
# 检查GPU插件状态
kubectl get pods -n gpu-operator
# 检查节点GPU资源
kubectl describe node | grep -A 5 -B 5 nvidia
# 查看GPU驱动日志
kubectl logs -n gpu-operator -l app=nvidia-driver-daemonset
10.2 性能调优建议
- 模型优化:
- 使用MindSpore Graph模式
- 开启算子融合
- 使用混合精度训练
- Milvus优化:
- 合理设置索引参数
- 使用SSD存储
- 配置合适的缓存大小
- Kubernetes优化:
- 使用本地存储加速
- 合理设置资源限制
- 启用拓扑感知调度