我的打算:在KubeFlow上跑通一个基于Pytorch框架,解决Mnist(手写数字识别)任务的Pipeline
计划如下:
- 构建由单个component构成的Pipeline
- 划分业务流程,封装为多个component;并添加PV卷存储数据;
- 进一步优化代码,使component内部细节"灰盒化",仅对外暴露接口及所需参数;
- 结合Elyra, 实现pipeline的低代码/无代码创建;
数据集格式
数据来源:mnist.npz
import numpy as np
f = np.load('./data/mnist.npz')
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']
print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)
(60000, 28, 28) (60000,)
(10000, 28, 28) (10000,)
import matplotlib.pyplot as plt
fig = plt.figure()
plt.imshow(x_train[0], cmap='gray')
plt.show()
第一步:构建由单个组件构成的Pipeline
import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader
torch.set_default_tensor_type(torch.DoubleTensor)
learning_rate = 1e-2
n_epochs = 1
path = 'mnist.npz'
# 定义网络结构
class Net(nn.Module):
def __init__(self):
super(Net, self).__init__()
......
def forward(self, x):
......
# 定义Dataset类
class Mnistset(Dataset):
def __init__(self,x,y):
......
def __getitem__(self, index):
......
def __len__(self):
......
# 载入数据
def load_data(path):
......
# 模型训练
def train_model(path):
# 制作Dataset,DataLoader
(x_train, y_train), (x_test, y_test) = load_data(path)
x_train, x_test = x_train / 255.0, x_test / 255.0
train_set = Mnistset(x_train, y_train)
train_loader = DataLoader(train_set)
test_set = Mnistset(x_test, y_test)
test_loader = DataLoader(test_set)
# 定义模型及优化器
model = Net()
optimizer = optim.Adam(model.parameters(),lr=learning_rate)
# 训练函数
def _train():
......
# 迭代训练
for epoch in range(1,n_epochs + 1):
_train()
if __name__ == '__main__':
train_model(path)
构建仅由一个component组成的pipeline
制作镜像
目录结构
将mnist.py所需文件放在同一目录下:
注意:
- mnist.npz 为所需数据集
- torch-1.12.0...为pip install 时所需WHL包;由于pip install torch过慢,因此手动下载所需WHL包,本地安装;
Dockerfile
FROM python:3.9
COPY mnist.py .
COPY requirements.txt .
COPY mnist.npz .
COPY torch-1.12.0+cpu-cp39-cp39-linux_x86_64.whl .
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
RUN pip install torch-1.12.0+cpu-cp39-cp39-linux_x86_64.whl
注意:
构建镜像时其实有两种选择:
- FROM python:tag
- FROM pytorch/pytorch:tag
若采用第二种,即直接基于pytorch环境开发,则不需要额外安装torch包(pip install torch);但本人拉取镜像时发现pytorch镜像有5G多,感觉得拉很久;
所以这里我采用了第一种方法,即基于python环境,手动安装pytorch。
制作镜像并上传
docker pull python:3.9
chmod a+x mnist.py
docker build -t mnist:v1.2 .
docker login -u lifu963 //这一步要求已经注册docker hub账号
docker tag mnist:v1.2 lifu963/mnist:v1.2
docker push lifu963/mnist:v1.2
构建pipeline
登录kubeflow平台,创建notebook工作环境; 执行:
import kfp
# 构建component
@kfp.dsl.component
def train_mnist_op():
return kfp.dsl.ContainerOp(
name='mnist',
image='lifu963/mnist:v1.2',
command=['python','mnist.py']) # image为我们之前所上传的镜像
# 构建pipeline
@kfp.dsl.pipeline(name='mnist train',description='mnist first version')
def my_pipeline():
op = train_mnist_op()
if __name__ == '__main__':
pipeline_func = my_pipeline
pipeline_filename = pipeline_func.__name__ + '.yaml'
# 编译pipeline yaml文件
kfp.compiler.Compiler().compile(pipeline_func,pipeline_filename)
附
编译yaml文件后,输出以下警告:
FutureWarning: Please create reusable components instead of constructing ContainerOp instances directly.
Reusable components are shareable, portable and have compatibility and support guarantees.
Please see the documentation: https://www.kubeflow.org/docs/pipelines/sdk/component-development/
#writing-your-component-definition-file
The components can be created manually (or, in case of python, using kfp.components.create_component_from_func or func_to_container_op)
and then loaded using kfp.components.load_component_from_file, load_component_from_uri or load_component_from_text:
https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.components.html
#kfp.components.load_component_from_file
warnings.warn(
大意是相较于直接构建kfp.dsl.ContainerOp对象,kfp目前更推荐以下两种方式:
-
kfp.components.create_component_from_func() 、 func_to_container_op()
-
kfp.components.load_component_from_file()、 load_component_from_uri()、load_component_from_text()
因此,打算在第三步优化代码时,改成这两种方式构建component。