论如何在KubeFlow上跑通第一个Pipeline(一)

923 阅读2分钟

我的打算:在KubeFlow上跑通一个基于Pytorch框架,解决Mnist(手写数字识别)任务的Pipeline

计划如下:

  • 构建由单个component构成的Pipeline
  • 划分业务流程,封装为多个component;并添加PV卷存储数据;
  • 进一步优化代码,使component内部细节"灰盒化",仅对外暴露接口及所需参数;
  • 结合Elyra, 实现pipeline的低代码/无代码创建;

数据集格式

数据来源:mnist.npz

import numpy as np

f = np.load('./data/mnist.npz')
x_train, y_train = f['x_train'], f['y_train']
x_test, y_test = f['x_test'], f['y_test']

print(x_train.shape,y_train.shape)
print(x_test.shape,y_test.shape)

(60000, 28, 28) (60000,)

(10000, 28, 28) (10000,)

import matplotlib.pyplot as plt
fig = plt.figure()
plt.imshow(x_train[0], cmap='gray')
plt.show()

download.png

第一步:构建由单个组件构成的Pipeline

import numpy as np
import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.optim as optim
from torch.utils.data import Dataset, DataLoader

torch.set_default_tensor_type(torch.DoubleTensor)

learning_rate = 1e-2
n_epochs = 1
path = 'mnist.npz'

# 定义网络结构
class Net(nn.Module):
    def __init__(self):
        super(Net, self).__init__()
        ......
        
    def forward(self, x):
        ......
        
# 定义Dataset类
class Mnistset(Dataset):
    def __init__(self,x,y):
        ......
    
    def __getitem__(self, index):
        ......
    
    def __len__(self):
        ......


# 载入数据
def load_data(path):
    ......
     
# 模型训练
def train_model(path):

#     制作Dataset,DataLoader
    (x_train, y_train), (x_test, y_test) = load_data(path)
    x_train, x_test = x_train / 255.0, x_test / 255.0
    train_set = Mnistset(x_train, y_train)
    train_loader = DataLoader(train_set)
    test_set = Mnistset(x_test, y_test)
    test_loader = DataLoader(test_set)
    
#     定义模型及优化器
    model = Net()
    optimizer = optim.Adam(model.parameters(),lr=learning_rate)
    
#     训练函数
    def _train():
        ......
    
#     迭代训练
    for epoch in range(1,n_epochs + 1):
        _train()
        
if __name__ == '__main__':
    train_model(path)

构建仅由一个component组成的pipeline

制作镜像

目录结构

将mnist.py所需文件放在同一目录下:

2.jpg

注意:

  • mnist.npz 为所需数据集
  • torch-1.12.0...为pip install 时所需WHL包;由于pip install torch过慢,因此手动下载所需WHL包,本地安装;

Dockerfile

FROM python:3.9
COPY mnist.py .
COPY requirements.txt .
COPY mnist.npz .
COPY torch-1.12.0+cpu-cp39-cp39-linux_x86_64.whl .
RUN pip install -i https://pypi.tuna.tsinghua.edu.cn/simple -r requirements.txt
RUN pip install torch-1.12.0+cpu-cp39-cp39-linux_x86_64.whl

注意:

构建镜像时其实有两种选择:

  1. FROM python:tag
  2. FROM pytorch/pytorch:tag

若采用第二种,即直接基于pytorch环境开发,则不需要额外安装torch包(pip install torch);但本人拉取镜像时发现pytorch镜像有5G多,感觉得拉很久;

所以这里我采用了第一种方法,即基于python环境,手动安装pytorch。

制作镜像并上传

docker pull python:3.9
chmod a+x mnist.py
docker build -t mnist:v1.2 .

docker login -u lifu963  //这一步要求已经注册docker hub账号
docker tag mnist:v1.2 lifu963/mnist:v1.2
docker push lifu963/mnist:v1.2

2.png

构建pipeline

登录kubeflow平台,创建notebook工作环境; 执行:

import kfp

# 构建component
@kfp.dsl.component
def train_mnist_op():
    return kfp.dsl.ContainerOp(
        name='mnist', 
        image='lifu963/mnist:v1.2',
        command=['python','mnist.py']) # image为我们之前所上传的镜像

# 构建pipeline
@kfp.dsl.pipeline(name='mnist train',description='mnist first version')
def my_pipeline():
    op = train_mnist_op()

if __name__ == '__main__':
    pipeline_func = my_pipeline
    pipeline_filename = pipeline_func.__name__ + '.yaml'
    # 编译pipeline yaml文件
    kfp.compiler.Compiler().compile(pipeline_func,pipeline_filename)
    

编译yaml文件后,输出以下警告:

FutureWarning: Please create reusable components instead of constructing ContainerOp instances directly. 
Reusable components are shareable, portable and have compatibility and support guarantees. 
Please see the documentation: https://www.kubeflow.org/docs/pipelines/sdk/component-development/
#writing-your-component-definition-file 
The components can be created manually (or, in case of python, using kfp.components.create_component_from_func or func_to_container_op) 
and then loaded using kfp.components.load_component_from_file, load_component_from_uri or load_component_from_text: 
https://kubeflow-pipelines.readthedocs.io/en/stable/source/kfp.components.html
#kfp.components.load_component_from_file
  warnings.warn(

大意是相较于直接构建kfp.dsl.ContainerOp对象,kfp目前更推荐以下两种方式:

  1. kfp.components.create_component_from_func() 、 func_to_container_op()

  2. kfp.components.load_component_from_file()、 load_component_from_uri()、load_component_from_text()

因此,打算在第三步优化代码时,改成这两种方式构建component。

上传至pipeline,创建Experiments并Run

1.jpg

2.jpg