欢迎关注我的公众号 [极智视界],获取我的更多经验分享
大家好,我是极智视界,本文来介绍一下 地平线天工开物工具链部署流程详解。
邀您加入我的知识星球「极智视界」,星球内有超多好玩的项目实战源码下载,链接:t.zsxq.com/0aiNxERDq
地平线天工开物工具链从完备性、清晰性上来说,应该是我见过的比较好的了。
这里好好介绍一下它的部署流程。
下面是一张很好的地平线部署流程图,特别清晰:
总体来看其实有几个主要的环节:
- 浮点模型准备;
- 模型检查;
- 模型转换;
- 性能验证;
- 精度验证;
除了模型检查外,其他几个环节其实都是十分常见的部署阶段。而地平线单独把模型检查拎出来,这也是它做的好的地方,因为很多时候其实模型本身就存在问题,但是我们不知,继续往下做,导致推理异常或精度异常或性能异常。这里把大量后续没必要的异常定位,在模型检查阶段就掐死,确实妙。
下面以地平线的例程 mobilenetv1 的部署验证展开细说。
1> 浮点模型准备
地平线的前端模型支持两种格式:caffe1.0 和 opset=10/11的 onnx 模型,这其实真的就够用了,caffe 代表老牌,onnx 代表通用。特别指出 opset=10/11,说明地平线工具链的开发人员是内行 (顺便提一句,之前和一家芯片厂商对接,我问是不是要求 opset 最好为 11,回答我说 主要看算子,opset 其实无所谓的,这让我觉得比较外行)。
对于部署来说,模型转换都是一个大难题,把它规约到 caffe1.0 和 opset=10/11 的 onnx,这样可以很大程度上简化后续的部署难度。其实至于怎么把自己的模型转换到 caffe1.0、opset=10/11 的 onnx 模型,最好也提供现成的解决方案,毕竟这一块其实很容易出问题。但现实是很多厂商其实会绕过这个话题,因为确实过于麻烦。如果训练框架为 pytorch,那还好说,如果是 其他稀奇古怪的框架,那就比较麻烦。
这里例程的 modbilenetv1 是直接提供了模型准备,进入到 ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classfication/01_mobilenet/mapper,然后执行:
bash 00_init.sh
这样会直接下载 mobilenet 所需的原始模型文件 caffemodel。
- mobilenet_deploy.prototxt
- mobilenet.caffemodel
2> 模型验证
进行模型验证:
bash 01_check.sh
看下 01_check.sh 脚本的内容:
#!/usr/bin/env sh
set -ex
cd $(dirname $0) || exit
model_type="caffe"
proto="../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt"
caffe_model="../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel"
march="bernoulli2"
hb_mapper checker --model-type ${model_type} \
--proto ${proto} --model ${caffe_model} \
--march ${march}
主要的命令为 hb_mapper checker,可选的参数包括以下几个:
- --model-type:模型的类型,前面说过 caffe 或 onnx;
- --march:处理器架构,默认 bernoulli2;
- --proto:仅在 model-type 为 caffe 时生效;
- --model:caffe 的 caffemodel 或 onnx 模型文件;
执行 01_check.sh 的日志如下:
# bash 01_check.sh
++ dirname ./01_check.sh
+ cd .
+ model_type=caffe
+ proto=../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt
+ caffe_model=../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel
+ march=bernoulli2
+ hb_mapper checker --model-type caffe --proto ../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt --model ../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel --march bernoulli2
2023-08-30 05:55:12,562 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/hb_mapper_checker.log
2023-08-30 05:55:12,568 INFO Start hb_mapper....
2023-08-30 05:55:12,571 INFO hbdk version 3.45.3
2023-08-30 05:55:12,574 INFO horizon_nn version 0.18.2
2023-08-30 05:55:12,578 INFO hb_mapper version 1.17.4
2023-08-30 05:55:12,717 INFO Model type: caffe
2023-08-30 05:55:12,719 INFO input names []
2023-08-30 05:55:12,719 INFO input shapes {}
2023-08-30 05:55:12,720 INFO Begin model checking....
2023-08-30 05:55:12,737 INFO [Wed Aug 30 05:55:12 2023] Start to Horizon NN Model Convert.
2023-08-30 05:55:12,738 INFO Loading horizon_nn debug methods:[]
2023-08-30 05:55:12,740 INFO The input parameter is not specified, convert with default parameters.
2023-08-30 05:55:12,742 INFO Parsing the hbdk parameter:{'hbdk_pass_through_params': '--O0'}
2023-08-30 05:55:12,743 INFO HorizonNN version: 0.18.2
2023-08-30 05:55:12,744 INFO HBDK version: 3.45.3
2023-08-30 05:55:13,358 INFO Find 1 inputs in the model:
2023-08-30 05:55:13,359 INFO Got input 'data' with shape [1, 3, 224, 224].
2023-08-30 05:55:14,415 INFO [Wed Aug 30 05:55:14 2023] Start to parse the onnx model.
2023-08-30 05:55:14,416 INFO Input ONNX model infomation:
ONNX IR version: 7
Opset version: [10, 1, 1]
Producer: none
Domain: none
Input name: data, [1, 3, 224, 224]
Output name: prob, [1, 1000, 1, 1]
2023-08-30 05:55:14,468 INFO [Wed Aug 30 05:55:14 2023] End to parse the onnx model.
2023-08-30 05:55:14,469 INFO Model input names parsed from model: ['data']
2023-08-30 05:55:14,542 INFO Saving the original float model: ./.hb_check/original_float_model.onnx.
2023-08-30 05:55:14,543 INFO [Wed Aug 30 05:55:14 2023] Start to optimize the model.
2023-08-30 05:55:15,277 INFO [Wed Aug 30 05:55:15 2023] End to optimize the model.
2023-08-30 05:55:15,344 INFO Saving the optimized model: ./.hb_check/optimized_float_model.onnx.
2023-08-30 05:55:15,345 INFO [Wed Aug 30 05:55:15 2023] Start to calibrate the model.
2023-08-30 05:55:15,348 INFO There are 1 samples in the calibration data set.
2023-08-30 05:55:15,634 INFO Run calibration model with max method.
2023-08-30 05:55:15,720 INFO Calibration using batch 8
max calibration in progress: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15.03it/s]
2023-08-30 05:55:16,234 INFO Saving the calibrated model: ./.hb_check/calibrated_model.onnx.
2023-08-30 05:55:16,235 INFO [Wed Aug 30 05:55:16 2023] End to calibrate the model.
2023-08-30 05:55:16,237 INFO [Wed Aug 30 05:55:16 2023] Start to quantize the model.
2023-08-30 05:55:16,928 INFO [Wed Aug 30 05:55:16 2023] End to quantize the model.
2023-08-30 05:55:17,261 INFO Saving the quantized model: ./.hb_check/quantized_model.onnx.
2023-08-30 05:55:18,107 INFO [Wed Aug 30 05:55:18 2023] Start to compile the model with march bernoulli2.
2023-08-30 05:55:18,108 INFO [Wed Aug 30 05:55:18 2023] End to compile the model with march bernoulli2.
2023-08-30 05:55:18,413 INFO Compile submodel: MOBILENET_subgraph_0
2023-08-30 05:55:18,836 INFO hbdk-cc parameters:['--O0', '--input-layout', 'NHWC', '--output-layout', 'NCHW']
[==================================================] 100%
2023-08-30 05:55:19,182 INFO consumed time 0.213798
2023-08-30 05:55:19,400 INFO FPS=230.15, latency = 4344.9 us (see ./.hb_check/MOBILENET_subgraph_0.html)
2023-08-30 05:55:19,588 INFO The converted model node information:
==============================================
Node ON Subgraph Type
----------------------------------------------
conv1 BPU id(0) HzSQuantizedConv
conv2_1/dw BPU id(0) HzSQuantizedConv
conv2_1/sep BPU id(0) HzSQuantizedConv
conv2_2/dw BPU id(0) HzSQuantizedConv
conv2_2/sep BPU id(0) HzSQuantizedConv
conv3_1/dw BPU id(0) HzSQuantizedConv
conv3_1/sep BPU id(0) HzSQuantizedConv
conv3_2/dw BPU id(0) HzSQuantizedConv
conv3_2/sep BPU id(0) HzSQuantizedConv
conv4_1/dw BPU id(0) HzSQuantizedConv
conv4_1/sep BPU id(0) HzSQuantizedConv
conv4_2/dw BPU id(0) HzSQuantizedConv
conv4_2/sep BPU id(0) HzSQuantizedConv
conv5_1/dw BPU id(0) HzSQuantizedConv
conv5_1/sep BPU id(0) HzSQuantizedConv
conv5_2/dw BPU id(0) HzSQuantizedConv
conv5_2/sep BPU id(0) HzSQuantizedConv
conv5_3/dw BPU id(0) HzSQuantizedConv
conv5_3/sep BPU id(0) HzSQuantizedConv
conv5_4/dw BPU id(0) HzSQuantizedConv
conv5_4/sep BPU id(0) HzSQuantizedConv
conv5_5/dw BPU id(0) HzSQuantizedConv
conv5_5/sep BPU id(0) HzSQuantizedConv
conv5_6/dw BPU id(0) HzSQuantizedConv
conv5_6/sep BPU id(0) HzSQuantizedConv
conv6/dw BPU id(0) HzSQuantizedConv
conv6/sep BPU id(0) HzSQuantizedConv
pool6 BPU id(0) HzSQuantizedConv
fc7 BPU id(0) HzSQuantizedConv
prob CPU -- Softmax
2023-08-30 05:55:19,589 INFO [Wed Aug 30 05:55:19 2023] End to Horizon NN Model Convert.
2023-08-30 05:55:19,595 INFO ONNX model output num : 1
2023-08-30 05:55:19,618 INFO End model checking....
如上,会打印一些版本信息、模型结构信息、原始模型的推理性能等,这样,校验通过,可以进入下个环节。
3> 模型转换
模型转换中,后量化是绕不过去的步骤,当然地平线的工具链中也提供了后量化的模块。首先执行 02_preprocess.sh 脚本进行量化数据集的准备:
bash 02_preprocess.sh
02_preprocess.sh 脚本的内容如下:
#!/usr/bin/env bash
set -e -v
cd $(dirname $0) || exit
python3 ../../../data_preprocess.py \
--src_dir ../../../01_common/calibration_data/imagenet \
--dst_dir ./calibration_data_bgr_f32 \
--pic_ext .bgr \
--read_mode skimage \
--saved_data_type float32
其中 --saved_data_type 可配置为 float32 或 uint8,如配置 uint8 就是为后续的 int8 量化准备校准数据集;--pic_ext 为处理后的数据后缀格式;--read_mode 可选 skimage 或 opencv,skimage 为 RGB、数据范围 01,opencv 为 BGR、数据范围 0255。
完了之后,为了要做模型转换,还需要准备一个配置文件,称为模型转换 Yaml 配置文件,Yaml 配置文件中有几个重要的参数配置:
- model_parameters (必选);
- input_parameters (必选);
- calibration_parameters (必选);
- compiler_parameters (必选);
- custom_op (可选);
贴上 mobilenet_config.yaml 的配置:
# 模型转化相关的参数
# ------------------------------------
# model conversion related parameters
model_parameters:
# Caffe浮点网络数据模型文件
# -----------------------------------------------------------
# the model file of floating-point Caffe neural network data
caffe_model: '../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel'
# Caffe网络描述文件
# ---------------------------------------------------------
# the file describes the structure of Caffe neural network
prototxt: '../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt'
# 适用BPU架构
# --------------------------------
# the applicable BPU architecture
march: "bernoulli2"
# 指定模型转换过程中是否输出各层的中间结果,如果为True,则输出所有层的中间输出结果,
# --------------------------------------------------------------------------------------
# specifies whether or not to dump the intermediate results of all layers in conversion
# if set to True, then the intermediate results of all layers shall be dumped
layer_out_dump: False
# 模型转换输出的结果的存放目录
# -----------------------------------------------------------
# the directory in which model conversion results are stored
working_dir: 'model_output'
# 模型转换输出的用于上板执行的模型文件的名称前缀
# -----------------------------------------------------------------------------------------
# model conversion generated name prefix of those model files used for dev board execution
output_model_file_prefix: 'mobilenetv1_224x224_nv12'
# 模型输入相关参数, 若输入多个节点, 则应使用';'进行分隔, 使用默认缺省设置则写None
# --------------------------------------------------------------------------
# model input related parameters,
# please use ";" to seperate when inputting multiple nodes,
# please use None for default setting
input_parameters:
# (选填) 模型输入的节点名称, 此名称应与模型文件中的名称一致, 否则会报错, 不填则会使用模型文件中的节点名称
# --------------------------------------------------------------------------------------------------------
# (Optional) node name of model input,
# it shall be the same as the name of model file, otherwise an error will be reported,
# the node name of model file will be used when left blank
input_name: ""
# 网络实际执行时,输入给网络的数据格式,包括 nv12/rgb/bgr/yuv444/gray/featuremap,
# ------------------------------------------------------------------------------------------
# the data formats to be passed into neural network when actually performing neural network
# available options: nv12/rgb/bgr/yuv444/gray/featuremap,
input_type_rt: 'nv12'
# 网络实际执行时输入的数据排布, 可选值为 NHWC/NCHW
# 若input_type_rt配置为nv12,则此处参数不需要配置
# ------------------------------------------------------------------
# the data layout formats to be passed into neural network when actually performing neural network, available options: NHWC/NCHW
# If input_type_rt is configured as nv12, then this parameter does not need to be configured
#input_layout_rt: ''
# 网络训练时输入的数据格式,可选的值为rgb/bgr/gray/featuremap/yuv444
# --------------------------------------------------------------------
# the data formats in network training
# available options: rgb/bgr/gray/featuremap/yuv444
input_type_train: 'bgr'
# 网络训练时输入的数据排布, 可选值为 NHWC/NCHW
# ------------------------------------------------------------------
# the data layout in network training, available options: NHWC/NCHW
input_layout_train: 'NCHW'
# (选填) 模型网络的输入大小, 以'x'分隔, 不填则会使用模型文件中的网络输入大小,否则会覆盖模型文件中输入大小
# -------------------------------------------------------------------------------------------
# (Optional)the input size of model network, seperated by 'x'
# note that the network input size of model file will be used if left blank
# otherwise it will overwrite the input size of model file
input_shape: ''
# 网络实际执行时,输入给网络的batch_size, 默认值为1
# ---------------------------------------------------------------------
# the data batch_size to be passed into neural network when actually performing neural network, default value: 1
#input_batch: 1
# 网络输入的预处理方法,主要有以下几种:
# no_preprocess 不做任何操作
# data_mean 减去通道均值mean_value
# data_scale 对图像像素乘以data_scale系数
# data_mean_and_scale 减去通道均值后再乘以scale系数
# -------------------------------------------------------------------------------------------
# preprocessing methods of network input, available options:
# 'no_preprocess' indicates that no preprocess will be made
# 'data_mean' indicates that to minus the channel mean, i.e. mean_value
# 'data_scale' indicates that image pixels to multiply data_scale ratio
# 'data_mean_and_scale' indicates that to multiply scale ratio after channel mean is minused
norm_type: 'data_mean_and_scale'
# 图像减去的均值, 如果是通道均值,value之间必须用空格分隔
# --------------------------------------------------------------------------
# the mean value minused by image
# note that values must be seperated by space if channel mean value is used
mean_value: 103.94 116.78 123.68
# 图像预处理缩放比例,如果是通道缩放比例,value之间必须用空格分隔
# ---------------------------------------------------------------------------
# scale value of image preprocess
# note that values must be seperated by space if channel scale value is used
scale_value: 0.017
# 模型量化相关参数
# -----------------------------
# model calibration parameters
calibration_parameters:
# 模型量化的参考图像的存放目录,图片格式支持Jpeg、Bmp等格式,输入的图片
# 应该是使用的典型场景,一般是从测试集中选择20~100张图片,另外输入
# 的图片要覆盖典型场景,不要是偏僻场景,如过曝光、饱和、模糊、纯黑、纯白等图片
# 若有多个输入节点, 则应使用';'进行分隔
# -------------------------------------------------------------------------------------------------
# the directory where reference images of model quantization are stored
# image formats include JPEG, BMP etc.
# should be classic application scenarios, usually 20~100 images are picked out from test datasets
# in addition, note that input images should cover typical scenarios
# and try to avoid those overexposed, oversaturated, vague,
# pure blank or pure white images
# use ';' to seperate when there are multiple input nodes
cal_data_dir: './calibration_data_bgr_f32'
# 校准数据二进制文件的数据存储类型,可选值为:float32, uint8
# calibration data binary file save type, available options: float32, uint8
cal_data_type: 'float32'
# 如果输入的图片文件尺寸和模型训练的尺寸不一致时,并且preprocess_on为true,
# 则将采用默认预处理方法(skimage resize),
# 将输入图片缩放或者裁减到指定尺寸,否则,需要用户提前把图片处理为训练时的尺寸
# ---------------------------------------------------------------------------------
# In case the size of input image file is different from that of in model training
# and that preprocess_on is set to True,
# shall the default preprocess method(skimage resize) be used
# i.e., to resize or crop input image into specified size
# otherwise user must keep image size as that of in training in advance
# preprocess_on: False
# 模型量化的算法类型,支持default、mix、kl、max、load,通常采用default即可满足要求
# 如不符合预期可先尝试修改为mix 仍不符合预期再尝试kl或max
# 当使用QAT导出模型时,此参数则应设置为load
# 相关参数的技术原理及说明请您参考用户手册中的PTQ原理及步骤中参数组详细介绍部分
# ----------------------------------------------------------------------------------
# The algorithm type of model quantization, support default, mix, kl, max, load, usually use default can meet the requirements.
# If it does not meet the expectation, you can try to change it to mix first. If there is still no expectation, try kl or max again.
# When using QAT to export the model, this parameter should be set to load.
# For more details of the parameters, please refer to the parameter details in PTQ Principle And Steps section of the user manual.
calibration_type: 'max'
# 该参数为'max'校准方法的参数,用以调整'max'校准的截取点。此参数仅在calibration_type为'max'时有效。
# 该参数取值范围:0.0 ~ 1.0。常用配置选项有:0.99999/0.99995/0.99990/0.99950/0.99900。
# ------------------------------------------------------------------------------------------------
# this is the parameter of the 'max' calibration method and it is used for adjusting the intercept point of the 'max' calibration.
# this parameter will only become valid when the calibration_type is specified as 'max'.
# RANGE: 0.0 - 1.0. Typical options includes: 0.99999/0.99995/0.99990/0.99950/0.99900.
max_percentile: 0.9999
# 编译器相关参数
# ----------------------------
# compiler related parameters
compiler_parameters:
# 编译策略,支持bandwidth和latency两种优化模式;
# bandwidth以优化ddr的访问带宽为目标;
# latency以优化推理时间为目标
# -------------------------------------------------------------------------------------------
# compilation strategy, there are 2 available optimization modes: 'bandwidth' and 'lantency'
# the 'bandwidth' mode aims to optimize ddr access bandwidth
# while the 'lantency' mode aims to optimize inference duration
compile_mode: 'latency'
# 设置debug为True将打开编译器的debug模式,能够输出性能仿真的相关信息,如帧率、DDR带宽占用等
# -----------------------------------------------------------------------------------
# the compiler's debug mode will be enabled by setting to True
# this will dump performance simulation related information
# such as: frame rate, DDR bandwidth usage etc.
debug: False
# 编译模型指定核数,不指定默认编译单核模型, 若编译双核模型,将下边注释打开即可
# -------------------------------------------------------------------------------------
# specifies number of cores to be used in model compilation
# as default, single core is used as this value left blank
# please delete the "# " below to enable dual-core mode when compiling dual-core model
# core_num: 2
# 优化等级可选范围为O0~O3
# O0不做任何优化, 编译速度最快,优化程度最低,
# O1-O3随着优化等级提高,预期编译后的模型的执行速度会更快,但是所需编译时间也会变长。
# 推荐用O2做最快验证
# ----------------------------------------------------------------------------------------------------------
# optimization level ranges between O0~O3
# O0 indicates that no optimization will be made
# the faster the compilation, the lower optimization level will be
# O1-O3: as optimization levels increase gradually, model execution, after compilation, shall become faster
# while compilation will be prolonged
# it is recommended to use O2 for fastest verification
optimize_level: 'O3'
地平线工具链中对于示例 Yaml 的注释是十分详尽的,所以也无需过多赘述。
然后就可执行 03_build.sh 脚本进行模型转换:
bash 03_build.sh
03_build.sh 的内容如下:
#!/bin/bash
set -e -v
cd $(dirname $0) || exit
config_file="./mobilenet_config.yaml"
model_type="caffe"
# build model
hb_mapper makertbin --config ${config_file} \
--model-type ${model_type}
模型转换主要用到的命令是 hb_mapper makertbin,传参很简单,一个是上面所说的 Yaml 配置文件,另一个就是输入模型的类型了。
执行模型生成的日志如下:
# bash 03_build.sh
cd $(dirname $0) || exit
config_file="./mobilenet_config.yaml"
model_type="caffe"
# build model
hb_mapper makertbin --config ${config_file} \
--model-type ${model_type}
2023-08-30 06:18:16,555 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/hb_mapper_makertbin.log
2023-08-30 06:18:16,559 INFO Start hb_mapper....
2023-08-30 06:18:16,561 INFO hbdk version 3.45.3
2023-08-30 06:18:16,564 INFO horizon_nn version 0.18.2
2023-08-30 06:18:16,567 INFO hb_mapper version 1.17.4
2023-08-30 06:18:16,569 INFO Start Model Convert....
2023-08-30 06:18:16,695 INFO Using caffe model file: /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel and prototxt file: /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt
2023-08-30 06:18:16,743 INFO Model has 1 inputs according to model file
2023-08-30 06:18:16,747 INFO Model name not given in yaml_file, using model name from model file: ['data']
2023-08-30 06:18:16,747 INFO Model input shape not given in yaml_file, using shape from model file: [[1, 3, 224, 224]]
2023-08-30 06:18:16,748 INFO nv12 input type rt received.
2023-08-30 06:18:16,751 INFO The calibration dir name suffix is the same as the value float32 of the cal_data_type parameter and will be read with the value of cal_data_type.
2023-08-30 06:18:16,753 INFO custom_op does not exist, skipped
2023-08-30 06:18:16,755 WARNING Input node data's input_source not set, it will be set to pyramid by default
2023-08-30 06:18:16,775 INFO *******************************************
2023-08-30 06:18:16,776 INFO First calibration picture name: ILSVRC2012_val_00000001.bgr
2023-08-30 06:18:16,777 INFO First calibration picture md5:
b86e53f8308d78931982391448f6e9c7 /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/calibration_data_bgr_f32/ILSVRC2012_val_00000001.bgr
2023-08-30 06:18:16,816 INFO *******************************************
2023-08-30 06:18:18,704 INFO [Wed Aug 30 06:18:18 2023] Start to Horizon NN Model Convert.
2023-08-30 06:18:18,705 INFO Loading horizon_nn debug methods:[]
2023-08-30 06:18:18,706 INFO Parsing the input parameter:{'data': {'input_shape': [1, 3, 224, 224], 'expected_input_type': 'YUV444_128', 'original_input_type': 'BGR', 'original_input_layout': 'NCHW', 'means': array([103.94, 116.78, 123.68], dtype=float32), 'scales': array([0.017], dtype=float32)}}
2023-08-30 06:18:18,707 INFO Parsing the calibration parameter
2023-08-30 06:18:18,707 INFO Parsing the hbdk parameter:{'hbdk_pass_through_params': '--O3 --core-num 1 --fast ', 'input-source': {'data': 'pyramid', '_default_value': 'ddr'}}
2023-08-30 06:18:18,708 INFO HorizonNN version: 0.18.2
2023-08-30 06:18:18,709 INFO HBDK version: 3.45.3
2023-08-30 06:18:18,926 INFO Find 1 inputs in the model:
2023-08-30 06:18:18,927 INFO Got input 'data' with shape [1, 3, 224, 224].
2023-08-30 06:18:19,932 INFO [Wed Aug 30 06:18:19 2023] Start to parse the onnx model.
2023-08-30 06:18:19,933 INFO Input ONNX model infomation:
ONNX IR version: 7
Opset version: [10, 1, 1]
Producer: none
Domain: none
Input name: data, [1, 3, 224, 224]
Output name: prob, [1, 1000, 1, 1]
2023-08-30 06:18:19,986 INFO [Wed Aug 30 06:18:19 2023] End to parse the onnx model.
2023-08-30 06:18:19,987 INFO Model input names parsed from model: ['data']
2023-08-30 06:18:19,988 INFO Create a preprocessing operator for input_name data with means=[103.94 116.78 123.68], std=[58.82352621], original_input_layout=NCHW, color convert from 'BGR' to 'YUV_BT601_FULL_RANGE'.
2023-08-30 06:18:20,098 INFO Saving the original float model: mobilenetv1_224x224_nv12_original_float_model.onnx.
2023-08-30 06:18:20,099 INFO [Wed Aug 30 06:18:20 2023] Start to optimize the model.
2023-08-30 06:18:20,666 INFO [Wed Aug 30 06:18:20 2023] End to optimize the model.
2023-08-30 06:18:20,725 INFO Saving the optimized model: mobilenetv1_224x224_nv12_optimized_float_model.onnx.
2023-08-30 06:18:20,727 INFO [Wed Aug 30 06:18:20 2023] Start to calibrate the model.
2023-08-30 06:18:20,729 INFO There are 100 samples in the calibration data set.
2023-08-30 06:18:21,009 INFO Run calibration model with max-percentile=0.999900 method.
2023-08-30 06:18:21,093 INFO Calibration using batch 8
max-percentile=0.999900 calibration in progress: 100%|███████████████████████████████████████████████████| 13/13 [00:05<00:00, 2.58it/s]
2023-08-30 06:18:26,546 INFO Saving the calibrated model: mobilenetv1_224x224_nv12_calibrated_model.onnx.
2023-08-30 06:18:26,548 INFO [Wed Aug 30 06:18:26 2023] End to calibrate the model.
2023-08-30 06:18:26,550 INFO [Wed Aug 30 06:18:26 2023] Start to quantize the model.
2023-08-30 06:18:29,465 INFO input data is from pyramid. Its layout is set to NHWC
2023-08-30 06:18:29,731 INFO [Wed Aug 30 06:18:29 2023] End to quantize the model.
2023-08-30 06:18:30,046 INFO Saving the quantized model: mobilenetv1_224x224_nv12_quantized_model.onnx.
2023-08-30 06:18:30,881 INFO [Wed Aug 30 06:18:30 2023] Start to compile the model with march bernoulli2.
2023-08-30 06:18:30,883 INFO [Wed Aug 30 06:18:30 2023] End to compile the model with march bernoulli2.
2023-08-30 06:18:31,172 INFO Compile submodel: MOBILENET_subgraph_0
2023-08-30 06:18:31,577 INFO hbdk-cc parameters:['--O3', '--core-num', '1', '--fast', '--input-layout', 'NHWC', '--output-layout', 'NCHW', '--input-source', 'pyramid']
[==================================================] 100%
2023-08-30 06:18:33,347 INFO consumed time 1.64005
2023-08-30 06:18:33,533 INFO FPS=347.12, latency = 2880.9 us (see MOBILENET_subgraph_0.html)
2023-08-30 06:18:33,711 INFO The converted model node information:
==============================================================================================================
Node ON Subgraph Type Cosine Similarity Threshold In/Out DataType
---------------------------------------------------------------------------------------------------------------
HZ_PREPROCESS_FOR_data BPU id(0) HzSQuantizedPreprocess 0.999988 127.000000 int8/int8
conv1 BPU id(0) HzSQuantizedConv 0.999916 2.937425 int8/int8
conv2_1/dw BPU id(0) HzSQuantizedConv 0.999356 2.040827 int8/int8
conv2_1/sep BPU id(0) HzSQuantizedConv 0.996678 4.486579 int8/int8
conv2_2/dw BPU id(0) HzSQuantizedConv 0.997330 3.545496 int8/int8
conv2_2/sep BPU id(0) HzSQuantizedConv 0.996376 2.791299 int8/int8
conv3_1/dw BPU id(0) HzSQuantizedConv 0.994169 1.417208 int8/int8
conv3_1/sep BPU id(0) HzSQuantizedConv 0.985448 2.188753 int8/int8
conv3_2/dw BPU id(0) HzSQuantizedConv 0.994925 1.822225 int8/int8
conv3_2/sep BPU id(0) HzSQuantizedConv 0.994255 1.841765 int8/int8
conv4_1/dw BPU id(0) HzSQuantizedConv 0.988255 1.043535 int8/int8
conv4_1/sep BPU id(0) HzSQuantizedConv 0.990334 1.736999 int8/int8
conv4_2/dw BPU id(0) HzSQuantizedConv 0.992463 0.990603 int8/int8
conv4_2/sep BPU id(0) HzSQuantizedConv 0.993469 1.574677 int8/int8
conv5_1/dw BPU id(0) HzSQuantizedConv 0.988949 0.823123 int8/int8
conv5_1/sep BPU id(0) HzSQuantizedConv 0.990804 1.265912 int8/int8
conv5_2/dw BPU id(0) HzSQuantizedConv 0.990191 0.772344 int8/int8
conv5_2/sep BPU id(0) HzSQuantizedConv 0.983377 1.530479 int8/int8
conv5_3/dw BPU id(0) HzSQuantizedConv 0.986417 0.783812 int8/int8
conv5_3/sep BPU id(0) HzSQuantizedConv 0.977491 1.927324 int8/int8
conv5_4/dw BPU id(0) HzSQuantizedConv 0.982257 0.996043 int8/int8
conv5_4/sep BPU id(0) HzSQuantizedConv 0.961735 2.167391 int8/int8
conv5_5/dw BPU id(0) HzSQuantizedConv 0.978750 1.923361 int8/int8
conv5_5/sep BPU id(0) HzSQuantizedConv 0.960038 3.578415 int8/int8
conv5_6/dw BPU id(0) HzSQuantizedConv 0.980339 2.463874 int8/int8
conv5_6/sep BPU id(0) HzSQuantizedConv 0.981139 4.124151 int8/int8
conv6/dw BPU id(0) HzSQuantizedConv 0.998357 0.667692 int8/int8
conv6/sep BPU id(0) HzSQuantizedConv 0.986100 0.983833 int8/int8
pool6 BPU id(0) HzSQuantizedConv 0.993779 11.415899 int8/int8
fc7 BPU id(0) HzSQuantizedConv 0.995593 5.843800 int8/int32
prob CPU -- Softmax 0.983072 -- float/float
2023-08-30 06:18:33,718 INFO The quantify model output:
=======================================================================
Node Cosine Similarity L1 Distance L2 Distance Chebyshev Distance
-----------------------------------------------------------------------
prob 0.983072 0.000409 0.000220 0.202166
2023-08-30 06:18:33,719 INFO [Wed Aug 30 06:18:33 2023] End to Horizon NN Model Convert.
2023-08-30 06:18:33,850 INFO start convert to *.bin file....
2023-08-30 06:18:34,002 INFO ONNX model output num : 1
2023-08-30 06:18:34,010 INFO ############# model deps info #############
2023-08-30 06:18:34,011 INFO hb_mapper version : 1.17.4
2023-08-30 06:18:34,013 INFO hbdk version : 3.45.3
2023-08-30 06:18:34,014 INFO hbdk runtime version: 3.15.25.0
2023-08-30 06:18:34,016 INFO horizon_nn version : 0.18.2
2023-08-30 06:18:34,017 INFO ############# model_parameters info #############
2023-08-30 06:18:34,018 INFO caffe_model : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel
2023-08-30 06:18:34,020 INFO prototxt : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt
2023-08-30 06:18:34,021 INFO BPU march : bernoulli2
2023-08-30 06:18:34,023 INFO layer_out_dump : False
2023-08-30 06:18:34,024 INFO log_level : DEBUG
2023-08-30 06:18:34,026 INFO working dir : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/model_output
2023-08-30 06:18:34,028 INFO output_model_file_prefix: mobilenetv1_224x224_nv12
2023-08-30 06:18:34,030 INFO ############# input_parameters info #############
2023-08-30 06:18:34,031 INFO ------------------------------------------
2023-08-30 06:18:34,033 INFO ---------input info : data ---------
2023-08-30 06:18:34,034 INFO input_name : data
2023-08-30 06:18:34,035 INFO input_type_rt : nv12
2023-08-30 06:18:34,036 INFO input_space&range : regular
2023-08-30 06:18:34,037 INFO input_layout_rt : None
2023-08-30 06:18:34,038 INFO input_type_train : bgr
2023-08-30 06:18:34,040 INFO input_layout_train : NCHW
2023-08-30 06:18:34,041 INFO norm_type : data_mean_and_scale
2023-08-30 06:18:34,042 INFO input_shape : 1x3x224x224
2023-08-30 06:18:34,043 INFO mean_value : 103.94,116.78,123.68,
2023-08-30 06:18:34,044 INFO scale_value : 0.017,
2023-08-30 06:18:34,045 INFO cal_data_dir : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/calibration_data_bgr_f32
2023-08-30 06:18:34,047 INFO cal_data_type : float32
2023-08-30 06:18:34,048 INFO ---------input info : data end -------
2023-08-30 06:18:34,049 INFO ------------------------------------------
2023-08-30 06:18:34,050 INFO ############# calibration_parameters info #############
2023-08-30 06:18:34,051 INFO preprocess_on : False
2023-08-30 06:18:34,052 INFO calibration_type: : max
2023-08-30 06:18:34,053 INFO max_percentile : 0.9999
2023-08-30 06:18:34,054 INFO ############# compiler_parameters info #############
2023-08-30 06:18:34,055 INFO hbdk_pass_through_params: --O3 --core-num 1 --fast
2023-08-30 06:18:34,056 INFO input-source : {'data': 'pyramid', '_default_value': 'ddr'}
2023-08-30 06:18:34,083 INFO Convert to runtime bin file successfully!
2023-08-30 06:18:34,085 INFO End Model Convert
执行成功后会在同级目录的 model_output 下生成 .bin 格式的板上执行模型文件:
其中:
- MOBILENET_subgraph_0.html:html 格式的静态性能评估文件;
- MOBILENET_subgraph_0.json:json 格式的静态性能评估文件;
- mobilenetv1_224x224_nv12.bin:板上执行模型文件;
4> 性能验证
性能验证这里其实会有两个概念,一个是静态性能验证,另一个是动态性能验证。静态性能验证其实也就是仿真模拟出来的,也就是上面的 html 和 json 中的性能数据;而动态性能验证是直接放板子上跑测的性能。
先来看静态性能验证,以 json 为例,如下,可以看到 fps 为 347.12:
如果你看 html,会更加直观。
接着来看动态性能验证,也就是要把前面模型转换生成的 mobilenetv1_224x224_nv12.bin 放到板子上去跑,可以通过 scp 命令把模型文件拷贝到板子上,然后执行如下命令进行性能测试:
hrt_model_exec perf --model_file mobilenetv1_224x224_nv12.bin --thread_num 1 --frame_count 1000
可以看到我这里实际的 fps 为 314.45,稍慢于静态模拟性能。
5> 精度验证
提供了两个脚本,一个是测试单图的 04_inference.sh,还有个是测试 imagenet val 验证集精度的 05_evalute.sh。由于我这里还没准备好 imagenet 数据集,所以 evalute.sh 这个脚本先不介绍了。
04_inference.sh 的功能是验证对比量化模型和原始模型的推理精度,来看下脚本内容:
#!/bin/bash
set -e -v
cd $(dirname $0) || exit
#for converted quanti model inference
quanti_model_file="./model_output/mobilenetv1_224x224_nv12_quantized_model.onnx"
quanti_input_layout="NHWC"
#for original float model inference
original_model_file="./model_output/mobilenetv1_224x224_nv12_original_float_model.onnx"
original_input_layout="NCHW"
if [[ $1 =~ "origin" ]]; then
model=$original_model_file
layout=$original_input_layout
input_offset=128
else
model=$quanti_model_file
layout=$quanti_input_layout
input_offset=128
fi
infer_image="../../../01_common/test_data/cls_images/zebra_cls.jpg"
python3 -u ../../cls_inference.py \
--model ${model} \
--image ${infer_image} \
--input_layout ${layout} \
--input_offset ${input_offset}
这个应该很容易就能看懂里面的原理,如果直接执行 bash 04_inference.sh 就是推理量化模型,如果执行 bash 04_inference.sh origin 就是推理原始模型,然后将两个模型的输出进行对比就行。
比如我推理量化模型的输出如下 (虽然我这里并没有做量化...):
023-08-30 07:30:28,254 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/inference.log
2023-08-30 07:30:29,266 INFO The input picture is classified to be:
2023-08-30 07:30:29,267 INFO label 340, prob 0.97307, class ['zebra']
2023-08-30 07:30:29,268 INFO label 292, prob 0.02184, class ['tiger, Panthera tigris']
2023-08-30 07:30:29,268 INFO label 282, prob 0.00331, class ['tiger cat']
2023-08-30 07:30:29,269 INFO label 83, prob 0.00108, class ['prairie chicken, prairie grouse, prairie fowl']
2023-08-30 07:30:29,270 INFO label 290, prob 0.00006, class ['jaguar, panther, Panthera onca, Felis onca']
类别就是 zebra 斑马,原图如下:
但是说白了,如果真的要进行精度验证,其实还是需要回到板子上运行然后做结果验证,这样才是最值得信赖的。
篇幅比较长,介绍了这个比较完美的工具链的完整的部署流程,值得学习的地方还是挺多的。
好了,以上分享了地平线天工开物工具链部署流程详解。希望我的分享能对你的学习有一点帮助。
【公众号传送】
畅享人工智能的科技魅力,让好玩的AI项目不难玩。邀请您加入我的知识星球, 星球内我精心整备了大量好玩的AI项目,皆以工程源码形式开放使用,涵盖人脸、检测、分割、多模态、AIGC、自动驾驶、工业等。不敢说会对你学习有所帮助,但一定非常好玩,并持续更新更加有趣的项目。 t.zsxq.com/0aiNxERDq