极智AI | 地平线天工开物工具链部署流程详解

344 阅读20分钟

欢迎关注我的公众号 [极智视界],获取我的更多经验分享

大家好,我是极智视界,本文来介绍一下 地平线天工开物工具链部署流程详解。

邀您加入我的知识星球「极智视界」,星球内有超多好玩的项目实战源码下载,链接:t.zsxq.com/0aiNxERDq

地平线天工开物工具链从完备性、清晰性上来说,应该是我见过的比较好的了。

这里好好介绍一下它的部署流程。

下面是一张很好的地平线部署流程图,特别清晰:

总体来看其实有几个主要的环节:

  • 浮点模型准备;
  • 模型检查;
  • 模型转换;
  • 性能验证;
  • 精度验证;

除了模型检查外,其他几个环节其实都是十分常见的部署阶段。而地平线单独把模型检查拎出来,这也是它做的好的地方,因为很多时候其实模型本身就存在问题,但是我们不知,继续往下做,导致推理异常或精度异常或性能异常。这里把大量后续没必要的异常定位,在模型检查阶段就掐死,确实妙。

下面以地平线的例程 mobilenetv1 的部署验证展开细说。

1> 浮点模型准备

地平线的前端模型支持两种格式:caffe1.0 和 opset=10/11的 onnx 模型,这其实真的就够用了,caffe 代表老牌,onnx 代表通用。特别指出 opset=10/11,说明地平线工具链的开发人员是内行 (顺便提一句,之前和一家芯片厂商对接,我问是不是要求 opset 最好为 11,回答我说 主要看算子,opset 其实无所谓的,这让我觉得比较外行)。

对于部署来说,模型转换都是一个大难题,把它规约到 caffe1.0 和 opset=10/11 的 onnx,这样可以很大程度上简化后续的部署难度。其实至于怎么把自己的模型转换到 caffe1.0、opset=10/11 的 onnx 模型,最好也提供现成的解决方案,毕竟这一块其实很容易出问题。但现实是很多厂商其实会绕过这个话题,因为确实过于麻烦。如果训练框架为 pytorch,那还好说,如果是 其他稀奇古怪的框架,那就比较麻烦。

这里例程的 modbilenetv1 是直接提供了模型准备,进入到 ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classfication/01_mobilenet/mapper,然后执行:

bash 00_init.sh

这样会直接下载 mobilenet 所需的原始模型文件 caffemodel。

  • mobilenet_deploy.prototxt
  • mobilenet.caffemodel

2> 模型验证

进行模型验证:

bash 01_check.sh

看下 01_check.sh 脚本的内容:

#!/usr/bin/env sh

set -ex
cd $(dirname $0) || exit
model_type="caffe"
proto="../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt"
caffe_model="../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel"
march="bernoulli2"

hb_mapper checker --model-type ${model_type} \
                  --proto ${proto} --model ${caffe_model} \
                  --march ${march}

主要的命令为 hb_mapper checker,可选的参数包括以下几个:

  • --model-type:模型的类型,前面说过 caffe 或 onnx;
  • --march:处理器架构,默认 bernoulli2;
  • --proto:仅在 model-type 为 caffe 时生效;
  • --model:caffe 的 caffemodel 或 onnx 模型文件;

执行 01_check.sh 的日志如下:

# bash 01_check.sh 
++ dirname ./01_check.sh
+ cd .
+ model_type=caffe
+ proto=../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt
+ caffe_model=../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel
+ march=bernoulli2
+ hb_mapper checker --model-type caffe --proto ../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt --model ../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel --march bernoulli2
2023-08-30 05:55:12,562 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/hb_mapper_checker.log
2023-08-30 05:55:12,568 INFO Start hb_mapper....
2023-08-30 05:55:12,571 INFO hbdk version 3.45.3
2023-08-30 05:55:12,574 INFO horizon_nn version 0.18.2
2023-08-30 05:55:12,578 INFO hb_mapper version 1.17.4
2023-08-30 05:55:12,717 INFO Model type: caffe
2023-08-30 05:55:12,719 INFO input names []
2023-08-30 05:55:12,719 INFO input shapes {}
2023-08-30 05:55:12,720 INFO Begin model checking....
2023-08-30 05:55:12,737 INFO [Wed Aug 30 05:55:12 2023] Start to Horizon NN Model Convert.
2023-08-30 05:55:12,738 INFO Loading horizon_nn debug methods:[]
2023-08-30 05:55:12,740 INFO The input parameter is not specified, convert with default parameters.
2023-08-30 05:55:12,742 INFO Parsing the hbdk parameter:{'hbdk_pass_through_params': '--O0'}
2023-08-30 05:55:12,743 INFO HorizonNN version: 0.18.2
2023-08-30 05:55:12,744 INFO HBDK version: 3.45.3
2023-08-30 05:55:13,358 INFO Find 1 inputs in the model:
2023-08-30 05:55:13,359 INFO Got input 'data' with shape [1, 3, 224, 224].
2023-08-30 05:55:14,415 INFO [Wed Aug 30 05:55:14 2023] Start to parse the onnx model.
2023-08-30 05:55:14,416 INFO Input ONNX model infomation:
ONNX IR version:          7
Opset version:            [10, 1, 1]
Producer:                 none
Domain:                   none
Input name:               data, [1, 3, 224, 224]
Output name:              prob, [1, 1000, 1, 1]
2023-08-30 05:55:14,468 INFO [Wed Aug 30 05:55:14 2023] End to parse the onnx model.
2023-08-30 05:55:14,469 INFO Model input names parsed from model: ['data']
2023-08-30 05:55:14,542 INFO Saving the original float model: ./.hb_check/original_float_model.onnx.
2023-08-30 05:55:14,543 INFO [Wed Aug 30 05:55:14 2023] Start to optimize the model.
2023-08-30 05:55:15,277 INFO [Wed Aug 30 05:55:15 2023] End to optimize the model.
2023-08-30 05:55:15,344 INFO Saving the optimized model: ./.hb_check/optimized_float_model.onnx.
2023-08-30 05:55:15,345 INFO [Wed Aug 30 05:55:15 2023] Start to calibrate the model.
2023-08-30 05:55:15,348 INFO There are 1 samples in the calibration data set.
2023-08-30 05:55:15,634 INFO Run calibration model with max method.
2023-08-30 05:55:15,720 INFO Calibration using batch 8
max calibration in progress: 100%|█████████████████████████████████████████████████████████████████| 1/1 [00:00<00:00, 15.03it/s]
2023-08-30 05:55:16,234 INFO Saving the calibrated model: ./.hb_check/calibrated_model.onnx.
2023-08-30 05:55:16,235 INFO [Wed Aug 30 05:55:16 2023] End to calibrate the model.
2023-08-30 05:55:16,237 INFO [Wed Aug 30 05:55:16 2023] Start to quantize the model.
2023-08-30 05:55:16,928 INFO [Wed Aug 30 05:55:16 2023] End to quantize the model.
2023-08-30 05:55:17,261 INFO Saving the quantized model: ./.hb_check/quantized_model.onnx.
2023-08-30 05:55:18,107 INFO [Wed Aug 30 05:55:18 2023] Start to compile the model with march bernoulli2.
2023-08-30 05:55:18,108 INFO [Wed Aug 30 05:55:18 2023] End to compile the model with march bernoulli2.
2023-08-30 05:55:18,413 INFO Compile submodel: MOBILENET_subgraph_0
2023-08-30 05:55:18,836 INFO hbdk-cc parameters:['--O0', '--input-layout', 'NHWC', '--output-layout', 'NCHW']
[==================================================] 100%
2023-08-30 05:55:19,182 INFO consumed time 0.213798
2023-08-30 05:55:19,400 INFO FPS=230.15, latency = 4344.9 us   (see ./.hb_check/MOBILENET_subgraph_0.html)
2023-08-30 05:55:19,588 INFO The converted model node information:
==============================================
Node         ON   Subgraph  Type              
----------------------------------------------
conv1        BPU  id(0)     HzSQuantizedConv  
conv2_1/dw   BPU  id(0)     HzSQuantizedConv  
conv2_1/sep  BPU  id(0)     HzSQuantizedConv  
conv2_2/dw   BPU  id(0)     HzSQuantizedConv  
conv2_2/sep  BPU  id(0)     HzSQuantizedConv  
conv3_1/dw   BPU  id(0)     HzSQuantizedConv  
conv3_1/sep  BPU  id(0)     HzSQuantizedConv  
conv3_2/dw   BPU  id(0)     HzSQuantizedConv  
conv3_2/sep  BPU  id(0)     HzSQuantizedConv  
conv4_1/dw   BPU  id(0)     HzSQuantizedConv  
conv4_1/sep  BPU  id(0)     HzSQuantizedConv  
conv4_2/dw   BPU  id(0)     HzSQuantizedConv  
conv4_2/sep  BPU  id(0)     HzSQuantizedConv  
conv5_1/dw   BPU  id(0)     HzSQuantizedConv  
conv5_1/sep  BPU  id(0)     HzSQuantizedConv  
conv5_2/dw   BPU  id(0)     HzSQuantizedConv  
conv5_2/sep  BPU  id(0)     HzSQuantizedConv  
conv5_3/dw   BPU  id(0)     HzSQuantizedConv  
conv5_3/sep  BPU  id(0)     HzSQuantizedConv  
conv5_4/dw   BPU  id(0)     HzSQuantizedConv  
conv5_4/sep  BPU  id(0)     HzSQuantizedConv  
conv5_5/dw   BPU  id(0)     HzSQuantizedConv  
conv5_5/sep  BPU  id(0)     HzSQuantizedConv  
conv5_6/dw   BPU  id(0)     HzSQuantizedConv  
conv5_6/sep  BPU  id(0)     HzSQuantizedConv  
conv6/dw     BPU  id(0)     HzSQuantizedConv  
conv6/sep    BPU  id(0)     HzSQuantizedConv  
pool6        BPU  id(0)     HzSQuantizedConv  
fc7          BPU  id(0)     HzSQuantizedConv  
prob         CPU  --        Softmax
2023-08-30 05:55:19,589 INFO [Wed Aug 30 05:55:19 2023] End to Horizon NN Model Convert.
2023-08-30 05:55:19,595 INFO ONNX model output num : 1
2023-08-30 05:55:19,618 INFO End model checking....

如上,会打印一些版本信息、模型结构信息、原始模型的推理性能等,这样,校验通过,可以进入下个环节。

3> 模型转换

模型转换中,后量化是绕不过去的步骤,当然地平线的工具链中也提供了后量化的模块。首先执行 02_preprocess.sh 脚本进行量化数据集的准备:

bash 02_preprocess.sh

02_preprocess.sh 脚本的内容如下:

#!/usr/bin/env bash

set -e -v

cd $(dirname $0) || exit

python3 ../../../data_preprocess.py \
  --src_dir ../../../01_common/calibration_data/imagenet \
  --dst_dir ./calibration_data_bgr_f32 \
  --pic_ext .bgr \
  --read_mode skimage \
  --saved_data_type float32

其中 --saved_data_type 可配置为 float32 或 uint8,如配置 uint8 就是为后续的 int8 量化准备校准数据集;--pic_ext 为处理后的数据后缀格式;--read_mode 可选 skimage 或 opencv,skimage 为 RGB、数据范围 01,opencv 为 BGR、数据范围 0255。

完了之后,为了要做模型转换,还需要准备一个配置文件,称为模型转换 Yaml 配置文件,Yaml 配置文件中有几个重要的参数配置:

  • model_parameters (必选);
  • input_parameters (必选);
  • calibration_parameters (必选);
  • compiler_parameters (必选);
  • custom_op (可选);

贴上 mobilenet_config.yaml 的配置:

# 模型转化相关的参数
# ------------------------------------
# model conversion related parameters
model_parameters:

  # Caffe浮点网络数据模型文件
  # -----------------------------------------------------------
  # the model file of floating-point Caffe neural network data
  caffe_model: '../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel'

  # Caffe网络描述文件
  # ---------------------------------------------------------
  # the file describes the structure of Caffe neural network
  prototxt: '../../../01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt'

  # 适用BPU架构
  # --------------------------------
  # the applicable BPU architecture
  march: "bernoulli2"

  # 指定模型转换过程中是否输出各层的中间结果,如果为True,则输出所有层的中间输出结果,
  # --------------------------------------------------------------------------------------
  # specifies whether or not to dump the intermediate results of all layers in conversion
  # if set to True, then the intermediate results of all layers shall be dumped
  layer_out_dump: False

  # 模型转换输出的结果的存放目录
  # -----------------------------------------------------------
  # the directory in which model conversion results are stored
  working_dir: 'model_output'

  # 模型转换输出的用于上板执行的模型文件的名称前缀
  # -----------------------------------------------------------------------------------------
  # model conversion generated name prefix of those model files used for dev board execution
  output_model_file_prefix: 'mobilenetv1_224x224_nv12'

# 模型输入相关参数, 若输入多个节点, 则应使用';'进行分隔, 使用默认缺省设置则写None
# --------------------------------------------------------------------------
# model input related parameters,
# please use ";" to seperate when inputting multiple nodes,
# please use None for default setting
input_parameters:

  # (选填) 模型输入的节点名称, 此名称应与模型文件中的名称一致, 否则会报错, 不填则会使用模型文件中的节点名称
  # --------------------------------------------------------------------------------------------------------
  # (Optional) node name of model input,
  # it shall be the same as the name of model file, otherwise an error will be reported,
  # the node name of model file will be used when left blank
  input_name: ""

  # 网络实际执行时,输入给网络的数据格式,包括 nv12/rgb/bgr/yuv444/gray/featuremap,
  # ------------------------------------------------------------------------------------------
  # the data formats to be passed into neural network when actually performing neural network
  # available options: nv12/rgb/bgr/yuv444/gray/featuremap,
  input_type_rt: 'nv12'

  # 网络实际执行时输入的数据排布, 可选值为 NHWC/NCHW
  # 若input_type_rt配置为nv12,则此处参数不需要配置
  # ------------------------------------------------------------------
  # the data layout formats to be passed into neural network when actually performing neural network, available options: NHWC/NCHW
  # If input_type_rt is configured as nv12, then this parameter does not need to be configured
  #input_layout_rt: ''

  # 网络训练时输入的数据格式,可选的值为rgb/bgr/gray/featuremap/yuv444
  # --------------------------------------------------------------------
  # the data formats in network training
  # available options: rgb/bgr/gray/featuremap/yuv444
  input_type_train: 'bgr'

  # 网络训练时输入的数据排布, 可选值为 NHWC/NCHW
  # ------------------------------------------------------------------
  # the data layout in network training, available options: NHWC/NCHW
  input_layout_train: 'NCHW'

  # (选填) 模型网络的输入大小, 以'x'分隔, 不填则会使用模型文件中的网络输入大小,否则会覆盖模型文件中输入大小
  # -------------------------------------------------------------------------------------------
  # (Optional)the input size of model network, seperated by 'x'
  # note that the network input size of model file will be used if left blank
  # otherwise it will overwrite the input size of model file
  input_shape: ''

  # 网络实际执行时,输入给网络的batch_size, 默认值为1
  # ---------------------------------------------------------------------
  # the data batch_size to be passed into neural network when actually performing neural network, default value: 1
  #input_batch: 1
  
  # 网络输入的预处理方法,主要有以下几种:
  # no_preprocess 不做任何操作
  # data_mean 减去通道均值mean_value
  # data_scale 对图像像素乘以data_scale系数
  # data_mean_and_scale 减去通道均值后再乘以scale系数
  # -------------------------------------------------------------------------------------------
  # preprocessing methods of network input, available options:
  # 'no_preprocess' indicates that no preprocess will be made 
  # 'data_mean' indicates that to minus the channel mean, i.e. mean_value
  # 'data_scale' indicates that image pixels to multiply data_scale ratio
  # 'data_mean_and_scale' indicates that to multiply scale ratio after channel mean is minused
  norm_type: 'data_mean_and_scale'

  # 图像减去的均值, 如果是通道均值,value之间必须用空格分隔
  # --------------------------------------------------------------------------
  # the mean value minused by image
  # note that values must be seperated by space if channel mean value is used
  mean_value: 103.94 116.78 123.68

  # 图像预处理缩放比例,如果是通道缩放比例,value之间必须用空格分隔
  # ---------------------------------------------------------------------------
  # scale value of image preprocess
  # note that values must be seperated by space if channel scale value is used
  scale_value: 0.017

# 模型量化相关参数
# -----------------------------
# model calibration parameters
calibration_parameters:

  # 模型量化的参考图像的存放目录,图片格式支持Jpeg、Bmp等格式,输入的图片
  # 应该是使用的典型场景,一般是从测试集中选择20~100张图片,另外输入
  # 的图片要覆盖典型场景,不要是偏僻场景,如过曝光、饱和、模糊、纯黑、纯白等图片
  # 若有多个输入节点, 则应使用';'进行分隔
  # -------------------------------------------------------------------------------------------------
  # the directory where reference images of model quantization are stored
  # image formats include JPEG, BMP etc.
  # should be classic application scenarios, usually 20~100 images are picked out from test datasets
  # in addition, note that input images should cover typical scenarios
  # and try to avoid those overexposed, oversaturated, vague, 
  # pure blank or pure white images
  # use ';' to seperate when there are multiple input nodes
  cal_data_dir: './calibration_data_bgr_f32'

  # 校准数据二进制文件的数据存储类型,可选值为:float32, uint8
  # calibration data binary file save type, available options: float32, uint8
  cal_data_type: 'float32'

  # 如果输入的图片文件尺寸和模型训练的尺寸不一致时,并且preprocess_on为true,
  # 则将采用默认预处理方法(skimage resize),
  # 将输入图片缩放或者裁减到指定尺寸,否则,需要用户提前把图片处理为训练时的尺寸
  # ---------------------------------------------------------------------------------
  # In case the size of input image file is different from that of in model training
  # and that preprocess_on is set to True,
  # shall the default preprocess method(skimage resize) be used
  # i.e., to resize or crop input image into specified size
  # otherwise user must keep image size as that of in training in advance
  # preprocess_on: False

  # 模型量化的算法类型,支持default、mix、kl、max、load,通常采用default即可满足要求
  # 如不符合预期可先尝试修改为mix 仍不符合预期再尝试kl或max
  # 当使用QAT导出模型时,此参数则应设置为load
  # 相关参数的技术原理及说明请您参考用户手册中的PTQ原理及步骤中参数组详细介绍部分 
  # ----------------------------------------------------------------------------------
  # The algorithm type of model quantization, support default, mix, kl, max, load, usually use default can meet the requirements.
  # If it does not meet the expectation, you can try to change it to mix first. If there is still no expectation, try kl or max again.
  # When using QAT to export the model, this parameter should be set to load.
  # For more details of the parameters, please refer to the parameter details in PTQ Principle And Steps section of the user manual.
  calibration_type: 'max'

  # 该参数为'max'校准方法的参数,用以调整'max'校准的截取点。此参数仅在calibration_type为'max'时有效。
  # 该参数取值范围:0.0 ~ 1.0。常用配置选项有:0.99999/0.99995/0.99990/0.99950/0.99900。
  # ------------------------------------------------------------------------------------------------
  # this is the parameter of the 'max' calibration method and it is used for adjusting the intercept point of the 'max' calibration.
  # this parameter will only become valid when the calibration_type is specified as 'max'.
  # RANGE: 0.0 - 1.0. Typical options includes: 0.99999/0.99995/0.99990/0.99950/0.99900.
  max_percentile: 0.9999

# 编译器相关参数
# ----------------------------
# compiler related parameters
compiler_parameters:

  # 编译策略,支持bandwidth和latency两种优化模式;
  # bandwidth以优化ddr的访问带宽为目标;
  # latency以优化推理时间为目标
  # -------------------------------------------------------------------------------------------
  # compilation strategy, there are 2 available optimization modes: 'bandwidth' and 'lantency'
  # the 'bandwidth' mode aims to optimize ddr access bandwidth
  # while the 'lantency' mode aims to optimize inference duration
  compile_mode: 'latency'

  # 设置debug为True将打开编译器的debug模式,能够输出性能仿真的相关信息,如帧率、DDR带宽占用等
  # -----------------------------------------------------------------------------------
  # the compiler's debug mode will be enabled by setting to True
  # this will dump performance simulation related information
  # such as: frame rate, DDR bandwidth usage etc.
  debug: False

  # 编译模型指定核数,不指定默认编译单核模型, 若编译双核模型,将下边注释打开即可
  # -------------------------------------------------------------------------------------
  # specifies number of cores to be used in model compilation 
  # as default, single core is used as this value left blank
  # please delete the "# " below to enable dual-core mode when compiling dual-core model
  # core_num: 2

  # 优化等级可选范围为O0~O3
  # O0不做任何优化, 编译速度最快,优化程度最低,
  # O1-O3随着优化等级提高,预期编译后的模型的执行速度会更快,但是所需编译时间也会变长。
  # 推荐用O2做最快验证
  # ----------------------------------------------------------------------------------------------------------
  # optimization level ranges between O0~O3
  # O0 indicates that no optimization will be made 
  # the faster the compilation, the lower optimization level will be
  # O1-O3: as optimization levels increase gradually, model execution, after compilation, shall become faster
  # while compilation will be prolonged
  # it is recommended to use O2 for fastest verification
  optimize_level: 'O3'

地平线工具链中对于示例 Yaml 的注释是十分详尽的,所以也无需过多赘述。

然后就可执行 03_build.sh 脚本进行模型转换:

bash 03_build.sh

03_build.sh 的内容如下:

#!/bin/bash

set -e -v

cd $(dirname $0) || exit

config_file="./mobilenet_config.yaml"
model_type="caffe"
# build model
hb_mapper makertbin --config ${config_file}  \
                    --model-type  ${model_type}

模型转换主要用到的命令是 hb_mapper makertbin,传参很简单,一个是上面所说的 Yaml 配置文件,另一个就是输入模型的类型了。

执行模型生成的日志如下:

# bash 03_build.sh 

cd $(dirname $0) || exit

config_file="./mobilenet_config.yaml"
model_type="caffe"
# build model
hb_mapper makertbin --config ${config_file}  \
                    --model-type  ${model_type}
2023-08-30 06:18:16,555 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/hb_mapper_makertbin.log
2023-08-30 06:18:16,559 INFO Start hb_mapper....
2023-08-30 06:18:16,561 INFO hbdk version 3.45.3
2023-08-30 06:18:16,564 INFO horizon_nn version 0.18.2
2023-08-30 06:18:16,567 INFO hb_mapper version 1.17.4
2023-08-30 06:18:16,569 INFO Start Model Convert....
2023-08-30 06:18:16,695 INFO Using caffe model file: /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel and prototxt file: /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt
2023-08-30 06:18:16,743 INFO Model has 1 inputs according to model file
2023-08-30 06:18:16,747 INFO Model name not given in yaml_file, using model name from model file: ['data']
2023-08-30 06:18:16,747 INFO Model input shape not given in yaml_file, using shape from model file: [[1, 3, 224, 224]]
2023-08-30 06:18:16,748 INFO nv12 input type rt received.
2023-08-30 06:18:16,751 INFO The calibration dir name suffix is the same as the value float32 of the cal_data_type parameter and will be read with the value of cal_data_type.
2023-08-30 06:18:16,753 INFO custom_op does not exist, skipped
2023-08-30 06:18:16,755 WARNING Input node data's input_source not set, it will be set to pyramid by default
2023-08-30 06:18:16,775 INFO *******************************************
2023-08-30 06:18:16,776 INFO First calibration picture name: ILSVRC2012_val_00000001.bgr
2023-08-30 06:18:16,777 INFO First calibration picture md5:
b86e53f8308d78931982391448f6e9c7  /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/calibration_data_bgr_f32/ILSVRC2012_val_00000001.bgr
2023-08-30 06:18:16,816 INFO *******************************************
2023-08-30 06:18:18,704 INFO [Wed Aug 30 06:18:18 2023] Start to Horizon NN Model Convert.
2023-08-30 06:18:18,705 INFO Loading horizon_nn debug methods:[]
2023-08-30 06:18:18,706 INFO Parsing the input parameter:{'data': {'input_shape': [1, 3, 224, 224], 'expected_input_type': 'YUV444_128', 'original_input_type': 'BGR', 'original_input_layout': 'NCHW', 'means': array([103.94, 116.78, 123.68], dtype=float32), 'scales': array([0.017], dtype=float32)}}
2023-08-30 06:18:18,707 INFO Parsing the calibration parameter
2023-08-30 06:18:18,707 INFO Parsing the hbdk parameter:{'hbdk_pass_through_params': '--O3 --core-num 1 --fast ', 'input-source': {'data': 'pyramid', '_default_value': 'ddr'}}
2023-08-30 06:18:18,708 INFO HorizonNN version: 0.18.2
2023-08-30 06:18:18,709 INFO HBDK version: 3.45.3
2023-08-30 06:18:18,926 INFO Find 1 inputs in the model:
2023-08-30 06:18:18,927 INFO Got input 'data' with shape [1, 3, 224, 224].
2023-08-30 06:18:19,932 INFO [Wed Aug 30 06:18:19 2023] Start to parse the onnx model.
2023-08-30 06:18:19,933 INFO Input ONNX model infomation:
ONNX IR version:          7
Opset version:            [10, 1, 1]
Producer:                 none
Domain:                   none
Input name:               data, [1, 3, 224, 224]
Output name:              prob, [1, 1000, 1, 1]
2023-08-30 06:18:19,986 INFO [Wed Aug 30 06:18:19 2023] End to parse the onnx model.
2023-08-30 06:18:19,987 INFO Model input names parsed from model: ['data']
2023-08-30 06:18:19,988 INFO Create a preprocessing operator for input_name data with means=[103.94 116.78 123.68], std=[58.82352621], original_input_layout=NCHW, color convert from 'BGR' to 'YUV_BT601_FULL_RANGE'.
2023-08-30 06:18:20,098 INFO Saving the original float model: mobilenetv1_224x224_nv12_original_float_model.onnx.
2023-08-30 06:18:20,099 INFO [Wed Aug 30 06:18:20 2023] Start to optimize the model.
2023-08-30 06:18:20,666 INFO [Wed Aug 30 06:18:20 2023] End to optimize the model.
2023-08-30 06:18:20,725 INFO Saving the optimized model: mobilenetv1_224x224_nv12_optimized_float_model.onnx.
2023-08-30 06:18:20,727 INFO [Wed Aug 30 06:18:20 2023] Start to calibrate the model.
2023-08-30 06:18:20,729 INFO There are 100 samples in the calibration data set.
2023-08-30 06:18:21,009 INFO Run calibration model with max-percentile=0.999900 method.
2023-08-30 06:18:21,093 INFO Calibration using batch 8
max-percentile=0.999900 calibration in progress: 100%|███████████████████████████████████████████████████| 13/13 [00:05<00:00,  2.58it/s]
2023-08-30 06:18:26,546 INFO Saving the calibrated model: mobilenetv1_224x224_nv12_calibrated_model.onnx.
2023-08-30 06:18:26,548 INFO [Wed Aug 30 06:18:26 2023] End to calibrate the model.
2023-08-30 06:18:26,550 INFO [Wed Aug 30 06:18:26 2023] Start to quantize the model.
2023-08-30 06:18:29,465 INFO input data is from pyramid. Its layout is set to NHWC
2023-08-30 06:18:29,731 INFO [Wed Aug 30 06:18:29 2023] End to quantize the model.
2023-08-30 06:18:30,046 INFO Saving the quantized model: mobilenetv1_224x224_nv12_quantized_model.onnx.
2023-08-30 06:18:30,881 INFO [Wed Aug 30 06:18:30 2023] Start to compile the model with march bernoulli2.
2023-08-30 06:18:30,883 INFO [Wed Aug 30 06:18:30 2023] End to compile the model with march bernoulli2.
2023-08-30 06:18:31,172 INFO Compile submodel: MOBILENET_subgraph_0
2023-08-30 06:18:31,577 INFO hbdk-cc parameters:['--O3', '--core-num', '1', '--fast', '--input-layout', 'NHWC', '--output-layout', 'NCHW', '--input-source', 'pyramid']
[==================================================] 100%
2023-08-30 06:18:33,347 INFO consumed time 1.64005
2023-08-30 06:18:33,533 INFO FPS=347.12, latency = 2880.9 us   (see MOBILENET_subgraph_0.html)
2023-08-30 06:18:33,711 INFO The converted model node information:
==============================================================================================================
Node                    ON   Subgraph  Type                    Cosine Similarity  Threshold   In/Out DataType  
---------------------------------------------------------------------------------------------------------------
HZ_PREPROCESS_FOR_data  BPU  id(0)     HzSQuantizedPreprocess  0.999988           127.000000  int8/int8        
conv1                   BPU  id(0)     HzSQuantizedConv        0.999916           2.937425    int8/int8        
conv2_1/dw              BPU  id(0)     HzSQuantizedConv        0.999356           2.040827    int8/int8        
conv2_1/sep             BPU  id(0)     HzSQuantizedConv        0.996678           4.486579    int8/int8        
conv2_2/dw              BPU  id(0)     HzSQuantizedConv        0.997330           3.545496    int8/int8        
conv2_2/sep             BPU  id(0)     HzSQuantizedConv        0.996376           2.791299    int8/int8        
conv3_1/dw              BPU  id(0)     HzSQuantizedConv        0.994169           1.417208    int8/int8        
conv3_1/sep             BPU  id(0)     HzSQuantizedConv        0.985448           2.188753    int8/int8        
conv3_2/dw              BPU  id(0)     HzSQuantizedConv        0.994925           1.822225    int8/int8        
conv3_2/sep             BPU  id(0)     HzSQuantizedConv        0.994255           1.841765    int8/int8        
conv4_1/dw              BPU  id(0)     HzSQuantizedConv        0.988255           1.043535    int8/int8        
conv4_1/sep             BPU  id(0)     HzSQuantizedConv        0.990334           1.736999    int8/int8        
conv4_2/dw              BPU  id(0)     HzSQuantizedConv        0.992463           0.990603    int8/int8        
conv4_2/sep             BPU  id(0)     HzSQuantizedConv        0.993469           1.574677    int8/int8        
conv5_1/dw              BPU  id(0)     HzSQuantizedConv        0.988949           0.823123    int8/int8        
conv5_1/sep             BPU  id(0)     HzSQuantizedConv        0.990804           1.265912    int8/int8        
conv5_2/dw              BPU  id(0)     HzSQuantizedConv        0.990191           0.772344    int8/int8        
conv5_2/sep             BPU  id(0)     HzSQuantizedConv        0.983377           1.530479    int8/int8        
conv5_3/dw              BPU  id(0)     HzSQuantizedConv        0.986417           0.783812    int8/int8        
conv5_3/sep             BPU  id(0)     HzSQuantizedConv        0.977491           1.927324    int8/int8        
conv5_4/dw              BPU  id(0)     HzSQuantizedConv        0.982257           0.996043    int8/int8        
conv5_4/sep             BPU  id(0)     HzSQuantizedConv        0.961735           2.167391    int8/int8        
conv5_5/dw              BPU  id(0)     HzSQuantizedConv        0.978750           1.923361    int8/int8        
conv5_5/sep             BPU  id(0)     HzSQuantizedConv        0.960038           3.578415    int8/int8        
conv5_6/dw              BPU  id(0)     HzSQuantizedConv        0.980339           2.463874    int8/int8        
conv5_6/sep             BPU  id(0)     HzSQuantizedConv        0.981139           4.124151    int8/int8        
conv6/dw                BPU  id(0)     HzSQuantizedConv        0.998357           0.667692    int8/int8        
conv6/sep               BPU  id(0)     HzSQuantizedConv        0.986100           0.983833    int8/int8        
pool6                   BPU  id(0)     HzSQuantizedConv        0.993779           11.415899   int8/int8        
fc7                     BPU  id(0)     HzSQuantizedConv        0.995593           5.843800    int8/int32       
prob                    CPU  --        Softmax                 0.983072           --          float/float
2023-08-30 06:18:33,718 INFO The quantify model output:
=======================================================================
Node  Cosine Similarity  L1 Distance  L2 Distance  Chebyshev Distance  
-----------------------------------------------------------------------
prob  0.983072           0.000409     0.000220     0.202166
2023-08-30 06:18:33,719 INFO [Wed Aug 30 06:18:33 2023] End to Horizon NN Model Convert.
2023-08-30 06:18:33,850 INFO start convert to *.bin file....
2023-08-30 06:18:34,002 INFO ONNX model output num : 1
2023-08-30 06:18:34,010 INFO ############# model deps info #############
2023-08-30 06:18:34,011 INFO hb_mapper version   : 1.17.4
2023-08-30 06:18:34,013 INFO hbdk version        : 3.45.3
2023-08-30 06:18:34,014 INFO hbdk runtime version: 3.15.25.0
2023-08-30 06:18:34,016 INFO horizon_nn version  : 0.18.2
2023-08-30 06:18:34,017 INFO ############# model_parameters info #############
2023-08-30 06:18:34,018 INFO caffe_model         : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet.caffemodel
2023-08-30 06:18:34,020 INFO prototxt            : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/01_common/model_zoo/mapper/classification/mobilenet/mobilenet_deploy.prototxt
2023-08-30 06:18:34,021 INFO BPU march           : bernoulli2
2023-08-30 06:18:34,023 INFO layer_out_dump      : False
2023-08-30 06:18:34,024 INFO log_level           : DEBUG
2023-08-30 06:18:34,026 INFO working dir         : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/model_output
2023-08-30 06:18:34,028 INFO output_model_file_prefix: mobilenetv1_224x224_nv12
2023-08-30 06:18:34,030 INFO ############# input_parameters info #############
2023-08-30 06:18:34,031 INFO ------------------------------------------
2023-08-30 06:18:34,033 INFO ---------input info : data ---------
2023-08-30 06:18:34,034 INFO input_name          : data
2023-08-30 06:18:34,035 INFO input_type_rt       : nv12
2023-08-30 06:18:34,036 INFO input_space&range   : regular
2023-08-30 06:18:34,037 INFO input_layout_rt     : None
2023-08-30 06:18:34,038 INFO input_type_train    : bgr
2023-08-30 06:18:34,040 INFO input_layout_train  : NCHW
2023-08-30 06:18:34,041 INFO norm_type           : data_mean_and_scale
2023-08-30 06:18:34,042 INFO input_shape         : 1x3x224x224
2023-08-30 06:18:34,043 INFO mean_value          : 103.94,116.78,123.68,
2023-08-30 06:18:34,044 INFO scale_value         : 0.017,
2023-08-30 06:18:34,045 INFO cal_data_dir        : /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/calibration_data_bgr_f32
2023-08-30 06:18:34,047 INFO cal_data_type       : float32
2023-08-30 06:18:34,048 INFO ---------input info : data end -------
2023-08-30 06:18:34,049 INFO ------------------------------------------
2023-08-30 06:18:34,050 INFO ############# calibration_parameters info #############
2023-08-30 06:18:34,051 INFO preprocess_on       : False
2023-08-30 06:18:34,052 INFO calibration_type:   : max
2023-08-30 06:18:34,053 INFO max_percentile      : 0.9999
2023-08-30 06:18:34,054 INFO ############# compiler_parameters info #############
2023-08-30 06:18:34,055 INFO hbdk_pass_through_params: --O3 --core-num 1 --fast
2023-08-30 06:18:34,056 INFO input-source        : {'data': 'pyramid', '_default_value': 'ddr'}
2023-08-30 06:18:34,083 INFO Convert to runtime bin file successfully!
2023-08-30 06:18:34,085 INFO End Model Convert

执行成功后会在同级目录的 model_output 下生成 .bin 格式的板上执行模型文件:

其中:

  • MOBILENET_subgraph_0.html:html 格式的静态性能评估文件;
  • MOBILENET_subgraph_0.json:json 格式的静态性能评估文件;
  • mobilenetv1_224x224_nv12.bin:板上执行模型文件;

4> 性能验证

性能验证这里其实会有两个概念,一个是静态性能验证,另一个是动态性能验证。静态性能验证其实也就是仿真模拟出来的,也就是上面的 html 和 json 中的性能数据;而动态性能验证是直接放板子上跑测的性能。

先来看静态性能验证,以 json 为例,如下,可以看到 fps 为 347.12:

如果你看 html,会更加直观。

接着来看动态性能验证,也就是要把前面模型转换生成的 mobilenetv1_224x224_nv12.bin 放到板子上去跑,可以通过 scp 命令把模型文件拷贝到板子上,然后执行如下命令进行性能测试:

hrt_model_exec perf --model_file mobilenetv1_224x224_nv12.bin --thread_num 1 --frame_count 1000

可以看到我这里实际的 fps 为 314.45,稍慢于静态模拟性能。

5> 精度验证

提供了两个脚本,一个是测试单图的 04_inference.sh,还有个是测试 imagenet val 验证集精度的 05_evalute.sh。由于我这里还没准备好 imagenet 数据集,所以 evalute.sh 这个脚本先不介绍了。

04_inference.sh 的功能是验证对比量化模型和原始模型的推理精度,来看下脚本内容:

#!/bin/bash

set -e -v
cd $(dirname $0) || exit

#for converted quanti model inference
quanti_model_file="./model_output/mobilenetv1_224x224_nv12_quantized_model.onnx"
quanti_input_layout="NHWC"

#for original float model inference
original_model_file="./model_output/mobilenetv1_224x224_nv12_original_float_model.onnx"
original_input_layout="NCHW"

if [[ $1 =~ "origin" ]];  then
  model=$original_model_file
  layout=$original_input_layout
  input_offset=128
else
  model=$quanti_model_file
  layout=$quanti_input_layout
  input_offset=128  
fi

infer_image="../../../01_common/test_data/cls_images/zebra_cls.jpg"

python3 -u ../../cls_inference.py \
        --model ${model} \
        --image ${infer_image} \
        --input_layout ${layout} \
        --input_offset ${input_offset} 

这个应该很容易就能看懂里面的原理,如果直接执行 bash 04_inference.sh 就是推理量化模型,如果执行 bash 04_inference.sh origin 就是推理原始模型,然后将两个模型的输出进行对比就行。

比如我推理量化模型的输出如下 (虽然我这里并没有做量化...):

023-08-30 07:30:28,254 INFO log will be stored in /open_explorer/ddk/samples/ai_toolchain/horizon_model_convert_sample/03_classification/01_mobilenet/mapper/inference.log
2023-08-30 07:30:29,266 INFO The input picture is classified to be:
2023-08-30 07:30:29,267 INFO label 340, prob 0.97307, class ['zebra']
2023-08-30 07:30:29,268 INFO label 292, prob 0.02184, class ['tiger, Panthera tigris']
2023-08-30 07:30:29,268 INFO label 282, prob 0.00331, class ['tiger cat']
2023-08-30 07:30:29,269 INFO label  83, prob 0.00108, class ['prairie chicken, prairie grouse, prairie fowl']
2023-08-30 07:30:29,270 INFO label 290, prob 0.00006, class ['jaguar, panther, Panthera onca, Felis onca']

类别就是 zebra 斑马,原图如下:

但是说白了,如果真的要进行精度验证,其实还是需要回到板子上运行然后做结果验证,这样才是最值得信赖的。

篇幅比较长,介绍了这个比较完美的工具链的完整的部署流程,值得学习的地方还是挺多的。

好了,以上分享了地平线天工开物工具链部署流程详解。希望我的分享能对你的学习有一点帮助。



 【公众号传送】

《极智AI | 地平线天工开物工具链部署流程详解》


畅享人工智能的科技魅力,让好玩的AI项目不难玩。邀请您加入我的知识星球, 星球内我精心整备了大量好玩的AI项目,皆以工程源码形式开放使用,涵盖人脸、检测、分割、多模态、AIGC、自动驾驶、工业等。不敢说会对你学习有所帮助,但一定非常好玩,并持续更新更加有趣的项目。 t.zsxq.com/0aiNxERDq

logo_show.gif