WeNet tutorial on LibriSpeech

这里记录一次完整的WeNet从训练到部署的流程。其中使用WeNet项目生成的动态库，进行了二次开发，无需依赖原生产环境，见项目。

由于WeNet项目官方更新比较频繁，近期也在支持onnx部署，建议先跟着官方文档走。

WeNet是一个端到端的语音识别模型，为语音识别提供全栈解决方案，由于仅依赖Pytorch，且部署方便（Onnx、LibTorch）、准确率高、文档齐全，故非常适合用于生产环境以及学习。

环境 ubuntu18.04 2块Tesla P40

Train

训练部分，对原教程中错误以及不足之处做出补充，其余参照原教程即可。

Setup environment

安装WeNet环境

git clone https://github.com/wenet-e2e/wenet.git

创建 Conda 环境:

conda create -n wenet python=3.8
conda activate wenet
pip install -r requirements.txt
conda install pytorch=1.10.0 torchvision torchaudio=0.10.0 cudatoolkit=11.1 -c pytorch -c conda-forge

First Experiment

首先修改脚本，脚本在examples/librispeech/s0/run.sh中。

# 根据实际显卡数量配置
export CUDA_VISIBLE_DEVICES="0,1" 
# 实际数据集位置，使用绝对路径而不要用相对路径
datadir=~/data/librispeech

Stage -1: Download data

LibriSpeech是一个大型(1000小时)阅读英语演讲语料库脚本中的下载地址不可用了，在openslr网站中，找到对应的mirrors自行下载。脚本中下载了以下文件：

具体可用以下命令后台进行下载：

nohup wget https://us.openslr.org/resources/12/dev-clean.tar.gz &
nohup wget https://us.openslr.org/resources/12/dev-other.tar.gz &
nohup wget https://us.openslr.org/resources/12/test-clean.tar.gz &
nohup wget https://us.openslr.org/resources/12/test-other.tar.gz &
nohup wget https://us.openslr.org/resources/12/train-clean-100.tar.gz &
nohup wget https://us.openslr.org/resources/12/train-clean-360.tar.gz &
nohup wget https://us.openslr.org/resources/12/train-other-500.tar.gz &

下载完成后自行解压即可，如：

tar -zxvf dev-clean.tar.gz
# 将所有压缩包解压
# ……

Stage 0: Prepare Training data

安装依赖项，所有语音格式为flac

sudo apt install flac

Stage 7(Optional): Add LM and test it with runtime

问题参考issues

官方回答是要求先编译，然后再运行脚本：

# runtime build requires cmake 3.14 or above
cd runtime/Libtorch
mkdir build && cd build && cmake .. && cmake --build .

这里应该加一个编译选项，不然openfst相关工具不会编译：

# runtime build requires cmake 3.14 or above
cd runtime/Libtorch
mkdir build && cd build && cmake -DFST_HAVE_BIN=ON .. && cmake --build .

提示decoder_main is not built, please go to runtime/server/x86 to build it.

需要把编译好的decoder_main路径加入到PATH环境变量中，如：

export PATH=~/data/wenet/runtime/Libtorch/build/bin:$PATH

Deploy && BUILD

对比一下LibTorch与Onnx两种部署方式的语音转写效率，这里使用cpu运行并且使用同一条语料。

LibTorch

参考

# runtime build requires cmake 3.14 or above
cd runtime/Libtorch
mkdir build && cd build && cmake .. && cmake --build .

# 结果
Decoded 12780ms audio taken 1839ms.

ONNX

参考

Step 1. Export your experiment model to ONNX by github.com/wenet-e2e/w…

由于导包有问题，wenet/bin/export_onnx_cpu.py代码中手动添加

sys.path.append('~/data/wenet') # 根据位置实际添加

建议在实验目录中运行，因为$exp/train.yaml有相对位置。

在~/data/wenet/examples/librispeech/s0中运行如下命令成功：

python ~/data/wenet/wenet/bin/export_onnx_cpu.py \
	--config ~/data/wenet/examples/librispeech/s0/exp/sp_spec_aug/train.yaml \
	--checkpoint ~/data/wenet/examples/librispeech/s0/exp/sp_spec_aug/final.pt \ 
	--chunk_size -1 \
	--output_dir onnx \ 
	--num_decoding_left_chunks -1

# runtime build requires cmake 3.14 or above
cd runtime/OnnxRuntime
mkdir build && cd build && cmake -DTORCH=OFF -DONNX=ON -DWEBSOCET=OFF -DGRPC=OFF .. && cmake --build .

# 结果
decoded 12780ms audio taken 959ms.

长语音

当语音过长时（如六分钟以上的语音），离线转写程序decoder_main会出现错误，此时应当设置为流式转写模式，例如：

export GLOG_logtostderr=1
export GLOG_v=3

wav_path=~/data/wav/16k-eng.wav
model_dir=~/data/env/my_librispeech

./build/bin/decoder_main \
    --chunk_size 96 \
    --simulate_streaming true \
    --continuous_decoding true \
    --wav_path $wav_path \
    --model_path $model_dir/final.zip \
    --unit_path $model_dir/units.txt 2>&1 | tee log.txt

其他模式都需做类似处理，即设置为流式转写模式。

Api

wenet已经提供了一个封装好的api，在./runtime/core/api中，并提供调用示例./runtime/core/bin/api_main.cc，但只支持LibTorch方式调用。

由于LibTorch与Onnx的接口完全一致，仅初始化方式不一样，简单修改即可支持Onnx，通过cmake的编译选项选择生成哪种动态库。

具体修改./runtime/core/api/wenet_api.cc文件中的两处即可：

// 通过编译选项选择头文件
#ifdef USE_ONNX
#include "decoder/onnx_asr_model.h"
#endif
#ifdef USE_TORCH
#include "decoder/torch_asr_model.h"
#endif

// 通过编译选项选择初始化方式
#ifdef USE_TORCH
    wenet::TorchAsrModel::InitEngineThreads();
    std::string model_path = wenet::JoinPath(model_dir, "final.zip");
    CHECK(wenet::FileExists(model_path));

    auto model = std::make_shared<wenet::TorchAsrModel>();
    model->Read(model_path);
    resource_->model = model;
#endif

#ifdef USE_ONNX
    wenet::OnnxAsrModel::InitEngineThreads();
    std::string encoder_path = wenet::JoinPath(model_dir, "encoder.onnx");
    std::string decoder_path = wenet::JoinPath(model_dir, "decoder.onnx");
    std::string ctc_path = wenet::JoinPath(model_dir, "ctc.onnx");
    CHECK(wenet::FileExists(encoder_path));
    CHECK(wenet::FileExists(decoder_path));
    CHECK(wenet::FileExists(ctc_path));

    auto model = std::make_shared<wenet::OnnxAsrModel>();
    model->Read(model_dir);
    resource_->model = model;
#endif

修改./runtime/core/api/CMakeLists.txt支持编译onnx动态库

# 添加onnx支持
if(ONNX)
 add_library(wenet_api SHARED wenet_api.cc)
 target_link_libraries(wenet_api PUBLIC decoder)
endif()

修改./runtime/core/bin/CMakeLists.txt支持编译调用示例api_main

# 用于测试动态库是否可以正常调用
if(ONNX)
 add_executable(api_main api_main.cc)
 target_link_libraries(api_main PUBLIC wenet_api)
endif()

编译生成动态库后，将依赖的动态库全部打包

#!/bin/sh
exe="libwenet_api.so" #发布的程序名称
des="./lib" #创建文件夹的位置
deplist=$(ldd $exe | awk  '{if (match($3,"/home")){ printf("%s "),$3 } }') # 只从编译的文件中寻找依赖，系统文件不打包
cp $deplist $des
cp $exe $des

动态库具体使用方式，见项目。

WeNet部署使用记录