树莓派系统环境
PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"
安装whisper
# 最好切换到官网源,清华源可能缺少一些包地址
sudo apt install libsdl2-dev
git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp
# 下载模型
./models/download-ggml-model.sh tiny
编译example
# 编译example
make tiny
语音识别
./main -m models/ggml-tiny.bin -l zh -t 4 -f samples/m.wav
结果(11s):
whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab = 51865
whisper_model_load: n_audio_ctx = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx = 448
whisper_model_load: n_text_state = 384
whisper_model_load: n_text_head = 6
whisper_model_load: n_text_layer = 4
whisper_model_load: n_mels = 80
whisper_model_load: ftype = 1
whisper_model_load: qntvr = 0
whisper_model_load: type = 1
whisper_model_load: mem required = 201.00 MB (+ 3.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx = 73.62 MB
whisper_model_load: model size = 73.54 MB
whisper_init_state: kv self size = 2.62 MB
whisper_init_state: kv cross size = 8.79 MB
system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |
main: processing 'samples/m.wav' (50176 samples, 3.1 sec), 4 threads, 1 processors, lang = zh, task = transcribe, timestamps = 1 ...
[00:00:00.000 --> 00:00:02.820] 我是中国人 我来自北京
whisper_print_timings: load time = 354.66 ms
whisper_print_timings: fallbacks = 0 p / 0 h
whisper_print_timings: mel time = 540.21 ms
whisper_print_timings: sample time = 17.95 ms / 10 runs ( 1.80 ms per run)
whisper_print_timings: encode time = 9901.51 ms / 1 runs ( 9901.51 ms per run)
whisper_print_timings: decode time = 232.32 ms / 10 runs ( 23.23 ms per run)
whisper_print_timings: total time = 11254.36 ms
时间太久,暂时没有达到预期,需要优化