树莓派系统环境

PRETTY_NAME="Debian GNU/Linux 11 (bullseye)"
NAME="Debian GNU/Linux"
VERSION_ID="11"
VERSION="11 (bullseye)"
VERSION_CODENAME=bullseye
ID=debian
HOME_URL="https://www.debian.org/"
SUPPORT_URL="https://www.debian.org/support"
BUG_REPORT_URL="https://bugs.debian.org/"

安装whisper

# 最好切换到官网源，清华源可能缺少一些包地址
sudo apt install libsdl2-dev

git clone https://github.com/ggerganov/whisper.cpp
cd whisper.cpp

# 下载模型
./models/download-ggml-model.sh tiny

编译example

# 编译example
make tiny

语音识别

./main -m models/ggml-tiny.bin -l zh -t 4 -f samples/m.wav

结果(11s)：

whisper_init_from_file_no_state: loading model from 'models/ggml-tiny.bin'
whisper_model_load: loading model
whisper_model_load: n_vocab       = 51865
whisper_model_load: n_audio_ctx   = 1500
whisper_model_load: n_audio_state = 384
whisper_model_load: n_audio_head  = 6
whisper_model_load: n_audio_layer = 4
whisper_model_load: n_text_ctx    = 448
whisper_model_load: n_text_state  = 384
whisper_model_load: n_text_head   = 6
whisper_model_load: n_text_layer  = 4
whisper_model_load: n_mels        = 80
whisper_model_load: ftype         = 1
whisper_model_load: qntvr         = 0
whisper_model_load: type          = 1
whisper_model_load: mem required  =  201.00 MB (+    3.00 MB per decoder)
whisper_model_load: adding 1608 extra tokens
whisper_model_load: model ctx     =   73.62 MB
whisper_model_load: model size    =   73.54 MB
whisper_init_state: kv self size  =    2.62 MB
whisper_init_state: kv cross size =    8.79 MB

system_info: n_threads = 4 / 4 | AVX = 0 | AVX2 = 0 | AVX512 = 0 | FMA = 0 | NEON = 1 | ARM_FMA = 1 | F16C = 0 | FP16_VA = 0 | WASM_SIMD = 0 | BLAS = 0 | SSE3 = 0 | VSX = 0 | COREML = 0 | OPENVINO = 0 |

main: processing 'samples/m.wav' (50176 samples, 3.1 sec), 4 threads, 1 processors, lang = zh, task = transcribe, timestamps = 1 ...


[00:00:00.000 --> 00:00:02.820]  我是中国人 我来自北京


whisper_print_timings:     load time =   354.66 ms
whisper_print_timings:     fallbacks =   0 p /   0 h
whisper_print_timings:      mel time =   540.21 ms
whisper_print_timings:   sample time =    17.95 ms /    10 runs (    1.80 ms per run)
whisper_print_timings:   encode time =  9901.51 ms /     1 runs ( 9901.51 ms per run)
whisper_print_timings:   decode time =   232.32 ms /    10 runs (   23.23 ms per run)
whisper_print_timings:    total time = 11254.36 ms

时间太久，暂时没有达到预期，需要优化

树莓派-whisper测试

安装whisper

编译example

语音识别