Python知识点：利用SpeechRecognition实现语音转文本SpeechRecognition是一个用于语音

简明介绍

SpeechRecognition是一个用于语音转文本（Speech To Text，STT）的Python库，可以方便地进行语音到文本的转换，实现计算机的语音输入功能。SpeechRecognition支持多种语音识别引擎（在线或离线），可用于任何需要语音识别（Speech Recognition）的应用场景。

CMU Sphinx (works offline)：卡内基梅隆大学（Carnegie Mellon University，缩写CMU）开发的一系列语音识别系统，其中PocketSphinx是C语言开发的轻量级语音识别引擎
Google Speech Recognition
Google Cloud Speech API
Wit.ai：Facebook开源的一个自然语言处理（NLU）平台，它提供了易于使用的API和工具，帮助开发者快速构建能够理解和回应人类自然语言的聊天机器人
Microsoft Azure Speech
Microsoft Bing Voice Recognition (Deprecated)
Houndify API：SoundHound提供的一个语音AI平台，SoundHound是一家领先的语音人工智能公司
IBM Speech to Text
Snowboy Hotword Detection (works offline)：Kitt.ai开发的热词检测引擎，可以利用它来实现语音唤醒功能
Tensorflow：一个开源软件库，用于各种感知和语言理解任务的机器学习
Vosk API (works offline)：一个离线开源语音识别工具包
OpenAI whisper (works offline)：由 OpenAI 开发的通用语音识别模型（ASR）
Whisper API

Library for performing speech recognition, with support for several engines and APIs, online and offline.

环境安装

$ pip install SpeechRecognition
$ pip install PocketSphinx
$ pip install pyaudio
$ pip install pyttsx3
$ pip install pydub
$ pip install ffmpeg
# 测试语音识别，默认使用Google Speech Recognition在线语音识别引擎
$ python -m speech_recognition

应用示例

#!/usr/bin/env python3

import speech_recognition as sr
import pyttsx3
from pydub import AudioSegment
from pydub.playback import play

def speechRecognitionByFile(fileName, fileFormat, languageType):
    # 播放语音文件：Windows下需要将ffmpeg.exe、ffplay.exe、ffprobe.exe复制到当前目录下才能播放flac文件！
    sound = AudioSegment.from_file(fileName, format=fileFormat)
    play(sound)
    # 创建语音识别对象
    recognizer = sr.Recognizer()
    # 打开语音文件：WAV/AIFF/FLAC
    with sr.AudioFile(fileName) as source:
        audio = recognizer.record(source)
    # 利用CMU Sphinx引擎进行语音识别
    try:
        text = recognizer.recognize_sphinx(audio, language=languageType)
        print(text)
        pyttsx3.speak(text)
    except sr.UnknownValueError:
        print("Sphinx could not understand audio")
    except sr.RequestError as e:
        print("Sphinx error: {0}".format(e))

def speechRecognitionByMicrophone(languageType):
    # 创建语音识别对象
    recognizer = sr.Recognizer()
    # 打开麦克风获取语音
    with sr.Microphone() as source:
        print("Say something!")
        # 降噪
        recognizer.adjust_for_ambient_noise(source)
        # 收音
        audio = recognizer.listen(source)
    # 利用CMU Sphinx引擎进行语音识别
    try:
        text = recognizer.recognize_sphinx(audio, language=languageType)
        print(text)
        pyttsx3.speak(text)
    except sr.UnknownValueError:
        print("Sphinx could not understand audio")
    except sr.RequestError as e:
        print("Sphinx error: {0}".format(e))

# 语音文件来自https://github.com/Uberi/speech_recognition/tree/master/examples
speechRecognitionByFile("english.wav", "wav", "en-US")
speechRecognitionByFile("chinese.flac", "flac", "zh-CN")
speechRecognitionByMicrophone("zh-CN")

默认情况下，SpeechRecognition的PocketSphinx仅支持英文语音识别，如果需要支持中文语音识别，需要下载中文包。在浏览器中输入sourceforge.net/projects/cm…，点击Mandarin（普通话），选择cmusphinx-zh-cn-5.2.tar.gz进行下载。

在Python安装目录下找到Lib\site-packages\speech_recognition\pocketsphinx-data目录，进入pocketsphinx-data目录，并新建zh-CN目录，解压cmusphinx-zh-cn-5.2.tar.gz到当前目录，将zh_cn.cd_cont_5000目录重命名为acoustic-model，将zh_cn.lm.bin重命名为language-model.lm.bin，将zh_cn.dic重命名为pronounciation-dictionary.dict。

python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"

参考资料

Uberi/speech_recognition: Speech recognition module for Python, supporting several engines and APIs, online and offline.

Notes on using PocketSphinx

语音处理工具pzh-speech诞生记（5）- 语音识别实现

Python 库配置问题-"Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work"，原因及解决办法

python——AudioSegment 读取mp3文件报错

《Python应用轻松入门》

《人工智能算法Python案例实战》