Python知识点:利用SpeechRecognition实现语音转文本

1,232 阅读3分钟

简明介绍

SpeechRecognition是一个用于语音转文本(Speech To Text,STT)的Python库,可以方便地进行语音到文本的转换,实现计算机的语音输入功能。SpeechRecognition支持多种语音识别引擎(在线或离线),可用于任何需要语音识别(Speech Recognition)的应用场景。

  • CMU Sphinx (works offline):卡内基梅隆大学(Carnegie Mellon University,缩写CMU)开发的一系列语音识别系统,其中PocketSphinx是C语言开发的轻量级语音识别引擎
  • Google Speech Recognition
  • Google Cloud Speech API
  • Wit.ai:Facebook开源的一个自然语言处理(NLU)平台,它提供了易于使用的API和工具,帮助开发者快速构建能够理解和回应人类自然语言的聊天机器人
  • Microsoft Azure Speech
  • Microsoft Bing Voice Recognition (Deprecated)
  • Houndify API:SoundHound提供的一个语音AI平台,SoundHound是一家领先的语音人工智能公司
  • IBM Speech to Text
  • Snowboy Hotword Detection (works offline):Kitt.ai开发的热词检测引擎,可以利用它来实现语音唤醒功能
  • Tensorflow:一个开源软件库,用于各种感知和语言理解任务的机器学习
  • Vosk API (works offline):一个离线开源语音识别工具包
  • OpenAI whisper (works offline):由 OpenAI 开发的通用语音识别模型(ASR)
  • Whisper API

Library for performing speech recognition, with support for several engines and APIs, online and offline.

环境安装

$ pip install SpeechRecognition
$ pip install PocketSphinx
$ pip install pyaudio
$ pip install pyttsx3
$ pip install pydub
$ pip install ffmpeg
# 测试语音识别,默认使用Google Speech Recognition在线语音识别引擎
$ python -m speech_recognition

应用示例

#!/usr/bin/env python3

import speech_recognition as sr
import pyttsx3
from pydub import AudioSegment
from pydub.playback import play

def speechRecognitionByFile(fileName, fileFormat, languageType):
    # 播放语音文件:Windows下需要将ffmpeg.exe、ffplay.exe、ffprobe.exe复制到当前目录下才能播放flac文件!
    sound = AudioSegment.from_file(fileName, format=fileFormat)
    play(sound)
    # 创建语音识别对象
    recognizer = sr.Recognizer()
    # 打开语音文件:WAV/AIFF/FLAC
    with sr.AudioFile(fileName) as source:
        audio = recognizer.record(source)
    # 利用CMU Sphinx引擎进行语音识别
    try:
        text = recognizer.recognize_sphinx(audio, language=languageType)
        print(text)
        pyttsx3.speak(text)
    except sr.UnknownValueError:
        print("Sphinx could not understand audio")
    except sr.RequestError as e:
        print("Sphinx error: {0}".format(e))

def speechRecognitionByMicrophone(languageType):
    # 创建语音识别对象
    recognizer = sr.Recognizer()
    # 打开麦克风获取语音
    with sr.Microphone() as source:
        print("Say something!")
        # 降噪
        recognizer.adjust_for_ambient_noise(source)
        # 收音
        audio = recognizer.listen(source)
    # 利用CMU Sphinx引擎进行语音识别
    try:
        text = recognizer.recognize_sphinx(audio, language=languageType)
        print(text)
        pyttsx3.speak(text)
    except sr.UnknownValueError:
        print("Sphinx could not understand audio")
    except sr.RequestError as e:
        print("Sphinx error: {0}".format(e))

# 语音文件来自https://github.com/Uberi/speech_recognition/tree/master/examples
speechRecognitionByFile("english.wav", "wav", "en-US")
speechRecognitionByFile("chinese.flac", "flac", "zh-CN")
speechRecognitionByMicrophone("zh-CN")

默认情况下,SpeechRecognition的PocketSphinx仅支持英文语音识别,如果需要支持中文语音识别,需要下载中文包。在浏览器中输入sourceforge.net/projects/cm…,点击Mandarin(普通话),选择cmusphinx-zh-cn-5.2.tar.gz进行下载。

image.png

在Python安装目录下找到Lib\site-packages\speech_recognition\pocketsphinx-data目录,进入pocketsphinx-data目录,并新建zh-CN目录,解压cmusphinx-zh-cn-5.2.tar.gz到当前目录,将zh_cn.cd_cont_5000目录重命名为acoustic-model,将zh_cn.lm.bin重命名为language-model.lm.bin,将zh_cn.dic重命名为pronounciation-dictionary.dict。

image.png

python -c "import speech_recognition as sr, os.path as p; print(p.dirname(sr.__file__))"

参考资料

Uberi/speech_recognition: Speech recognition module for Python, supporting several engines and APIs, online and offline.

Notes on using PocketSphinx

语音处理工具pzh-speech诞生记(5)- 语音识别实现

Python 库配置问题-"Couldn't find ffmpeg or avconv - defaulting to ffmpeg, but may not work",原因及解决办法

python——AudioSegment 读取mp3文件报错

《Python应用轻松入门》

《人工智能算法Python案例实战》