【HarmonyOS】语音识别转文字

80 阅读3分钟

语音识别功能

通过识别语音转文字搭配NLU(人机对话),可实现通过语音页面跳转、转账、理财等功能,做到无需点击即可完成相关流程。语音转文字使用speechRecognizer模块,NLU使用第三方提供的接口不做介绍。

speechRecognizer

通过对着设备说话识别的需要开启麦克风权限(ohos.permission.MICROPHONE)

1、初始化引擎


const asrEngine = await speechRecognizer.createEngine(
    {
      language: 'zh-CN',
      online: 1,
      extraParams: { locate: "CN", recognizerMode: "short" }
    }
)

  • language:识别的语言,当前只支持识别中文
  • online:当中只有一个值1,表示离线模式,离线模型已下载到本地,没有网络也可识别语音
  • extraParams:
    • locate:应用区域,当前只支持CN
    • recognizerMode:识别说话的时间。short:最多60秒。long:最多8小时

2、设置语音识别回调


let setListener: speechRecognizer.RecognitionListener = {
  // 开始识别成功回调
  onStart(sessionId: string, eventMessage: string) {
  },
  // 事件回调
  onEvent(sessionId: string, eventCode: number, eventMessage: string) {
  },
  // 识别结果回调,包括中间结果和最终结果
  onResult(sessionId: string, result: speechRecognizer.SpeechRecognitionResult) {
  },
  // 识别完成回调
  onComplete(sessionId: string, eventMessage: string) {
  },
  // 错误回调,错误码通过本方法返回
  onError(sessionId: string, errorCode: number, errorMessage: string) {
  },
}

asrEngine.setListener(setListener)

开始监听时依次触发对应的回调函数,识别的文字在onResult回调中获取。onResult回调函数中识别的文字为依次识别,例如语音为夕阳西下,会触发4次onResult回调,识别结果分别为夕、夕阳、夕阳西、夕阳西下,通过result.isFinal判断是否是完整的一句话

3、开始监听


let recognizerParams: speechRecognizer.StartParams = {
  sessionId: '666888',
  audioInfo: {
    audioType: 'pcm',
    sampleRate: 16000,
    soundChannel: 1,
    sampleBit: 16
  },
  extraParams: {
    recognitionMode: 0,
    maxAudioDuration: 60000
  }
}

asrEngine.startListening(recognizerParams);

  • sessionId:会话id,结束识别或中断识别时用这个标识
  • audioInfo:这些配置内容都是当前仅支持的,无法配置为其它值,依次为音频类型、采样率、通道位数1的信息、采样位数
  • extraParams:
    • recognitionMode:语音识别类型。0:打开麦克风权限直接对着设备说话即可识别。1:通过writeAudio方法传入待识别的音频流识别。
    • maxAudioDuration:识别时长。当recognizerMode的值为short时,识别时间可设置20s~60s,为long时,识别时间可设置20s~8h

4、完整代码

class SpeechRecognizerManager {
  private extraParam: Record<string, Object> = {
    "locate": "CN", "recognizerMode": "short"
  };
  private initParamsInfo: speechRecognizer.CreateEngineParams = {
    language: 'zh-CN',
    online: 1,
    extraParams: this.extraParam
  };
  private asrEngine: speechRecognizer.SpeechRecognitionEngine | null = null
  private sessionId: string = "asr" + Date.now()
  private static instance: SpeechRecognizerManager

  static getInstance() {
    if (!SpeechRecognizerManager.instance) {
      return new SpeechRecognizerManager()
    }
    return SpeechRecognizerManager.instance
  }

  // 私有化构造,无法通过new创建实例
  private constructor() {
  }

  // 初始化引擎
  private async createEngine() {
    if (!this.asrEngine) {
      this.asrEngine = await speechRecognizer.createEngine(this.initParamsInfo)
    }
  }


  // 语音识别回调
  private setListener(callback: (srr: speechRecognizer.SpeechRecognitionResult) => void = () => {
  }) {
    // 创建回调对象
    let setListener: speechRecognizer.RecognitionListener = {
      // 开始识别成功回调
      onStart(sessionId: string, eventMessage: string) {
      },
      // 事件回调
      onEvent(sessionId: string, eventCode: number, eventMessage: string) {
      },
      // 识别结果回调,包括中间结果和最终结果
      onResult(sessionId: string, result: speechRecognizer.SpeechRecognitionResult) {
        // 获取完整的一句话
        if (result.isFinal) {
          callback && callback(result)
        }
      },
      // 识别完成回调
      onComplete(sessionId: string, eventMessage: string) {
      },
      // 错误回调,错误码通过本方法返回
      onError(sessionId: string, errorCode: number, errorMessage: string) {
      },
    }
    // 设置回调
    this.asrEngine?.setListener(setListener);
  }

  // 语音开启监听
  private startListening() {
    let recognizerParams: speechRecognizer.StartParams = {
      sessionId: this.sessionId,
      audioInfo: {
        audioType: 'pcm',
        sampleRate: 16000,
        soundChannel: 1,
        sampleBit: 16
      },
      extraParams: {
        recognitionMode: 0,
        maxAudioDuration: 60000
      }
    }
    this.asrEngine?.startListening(recognizerParams);
  };

  // 中断识别
  cancel() {
    this.asrEngine?.cancel(this.sessionId)
  }

  // 释放引擎
  shutDown() {
    this.asrEngine?.shutdown()
  }

  // 判断引擎是否在识别中 true:识别中 false:空闲
  isBusy() {
    return this.asrEngine?.isBusy()
  }

  // 识别方法
  speak(callback: (srr: speechRecognizer.SpeechRecognitionResult) => void) {
    this.createEngine().then(() => {
      this.setListener(callback)
      this.startListening()
    })
  }
}