React Native 使用 Azure 语音服务在工作过程中，遇到需要在 React Native 使用微软 Azu

在工作过程中，遇到需要在 React Native 使用微软 Azure 语音服务进行文本转语音、实时语音转文本的需求，做此记录。

Azure 语音服务概述

什么是语音服务？ - Azure AI services | Microsoft Learn

一、先决条件

要使用 Azure 语音服务需要有以下先决条件

Azure 订阅 - 免费创建订阅。
在 Azure 门户中创建语音资源。
选择你的语音资源密钥和地区。部署语音资源后，选择“转到资源”以查看和管理密钥。有关 Azure AI 服务资源的详细信息，请参阅获取资源密钥。

上述步骤完成之后，会得到密钥和资源地区两个变量，下面会使用到这两个变量。

二、接入实时语音转文本文档地址

官方文档中提供了十种常用语言的案例，我们可以根据 JavaScript 的部分继续往下进行

由于我们需要实现实时语音转文本，要处理从麦克风识别，翻了一遍文档之后发现他只支持浏览器的 JS 环境中通过麦克风识别，参考此处：识别麦克风的语音。不适合在 RN 中使用，我们需要通过别的方式来实现。

安装依赖
```
npm install microsoft-cognitiveservices-speech-sdk react-native-get-random-values react-native-live-audio-stream
```
上面的命令安装了以下三个依赖
- microsoft-cognitiveservices-speech-sdk：微软语音 sdk
- react-native-get-random-values 为了解决在 RN 中使用语音 sdk 报错，参见 Issue #473
- react-native-live-audio-stream 处理实时音频流

代码实现

  // 需要放在 speech sdk 之前引用，否则 sdk 会报错
  import "react-native-get-random-values";
  
  import { Buffer } from "buffer";
  import { PermissionsAndroid, Platform } from "react-native";
  import {
    AudioConfig,
    AudioInputStream,
    PushAudioInputStream,
    SpeechTranslationConfig,
    TranslationRecognizer,
  } from "microsoft-cognitiveservices-speech-sdk";
  import AudioRecord from "react-native-live-audio-stream";

  // AudioRecord 参数，按自己需求来
  const channels = 1;
  const bitsPerSample = 16;
  const sampleRate = 16000;
  const audioSource = 6;

  // Azure 语音服务 key、regoin、语种等，按自己需求来
  const your_key = '';
  const your_region = '';
  const your_language = 'zh-CN';
  const your_target_language = 'zh';

  // 语音转换类
  class Converter {
    // 微软语音转换核心
    private recognizer: TranslationRecognizer | null = null;
    // 通过流来实现语音转换
    private pushStream: PushAudioInputStream = AudioInputStream.createPushStream();

    constructor() {}

    /**
     * 初始化语音识别
     */
    public async init() {
      // 这里初始化 AudioRecord，wavFile不填就从麦克风识别语音
      AudioRecord.init({
        sampleRate,
        channels,
        bitsPerSample,
        audioSource,
        wavFile: "",
      });

      // 当麦克风识别到语音后，将 base64 转成流，写入到 pushStream 中
      AudioRecord.on("data", (data: any) => {
        this.pushStream?.write(Buffer.from(data, "base64"));
      });

      // 语音转换配置
      const speechTranslationConfig = SpeechTranslationConfig.fromSubscription(
        your_key,
        your_region
      );
      speechTranslationConfig.speechRecognitionLanguage = your_language;
      speechTranslationConfig.addTargetLanguage(your_target_language);

      // 这里指定音频来源为 pushStream 流
      const audioConfig = AudioConfig.fromStreamInput(this.pushStream);
      
      // 通过配置生成 TranslationRecognizer 实例
      this.recognizer = new TranslationRecognizer(speechTranslationConfig, audioConfig);
    }

    /**
     * 开始语音识别
     */
    public async start() {
      // 先检查是否有麦克风等权限
      await Converter.checkPermission();

      if (!this.recognizer) this.init();

      this.recognizer!.sessionStarted = (s, e) => {
        console.log("session started: " + e.sessionId);
      };
      this.recognizer!.sessionStopped = (s, e) => {
        console.log("session stopped: " + e.sessionId);
      };
      // 这里是边说边识别，每次说话都会执行
      this.recognizer!.recognizing = (s, e) => {
        console.log("recognizing", e.result.text);
      }
      // 这是是识别一整句话说完之后执行
      this.recognizer!.recognized = (s, e) => {
        console.log("recognized", e.result.text);
      }
      // 开始识别
      this.recognizer!.startContinuousRecognitionAsync(
        () => {
          console.log("startContinuousRecognitionAsync");
        },
        (err) => {
          console.log(err);
        }
      );

      // 开始监听麦克风
      AudioRecord.start();
    }

    /**
     * 停止语音识别
     */
    public async stop() {
      // 停止麦克风监听
      AudioRecord.stop();
      if (!!this.recognizer) {
        // 停止识别
        this.recognizer.stopContinuousRecognitionAsync(
          () => {
            console.log("stopContinuousRecognitionAsync");
          },
          (err) => {
            console.log(err);
          }
        );
      }
    }

    /**
     * 检查权限
     */
    public static async checkPermission() {
      console.log(Platform.Version)
      if (Platform.OS === "android") {
        try {
          const grants = await PermissionsAndroid.requestMultiple([
            PermissionsAndroid.PERMISSIONS.WRITE_EXTERNAL_STORAGE,
            PermissionsAndroid.PERMISSIONS.READ_EXTERNAL_STORAGE,
            PermissionsAndroid.PERMISSIONS.RECORD_AUDIO,
          ]);

          console.log("grants", grants)
          if (
            grants["android.permission.WRITE_EXTERNAL_STORAGE"] === PermissionsAndroid.RESULTS.GRANTED &&
            grants["android.permission.READ_EXTERNAL_STORAGE"] === PermissionsAndroid.RESULTS.GRANTED &&
            grants["android.permission.RECORD_AUDIO"] === PermissionsAndroid.RESULTS.GRANTED
          ) {
            console.log("Permissions granted");
            return true;
          } else {
            console.log("All required permissions not granted");
            return;
          }
        } catch (err) {
          console.warn(err);
          return;
        }
      }
      return true;
    }
  }

  export default Converter;

使用

import Converter from "./Converter"

const converter = new Converter();

converter.start();

converter.stop();

或许你也可以通过start传入recognized，来实现不同识别结果处理。

三、接入文本转语音文档地址

我使用的是 expo 构建的项目，所以此处选择 expo-av 来实现播放音频，由于 expo-av不支持直接播放流，这里需要通过 expo-file-system 将转换的音频存到本地再播放。如果使用其他插件也可以按照文档选择其他方式实现。

安装依赖

npx expo install expo-av expo-file-system expo-crypto

代码实现

  import "react-native-get-random-values";
  import { Buffer } from "buffer";
  import * as FileSystem from "expo-file-system";
  import { Audio } from "expo-av";

  import {
    SpeechConfig,
    SpeechSynthesisOutputFormat,
    SpeechSynthesizer,
  } from "microsoft-cognitiveservices-speech-sdk";
  import { randomUUID } from "expo-crypto";

  const your_key = "";
  const your_region = "";
  const your_language = "";

  export function synthesizeSpeech(text: string) {
    const speechConfig = SpeechConfig.fromSubscription(your_key, your_region);
    speechConfig.speechSynthesisLanguage = your_language;
    speechConfig.speechSynthesisOutputFormat = SpeechSynthesisOutputFormat.Audio16Khz64KBitRateMonoMp3;

    // @ts-ignore
    const speechSynthesizer = new SpeechSynthesizer(speechConfig, null);

    speechSynthesizer.speakTextAsync(text, async (e) => {
      const tem_file_uri = FileSystem.documentDirectory + `${randomUUID()}.wav`;

      const myBuffer = Buffer.from(e.audioData);

      // 将Buffer转换为Base64
      const base64String = myBuffer.toString("base64");

      await FileSystem.writeAsStringAsync(tem_file_uri, base64String, { encoding: "base64" });

      const { sound } = await Audio.Sound.createAsync({ uri: tem_file_uri });

      sound.setOnPlaybackStatusUpdate(status => {
        // @ts-ignore
        if (status.didJustFinish) {
          FileSystem.deleteAsync(tem_file_uri);
        }
      })
      await sound.playAsync();
    });
  }

React Native 使用 Azure 语音服务

Azure 语音服务概述

一、先决条件

二、接入实时语音转文本 文档地址

三、接入文本转语音 文档地址

二、接入实时语音转文本文档地址

三、接入文本转语音文档地址