前端Web Speech API

1,800 阅读3分钟

Web Speech API 是一种浏览器内置的 API,用于语音识别和语音合成。它允许开发者在网页中实现语音输入和语音输出功能,提供更自然和直观的用户交互方式。Web Speech API 包括两个主要部分:

  1. Speech Recognition API:用于语音识别,将用户的语音输入转换为文本。
  2. Speech Synthesis API:用于语音合成,将文本转换为语音输出。

使用场景

  1. 语音控制和导航:通过语音命令控制网页的导航和操作,例如在智能家居控制面板中使用语音控制设备。
  2. 辅助技术:帮助视力或行动不便的用户通过语音进行网页操作。
  3. 语音输入:在表单和聊天应用中使用语音输入,提高输入效率。
  4. 语音反馈:在教育和培训应用中提供语音反馈,增强用户体验。

1. Speech Recognition API

Speech Recognition API 用于将用户的语音输入转换为文本。以下是一个基本的示例:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Speech Recognition Example</title>
</head>
<body>
  <h1>Speech Recognition Example</h1>
  <button id="start-recognition">Start Recognition</button>
  <p id="result"></p>

  <script>
    // 检查浏览器是否支持 SpeechRecognition API
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    if (!SpeechRecognition) {
      alert('Your browser does not support Speech Recognition API');
    } else {
      const recognition = new SpeechRecognition();
      recognition.lang = 'en-US'; // 设置识别语言
      recognition.interimResults = false; // 是否返回临时结果
      recognition.maxAlternatives = 1; // 返回结果的最大数量

      const startButton = document.getElementById('start-recognition');
      const resultParagraph = document.getElementById('result');

      startButton.addEventListener('click', () => {
        recognition.start();
      });

      recognition.addEventListener('result', (event) => {
        const transcript = event.results[0][0].transcript;
        resultParagraph.textContent = `You said: ${transcript}`;
      });

      recognition.addEventListener('speechend', () => {
        recognition.stop();
      });

      recognition.addEventListener('error', (event) => {
        resultParagraph.textContent = `Error occurred in recognition: ${event.error}`;
      });
    }
  </script>
</body>
</html>

2. Speech Synthesis API

Speech Synthesis API 用于将文本转换为语音输出。以下是一个基本的示例:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Speech Synthesis Example</title>
</head>
<body>
  <h1>Speech Synthesis Example</h1>
  <textarea id="text-to-speak" rows="4" cols="50">Hello, how are you?</textarea>
  <button id="speak-button">Speak</button>

  <script>
    const speakButton = document.getElementById('speak-button');
    const textToSpeak = document.getElementById('text-to-speak');

    speakButton.addEventListener('click', () => {
      const utterance = new SpeechSynthesisUtterance(textToSpeak.value);
      utterance.lang = 'en-US'; // 设置语音语言
      utterance.pitch = 1; // 设置语音音调
      utterance.rate = 1; // 设置语音速度

      window.speechSynthesis.speak(utterance);
    });
  </script>
</body>
</html>

综合示例

结合 Speech Recognition API 和 Speech Synthesis API,可以实现一个简单的语音助手:

<!DOCTYPE html>
<html lang="en">
<head>
  <meta charset="UTF-8">
  <meta name="viewport" content="width=device-width, initial-scale=1.0">
  <title>Speech Assistant Example</title>
</head>
<body>
  <h1>Speech Assistant Example</h1>
  <button id="start-assistant">Start Assistant</button>
  <p id="assistant-result"></p>

  <script>
    const SpeechRecognition = window.SpeechRecognition || window.webkitSpeechRecognition;
    if (!SpeechRecognition) {
      alert('Your browser does not support Speech Recognition API');
    } else {
      const recognition = new SpeechRecognition();
      recognition.lang = 'en-US';
      recognition.interimResults = false;
      recognition.maxAlternatives = 1;

      const startButton = document.getElementById('start-assistant');
      const resultParagraph = document.getElementById('assistant-result');

      startButton.addEventListener('click', () => {
        recognition.start();
      });

      recognition.addEventListener('result', (event) => {
        const transcript = event.results[0][0].transcript;
        resultParagraph.textContent = `You said: ${transcript}`;
        respondToSpeech(transcript);
      });

      recognition.addEventListener('speechend', () => {
        recognition.stop();
      });

      recognition.addEventListener('error', (event) => {
        resultParagraph.textContent = `Error occurred in recognition: ${event.error}`;
      });

      function respondToSpeech(transcript) {
        let response = '';

        if (transcript.toLowerCase().includes('hello')) {
          response = 'Hello! How can I help you today?';
        } else if (transcript.toLowerCase().includes('time')) {
          response = `The current time is ${new Date().toLocaleTimeString()}`;
        } else {
          response = 'Sorry, I did not understand that.';
        }

        const utterance = new SpeechSynthesisUtterance(response);
        utterance.lang = 'en-US';
        window.speechSynthesis.speak(utterance);
      }
    }
  </script>
</body>
</html>

代码讲解

  1. 检查浏览器支持:首先检查浏览器是否支持 SpeechRecognitionSpeechSynthesis API。
  2. 初始化识别和合成对象:创建 SpeechRecognitionSpeechSynthesisUtterance 对象。
  3. 事件监听:为按钮添加点击事件监听器,开始语音识别。为 recognition 对象添加 resultspeechenderror 事件监听器,处理识别结果和错误。
  4. 响应语音输入:根据识别结果生成响应文本,并使用 SpeechSynthesis API 将响应文本转换为语音输出。

通过上述示例,可以在网页中实现基本的语音识别和语音合成功能,使用户能够通过语音进行交互,提供更自然和直观的用户体验。

参考文献

【第3349期】纯前端实现语音文字互转 (qq.com)