基于Vue3 + Vosk-browser 实现前端离线语音识别组件

83 阅读7分钟

基于Vue3 + Vosk-browser 实现前端离线语音识别组件

本文聚焦于纯前端离线语音识别方案,基于Vue3和vosk-browser构建高性能、无后端依赖的语音识别组件。

组件核心功能

1. 离线语音识别能力

组件基于Vosk本地语音模型运行,无需联网、无需调用第三方API,所有语音识别逻辑在前端完成,既保证了数据隐私性,又避免了网络延迟带来的体验问题。支持中文实时语音转文字,识别结果精准度高,满足日常语音输入场景需求。

2. 实时识别反馈机制

  • 临时结果实时展示:录音过程中实时输出临时识别文本,搭配打字动效,模拟实时输入的视觉体验,让用户直观感知识别进度。
  • 最终结果固化保存:停止录音后,自动整合并保存最终识别结果,区分临时与最终结果,避免识别过程中的文本干扰。

3. 完整的录音控制体系

  • 录音启停管理:提供清晰的“开始录音”“停止录音”按钮,按钮状态随录音进程动态禁用/启用,防止重复操作。
  • 麦克风权限处理:自动检测并请求麦克风权限,权限获取失败时给出明确的错误提示,引导用户授权。
  • 音频流资源管理:录音停止或组件卸载时,自动关闭麦克风媒体流、释放音频上下文等资源,避免内存泄漏。

4. 识别结果管理功能

  • 结果展示与统计:清晰展示最终识别文本,自动统计字符长度,附带保存状态标识,方便用户查看文本信息。
  • 结果清空操作:提供“清空记录”按钮,支持一键清除所有识别结果,满足重复使用场景的需求。

5. 可视化状态提示

  • 录音状态指示:通过脉冲动画指示灯和文字提示(“录音中...”/“准备就绪”),直观展示当前录音状态。
  • 音量波形动画:录音过程中展示模拟音量波形跳动效果,增强用户交互感知。
  • 空状态引导:无识别结果时,展示引导文案,提示用户操作方式及麦克风权限要求。
  • 错误提示弹窗:模型加载失败、麦克风权限拒绝等异常场景,弹出醒目错误提示,告知具体问题原因。

6. 响应式适配

组件布局兼容PC端与移动端,在小屏设备上自动调整排版结构(如按钮纵向排列、提示文本居中),保证多终端使用体验一致。

关键实现逻辑(核心功能层)

1. 离线模型加载

初始化阶段加载本地Vosk中文模型,为离线识别提供基础,模型加载失败时捕获异常并反馈:

import { createModel } from 'vosk-browser';
const initModel = async () => {
  try {
    model = await createModel("/vosk-model/vosk-model-small-cn-0.22.zip");
  } catch (err) {
    error.value = `模型加载失败: ${err.message}`;
  }
};


2. 录音与识别核心

通过 Web Audio API 采集麦克风音频流,传入 Vosk 识别器实现实时识别,区分临时 / 最终结果处理:

const startRecording = async () => {
  // 获取麦克风媒体流
  const mediaStream = await navigator.mediaDevices.getUserMedia({
    audio: { echoCancellation: true, noiseSuppression: true, sampleRate: 16000 }
  });
  // 初始化识别器并监听结果
  const recognizer = new model.KaldiRecognizer(16000);
  recognizer.on("result", (message) => {
    recognizedText.value += message.result?.text || ''; // 最终结果
  });
  recognizer.on("partialresult", (message) => {
    tempText.value = message.result?.partial || ''; // 临时结果
  });
  // 音频数据处理与识别
  const audioContext = new AudioContext({ sampleRate: 16000 });
  const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);
  recognizerNode.onaudioprocess = (event) => {
    const channelData = event.inputBuffer.getChannelData(0);
    recognizer.acceptWaveformFloat(channelData, 16000);
  };
};

3. 资源自动清理

停止录音或组件卸载时,释放麦克风、音频上下文等资源,避免资源占用:

const stopRecording = () => {
  if (cleanupFunction) {
    cleanupFunction(); // 停止媒体流、断开音频节点、关闭音频上下文
  }
};
onUnmounted(() => stopRecording());

重要注意事项

1. 模型部署要求

  • 模型文件(如 vosk-model-small-cn-0.22.zip)需放置在项目 public 目录下,确保通过绝对路径可访问;
  • 推荐使用体积较小的 “small” 版本模型,大模型会增加加载时间,影响首次使用体验;
  • 模型加载为异步过程,需等待模型加载完成后再启动录音,避免识别器初始化失败。

2. 浏览器环境限制

  • 麦克风权限(getUserMedia)仅支持 HTTPS 环境(localhost除外),生产环境需部署在 HTTPS 服务器上;
  • 需浏览器支持 WebAssembly、Web Audio API 和 MediaDevices API,建议使用 Chrome、Edge、Firefox 等现代浏览器,不兼容 IE;
  • 部分移动端浏览器可能限制后台录音,需保持页面在前台运行。

3. 性能与体验优化

  • 音频处理时可添加音量增益(如代码中 gainMultiplier=5.0),提升低音量环境下的识别准确率;
  • 模型加载过程可添加加载动画,避免用户误以为组件未响应;
  • 识别结果较长时,建议添加滚动容器,防止页面溢出。

4. 权限与错误处理

  • 需提前告知用户麦克风权限用途,避免权限被拒绝;
  • 捕获模型加载、音频采集、识别过程中的所有异常,给出明确的错误提示,方便问题定位;
  • 重复点击 “开始录音” 时需做状态校验,防止多次创建音频流导致资源冲突。

5. 跨域与部署问题

  • 部署时确保模型文件的跨域配置正确,避免因跨域导致模型加载失败;
  • 静态部署(如 Nginx)时,需配置对 zip 模型文件的正确 MIME 类型,防止文件加载异常。

扩展优化方向

  1. 支持多语言模型切换,适配不同语种的语音识别需求;
  2. 增加识别结果的复制、导出(如 TXT/JSON)功能;
  3. 优化识别算法,添加标点符号自动补全、口语化文本规整;
  4. 增加录音时长统计、识别准确率展示等数据维度;
  5. 支持语音识别暂停 / 继续功能,提升操作灵活性。

模型地址:alphacephei.com/vosk/models…

下载后解压,将模型文件放在项目的 public/models/ 目录下

完整代码实现

<template>
  <div class="voice-recognition-container">
    <!-- 头部标题 -->
    <div class="header">
      <div class="title-section">
        <div class="logo">
          <svg viewBox="0 0 24 24" fill="currentColor">
            <path d="M12 15C13.6569 15 15 13.6569 15 12V6C15 4.34315 13.6569 3 12 3C10.3431 3 9 4.34315 9 6V12C9 13.6569 10.3431 15 12 15Z"/>
            <path d="M18 12C18 15.3137 15.3137 18 12 18C8.68629 18 6 15.3137 6 12"/>
            <path d="M12 20V18"/>
          </svg>
        </div>
        <div>
          <h1 class="app-title">智能语音识别</h1>
          <p class="app-subtitle">实时语音转文字,支持中文识别</p>
        </div>
      </div>
      <div class="status-indicator" :class="{ active: isListening }">
        <span class="pulse"></span>
        {{ isListening ? '录音中...' : '准备就绪' }}
      </div>
    </div>

    <!-- 录音控制区域 -->
    <div class="control-section">
      <div class="button-group">
        <button 
          @click="startRecording" 
          :disabled="isListening"
          class="control-btn start-btn"
        >
          <svg viewBox="0 0 24 24" fill="currentColor">
            <circle cx="12" cy="12" r="10"/>
          </svg>
          <span>开始录音</span>
        </button>
        
        <button 
          @click="stopRecording" 
          :disabled="!isListening"
          class="control-btn stop-btn"
        >
          <svg viewBox="0 0 24 24" fill="currentColor">
            <rect x="6" y="6" width="12" height="12" rx="2"/>
          </svg>
          <span>停止录音</span>
        </button>
      </div>

      <!-- 音量指示器(模拟) -->
      <div v-if="isListening" class="volume-indicator">
        <div class="volume-bars">
          <div v-for="n in 8" :key="n" class="bar"></div>
        </div>
        <span class="volume-label">麦克风音量</span>
      </div>
    </div>

    <!-- 识别结果区域 -->
    <div class="result-section">
      <div class="result-header">
        <h3 class="result-title">
          <svg viewBox="0 0 24 24" fill="currentColor">
            <path d="M9 17H15M12 12V21M12 3C7.02944 3 3 7.02944 3 12C3 16.9706 7.02944 21 12 21C16.9706 21 21 16.9706 21 12C21 7.02944 16.9706 3 12 3Z"/>
          </svg>
          识别结果
        </h3>
        <button 
          v-if="recognizedText" 
          @click="recognizedText = ''" 
          class="clear-btn"
        >
          清空记录
        </button>
      </div>
      
      <div class="result-content">
        <!-- 最终识别结果 -->
        <div v-if="recognizedText" class="final-result">
          <div class="result-label">最终结果</div>
          <div class="text-content">{{ recognizedText }}</div>
          <div class="result-stats">
            <span class="stat-item">
              <svg viewBox="0 0 24 24" fill="currentColor">
                <path d="M9 12L11 14L15 10M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
              </svg>
              已保存
            </span>
            <span class="stat-item">
              <svg viewBox="0 0 24 24" fill="currentColor">
                <path d="M12 6V12L16 14M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
              </svg>
              {{ recognizedText.length }} 字符
            </span>
          </div>
        </div>

        <!-- 临时识别结果 -->
        <div v-if="tempText" class="temp-result">
          <div class="result-label">
            <span class="typing-indicator">
              <span class="dot"></span>
              <span class="dot"></span>
              <span class="dot"></span>
            </span>
            正在输入...
          </div>
          <div class="text-content">{{ tempText }}</div>
        </div>

        <!-- 空状态 -->
        <div v-if="!recognizedText && !tempText" class="empty-state">
          <svg viewBox="0 0 24 24" fill="currentColor">
            <path d="M19 11H5M19 11C20.1046 11 21 11.8954 21 13V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V13C3 11.8954 3.89543 11 5 11M19 11V9C19 7.89543 18.1046 7 17 7M5 11V9C5 7.89543 5.89543 7 7 7M7 7V5C7 3.89543 7.89543 3 9 3H15C16.1046 3 17 3.89543 17 5V7M7 7H17"/>
          </svg>
          <p>点击"开始录音"按钮进行语音识别</p>
          <p class="hint">请确保已授予麦克风访问权限</p>
        </div>
      </div>

      <!-- 操作提示 -->
      <div class="hint-section">
        <div class="hint-item">
          <svg viewBox="0 0 24 24" fill="currentColor">
            <path d="M13 16H12V12H11M12 8H12.01M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
          </svg>
          <span>建议在安静环境下使用</span>
        </div>
        <div class="hint-item">
          <svg viewBox="0 0 24 24" fill="currentColor">
            <path d="M12 15H12.01M12 12V9M4 21C4 17.134 7.13401 14 11 14C13.0543 14 14.8772 14.897 16 16.2915M15 7C15 9.20914 13.2091 11 11 11C8.79086 11 7 9.20914 7 7C7 4.79086 8.79086 3 11 3C13.2091 3 15 4.79086 15 7Z"/>
          </svg>
          <span>清晰发音可获得更好效果</span>
        </div>
      </div>
    </div>

    <!-- 错误提示 -->
    <div v-if="error" class="error-alert">
      <svg viewBox="0 0 24 24" fill="currentColor">
        <path d="M12 8V12M12 16H12.01M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
      </svg>
      <span>{{ error }}</span>
    </div>
  </div>
</template>

<script setup>
import { ref, onMounted, onUnmounted } from 'vue';
import { createModel } from 'vosk-browser';

// 原有逻辑代码保持不变
const isListening = ref(false);
const error = ref(null);
const recognizedText = ref('');
let model = null;
let cleanupFunction = null;
const mySampleRate = 16000;
const tempText = ref('');

const initModel = async () => {
  try {
    model = await createModel("/vosk-model/vosk-model-small-cn-0.22.zip");
    console.log("模型加载成功!");
  } catch (err) {
    error.value = `模型加载失败: ${err.message}`;
    console.error("模型加载错误:", err);
  }
};

const startRecording = async () => {
  if (isListening.value) return;
  
  try {
    const mediaStream = await navigator.mediaDevices.getUserMedia({
      video: false,
      audio: {
        echoCancellation: true,
        noiseSuppression: true,
        channelCount: 1,
        sampleRate: mySampleRate
      },
    });

    const recognizer = new model.KaldiRecognizer(mySampleRate);
    
    recognizer.on("result", (message) => {
      const text = message.result?.text;
      if (text) {
        recognizedText.value += (recognizedText.value ? '\n' : '') + text;
        tempText.value = '';
      }
    });
    
    recognizer.on("partialresult", (message) => {
      const partialText = message.result?.partial;
      if (partialText) {
        tempText.value = partialText;
      }
    });
    
    recognizer.on("error", (err) => {
      console.error('识别器错误:', err);
    });

    const audioContext = new AudioContext({ sampleRate: mySampleRate });
    const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);

    recognizerNode.onaudioprocess = (event) => {
      try {
        const channelData = event.inputBuffer.getChannelData(0);
        const boostedData = new Float32Array(channelData.length);
        const gainMultiplier = 5.0;
        
        for (let i = 0; i < channelData.length; i++) {
          boostedData[i] = Math.min(1.0, Math.max(-1.0, channelData[i] * gainMultiplier));
        }

        if (recognizer.acceptWaveformFloat) {
          recognizer.acceptWaveformFloat(boostedData, mySampleRate);
        } else {
          recognizer.acceptWaveform(event.inputBuffer);
        }
      } catch (err) {
        console.error('音频处理错误:', err);
      }
    };

    const source = audioContext.createMediaStreamSource(mediaStream);
    source.connect(recognizerNode);
    recognizerNode.connect(audioContext.destination);
    
    isListening.value = true;

    cleanupFunction = () => {
      mediaStream.getTracks().forEach(track => track.stop());
      source.disconnect();
      recognizerNode.disconnect();
      audioContext.close();
      isListening.value = false;
    };

  } catch (err) {
    console.error('初始化失败:', err);
    error.value = `麦克风访问失败: ${err.message}`;
  }
};

const stopRecording = () => {
  if (cleanupFunction) {
    cleanupFunction();
    cleanupFunction = null;
  }
};

onMounted(() => {
  initModel();
});

onUnmounted(() => {
  stopRecording();
});
</script>

<style scoped>
.voice-recognition-container {
  max-width: 800px;
  margin: 0 auto;
  padding: 24px;
  background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
  min-height: 100vh;
  font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
}

.header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 32px;
  color: white;
}

.title-section {
  display: flex;
  align-items: center;
  gap: 16px;
}

.logo {
  width: 48px;
  height: 48px;
  background: rgba(255, 255, 255, 0.1);
  border-radius: 12px;
  display: flex;
  align-items: center;
  justify-content: center;
  backdrop-filter: blur(10px);
}

.logo svg {
  width: 24px;
  height: 24px;
}

.app-title {
  font-size: 28px;
  font-weight: 700;
  margin: 0;
  background: linear-gradient(135deg, #ffffff 0%, #e0e0e0 100%);
  -webkit-background-clip: text;
  -webkit-text-fill-color: transparent;
}

.app-subtitle {
  font-size: 14px;
  opacity: 0.9;
  margin: 4px 0 0;
}

.status-indicator {
  padding: 8px 16px;
  background: rgba(255, 255, 255, 0.1);
  border-radius: 20px;
  font-size: 14px;
  display: flex;
  align-items: center;
  gap: 8px;
  backdrop-filter: blur(10px);
}

.status-indicator.active {
  background: rgba(255, 255, 255, 0.2);
  color: #4ade80;
}

.pulse {
  width: 8px;
  height: 8px;
  background: #4ade80;
  border-radius: 50%;
  animation: pulse 2s infinite;
}

.control-section {
  background: white;
  border-radius: 20px;
  padding: 32px;
  margin-bottom: 24px;
  box-shadow: 0 10px 40px rgba(0, 0, 0, 0.1);
}

.button-group {
  display: flex;
  gap: 16px;
  margin-bottom: 24px;
}

.control-btn {
  flex: 1;
  display: flex;
  align-items: center;
  justify-content: center;
  gap: 12px;
  padding: 16px 24px;
  border: none;
  border-radius: 12px;
  font-size: 16px;
  font-weight: 600;
  cursor: pointer;
  transition: all 0.3s ease;
  color: white;
}

.control-btn svg {
  width: 20px;
  height: 20px;
}

.start-btn {
  background: linear-gradient(135deg, #4ade80 0%, #22c55e 100%);
}

.start-btn:hover:not(:disabled) {
  transform: translateY(-2px);
  box-shadow: 0 8px 25px rgba(34, 197, 94, 0.3);
}

.stop-btn {
  background: linear-gradient(135deg, #f87171 0%, #ef4444 100%);
}

.stop-btn:hover:not(:disabled) {
  transform: translateY(-2px);
  box-shadow: 0 8px 25px rgba(239, 68, 68, 0.3);
}

.control-btn:disabled {
  opacity: 0.5;
  cursor: not-allowed;
  transform: none !important;
  box-shadow: none !important;
}

.volume-indicator {
  text-align: center;
}

.volume-bars {
  display: flex;
  justify-content: center;
  gap: 4px;
  height: 40px;
  align-items: flex-end;
  margin-bottom: 8px;
}

.bar {
  width: 6px;
  background: linear-gradient(to top, #4ade80, #3b82f6);
  border-radius: 3px;
  animation: soundwave 1.5s infinite ease-in-out;
}

.bar:nth-child(1) { animation-delay: 0.1s; height: 20px; }
.bar:nth-child(2) { animation-delay: 0.2s; height: 28px; }
.bar:nth-child(3) { animation-delay: 0.3s; height: 36px; }
.bar:nth-child(4) { animation-delay: 0.4s; height: 28px; }
.bar:nth-child(5) { animation-delay: 0.5s; height: 20px; }
.bar:nth-child(6) { animation-delay: 0.6s; height: 28px; }
.bar:nth-child(7) { animation-delay: 0.7s; height: 36px; }
.bar:nth-child(8) { animation-delay: 0.8s; height: 28px; }

.volume-label {
  font-size: 14px;
  color: #64748b;
}

.result-section {
  background: white;
  border-radius: 20px;
  padding: 24px;
  box-shadow: 0 10px 40px rgba(0, 0, 0, 0.1);
}

.result-header {
  display: flex;
  justify-content: space-between;
  align-items: center;
  margin-bottom: 20px;
}

.result-title {
  display: flex;
  align-items: center;
  gap: 8px;
  margin: 0;
  font-size: 20px;
  color: #1e293b;
}

.result-title svg {
  width: 20px;
  height: 20px;
  color: #3b82f6;
}

.clear-btn {
  padding: 8px 16px;
  background: #f1f5f9;
  border: none;
  border-radius: 8px;
  color: #64748b;
  font-size: 14px;
  cursor: pointer;
  transition: all 0.3s ease;
}

.clear-btn:hover {
  background: #e2e8f0;
  color: #475569;
}

.result-content {
  min-height: 200px;
}

.final-result, .temp-result {
  background: #f8fafc;
  border-radius: 12px;
  padding: 20px;
  margin-bottom: 16px;
}

.final-result {
  border-left: 4px solid #4ade80;
}

.temp-result {
  border-left: 4px solid #3b82f6;
}

.result-label {
  font-size: 12px;
  font-weight: 600;
  text-transform: uppercase;
  letter-spacing: 0.5px;
  color: #64748b;
  margin-bottom: 8px;
  display: flex;
  align-items: center;
  gap: 8px;
}

.text-content {
  font-size: 16px;
  line-height: 1.6;
  color: #1e293b;
  white-space: pre-wrap;
}

.result-stats {
  display: flex;
  gap: 16px;
  margin-top: 12px;
  padding-top: 12px;
  border-top: 1px solid #e2e8f0;
}

.stat-item {
  display: flex;
  align-items: center;
  gap: 4px;
  font-size: 12px;
  color: #64748b;
}

.stat-item svg {
  width: 12px;
  height: 12px;
}

.typing-indicator {
  display: flex;
  gap: 2px;
}

.dot {
  width: 4px;
  height: 4px;
  background: #3b82f6;
  border-radius: 50%;
  animation: typing 1.4s infinite ease-in-out;
}

.dot:nth-child(1) { animation-delay: 0s; }
.dot:nth-child(2) { animation-delay: 0.2s; }
.dot:nth-child(3) { animation-delay: 0.4s; }

.empty-state {
  text-align: center;
  padding: 40px 20px;
  color: #64748b;
}

.empty-state svg {
  width: 48px;
  height: 48px;
  margin-bottom: 16px;
  color: #cbd5e1;
}

.empty-state p {
  margin: 0;
  font-size: 16px;
}

.hint {
  font-size: 14px;
  margin-top: 4px;
  color: #94a3b8;
}

.hint-section {
  display: flex;
  gap: 16px;
  margin-top: 24px;
  padding-top: 24px;
  border-top: 1px solid #e2e8f0;
}

.hint-item {
  display: flex;
  align-items: center;
  gap: 8px;
  font-size: 14px;
  color: #64748b;
  flex: 1;
}

.hint-item svg {
  width: 16px;
  height: 16px;
  flex-shrink: 0;
}

.error-alert {
  position: fixed;
  bottom: 24px;
  left: 50%;
  transform: translateX(-50%);
  background: #fef2f2;
  border: 1px solid #fecaca;
  color: #dc2626;
  padding: 12px 20px;
  border-radius: 12px;
  display: flex;
  align-items: center;
  gap: 8px;
  box-shadow: 0 4px 20px rgba(220, 38, 38, 0.1);
  z-index: 1000;
}

.error-alert svg {
  width: 20px;
  height: 20px;
  flex-shrink: 0;
}

@keyframes pulse {
  0%, 100% {
    opacity: 1;
    transform: scale(1);
  }
  50% {
    opacity: 0.5;
    transform: scale(1.1);
  }
}

@keyframes soundwave {
  0%, 100% {
    height: 20px;
  }
  50% {
    height: 40px;
  }
}

@keyframes typing {
  0%, 100% {
    transform: translateY(0);
  }
  50% {
    transform: translateY(-4px);
  }
}

@media (max-width: 640px) {
  .voice-recognition-container {
    padding: 16px;
  }
  
  .header {
    flex-direction: column;
    gap: 16px;
    text-align: center;
  }
  
  .title-section {
    flex-direction: column;
    text-align: center;
  }
  
  .button-group {
    flex-direction: column;
  }
  
  .hint-section {
    flex-direction: column;
    gap: 12px;
  }
}
</style>

image.png