基于Vue3 + Vosk-browser 实现前端离线语音识别组件
本文聚焦于纯前端离线语音识别方案,基于Vue3和vosk-browser构建高性能、无后端依赖的语音识别组件。
组件核心功能
1. 离线语音识别能力
组件基于Vosk本地语音模型运行,无需联网、无需调用第三方API,所有语音识别逻辑在前端完成,既保证了数据隐私性,又避免了网络延迟带来的体验问题。支持中文实时语音转文字,识别结果精准度高,满足日常语音输入场景需求。
2. 实时识别反馈机制
- 临时结果实时展示:录音过程中实时输出临时识别文本,搭配打字动效,模拟实时输入的视觉体验,让用户直观感知识别进度。
- 最终结果固化保存:停止录音后,自动整合并保存最终识别结果,区分临时与最终结果,避免识别过程中的文本干扰。
3. 完整的录音控制体系
- 录音启停管理:提供清晰的“开始录音”“停止录音”按钮,按钮状态随录音进程动态禁用/启用,防止重复操作。
- 麦克风权限处理:自动检测并请求麦克风权限,权限获取失败时给出明确的错误提示,引导用户授权。
- 音频流资源管理:录音停止或组件卸载时,自动关闭麦克风媒体流、释放音频上下文等资源,避免内存泄漏。
4. 识别结果管理功能
- 结果展示与统计:清晰展示最终识别文本,自动统计字符长度,附带保存状态标识,方便用户查看文本信息。
- 结果清空操作:提供“清空记录”按钮,支持一键清除所有识别结果,满足重复使用场景的需求。
5. 可视化状态提示
- 录音状态指示:通过脉冲动画指示灯和文字提示(“录音中...”/“准备就绪”),直观展示当前录音状态。
- 音量波形动画:录音过程中展示模拟音量波形跳动效果,增强用户交互感知。
- 空状态引导:无识别结果时,展示引导文案,提示用户操作方式及麦克风权限要求。
- 错误提示弹窗:模型加载失败、麦克风权限拒绝等异常场景,弹出醒目错误提示,告知具体问题原因。
6. 响应式适配
组件布局兼容PC端与移动端,在小屏设备上自动调整排版结构(如按钮纵向排列、提示文本居中),保证多终端使用体验一致。
关键实现逻辑(核心功能层)
1. 离线模型加载
初始化阶段加载本地Vosk中文模型,为离线识别提供基础,模型加载失败时捕获异常并反馈:
import { createModel } from 'vosk-browser';
const initModel = async () => {
try {
model = await createModel("/vosk-model/vosk-model-small-cn-0.22.zip");
} catch (err) {
error.value = `模型加载失败: ${err.message}`;
}
};
2. 录音与识别核心
通过 Web Audio API 采集麦克风音频流,传入 Vosk 识别器实现实时识别,区分临时 / 最终结果处理:
const startRecording = async () => {
// 获取麦克风媒体流
const mediaStream = await navigator.mediaDevices.getUserMedia({
audio: { echoCancellation: true, noiseSuppression: true, sampleRate: 16000 }
});
// 初始化识别器并监听结果
const recognizer = new model.KaldiRecognizer(16000);
recognizer.on("result", (message) => {
recognizedText.value += message.result?.text || ''; // 最终结果
});
recognizer.on("partialresult", (message) => {
tempText.value = message.result?.partial || ''; // 临时结果
});
// 音频数据处理与识别
const audioContext = new AudioContext({ sampleRate: 16000 });
const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);
recognizerNode.onaudioprocess = (event) => {
const channelData = event.inputBuffer.getChannelData(0);
recognizer.acceptWaveformFloat(channelData, 16000);
};
};
3. 资源自动清理
停止录音或组件卸载时,释放麦克风、音频上下文等资源,避免资源占用:
const stopRecording = () => {
if (cleanupFunction) {
cleanupFunction(); // 停止媒体流、断开音频节点、关闭音频上下文
}
};
onUnmounted(() => stopRecording());
重要注意事项
1. 模型部署要求
- 模型文件(如 vosk-model-small-cn-0.22.zip)需放置在项目 public 目录下,确保通过绝对路径可访问;
- 推荐使用体积较小的 “small” 版本模型,大模型会增加加载时间,影响首次使用体验;
- 模型加载为异步过程,需等待模型加载完成后再启动录音,避免识别器初始化失败。
2. 浏览器环境限制
- 麦克风权限(getUserMedia)仅支持 HTTPS 环境(localhost除外),生产环境需部署在 HTTPS 服务器上;
- 需浏览器支持 WebAssembly、Web Audio API 和 MediaDevices API,建议使用 Chrome、Edge、Firefox 等现代浏览器,不兼容 IE;
- 部分移动端浏览器可能限制后台录音,需保持页面在前台运行。
3. 性能与体验优化
- 音频处理时可添加音量增益(如代码中 gainMultiplier=5.0),提升低音量环境下的识别准确率;
- 模型加载过程可添加加载动画,避免用户误以为组件未响应;
- 识别结果较长时,建议添加滚动容器,防止页面溢出。
4. 权限与错误处理
- 需提前告知用户麦克风权限用途,避免权限被拒绝;
- 捕获模型加载、音频采集、识别过程中的所有异常,给出明确的错误提示,方便问题定位;
- 重复点击 “开始录音” 时需做状态校验,防止多次创建音频流导致资源冲突。
5. 跨域与部署问题
- 部署时确保模型文件的跨域配置正确,避免因跨域导致模型加载失败;
- 静态部署(如 Nginx)时,需配置对 zip 模型文件的正确 MIME 类型,防止文件加载异常。
扩展优化方向
- 支持多语言模型切换,适配不同语种的语音识别需求;
- 增加识别结果的复制、导出(如 TXT/JSON)功能;
- 优化识别算法,添加标点符号自动补全、口语化文本规整;
- 增加录音时长统计、识别准确率展示等数据维度;
- 支持语音识别暂停 / 继续功能,提升操作灵活性。
模型地址:alphacephei.com/vosk/models…
下载后解压,将模型文件放在项目的 public/models/ 目录下
完整代码实现
<template>
<div class="voice-recognition-container">
<!-- 头部标题 -->
<div class="header">
<div class="title-section">
<div class="logo">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M12 15C13.6569 15 15 13.6569 15 12V6C15 4.34315 13.6569 3 12 3C10.3431 3 9 4.34315 9 6V12C9 13.6569 10.3431 15 12 15Z"/>
<path d="M18 12C18 15.3137 15.3137 18 12 18C8.68629 18 6 15.3137 6 12"/>
<path d="M12 20V18"/>
</svg>
</div>
<div>
<h1 class="app-title">智能语音识别</h1>
<p class="app-subtitle">实时语音转文字,支持中文识别</p>
</div>
</div>
<div class="status-indicator" :class="{ active: isListening }">
<span class="pulse"></span>
{{ isListening ? '录音中...' : '准备就绪' }}
</div>
</div>
<!-- 录音控制区域 -->
<div class="control-section">
<div class="button-group">
<button
@click="startRecording"
:disabled="isListening"
class="control-btn start-btn"
>
<svg viewBox="0 0 24 24" fill="currentColor">
<circle cx="12" cy="12" r="10"/>
</svg>
<span>开始录音</span>
</button>
<button
@click="stopRecording"
:disabled="!isListening"
class="control-btn stop-btn"
>
<svg viewBox="0 0 24 24" fill="currentColor">
<rect x="6" y="6" width="12" height="12" rx="2"/>
</svg>
<span>停止录音</span>
</button>
</div>
<!-- 音量指示器(模拟) -->
<div v-if="isListening" class="volume-indicator">
<div class="volume-bars">
<div v-for="n in 8" :key="n" class="bar"></div>
</div>
<span class="volume-label">麦克风音量</span>
</div>
</div>
<!-- 识别结果区域 -->
<div class="result-section">
<div class="result-header">
<h3 class="result-title">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M9 17H15M12 12V21M12 3C7.02944 3 3 7.02944 3 12C3 16.9706 7.02944 21 12 21C16.9706 21 21 16.9706 21 12C21 7.02944 16.9706 3 12 3Z"/>
</svg>
识别结果
</h3>
<button
v-if="recognizedText"
@click="recognizedText = ''"
class="clear-btn"
>
清空记录
</button>
</div>
<div class="result-content">
<!-- 最终识别结果 -->
<div v-if="recognizedText" class="final-result">
<div class="result-label">最终结果</div>
<div class="text-content">{{ recognizedText }}</div>
<div class="result-stats">
<span class="stat-item">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M9 12L11 14L15 10M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
</svg>
已保存
</span>
<span class="stat-item">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M12 6V12L16 14M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
</svg>
{{ recognizedText.length }} 字符
</span>
</div>
</div>
<!-- 临时识别结果 -->
<div v-if="tempText" class="temp-result">
<div class="result-label">
<span class="typing-indicator">
<span class="dot"></span>
<span class="dot"></span>
<span class="dot"></span>
</span>
正在输入...
</div>
<div class="text-content">{{ tempText }}</div>
</div>
<!-- 空状态 -->
<div v-if="!recognizedText && !tempText" class="empty-state">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M19 11H5M19 11C20.1046 11 21 11.8954 21 13V19C21 20.1046 20.1046 21 19 21H5C3.89543 21 3 20.1046 3 19V13C3 11.8954 3.89543 11 5 11M19 11V9C19 7.89543 18.1046 7 17 7M5 11V9C5 7.89543 5.89543 7 7 7M7 7V5C7 3.89543 7.89543 3 9 3H15C16.1046 3 17 3.89543 17 5V7M7 7H17"/>
</svg>
<p>点击"开始录音"按钮进行语音识别</p>
<p class="hint">请确保已授予麦克风访问权限</p>
</div>
</div>
<!-- 操作提示 -->
<div class="hint-section">
<div class="hint-item">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M13 16H12V12H11M12 8H12.01M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
</svg>
<span>建议在安静环境下使用</span>
</div>
<div class="hint-item">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M12 15H12.01M12 12V9M4 21C4 17.134 7.13401 14 11 14C13.0543 14 14.8772 14.897 16 16.2915M15 7C15 9.20914 13.2091 11 11 11C8.79086 11 7 9.20914 7 7C7 4.79086 8.79086 3 11 3C13.2091 3 15 4.79086 15 7Z"/>
</svg>
<span>清晰发音可获得更好效果</span>
</div>
</div>
</div>
<!-- 错误提示 -->
<div v-if="error" class="error-alert">
<svg viewBox="0 0 24 24" fill="currentColor">
<path d="M12 8V12M12 16H12.01M21 12C21 16.9706 16.9706 21 12 21C7.02944 21 3 16.9706 3 12C3 7.02944 7.02944 3 12 3C16.9706 3 21 7.02944 21 12Z"/>
</svg>
<span>{{ error }}</span>
</div>
</div>
</template>
<script setup>
import { ref, onMounted, onUnmounted } from 'vue';
import { createModel } from 'vosk-browser';
// 原有逻辑代码保持不变
const isListening = ref(false);
const error = ref(null);
const recognizedText = ref('');
let model = null;
let cleanupFunction = null;
const mySampleRate = 16000;
const tempText = ref('');
const initModel = async () => {
try {
model = await createModel("/vosk-model/vosk-model-small-cn-0.22.zip");
console.log("模型加载成功!");
} catch (err) {
error.value = `模型加载失败: ${err.message}`;
console.error("模型加载错误:", err);
}
};
const startRecording = async () => {
if (isListening.value) return;
try {
const mediaStream = await navigator.mediaDevices.getUserMedia({
video: false,
audio: {
echoCancellation: true,
noiseSuppression: true,
channelCount: 1,
sampleRate: mySampleRate
},
});
const recognizer = new model.KaldiRecognizer(mySampleRate);
recognizer.on("result", (message) => {
const text = message.result?.text;
if (text) {
recognizedText.value += (recognizedText.value ? '\n' : '') + text;
tempText.value = '';
}
});
recognizer.on("partialresult", (message) => {
const partialText = message.result?.partial;
if (partialText) {
tempText.value = partialText;
}
});
recognizer.on("error", (err) => {
console.error('识别器错误:', err);
});
const audioContext = new AudioContext({ sampleRate: mySampleRate });
const recognizerNode = audioContext.createScriptProcessor(4096, 1, 1);
recognizerNode.onaudioprocess = (event) => {
try {
const channelData = event.inputBuffer.getChannelData(0);
const boostedData = new Float32Array(channelData.length);
const gainMultiplier = 5.0;
for (let i = 0; i < channelData.length; i++) {
boostedData[i] = Math.min(1.0, Math.max(-1.0, channelData[i] * gainMultiplier));
}
if (recognizer.acceptWaveformFloat) {
recognizer.acceptWaveformFloat(boostedData, mySampleRate);
} else {
recognizer.acceptWaveform(event.inputBuffer);
}
} catch (err) {
console.error('音频处理错误:', err);
}
};
const source = audioContext.createMediaStreamSource(mediaStream);
source.connect(recognizerNode);
recognizerNode.connect(audioContext.destination);
isListening.value = true;
cleanupFunction = () => {
mediaStream.getTracks().forEach(track => track.stop());
source.disconnect();
recognizerNode.disconnect();
audioContext.close();
isListening.value = false;
};
} catch (err) {
console.error('初始化失败:', err);
error.value = `麦克风访问失败: ${err.message}`;
}
};
const stopRecording = () => {
if (cleanupFunction) {
cleanupFunction();
cleanupFunction = null;
}
};
onMounted(() => {
initModel();
});
onUnmounted(() => {
stopRecording();
});
</script>
<style scoped>
.voice-recognition-container {
max-width: 800px;
margin: 0 auto;
padding: 24px;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
min-height: 100vh;
font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', Roboto, sans-serif;
}
.header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 32px;
color: white;
}
.title-section {
display: flex;
align-items: center;
gap: 16px;
}
.logo {
width: 48px;
height: 48px;
background: rgba(255, 255, 255, 0.1);
border-radius: 12px;
display: flex;
align-items: center;
justify-content: center;
backdrop-filter: blur(10px);
}
.logo svg {
width: 24px;
height: 24px;
}
.app-title {
font-size: 28px;
font-weight: 700;
margin: 0;
background: linear-gradient(135deg, #ffffff 0%, #e0e0e0 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
}
.app-subtitle {
font-size: 14px;
opacity: 0.9;
margin: 4px 0 0;
}
.status-indicator {
padding: 8px 16px;
background: rgba(255, 255, 255, 0.1);
border-radius: 20px;
font-size: 14px;
display: flex;
align-items: center;
gap: 8px;
backdrop-filter: blur(10px);
}
.status-indicator.active {
background: rgba(255, 255, 255, 0.2);
color: #4ade80;
}
.pulse {
width: 8px;
height: 8px;
background: #4ade80;
border-radius: 50%;
animation: pulse 2s infinite;
}
.control-section {
background: white;
border-radius: 20px;
padding: 32px;
margin-bottom: 24px;
box-shadow: 0 10px 40px rgba(0, 0, 0, 0.1);
}
.button-group {
display: flex;
gap: 16px;
margin-bottom: 24px;
}
.control-btn {
flex: 1;
display: flex;
align-items: center;
justify-content: center;
gap: 12px;
padding: 16px 24px;
border: none;
border-radius: 12px;
font-size: 16px;
font-weight: 600;
cursor: pointer;
transition: all 0.3s ease;
color: white;
}
.control-btn svg {
width: 20px;
height: 20px;
}
.start-btn {
background: linear-gradient(135deg, #4ade80 0%, #22c55e 100%);
}
.start-btn:hover:not(:disabled) {
transform: translateY(-2px);
box-shadow: 0 8px 25px rgba(34, 197, 94, 0.3);
}
.stop-btn {
background: linear-gradient(135deg, #f87171 0%, #ef4444 100%);
}
.stop-btn:hover:not(:disabled) {
transform: translateY(-2px);
box-shadow: 0 8px 25px rgba(239, 68, 68, 0.3);
}
.control-btn:disabled {
opacity: 0.5;
cursor: not-allowed;
transform: none !important;
box-shadow: none !important;
}
.volume-indicator {
text-align: center;
}
.volume-bars {
display: flex;
justify-content: center;
gap: 4px;
height: 40px;
align-items: flex-end;
margin-bottom: 8px;
}
.bar {
width: 6px;
background: linear-gradient(to top, #4ade80, #3b82f6);
border-radius: 3px;
animation: soundwave 1.5s infinite ease-in-out;
}
.bar:nth-child(1) { animation-delay: 0.1s; height: 20px; }
.bar:nth-child(2) { animation-delay: 0.2s; height: 28px; }
.bar:nth-child(3) { animation-delay: 0.3s; height: 36px; }
.bar:nth-child(4) { animation-delay: 0.4s; height: 28px; }
.bar:nth-child(5) { animation-delay: 0.5s; height: 20px; }
.bar:nth-child(6) { animation-delay: 0.6s; height: 28px; }
.bar:nth-child(7) { animation-delay: 0.7s; height: 36px; }
.bar:nth-child(8) { animation-delay: 0.8s; height: 28px; }
.volume-label {
font-size: 14px;
color: #64748b;
}
.result-section {
background: white;
border-radius: 20px;
padding: 24px;
box-shadow: 0 10px 40px rgba(0, 0, 0, 0.1);
}
.result-header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 20px;
}
.result-title {
display: flex;
align-items: center;
gap: 8px;
margin: 0;
font-size: 20px;
color: #1e293b;
}
.result-title svg {
width: 20px;
height: 20px;
color: #3b82f6;
}
.clear-btn {
padding: 8px 16px;
background: #f1f5f9;
border: none;
border-radius: 8px;
color: #64748b;
font-size: 14px;
cursor: pointer;
transition: all 0.3s ease;
}
.clear-btn:hover {
background: #e2e8f0;
color: #475569;
}
.result-content {
min-height: 200px;
}
.final-result, .temp-result {
background: #f8fafc;
border-radius: 12px;
padding: 20px;
margin-bottom: 16px;
}
.final-result {
border-left: 4px solid #4ade80;
}
.temp-result {
border-left: 4px solid #3b82f6;
}
.result-label {
font-size: 12px;
font-weight: 600;
text-transform: uppercase;
letter-spacing: 0.5px;
color: #64748b;
margin-bottom: 8px;
display: flex;
align-items: center;
gap: 8px;
}
.text-content {
font-size: 16px;
line-height: 1.6;
color: #1e293b;
white-space: pre-wrap;
}
.result-stats {
display: flex;
gap: 16px;
margin-top: 12px;
padding-top: 12px;
border-top: 1px solid #e2e8f0;
}
.stat-item {
display: flex;
align-items: center;
gap: 4px;
font-size: 12px;
color: #64748b;
}
.stat-item svg {
width: 12px;
height: 12px;
}
.typing-indicator {
display: flex;
gap: 2px;
}
.dot {
width: 4px;
height: 4px;
background: #3b82f6;
border-radius: 50%;
animation: typing 1.4s infinite ease-in-out;
}
.dot:nth-child(1) { animation-delay: 0s; }
.dot:nth-child(2) { animation-delay: 0.2s; }
.dot:nth-child(3) { animation-delay: 0.4s; }
.empty-state {
text-align: center;
padding: 40px 20px;
color: #64748b;
}
.empty-state svg {
width: 48px;
height: 48px;
margin-bottom: 16px;
color: #cbd5e1;
}
.empty-state p {
margin: 0;
font-size: 16px;
}
.hint {
font-size: 14px;
margin-top: 4px;
color: #94a3b8;
}
.hint-section {
display: flex;
gap: 16px;
margin-top: 24px;
padding-top: 24px;
border-top: 1px solid #e2e8f0;
}
.hint-item {
display: flex;
align-items: center;
gap: 8px;
font-size: 14px;
color: #64748b;
flex: 1;
}
.hint-item svg {
width: 16px;
height: 16px;
flex-shrink: 0;
}
.error-alert {
position: fixed;
bottom: 24px;
left: 50%;
transform: translateX(-50%);
background: #fef2f2;
border: 1px solid #fecaca;
color: #dc2626;
padding: 12px 20px;
border-radius: 12px;
display: flex;
align-items: center;
gap: 8px;
box-shadow: 0 4px 20px rgba(220, 38, 38, 0.1);
z-index: 1000;
}
.error-alert svg {
width: 20px;
height: 20px;
flex-shrink: 0;
}
@keyframes pulse {
0%, 100% {
opacity: 1;
transform: scale(1);
}
50% {
opacity: 0.5;
transform: scale(1.1);
}
}
@keyframes soundwave {
0%, 100% {
height: 20px;
}
50% {
height: 40px;
}
}
@keyframes typing {
0%, 100% {
transform: translateY(0);
}
50% {
transform: translateY(-4px);
}
}
@media (max-width: 640px) {
.voice-recognition-container {
padding: 16px;
}
.header {
flex-direction: column;
gap: 16px;
text-align: center;
}
.title-section {
flex-direction: column;
text-align: center;
}
.button-group {
flex-direction: column;
}
.hint-section {
flex-direction: column;
gap: 12px;
}
}
</style>