系统设计实战 203：实时翻译系统🚀 系统设计实战 203：实时翻译系统 1. 系统概述 1.1 业务背景实时翻译系

🚀 系统设计实战 203：实时翻译系统

摘要：本文深入剖析系统的核心架构、关键算法和工程实践，提供完整的设计方案和面试要点。

你是否想过，设计实时翻译系统背后的技术挑战有多复杂？

1. 系统概述

1.1 业务背景

实时翻译系统集成语音识别、机器翻译和语音合成技术，为用户提供跨语言的实时沟通能力，广泛应用于国际会议、旅游、教育和商务场景。

1.2 核心功能

语音识别：实时语音转文字
机器翻译：多语言文本翻译
语音合成：文字转语音输出
低延迟处理：端到端毫秒级响应
多语言支持：100+语言对翻译

1.3 技术挑战

实时性要求：端到端延迟<500ms
准确性保证：语音识别和翻译准确率>95%
多语言处理：不同语言的语音特征差异
噪声处理：复杂环境下的语音增强
上下文理解：保持对话连贯性

2. 架构设计

2.1 整体架构

┌─────────────────────────────────────────────────────────────┐
│                  实时翻译系统架构                            │
├─────────────────────────────────────────────────────────────┤
│  Client Layer                                               │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ 移动端APP   │ │ Web客户端   │ │ 硬件设备    │           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
├─────────────────────────────────────────────────────────────┤
│  Gateway Layer                                              │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ WebSocket网关│ │ 负载均衡    │ │ 协议适配    │           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
├─────────────────────────────────────────────────────────────┤
│  Processing Pipeline                                        │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ 语音识别    │ │ 机器翻译    │ │ 语音合成    │           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
├─────────────────────────────────────────────────────────────┤
│  Model Layer                                                │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ ASR模型     │ │ NMT模型     │ │ TTS模型     │           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
├─────────────────────────────────────────────────────────────┤
│  Infrastructure Layer                                       │
│  ┌─────────────┐ ┌─────────────┐ ┌─────────────┐           │
│  │ GPU集群     │ │ 缓存系统    │ │ 监控告警    │           │
│  └─────────────┘ └─────────────┘ └─────────────┘           │
└─────────────────────────────────────────────────────────────┘

2.2 核心组件

2.2.1 语音识别服务

// 时间复杂度：O(N)，空间复杂度：O(1)

type SpeechRecognitionService struct {
    models      map[string]*ASRModel
    preprocessor AudioPreprocessor
    vad         VoiceActivityDetector
    cache       RecognitionCache
}

type ASRModel struct {
    ModelPath   string
    Language    string
    SampleRate  int
    Session     *tensorflow.Session
    Vocabulary  map[string]int
}

type AudioSegment struct {
    Data       []float32
    SampleRate int
    Duration   time.Duration
    Language   string
    Timestamp  time.Time
}

func (srs *SpeechRecognitionService) RecognizeStream(audioStream <-chan AudioSegment) (<-chan RecognitionResult, error) {
    resultChan := make(chan RecognitionResult, 100)
    
    go func() {
        defer close(resultChan)
        
        buffer := NewAudioBuffer(srs.getBufferSize())
        
        for segment := range audioStream {
            // 音频预处理
            processedAudio := srs.preprocessor.Process(segment)
            
            // 语音活动检测
            if !srs.vad.IsVoiceActive(processedAudio) {
                continue
            }
            
            // 添加到缓冲区
            buffer.Add(processedAudio)
            
            // 检查是否有完整的语音片段
            if buffer.HasCompleteSegment() {
                completeSegment := buffer.GetCompleteSegment()
                
                // 异步识别
                go func(audio AudioSegment) {
                    result := srs.recognizeSingle(audio)
                    if result != nil {
                        resultChan <- *result
                    }
                }(completeSegment)
            }
        }
    }()
    
    return resultChan, nil
}

func (srs *SpeechRecognitionService) recognizeSingle(audio AudioSegment) *RecognitionResult {
    // 选择合适的模型
    model := srs.models[audio.Language]
    if model == nil {
        model = srs.models["auto"] // 自动语言检测模型
    }
    
    // 特征提取
    features := srs.extractFeatures(audio)
    
    // 模型推理
    logits, err := srs.runInference(model, features)
    if err != nil {
        log.Printf("ASR inference failed: %v", err)
        return nil
    }
    
    // 解码
    text, confidence := srs.decodeLogits(logits, model.Vocabulary)
    
    return &RecognitionResult{
        Text:       text,
        Confidence: confidence,
        Language:   audio.Language,
        Timestamp:  audio.Timestamp,
        Duration:   audio.Duration,
    }
}

func (srs *SpeechRecognitionService) extractFeatures(audio AudioSegment) [][]float32 {
    // Mel频谱特征提取
    melSpectrogram := srs.computeMelSpectrogram(audio.Data, audio.SampleRate)
    
    // 归一化
    normalizedFeatures := srs.normalizeFeatures(melSpectrogram)
    
    return normalizedFeatures
}

func (srs *SpeechRecognitionService) computeMelSpectrogram(audio []float32, sampleRate int) [][]float32 {
    // STFT参数
    frameLength := 1024
    hopLength := 256
    nMels := 80
    
    // 短时傅里叶变换
    stft := srs.computeSTFT(audio, frameLength, hopLength)
    
    // 功率谱
    powerSpectrum := srs.computePowerSpectrum(stft)
    
    // Mel滤波器组
    melFilters := srs.createMelFilters(nMels, frameLength/2+1, sampleRate)
    
    // 应用Mel滤波器
    melSpectrogram := srs.applyMelFilters(powerSpectrum, melFilters)
    
    // 对数变换
    logMelSpectrogram := srs.applyLogTransform(melSpectrogram)
    
    return logMelSpectrogram
}

2.2.2 机器翻译服务

type MachineTranslationService struct {
    models       map[string]*NMTModel
    tokenizers   map[string]*Tokenizer
    cache        TranslationCache
    contextManager ContextManager
}

type NMTModel struct {
    ModelPath    string
    SourceLang   string
    TargetLang   string
    Session      *tensorflow.Session
    Vocabulary   map[string]map[string]int
    MaxLength    int
}

type TranslationRequest struct {
    Text       string
    SourceLang string
    TargetLang string
    Context    []string
    Domain     string
}

func (mts *MachineTranslationService) Translate(req TranslationRequest) (*TranslationResult, error) {
    // 检查缓存
    cacheKey := mts.buildCacheKey(req)
    if cached, exists := mts.cache.Get(cacheKey); exists {
        return cached.(*TranslationResult), nil
    }
    
    // 选择模型
    modelKey := fmt.Sprintf("%s-%s", req.SourceLang, req.TargetLang)
    model := mts.models[modelKey]
    if model == nil {
        // 尝试通过英语中转
        return mts.translateViaEnglish(req)
    }
    
    // 文本预处理
    preprocessedText := mts.preprocessText(req.Text, req.SourceLang)
    
    // 分词
    tokenizer := mts.tokenizers[req.SourceLang]
    tokens := tokenizer.Tokenize(preprocessedText)
    
    // 添加上下文
    contextTokens := mts.contextManager.GetContextTokens(req.Context, req.SourceLang)
    inputTokens := append(contextTokens, tokens...)
    
    // 转换为ID序列
    inputIDs := mts.tokensToIDs(inputTokens, model.Vocabulary[req.SourceLang])
    
    // 模型推理
    outputIDs, attention := mts.runTranslation(model, inputIDs)
    
    // 解码
    outputTokens := mts.idsToTokens(outputIDs, model.Vocabulary[req.TargetLang])
    translatedText := mts.detokenize(outputTokens, req.TargetLang)
    
    // 后处理
    finalText := mts.postprocessText(translatedText, req.TargetLang)
    
    result := &TranslationResult{
        TranslatedText: finalText,
        SourceLang:     req.SourceLang,
        TargetLang:     req.TargetLang,
        Confidence:     mts.calculateConfidence(attention),
        Timestamp:      time.Now(),
    }
    
    // 缓存结果
    mts.cache.Set(cacheKey, result, 1*time.Hour)
    
    return result, nil
}

func (mts *MachineTranslationService) runTranslation(model *NMTModel, inputIDs []int) ([]int, [][]float32) {
    // 构建输入张量
    inputTensor, err := tensorflow.NewTensor([][]int32{int32Slice(inputIDs)})
    if err != nil {
        log.Printf("Failed to create input tensor: %v", err)
        return nil, nil
    }
    
    // 运行推理
    results, err := model.Session.Run(
        map[tensorflow.Output]*tensorflow.Tensor{
            model.Session.Graph().Operation("input_ids").Output(0): inputTensor,
        },
        []tensorflow.Output{
            model.Session.Graph().Operation("output_ids").Output(0),
            model.Session.Graph().Operation("attention_weights").Output(0),
        },
        nil,
    )
    
    if err != nil {
        log.Printf("Translation inference failed: %v", err)
        return nil, nil
    }
    
    // 提取结果
    outputIDs := results[0].Value().([][]int32)[0]
    attentionWeights := results[1].Value().([][]float32)
    
    return int32SliceToInt(outputIDs), attentionWeights
}

// Beam Search解码
func (mts *MachineTranslationService) beamSearchDecode(model *NMTModel, inputIDs []int, beamSize int) []int {
    type BeamState struct {
        Sequence []int
        Score    float64
    }
    
    // 初始化beam
    beams := []*BeamState{
        {Sequence: []int{mts.getStartTokenID()}, Score: 0.0},
    }
    
    maxLength := model.MaxLength
    
    for step := 0; step < maxLength; step++ {
        candidates := make([]*BeamState, 0)
        
        for _, beam := range beams {
            if mts.isEndToken(beam.Sequence[len(beam.Sequence)-1]) {
                candidates = append(candidates, beam)
                continue
            }
            
            // 获取下一个token的概率分布
            logits := mts.getNextTokenLogits(model, inputIDs, beam.Sequence)
            topK := mts.getTopK(logits, beamSize)
            
            for _, candidate := range topK {
                newSequence := append(beam.Sequence, candidate.TokenID)
                newScore := beam.Score + math.Log(candidate.Probability)
                
                candidates = append(candidates, &BeamState{
                    Sequence: newSequence,
                    Score:    newScore,
                })
            }
        }
        
        // 选择top-k候选
        sort.Slice(candidates, func(i, j int) bool {
            return candidates[i].Score > candidates[j].Score
        })
        
        if len(candidates) > beamSize {
            candidates = candidates[:beamSize]
        }
        
        beams = candidates
        
        // 检查是否所有beam都结束
        allEnded := true
        for _, beam := range beams {
            if !mts.isEndToken(beam.Sequence[len(beam.Sequence)-1]) {
                allEnded = false
                break
            }
        }
        
        if allEnded {
            break
        }
    }
    
    // 返回得分最高的序列
    if len(beams) > 0 {
        return beams[0].Sequence
    }
    
    return []int{}
}

2.2.3 语音合成服务

type TextToSpeechService struct {
    models     map[string]*TTSModel
    vocoders   map[string]*Vocoder
    cache      SynthesisCache
    processor  AudioProcessor
}

type TTSModel struct {
    ModelPath  string
    Language   string
    Voice      string
    SampleRate int
    Session    *tensorflow.Session
}

type SynthesisRequest struct {
    Text     string
    Language string
    Voice    string
    Speed    float64
    Pitch    float64
}

func (tts *TextToSpeechService) Synthesize(req SynthesisRequest) (*AudioResult, error) {
    // 检查缓存
    cacheKey := tts.buildCacheKey(req)
    if cached, exists := tts.cache.Get(cacheKey); exists {
        return cached.(*AudioResult), nil
    }
    
    // 选择模型和声码器
    modelKey := fmt.Sprintf("%s-%s", req.Language, req.Voice)
    model := tts.models[modelKey]
    vocoder := tts.vocoders[req.Language]
    
    if model == nil || vocoder == nil {
        return nil, errors.New("model or vocoder not found")
    }
    
    // 文本预处理
    processedText := tts.preprocessText(req.Text, req.Language)
    
    // 音素转换
    phonemes := tts.textToPhonemes(processedText, req.Language)
    
    // 生成Mel频谱
    melSpectrogram := tts.generateMelSpectrogram(model, phonemes, req)
    
    // 应用韵律调整
    adjustedMel := tts.adjustProsody(melSpectrogram, req.Speed, req.Pitch)
    
    // 声码器合成音频
    audioWaveform := vocoder.Synthesize(adjustedMel)
    
    // 后处理
    finalAudio := tts.processor.PostProcess(audioWaveform, model.SampleRate)
    
    result := &AudioResult{
        AudioData:  finalAudio,
        SampleRate: model.SampleRate,
        Duration:   time.Duration(len(finalAudio)) * time.Second / time.Duration(model.SampleRate),
        Format:     "wav",
    }
    
    // 缓存结果
    tts.cache.Set(cacheKey, result, 24*time.Hour)
    
    return result, nil
}

func (tts *TextToSpeechService) generateMelSpectrogram(model *TTSModel, phonemes []string, req SynthesisRequest) [][]float32 {
    // 音素编码
    phonemeIDs := tts.phonemesToIDs(phonemes)
    
    // 构建输入
    inputTensor, _ := tensorflow.NewTensor([][]int32{int32Slice(phonemeIDs)})
    
    // 模型推理
    results, err := model.Session.Run(
        map[tensorflow.Output]*tensorflow.Tensor{
            model.Session.Graph().Operation("phoneme_ids").Output(0): inputTensor,
        },
        []tensorflow.Output{
            model.Session.Graph().Operation("mel_spectrogram").Output(0),
        },
        nil,
    )
    
    if err != nil {
        log.Printf("TTS inference failed: %v", err)
        return nil
    }
    
    melSpectrogram := results[0].Value().([][]float32)
    return melSpectrogram
}

type Vocoder struct {
    ModelPath  string
    Session    *tensorflow.Session
    SampleRate int
}

func (v *Vocoder) Synthesize(melSpectrogram [][]float32) []float32 {
    // 构建输入张量
    inputTensor, _ := tensorflow.NewTensor([][][]float32{melSpectrogram})
    
    // 声码器推理
    results, err := v.Session.Run(
        map[tensorflow.Output]*tensorflow.Tensor{
            v.Session.Graph().Operation("mel_input").Output(0): inputTensor,
        },
        []tensorflow.Output{
            v.Session.Graph().Operation("audio_output").Output(0),
        },
        nil,
    )
    
    if err != nil {
        log.Printf("Vocoder synthesis failed: %v", err)
        return nil
    }
    
    audioWaveform := results[0].Value().([]float32)
    return audioWaveform
}

3. 实时处理管道

3.1 流式处理架构

type RealtimeTranslationPipeline struct {
    asrService *SpeechRecognitionService
    mtService  *MachineTranslationService
    ttsService *TextToSpeechService
    buffer     *StreamBuffer
    latencyTracker LatencyTracker
}

func (rtp *RealtimeTranslationPipeline) ProcessAudioStream(audioStream <-chan AudioSegment, sourceLang, targetLang string) (<-chan AudioResult, error) {
    outputChan := make(chan AudioResult, 10)
    
    go func() {
        defer close(outputChan)
        
        // 语音识别流
        recognitionResults, err := rtp.asrService.RecognizeStream(audioStream)
        if err != nil {
            log.Printf("ASR stream failed: %v", err)
            return
        }
        
        // 处理识别结果
        for result := range recognitionResults {
            startTime := time.Now()
            
            // 翻译
            translationReq := MachineTranslationService.TranslationRequest{
                Text:       result.Text,
                SourceLang: sourceLang,
                TargetLang: targetLang,
            }
            
            translationResult, err := rtp.mtService.Translate(translationReq)
            if err != nil {
                log.Printf("Translation failed: %v", err)
                continue
            }
            
            // 语音合成
            synthesisReq := TextToSpeechService.SynthesisRequest{
                Text:     translationResult.TranslatedText,
                Language: targetLang,
                Voice:    "default",
                Speed:    1.0,
                Pitch:    1.0,
            }
            
            audioResult, err := rtp.ttsService.Synthesize(synthesisReq)
            if err != nil {
                log.Printf("TTS failed: %v", err)
                continue
            }
            
            // 记录延迟
            endTime := time.Now()
            latency := endTime.Sub(startTime)
            rtp.latencyTracker.RecordLatency(latency)
            
            // 输出结果
            audioResult.SourceText = result.Text
            audioResult.TranslatedText = translationResult.TranslatedText
            audioResult.Latency = latency
            
            outputChan <- *audioResult
        }
    }()
    
    return outputChan, nil
}

3.2 延迟优化

type LatencyOptimizer struct {
    asrOptimizer *ASROptimizer
    mtOptimizer  *MTOptimizer
    ttsOptimizer *TTSOptimizer
    pipelineOptimizer *PipelineOptimizer
}

type ASROptimizer struct {
    chunkSize    int
    overlapRatio float64
    vadThreshold float64
}

func (ao *ASROptimizer) OptimizeChunking(audioStream <-chan AudioSegment) <-chan AudioSegment {
    optimizedStream := make(chan AudioSegment, 10)
    
    go func() {
        defer close(optimizedStream)
        
        buffer := NewCircularBuffer(ao.chunkSize)
        
        for segment := range audioStream {
            buffer.Add(segment)
            
            // 检查是否有足够的数据进行处理
            if buffer.HasEnoughData() {
                chunk := buffer.GetOptimalChunk()
                optimizedStream <- chunk
            }
        }
    }()
    
    return optimizedStream
}

type MTOptimizer struct {
    cache          TranslationCache
    prefixCache    PrefixCache
    batchProcessor BatchProcessor
}

func (mo *MTOptimizer) OptimizeTranslation(text string, sourceLang, targetLang string) (*TranslationResult, error) {
    // 1. 检查前缀缓存
    if prefixResult := mo.prefixCache.GetLongestPrefix(text, sourceLang, targetLang); prefixResult != nil {
        // 只翻译剩余部分
        remainingText := text[len(prefixResult.SourcePrefix):]
        if remainingText == "" {
            return prefixResult.Result, nil
        }
        
        // 翻译剩余部分并合并
        remainingResult, err := mo.translateRemaining(remainingText, sourceLang, targetLang)
        if err != nil {
            return nil, err
        }
        
        mergedResult := mo.mergeResults(prefixResult.Result, remainingResult)
        return mergedResult, nil
    }
    
    // 2. 正常翻译流程
    return mo.translateNormal(text, sourceLang, targetLang)
}

type TTSOptimizer struct {
    streamingSynthesis bool
    chunkSize         int
    parallelVocoders  int
}

func (to *TTSOptimizer) OptimizeSynthesis(text string, language, voice string) (*AudioResult, error) {
    if !to.streamingSynthesis {
        return to.synthesizeNormal(text, language, voice)
    }
    
    // 流式合成
    sentences := to.splitIntoSentences(text, language)
    audioChunks := make([][]float32, len(sentences))
    
    // 并行合成
    var wg sync.WaitGroup
    semaphore := make(chan struct{}, to.parallelVocoders)
    
    for i, sentence := range sentences {
        wg.Add(1)
        go func(idx int, sent string) {
            defer wg.Done()
            
            semaphore <- struct{}{} // 获取信号量
            defer func() { <-semaphore }() // 释放信号量
            
            audio, err := to.synthesizeSentence(sent, language, voice)
            if err != nil {
                log.Printf("Failed to synthesize sentence %d: %v", idx, err)
                return
            }
            
            audioChunks[idx] = audio
        }(i, sentence)
    }
    
    wg.Wait()
    
    // 合并音频
    finalAudio := to.concatenateAudio(audioChunks)
    
    return &AudioResult{
        AudioData:  finalAudio,
        SampleRate: 22050,
        Duration:   time.Duration(len(finalAudio)) * time.Second / 22050,
        Format:     "wav",
    }, nil
}

4. 性能监控

4.1 延迟监控

type LatencyMonitor struct {
    metrics map[string]*LatencyMetrics
    alerts  AlertManager
}

type LatencyMetrics struct {
    P50    time.Duration
    P95    time.Duration
    P99    time.Duration
    Mean   time.Duration
    Count  int64
    Window time.Duration
}

func (lm *LatencyMonitor) RecordLatency(component string, latency time.Duration) {
    metrics := lm.metrics[component]
    if metrics == nil {
        metrics = &LatencyMetrics{Window: 5 * time.Minute}
        lm.metrics[component] = metrics
    }
    
    metrics.addSample(latency)
    
    // 检查是否超过阈值
    if metrics.P95 > lm.getThreshold(component) {
        lm.alerts.TriggerAlert(&Alert{
            Component: component,
            Metric:    "p95_latency",
            Value:     metrics.P95,
            Threshold: lm.getThreshold(component),
            Severity:  SeverityHigh,
        })
    }
}

实时翻译系统通过优化的流式处理管道和先进的AI模型，实现了低延迟、高质量的跨语言实时沟通，为全球化交流提供了强大的技术支撑。

🎯 场景引入

你打开App，

你打开手机准备使用设计实时翻译系统服务。看似简单的操作背后，系统面临三大核心挑战：

挑战一：高并发——如何在百万级 QPS 下保持低延迟？
挑战二：高可用——如何在节点故障时保证服务不中断？
挑战三：数据一致性——如何在分布式环境下保证数据正确？

📈 容量估算

假设 DAU 1000 万，人均日请求 50 次

指标	数值
日活用户	500 万
峰值 QPS	~5 万/秒
数据存储	~5 TB
P99 延迟	< 100ms
可用性	99.99%
日增数据	~50 GB
服务节点数	20-50

❓ 高频面试问题

Q1：实时翻译系统的核心设计原则是什么？

参考正文中的架构设计部分，核心原则包括：高可用（故障自动恢复）、高性能（低延迟高吞吐）、可扩展（水平扩展能力）、一致性（数据正确性保证）。面试时需结合具体场景展开。

Q2：实时翻译系统在大规模场景下的主要挑战是什么？

性能瓶颈：随着数据量和请求量增长，单节点无法承载；2) 一致性：分布式环境下的数据一致性保证；3) 故障恢复：节点故障时的自动切换和数据恢复；4) 运维复杂度：集群管理、监控、升级。

Q3：如何保证实时翻译系统的高可用？

多副本冗余（至少 3 副本）；2) 自动故障检测和切换（心跳 + 选主）；3) 数据持久化和备份；4) 限流降级（防止雪崩）；5) 多机房/多活部署。

Q4：实时翻译系统的性能优化有哪些关键手段？

缓存（减少重复计算和 IO）；2) 异步处理（非关键路径异步化）；3) 批量操作（减少网络往返）；4) 数据分片（并行处理）；5) 连接池复用。

Q5：实时翻译系统与同类方案相比有什么优劣势？

参考方案对比表格。选型时需考虑：团队技术栈、数据规模、延迟要求、一致性需求、运维成本。没有银弹，需根据业务场景权衡取舍。

| 方案一 | 简单实现 | 低 | 适合小规模 | | 方案二 | 中等复杂度 | 中 | 适合中等规模 | | 方案三 | 高复杂度 ⭐推荐 | 高 | 适合大规模生产环境 |

🚀 架构演进路径

阶段一：单机版 MVP（用户量 < 10 万）

单体应用 + 单机数据库
功能验证优先，快速迭代
适用场景：产品早期验证

阶段二：基础版分布式（用户量 10 万 - 100 万）

应用层水平扩展（无状态服务 + 负载均衡）
数据库主从分离（读写分离）
引入 Redis 缓存热点数据
适用场景：业务增长期

阶段三：生产级高可用（用户量 > 100 万）

微服务拆分，独立部署和扩缩容
数据库分库分表（按业务维度分片）
引入消息队列解耦异步流程
多机房部署，异地容灾
全链路监控 + 自动化运维

✅ 架构设计检查清单

检查项	状态	说明
高可用	✅	多副本部署，自动故障转移，99.9% SLA
可扩展	✅	无状态服务水平扩展，数据层分片
数据一致性	✅	核心路径强一致，非核心最终一致
安全防护	✅	认证授权 + 加密 + 审计日志
监控告警	✅	Metrics + Logging + Tracing 三支柱
容灾备份	✅	多机房部署，定期备份，RPO < 1 分钟
性能优化	✅	多级缓存 + 异步处理 + 连接池
灰度发布	✅	支持按用户/地域灰度，快速回滚

系统设计实战 203：实时翻译系统