iOS 音视频开发相关基础知识

716 阅读3分钟

deepseek_mermaid_20250419_adfbed.png

1. 音视频采集阶段

1.1 视频采集(AVFoundation框架)

  • 核心类AVCaptureSession

    let captureSession = AVCaptureSession()
    captureSession.sessionPreset = .hd1920x1080 // 设置分辨率
    
  • 详细流程

    1. 硬件选择

      • AVCaptureDevice.DiscoverySession 遍历摄像头设备(前置/后置/多摄)
      • 通过AVCaptureDevice.Format选择支持的像素格式(如420f/YUV)
    2. 输入输出配置

      let cameraInput = try AVCaptureDeviceInput(device: cameraDevice)
      if captureSession.canAddInput(cameraInput) {
          captureSession.addInput(cameraInput)
      }
      // 视频输出
      let videoOutput = AVCaptureVideoDataOutput()
      videoOutput.setSampleBufferDelegate(self, queue: videoQueue)
      
    3. 帧率控制

      try cameraDevice.lockForConfiguration()
      cameraDevice.activeVideoMinFrameDuration = CMTime(value: 1, timescale: 30) // 30fps
      cameraDevice.unlockForConfiguration()
      
    4. 动态切换:运行时动态切换摄像头(需处理AVCaptureSessionbeginConfiguration/commitConfiguration

1.2 音频采集(AVAudioEngine框架)

  • 核心类AVAudioEngine + AVAudioInputNode

    let audioEngine = AVAudioEngine()
    let inputNode = audioEngine.inputNode
    let bus = 0 // 输入总线
    inputNode.installTap(onBus: bus, bufferSize: 1024, format: inputNode.inputFormat(forBus: bus)) { (buffer, time) in
        // 获取PCM原始数据(buffer.floatChannelData)
    }
    audioEngine.prepare()
    try audioEngine.start()
    
  • 关键细节

    • 权限AVAudioSessionrequestRecordPermission检查麦克风权限
    • 采样率:通过AVAudioFormat设置采样率(如44.1kHz)
    • 多声道处理:支持立体声采集(buffer.format.channelCount

2. 预处理/处理阶段

2.1 视频处理(实时滤镜/美颜)

  • 方案一:CoreImage(CPU处理,简单滤镜)

    let filter = CIFilter(name: "CIColorInvert")
    filter?.setValue(CIImage(cvPixelBuffer: pixelBuffer), forKey: kCIInputImageKey)
    let outputImage = filter?.outputImage
    
  • 方案二:Metal(GPU加速,高性能)

    • 使用MTKView + 自定义着色器(.metal文件)

    • 核心步骤

      1. 创建CVMetalTextureCacheCMSampleBuffer转为Metal纹理
      2. 通过MTLCommandQueue提交渲染指令
      3. 应用自定义滤镜(如高斯模糊、人脸识别)

2.2 音频处理(降噪/回声消除)

  • 使用AudioUnit(低延迟处理)

    // 创建AURadioTap处理音频流
    let componentDesc = AudioComponentDescription(
        componentType: kAudioUnitType_Output,
        componentSubType: kAudioUnitSubType_RemoteIO,
        componentManufacturer: kAudioUnitManufacturer_Apple,
        componentFlags: 0,
        componentFlagsMask: 0)
    var audioUnit: AudioUnit?
    AudioComponentInstanceNew(componentDesc, &audioUnit)
    
    // 设置回调函数处理音频Buffer
    AURenderCallbackStruct callbackStruct;
    callbackStruct.inputProc = audioProcessingCallback;
    AudioUnitSetProperty(audioUnit, 
        kAudioOutputUnitProperty_SetInputCallback, 
        kAudioUnitScope_Global, 
        0, 
        &callbackStruct, 
        sizeof(callbackStruct));
    
  • 典型处理

    • WebRTC音频模块:集成libwebrtcAudioProcessingModule(NS、AEC、AGC)
    • 自定义算法:通过Accelerate框架做FFT频域处理

3. 编码阶段

3.1 视频编码(H.264/H.265硬件编码)

  • VideoToolbox框架

    // 创建编码会话
    var compressionSession: VTCompressionSession?
    VTCompressionSessionCreate(
        allocator: kCFAllocatorDefault,
        width: width,
        height: height,
        codecType: kCMVideoCodecType_H264,
        encoderSpecification: nil,
        imageBufferAttributes: nil,
        compressedDataAllocator: nil,
        outputCallback: videoEncodeCallback,
        refcon: nil,
        compressionSessionOut: &compressionSession)
    
    // 设置关键参数
    VTSessionSetProperty(compressionSession, 
        key: kVTCompressionPropertyKey_RealTime, 
        value: kCFBooleanTrue)
    VTSessionSetProperty(compressionSession, 
        key: kVTCompressionPropertyKey_ExpectedFrameRate, 
        value: 30 as CFNumber)
    
  • 编码输出回调

    func videoEncodeCallback(_ outputCallbackRefCon: UnsafeMutableRawPointer?, 
                            _ sourceFrameRefCon: UnsafeMutableRawPointer?, 
                            _ status: OSStatus, 
                            _ infoFlags: VTEncodeInfoFlags, 
                            _ sampleBuffer: CMSampleBuffer?) {
        // 获取NALU数据(包括SPS/PPS/IDR/P帧)
        let dataBuffer = CMSampleBufferGetDataBuffer(sampleBuffer!)
        var blockBufferLength = 0
        CMBlockBufferGetDataLength(dataBuffer!, 0, nil, &blockBufferLength)
    }
    

3.2 音频编码(AAC/Opus)

  • AudioConverter(软编)

    var converter: AudioConverterRef?
    AudioConverterNew(&inFormat, &outFormat, &converter)
    
    // 输入输出缓冲区
    var inputBufferList = AudioBufferList()
    var outputBuffer = [UInt8](repeating:0, count:1024)
    
    // 执行编码
    AudioConverterFillComplexBuffer(converter, 
                                  inInputDataProc, 
                                  &inputData, 
                                  &outputPacketCount, 
                                  &outputBufferList, 
                                  nil)
    
  • 硬编方案:通过AVAssetWriter直接写入压缩格式(需配置AVAudioSettings


4. 传输/存储阶段

4.1 实时传输(WebRTC/RTP)

  • WebRTC核心流程

    1. PeerConnection创建

      let config = RTCConfiguration()
      config.iceServers = [RTCIceServer(urlStrings: ["stun:stun.l.google.com:19302"])]
      let pcFactory = RTCPeerConnectionFactory()
      let peerConnection = pcFactory.peerConnection(with: config, constraints: nil, delegate: nil)
      
    2. 传输音视频轨道

      let videoTrack = pcFactory.videoTrack(with: videoSource, trackId: "video0")
      peerConnection.add(videoTrack, streamIds: ["stream1"])
      
    3. NAT穿透:ICE Candidate交换,TURN/STUN服务器配置

4.2 本地存储(MP4/MOV封装)

  • AVAssetWriter写入文件

    let writer = try AVAssetWriter(outputURL: outputURL, fileType: .mp4)
    let videoSettings: [String: Any] = [
        AVVideoCodecKey: AVVideoCodecType.h264,
        AVVideoWidthKey: 1280,
        AVVideoHeightKey: 720
    ]
    let videoInput = AVAssetWriterInput(mediaType: .video, outputSettings: videoSettings)
    writer.add(videoInput)
    
    // 写入SampleBuffer
    videoInput.append(sampleBuffer)
    

5. 解码与渲染阶段

5.1 视频解码(VideoToolbox硬解)

  • 解码会话创建

    var decompressionSession: VTDecompressionSession?
    VTDecompressionSessionCreate(
        allocator: kCFAllocatorDefault,
        formatDescription: formatDesc,
        decoderSpecification: nil,
        imageBufferAttributes: nil,
        outputCallback: videoDecodeCallback,
        decompressionSessionOut: &decompressionSession)
    
  • 渲染到屏幕

    • 方案一AVSampleBufferDisplayLayer直接显示CMSampleBuffer
    • 方案二:通过Metal纹理渲染(CVMetalTextureRef

5.2 音频播放(AudioQueue/AVAudioEngine)

  • 低延迟播放(AudioQueue)

    var audioQueue: AudioQueueRef?
    AudioQueueNewOutput(&asbd, 
                       audioQueueOutputCallback, 
                       nil, 
                       nil, 
                       nil, 
                       0, 
                       &audioQueue)
    AudioQueueStart(audioQueue, nil)
    
    // 填充PCM数据到Buffer
    AudioQueueEnqueueBuffer(audioQueue, buffer, 0, nil)
    

关键问题与优化

  1. 同步问题

    • 音视频同步:通过CMTime计算PTS/DTS,对齐时间戳
    • 丢帧策略:动态调整视频帧率保证音频连续性
  2. 性能优化

    • 编码参数:CBR/VBR码率控制、GOP长度(影响延迟)
    • 渲染线程:Metal渲染必须在主线程外(避免卡顿)
  3. 设备兼容性

    • 编码格式检查:通过AVCaptureDevice.formats检测设备支持的格式
    • 多机型适配:动态调整分辨率和码率(如iPhone SE vs iPhone 15 Pro)