VideoToolbox 硬解码 h.264VideoToolbox 是 Apple 在 iOS 8 之后推出的用于视频

前言

VideoToolbox 是 Apple 在 iOS 8 之后推出的用于视频硬编码、解码的工具库。平时所说的软编解码是指使用 ffmpeg 这个第三方库去做编码解码。

1. 获取编码后的视频数据

一般网络获取的音视频信息都是封装好的格式比如 mp4，flv 等等，如果是这种方式的话，我自己的思路是使用 FFmpeg 解析多媒体文件，然后分离出对应的音频数据和视频数据再做解码工作，这里可以参考小东邪的这篇博客：iOS利用VideoToolbox实现视频硬解码
还有一种做法是直接读取 h.264 视频文件，可以参考 落影loyinglin 的这篇博客：使用VideoToolbox硬解码H.264
我自己的做法是直接获取编码后的视频信息 frameData（边编码边解码），然后开始解码工作，下面的讲解也是这样处理的。本文讲的是 h.264 的解码，关于 h.265的解码有兴趣的同学可以参考小东邪的博客学习一下。

2. 开始解码

拿到外部传进来的数据 (NSData *)frame，然后转为 uint8_t *，并获取数据大小 frame.length，调用内部方法 - (void)decodeNaluData:(uint8_t *)frame size:(uint32_t)size，sps 数据和 pps 数据是不需要解码的，读取并保存起来以供后面使用。

- (void)decodeNaluData:(NSData *)frame
{
    dispatch_async(_decodeQueue, ^{
        // 判断数据类型,来决定是否需要解码 (PPS/SPS 数据不需要解码, 直接获取)
        // 1.获取二进制数据, 将数据拆解开来, NSData -> 二进制数据 bytes
        uint8_t *naluBytes = (uint8_t *)frame.bytes;
        // 2.调用解码方法
        [self decodeNaluData:naluBytes size:(uint32_t)frame.length];
    });
}

获取 nalu_type，5表示I 帧, 7表示 sps 信息, 8表示 pps 信息。
计算 naluSize ，这里注意是用总长度减去 4，才是数据的真正长度，这个 4 是起始码startCode的大小
根据不同的数据类型做进一步处理
- I 帧数据我们就调用 - (CVPixelBufferRef)decode:(uint8_t *)frame withSize:(uint32_t)frameSize, 在这之前一定要初始化解码器。
- sps 信息的话我们记录其数据大小 _spsSize，并保存其数据在 _sps 中
- pps 信息的话也是记录其数据大小 _ppsSize，并保存其数据在 _pps 中

- (void)decodeNaluData:(uint8_t *)frame size:(uint32_t)size
{
    // 数据类型: frame, 前面 4 个字节是起始位 00 00 00 01
    // 从第 5 位开始才是真正的数据, 转换为 10 进制, 5:I 帧,7:sps, 8:pps
    /*
     对于h264类型数据将 数据 & 上 0x1F 可以确定 NALU header 的类型,
     对于h265类型数据将 数据 & 上 0x4F 可以确定 NALU header 的类型
     */
    int nalu_type = (frame[4] & 0x1F);
    
    // 2.
    uint32_t naluSize = size - 4; // 总长度减去起始位的 4, 才是获取的数据的长度
    uint8_t *pNaluSize = (uint8_t *)(&naluSize);
    // 下面的代码其实是系统端字节序(小端)转大端字节序, 因为在编码的时候我们是把大端转为系统端,这里是一个逆操作
//    frame[0] = *(pNaluSize + 3);
//    frame[1] = *(pNaluSize + 2);
//    frame[2] = *(pNaluSize + 1);
//    frame[3] = *(pNaluSize + 0);
    // CFSwapInt32HostToBig(naluSize) 这行代码和上面的写法是等价的
    *pNaluSize = CFSwapInt32HostToBig(naluSize);
    
    CVPixelBufferRef pixelBuffer = NULL;
    // 什么时候初始化解码器? 在第一次获取到关键帧时, 就初始化解码器
    switch (nalu_type) {
        case 0x05: {
            // I 帧数据
            if ([self initDecoder]) {
                pixelBuffer = [self decode:frame withSize:size];
            }
        }
            break;
        case 0x07:{
            // 保存 sps 数据
            _spsSize = naluSize;
            // 申请内存空间
            _sps = malloc(_spsSize);
            // 从 frame 的第 5 位地址开始复制内容, 赋值的长度为 _spsSize, 将这个值拷贝到 _sps 中
            memcpy(_sps, &frame[4], _spsSize);
        }
            break;
        case 0x08:{
            // 保存 pps 数据
            _ppsSize = naluSize;
            _pps = malloc(_ppsSize);
            memcpy(_pps, &frame[4], _ppsSize);
        }
            break;
        default:{
            // B/P 帧数据
            if ([self initDecoder]) {
                pixelBuffer = [self decode:frame withSize:size];
            }
        }
            break;
    }
}

2.1 初始化解码器

根据保存的 sps 信息和 pps 信息调用CMVideoFormatDescriptionCreateFromH264ParameterSets() 函数创建解码参数 CMVideoFormatDescriptionRef。
设置 CMVideoFormatDescriptionRef 的相关参数：
- 视频颜色的格式YUV420: kCVPixelBufferPixelFormatTypeKey
- 视频的宽 kCVPixelBufferWidthKey
- 视频的高 kCVPixelBufferHeightKey
- 零拷贝通道（允许 OpenGL 的上下文能够直接对解码后的图像数据进行绘制, 不需要数据总线与 CPU 之间复制数据） kCVPixelBufferOpenGLCompatibilityKey
- 解码完成后的回调 VTDecompressionOutputCallbackRecord
调用 VTDecompressionSessionCreate() 函数创建解压回话 VTDecompressionSessionRef, 并设置实时编码属性kVTDecompressionPropertyKey_RealTime。

- (BOOL)initDecoder
{
    if (_decodeSession) {
        return YES;
    }
    // 1.根据 sps,pps 设置解码参数
    // 存放 sps 和 pps 信息的数组
    const uint8_t *const parameterSetPointers[2] = {_sps, _pps};
    // 存放 sps 和 pps 信息长度的数组
    const size_t paramterSetSizes[2] = {_spsSize, _ppsSize};
    // 大端模式的起始位的长度固定为 4
    int naluHeaderLength = 4;
    /*
     CMVideoFormatDescriptionCreateFromH264ParameterSets(CFAllocatorRef  _Nullable allocator, size_t parameterSetCount, const uint8_t *const  _Nonnull * _Nonnull parameterSetPointers, const size_t * _Nonnull parameterSetSizes, int NALUnitHeaderLength, CMFormatDescriptionRef  _Nullable * _Nonnull formatDescriptionOut)
     参数 1: allocator 内存分配器, 使用默认的 KCFAllocatorDefault
     参数 2: parameterSetCount 解码参数的个数 2 (分别为 sps, pps)
     参数 3: parameterSetPointers 参数集的地址
     参数 4: parameterSetSizes 参数集的大小
     参数 5: NALUnitHeaderLength 流数据起始位的长度
     参数 6: formatDescriptionOut 解码器的描述
     */
    OSStatus status = CMVideoFormatDescriptionCreateFromH264ParameterSets( kCFAllocatorDefault, 2, parameterSetPointers, paramterSetSizes, naluHeaderLength, &_decodeDesc);
    if (status != noErr) {
        NSLog(@"initDecoder CMVideoFormatDescriptionCreateFromH264ParameterSets() failed! with status:%d", (int)status);
        return NO;
    }
    
    // 2.解码参数
    /*
     kCVPixelBufferPixelFormatTypeKey: 摄像头输出格式: YUV4:2:0(kCVPixelFormatType_420YpCbCr8BiPlanarFullRange)
     kCVPixelBufferWidthKey: 视频捕获的宽
     kCVPixelBufferHeightKey: 视频捕获的高
     kCVPixelBufferOpenGLCompatibilityKey: 允许 OpenGL 的上下文能够直接对解码后的图像数据进行绘制, 不需要数据总线与 CPU 之间复制数据,简称零拷贝通道
     */
    NSDictionary *destinationPixelBufferAttrs = @{      
        (id)kCVPixelBufferPixelFormatTypeKey: [NSNumber numberWithInt:kCVPixelFormatType_420YpCbCr8BiPlanarFullRange],
        (id)kCVPixelBufferWidthKey: [NSNumber numberWithInteger:_config.width],
        (id)kCVPixelBufferHeightKey: [NSNumber numberWithInteger:_config.height],
        (id)kCVPixelBufferOpenGLCompatibilityKey: [NSNumber numberWithBool:YES]
    };
    // ,
    
    // 解码完成后的回调
    VTDecompressionOutputCallbackRecord callbackRecord;
    // 设置解码后的回调指针指向我们声明的 C 语言函数名 videoDecompressionOutputCallback
    callbackRecord.decompressionOutputCallback = videoDecompressionOutputCallback;
    // 因为在上面的 C 语言函数中会掉用 OC 的方法, 因此将调用者(self)传过去
    callbackRecord.decompressionOutputRefCon = (__bridge void *)self;
    
    // 3. 创建 session
    /*
     调用 VTDecompressionSessionCreate() 函数来创建 解压 session
     VTDecompressionSessionCreate(CFAllocatorRef  _Nullable allocator, CMVideoFormatDescriptionRef  _Nonnull videoFormatDescription, CFDictionaryRef  _Nullable videoDecoderSpecification, CFDictionaryRef  _Nullable destinationImageBufferAttributes, const VTDecompressionOutputCallbackRecord * _Nullable outputCallback, VTDecompressionSessionRef  _Nullable * _Nonnull decompressionSessionOut)
     参数 1: allocator 内存分配器, 使用默认的 kCFAllocatorDefault
     参数 2: videoFormatDescription 描述源视频帧的视频格式 _decodeDesc
     参数 3: videoDecoderSpecification 是否需要特定的视频解码器, 不需要给 NULL 即可
     参数 4: destinationImageBufferAttributes 描述源像素的缓存区,如果没特殊要求给 NULL 即可, 有的话就把设置的字典信息填进去
     参数 5: outputCallback 已经解码完成的回调函数
     参数 6: decompressionSessionOut 解压 session
     */
    status = VTDecompressionSessionCreate(kCFAllocatorDefault, _decodeDesc, NULL, (__bridge CFDictionaryRef)destinationPixelBufferAttrs, &callbackRecord, &_decodeSession);
    if (status != noErr) {
        NSLog(@"initDecoder VTDecompressionSessionCreate() failed! ");
        return NO;
    }
    
    // 4. 设置解码器是实时解码
    status = VTSessionSetProperty(_decodeSession, kVTDecompressionPropertyKey_RealTime, kCFBooleanTrue);
    if (status != noErr) {
        NSLog(@"initDecoder VTSessionSetProperty() failed! ");
        return YES;
    }
    NSLog(@"video decoder init success!");
    return YES;
}

2.2 解码视频数据

调用 CMBlockBufferCreateWithMemoryBlock() 喊出函数创建 CMBlockBufferRef。
拿到第一步创建的 blockBuffer，调用 CMSampleBufferCreateReady() 函数创建 CMSampleBufferRef。
调用 VTDecompressionSessionDecodeFrame() 解码函数从第二步的进行 sampleBuffer 数据进行解码。这里我们设置了两个解码参数：
- 使用低功耗模式：VTDecodeFrameFlags flag1 = kVTDecodeFrame_1xRealTimePlayback
- 异步解码：VTDecodeInfoFlags flag2 = kVTDecodeInfo_Asynchronous
不管成功或者失败都会从 videoDecompressionOutputCallback() 函数拿到解码完成的回调

- (CVPixelBufferRef)decode:(uint8_t *)frame withSize:(uint32_t)frameSize
{
    CVPixelBufferRef outputPixelBuffer = NULL;
    CMBlockBufferRef blockBuffer = NULL;
    CMBlockBufferFlags flags = 0;
    
    // 1.创建 blockBuffer
    /*
     CMBlockBufferCreateWithMemoryBlock(CFAllocatorRef  _Nullable structureAllocator, void * _Nullable memoryBlock, size_t blockLength, CFAllocatorRef  _Nullable blockAllocator, const CMBlockBufferCustomBlockSource * _Nullable customBlockSource, size_t offsetToData, size_t dataLength, CMBlockBufferFlags flags, CMBlockBufferRef  _Nullable * _Nonnull blockBufferOut)
     参数 1: structureAllocator 内存分配器
     参数 2: memoryBlock 内容,也就是帧,frame
     参数 3: blockLength 内容的长度
     参数 4: blockAllocator 内存分配器. 给 NULL/kCFAllocatorNull 即可
     参数 5: customBlockSource NULL
     参数 6: offsetToData 数据偏移量, 没有的话就给 0, 从头开始读取
     参数 7: dataLength 数据长度
     参数 8: flags 给 0 即可
     参数 9: blockBufferOut blockBuffer 的地址
     */
    OSStatus status = CMBlockBufferCreateWithMemoryBlock( kCFAllocatorDefault, frame, frameSize, kCFAllocatorNull, NULL, 0, frameSize, flags, &blockBuffer);
    if (status != kCMBlockBufferNoErr) {
        NSLog(@"decodeWithSize CMBlockBufferCreateWithMemoryBlock failed!");
        return outputPixelBuffer;
    }
    
    // 2. 创建 sampleBuffer
    /*
    参数1: allocator 分配器,使用默认内存分配, kCFAllocatorDefault
    参数2: blockBuffer.需要编码的数据blockBuffer.不能为NULL
    参数3: formatDescription,视频输出格式
    参数4: numSamples.CMSampleBuffer 个数.
    参数5: numSampleTimingEntries 必须为0,1,numSamples
    参数6: sampleTimingArray.  数组.为空
    参数7: numSampleSizeEntries 默认为1
    参数8: sampleSizeArray
    参数9: sampleBuffer对象
    */
    CMSampleBufferRef sampleBuffer = NULL;
    const size_t sampleSizeArray[] = {frameSize};
    status = CMSampleBufferCreateReady(kCFAllocatorDefault, blockBuffer, _decodeDesc, 1, 0, NULL, 1, sampleSizeArray, &sampleBuffer);
    if (status != noErr || sampleBuffer == NULL) {
        NSLog(@"decodeWithSize CMSampleBufferCreateReady() failed or sampleBuffer is null");
        CFRelease(blockBuffer);
        return outputPixelBuffer;
    }
    
    // 3.调用解码函数
    // 低功耗模式
    VTDecodeFrameFlags flag1 = kVTDecodeFrame_1xRealTimePlayback;
    // 异步解码
    VTDecodeInfoFlags flag2 =kVTDecodeInfo_Asynchronous;
    /*
    VTDecompressionSessionDecodeFrame(VTDecompressionSessionRef  _Nonnull session, CMSampleBufferRef  _Nonnull sampleBuffer, VTDecodeFrameFlags decodeFlags, void * _Nullable sourceFrameRefCon, VTDecodeInfoFlags * _Nullable infoFlagsOut)
     参数 1: session 解码会话
     参数 2: sampleBuffer 源数据 包含一个或多个视频帧的CMsampleBuffer
     参数 3: decodeFlags 解码标记标志, 采用低功耗模式
     参数 4: sourceFrameRefCon 解码后的数据地址
     参数 5: infoFlagsOut 同步/异步解码信息, 采用异步解码
    */
    status = VTDecompressionSessionDecodeFrame(_decodeSession, sampleBuffer, flag1, &outputPixelBuffer, &flag2);
    if (status == kVTInvalidSessionErr) {
        NSLog(@"Video hard decode  InvalidSessionErr status = %d", (int)status);
    } else if (status == kVTVideoDecoderBadDataErr) {
        NSLog(@"Video hard decode  BadData status = %d", (int)status);
    } else if (status != noErr) {
        NSLog(@"Video hard decode failed status = %d", (int)status);
    }
    
    CFRelease(blockBuffer);
    CFRelease(sampleBuffer);
    return outputPixelBuffer; 
}

2.3 解码完成的回调

先做容错判断，然后将解码后的数据通过代理方法回调出去，供视频渲染工具类使用，关于视频渲染的话使用的是 OpenGL ES + CAEAGLLayer 实现的，借鉴了开源第三方库 GPUImage 里面的做法，有兴趣的可以认真研究一下。

/*
  参数 1: 回调引用
  参数 2: 帧引用
  参数 3: 状态标识
  参数 4: 同步还是异步解码
  参数 5: 实际的图像缓存
  参数 6: 图像出现的时间戳
  参数 7: 图像的持续时间
 */
void videoDecompressionOutputCallback(void * CM_NULLABLE decompressionOutputRefCon,
                                      void * CM_NULLABLE sourceFrameRefCon,
                                      OSStatus status,
                                      VTDecodeInfoFlags infoFlags,
                                      CM_NULLABLE CVImageBufferRef imageBuffer,
                                      CMTime presentationTimeStamp,
                                      CMTime presentationDuration){
    // 回调方法! 在设置 session 的时候配置的
    // 解码完成后即可回到这个方法(成功或者失败)
    if (status != noErr) {
        // 失败了
        NSLog(@"videoDecompressionOutputCallback failed status=%d", (int)status);
        return;
    }
    
    if (imageBuffer == NULL) {
        NSLog(@"CVImageBufferRef imageBuffer is NULL!");
        if (sourceFrameRefCon) {
            free(sourceFrameRefCon);
        }
        return;
    }
    
    // 解码后的数据 sourceFrameRefCon -> CVPixelBufferRef
    CVPixelBufferRef *outputPixelBuffer = (CVPixelBufferRef *)sourceFrameRefCon;
    *outputPixelBuffer = CVPixelBufferRetain(imageBuffer);
    
    // 获取 self
    YYVideoDecoder *decoder = (__bridge YYVideoDecoder *)decompressionOutputRefCon;
    
    // 调用回调队列
    dispatch_async(decoder.callbackQueue, ^{
        if (decoder.delegate && [decoder.delegate respondsToSelector:@selector(videoDecodeCallback:)]) {
            [decoder.delegate videoDecodeCallback:imageBuffer];
        }
        CVPixelBufferRelease(imageBuffer);
    });
}

2.4 解码完成释放资源

- (void)releaseDecodeResource
{
    // 释放 _decodeSession
    if (_decodeSession) {
        VTDecompressionSessionWaitForAsynchronousFrames(_decodeSession); // 等待异步解码完成
        VTDecompressionSessionInvalidate(_decodeSession); // 设置会话无效
        CFRelease(_decodeSession);
        _decodeSession = NULL;
    }
    
    // 清空解码的视频格式描述信息
    if (_decodeDesc) {
        CFRelease(_decodeDesc);
        _decodeDesc = NULL;
    }
    
    // 清空 sps pps 相关资源
    _spsSize = 0;
    free(_sps);
    _ppsSize = 0;
    free(_pps);
}

补充内容

1. 关于 h.264 和 h.265

h.264:目前业界使用最广泛的视频格式，它能够在更低带宽下提供优质视频（换言之，只有 MPEG-2，H.263 或 MPEG-4 第 2 部分的一半带宽或更少），也不增加太多设计复杂度使得无法实现或实现成本过高。
h.265:高效率视频编码（High Efficiency Video Coding，简称HEVC）是一种视频压缩标准，被视为是 ITU-T H.264/MPEG-4 AVC 标准的继任者。HEVC 被认为不仅提升视频质量，同时也能达到 H.264/MPEG-4 AVC 两倍之压缩率（等同于同样画面质量下比特率减少了 50%）.

2. SPS、 PPS、 VPS

h.265 的视频参数集 Parameter Set 相较于 h.264 多了一个 vps 信息

图像参数集SPS: Sequence Parameter Set 包含一个CVS中所有编码图像的共享编码参数。
1. 一段HEVC码流可能包含一个或者多个编码视频序列，每个视频序列由一个随机接入点开始，即IDR/BLA/CRA。序列参数集SPS包含该视频序列中所有slice需要的信息。
2. SPS的内容大致可以分为几个部分：
  - 自引ID；
  - 解码相关信息，如档次级别、分辨率、子层数等；
  - 某档次中的功能开关标识及该功能的参数；
  - 对结构和变换系数编码灵活性的限制信息；
  - 时域可分级信息；
  - VUI。
序列参数集PPS: Picture Parameter Set 包含一幅图像所用的公共参数，即一幅图像中所有片段SS（Slice Segment）引用同一个PPS。
1. PPS包含每一帧可能不同的设置信息，其内容同H.264中的大致类似，主要包括：
  - 自引信息；
  - 初始图像控制信息，如初始QP等；
  - 分块信息。
2. 在解码开始的时候，所有的PPS全部是非活动状态，而且在解码的任意时刻，最多只能有一个PPS处于激活状态。当某部分码流引用了某个PPS的时候，这个PPS便被激活，称为活动PPS，一直到另一个PPS被激活。
视频参数集VPS: Video Parameter Set VPS主要用于传输视频分级信息，有利于兼容标准在可分级视频编码或多视点视频的扩展。
1. 用于解释编码过的视频序列的整体结构，包括时域子层依赖关系等。HEVC中加入该结构的主要目的是兼容标准在系统的多子层方面的扩展，处理比如未来的可分级或者多视点视频使用原先的解码器进行解码但是其所需的信息可能会被解码器忽略的问题。
2. 对于给定视频序列的某一个子层，无论其SPS相不相同，都共享一个VPS。其主要包含的信息有：多个子层或操作点共享的语法元素；档次和级别等会话关键信息；其他不属于SPS的操作点特定信息。
3. 编码生成的码流中，第一个NAL单元携带的就是VPS信息

参数集包含了相应的编码图像的信息。SPS包含的是针对一连续编码视频序列的参数（标识符seq_parameter_set_id、帧数及POC的约束、参考帧数目、解码图像尺寸和帧场编码模式选择标识等等）。PPS对应的是一个序列中某一幅图像或者某几幅图像，其参数如标识符pic_parameter_set_id、可选的seq_parameter_set_id、熵编码模式选择标识、片组数目、初始量化参数和去方块滤波系数调整标识等等。通常，SPS 和PPS 在片的头信息和数据解码前传送至解码器。每个片的头信息对应一个 pic_parameter_set_id，PPS被其激活后一直有效到下一个PPS被激活；类似的，每个PPS对应一个 seq_parameter_set_id，SPS被其激活以后将一直有效到下一个SPS被激活。参数集机制将一些重要的、改变少的序列参数和图像参数与编码片分离，并在编码片之前传送至解码端，或者通过其他机制传输。

摘自小东邪-移动端音视频从零到上手