【音视频开发】9. FFmpeg API 解码 H264 视频流FFmpeg 解码 API 7.1 的一种用法，把 h2

使用 FFmpeg API 解码 H264 视频流

1、YUV 颜色空间

YUV：Y 亮度参量（黑白），U 色度参量蓝，V 色度参量红，每个分量 1B
采样表示：每个像素必须有 Y 分量，U 和 V 可以减少（用于降低存储）
- 444：1Y ----1UV，每一行的 1 个 Y 对应 1 对 UV，平均每个像素 3B
- 422：2Y----1UV，每一行的 2 个 Y 对应 1 对 UV，平均每个像素 2B
- 420：4Y----1UV，每两行的 2x2 的 4 个 Y 对应 1 对 UV，平均每个像素 1.5B
存储结构：
- planar 平面格式：如 YUV420P，Y U V 分别连续存储
- packed 打包格式：如 YUYV422，Y U Y V 交替存储
- Semi-Planar 格式：如 NV12，Y 连续存储，U V 交替存储
为什么解码出错会绿屏？YUV 转 RGB 时，R和B都是负数（置为0），G为正数
对齐问题：假设 100×100 16对齐，Y 100 => 112，U 50 => 64，V 50 =>64

2、相关的数据结构

视频 AVPacket

AVPacket 主要用于存储压缩的音频数据，解复用后/解码前、编码后/复用前
包含缓冲区信息、显示事件戳、解码时间戳等信息

视频 AVFrame

AVFrame 主要用于存储解码后的像素数据，解码后/编码前
包括：长宽、plane数据数组、行长度数组、像素格式等
plane：表示一片连续的缓冲区
.data：plane数据（像素数据）缓冲区数组
- packed 视频：YUYV 交织存储在 data[0]
- planar 视频：data[0] 指向 Y-plane，data[1] 指向 U-plane，data[2] 指向 V-plane
.linesize：行长度数组
- packed 视频：linesize[0] 表示一行图像所占空间，包含对齐的空间
- planar 视频：linesize[i] 表示一行图像在当前 plane 所占空间

视频 AVCodecContext

AVCodecContext 结构体存储视频解码器的各种参数
.width 和 .height 表示视频的长宽像素数
.pix_fmt 表示视频的像素格式

2、相关的 API

视频流解析 API

av_parser_init：初始化视频流解析器
av_parser_close：关闭视频流解析器
av_parser_parse2：从二进制 h264 数据流中解析出一个 h264 视频帧

视频解码 API

avcodec_find_decoder：通过 AVSampleFormat 获取 AVCodec 对象
avcodec_alloc_context3：分配解码器上下文内存
avcodec_free_context：释放解码器上下文内存
avcodec_open2：初始化解码器上下文对象
avcodec_send_packet：把 AVPacket （压缩数据）传给解码器
avcodec_receive_frame：从解码器取出 AVFrame （解压后的数据）

像素格式 API

av_get_pix_fmt_name：获取像素格式名称

3、代码实战 —— 解码 H264 视频流

需求：输入一个 .h264 文件（yuv420p/yuyv422），输出一个 .yuv 文件
思路：
- 生成 h264 流：ffmpeg -i av.mp4 -an -c:v copy video.h264
- 从二进制文件流解析出 h264 编码帧
- 把 h264 编码帧 send 进解码器
- 循环从解码器 receive 出 yuv 像素数据
- 根据不同像素格式，写入文件
- 测试解码是否成功（注意格式）：ffplay -pixel_format yuv420p -video_size 640x360 -framerate 25 video.yuv
代码示例的环境：
- 工具链：VS2022，std=c++20
- 依赖1：ffmpeg7.1 的 avcodec，avformat，avutil
- 依赖2：glog

extern "C" {
#include <libavcodec/avcodec.h>
#include <libavformat/avformat.h>
#include <libavutil/pixdesc.h>
}

#include <glog/logging.h>

#include <fstream>
#include <string_view> // std=c++17

static constexpr std::size_t kInputVideoBufferSize = 20480;
static constexpr int kInputVideoBufferRefillThreshold = 4096;
thread_local static char error_buffer[AV_ERROR_MAX_STRING_SIZE] = {}; // store FFmpeg error string

/**
 * @brief Convert FFmpeg error code to error string
 * @param error_code FFmpeg error code
 * @return error string
 */
static char *ErrorToString(const int error_code) {
    std::memset(error_buffer, 0, AV_ERROR_MAX_STRING_SIZE);
    return av_make_error_string(error_buffer, AV_ERROR_MAX_STRING_SIZE, error_code);
}


/**
 * @brief Get file extension from file name
 * @param file_name file name
 * @return file extension
 */
static std::string GetFileExtension(std::string_view file_name) {
    size_t pos = file_name.rfind('.');
    if (pos == std::string::npos) {
        return "";
    }
    std::string extension(file_name.substr(pos + 1));
    for (char &c: extension) {
        c = static_cast<char>(std::tolower(c));
    }
    return extension;
}

/**
 * @brief Decode a h264 frame, write pixel data to output file
 * @param codec_ctx codec context
 * @param pkt a h264 frame
 * @param ofs output file stream
 * @return true if success, false otherwise
 */
static bool InnerDecodeVideo(AVCodecContext *codec_ctx, AVPacket *pkt, std::ofstream &ofs) {
    if (!codec_ctx || !pkt) {
        return false;
    }

    int error_code{};
    bool logged = false;

    // send packet to decoder
    if ((error_code = avcodec_send_packet(codec_ctx, pkt)) < 0) {
        if (error_code != AVERROR(EAGAIN) && error_code != AVERROR_EOF) {
            LOG(ERROR) << "Failed to send packet to decoder: " << ErrorToString(error_code);
            return false;
        }
    }

    // allocate AVFrame
    AVFrame *frame = av_frame_alloc();
    if (frame == nullptr) {
        LOG(ERROR) << "Failed to allocate AVFrame: av_frame_alloc()";
        return false;
    }

    // receive pixel data from decoder, until EOF
    // do not need to manage pixel data memory
    while ((error_code = avcodec_receive_frame(codec_ctx, frame)) == 0) {
        AVPixelFormat pix_fmt = codec_ctx->pix_fmt;

        // log 1 time per frame
        if (!logged) {
            if (pix_fmt != AV_PIX_FMT_YUV420P && pix_fmt != AV_PIX_FMT_YUYV422) {
                LOG(ERROR) << "Unsupported pixel format: " << av_get_pix_fmt_name(pix_fmt);
                continue;
            }
            LOG(INFO) << "Decode " << pkt->size << "B AVPacket"
                      << ", " << frame->width << "x" << frame->height
                      << ", pix_fmt=" << av_get_pix_fmt_name(pix_fmt);
            logged = true;
        }

        if (!ofs) {
            continue;
        }

        // write to output file
        // if YUV planar format: YY...YYUU...UUVV...VV, yuv420p
        //     store: Y in data[0], U in data[1], V in data[2]
        //     line alignment: Y in linesize[0], U in linesize[1], V in linesize[2]
        // if YUYV packed format: YUYV...YUYV, yuyv422
        //     store: YUYV in data[0]
        //     line alignment: YUYV in linesize[0]
        if (pix_fmt == AV_PIX_FMT_YUV420P) {
            for (int i = 0; (i < frame->height && ofs); ++i) {
                ofs.write(reinterpret_cast<char *>(frame->data[0] + i * frame->linesize[0]), frame->width);
            }
            for (int i = 0; (i < frame->height / 2 && ofs); ++i) {
                ofs.write(reinterpret_cast<char *>(frame->data[1] + i * frame->linesize[1]), frame->width / 2);
            }
            for (int i = 0; (i < frame->height / 2 && ofs); ++i) {
                ofs.write(reinterpret_cast<char *>(frame->data[2] + i * frame->linesize[2]), frame->width / 2);
            }
        } else if (pix_fmt == AV_PIX_FMT_YUYV422) {
            for (int i = 0; (i < frame->height && ofs); ++i) {
                ofs.write(reinterpret_cast<char *>(frame->data[0] + i * frame->linesize[0]), frame->width * 2);
            }
        }
        if (!ofs) {
            LOG(ERROR) << "Failed to write yuv file, ofstream is broken";
        }
    }

    av_frame_free(&frame);

    if (error_code != AVERROR(EAGAIN) && error_code != AVERROR_EOF) {
        LOG(ERROR) << "Failed to receive frame from decoder: " << ErrorToString(error_code);
        return false;
    }

    if (!ofs) {
        return false;
    }

    return true;
}


/**
 * @brief Decode video file
 * @param input_file input video file, must be h264 yuv420p/yuyv422
 * @param output_file output yuv file, yuv420p/yuyv422
 */
void DecodeVideo(std::string_view input_file, std::string_view output_file) {
    int error_code{};

    // check file extension
    AVCodecID codec_id{};
    std::string file_extension = GetFileExtension(input_file);
    if (file_extension == "h264") {
        codec_id = AV_CODEC_ID_H264;
        LOG(INFO) << "Decode H264 video start";
    } else {
        LOG(ERROR) << "Unsupported video format: " << file_extension << ", only H264 is supported";
        return;
    }

    // find AVCodec
    const AVCodec *codec = avcodec_find_decoder(codec_id);
    if (codec == nullptr) {
        LOG(ERROR) << "AVCodec not found: " << codec_id;
        return;
    }

    // open input_file
    std::ifstream ifs(input_file.data(), std::ios::in | std::ios::binary);
    if (!ifs.is_open()) {
        LOG(ERROR) << "Failed to open input file: " << input_file;
        return;
    }

    // open output_file
    std::ofstream ofs(output_file.data(), std::ios::out | std::ios::binary);
    if (!ofs.is_open()) {
        LOG(ERROR) << "Failed to open output file: " << output_file;
        return;
    }

    // initialize AVCodecParserContext
    AVCodecParserContext *parser_ctx = av_parser_init(codec->id);
    if (parser_ctx == nullptr) {
        LOG(ERROR) << "Failed to init AVCodecParserContext: " << codec->id;
        return;
    }

    // allocate AVCodecContext
    AVCodecContext *codec_ctx = avcodec_alloc_context3(codec);
    if (codec_ctx == nullptr) {
        LOG(ERROR) << "Failed to allocate AVCodecContext: " << codec->id;
        av_parser_close(parser_ctx);
        return;
    }

    // initialize AVCodecContext
    if ((error_code = avcodec_open2(codec_ctx, codec, nullptr)) < 0) {
        LOG(ERROR) << "Failed to init AVCodecContext: " << ErrorToString(error_code);
        avcodec_free_context(&codec_ctx);
        av_parser_close(parser_ctx);
        return;
    }

    // allocate AVPacket
    AVPacket *pkt = av_packet_alloc();
    if (pkt == nullptr) {
        LOG(ERROR) << "Failed to allocate AVPacket: av_packet_alloc()";
        avcodec_free_context(&codec_ctx);
        av_parser_close(parser_ctx);
        return;
    }

    // allocate input buffer
    const std::size_t input_buffer_size = kInputVideoBufferSize + AV_INPUT_BUFFER_PADDING_SIZE;
    auto input_buffer = std::make_unique<uint8_t[]>(input_buffer_size); // std=c++17
    std::memset(input_buffer.get(), 0, input_buffer_size);
    uint8_t *data = input_buffer.get();

    size_t data_size{};
    while (true) {
        // refill input buffer
        if (data_size < kInputVideoBufferRefillThreshold && !ifs.eof()) {
            if (data_size > 0) {
                std::memcpy(input_buffer.get(), data, data_size);
            }
            data = input_buffer.get();
            std::size_t bytes_to_read = kInputVideoBufferRefillThreshold - data_size;
            if (!ifs.read(reinterpret_cast<char *>(data) + data_size, static_cast<std::streamsize>(bytes_to_read))) {
                if (!ifs.eof()) {
                    LOG(ERROR) << "Failed to read input file: " << input_file << ", ifstream is broken";
                    break;
                }
                LOG(INFO) << "End of ifstream: " << input_file;
            }
            data_size += ifs.gcount();
        }

        // parse an audio frame. if success, pkt->data == data && pkt->size == parsed
        int parsed = av_parser_parse2(parser_ctx, codec_ctx,
                                      &pkt->data, &pkt->size,
                                      data, static_cast<int>(data_size),
                                      AV_NOPTS_VALUE, AV_NOPTS_VALUE, 0);
        if (parsed < 0) {
            LOG(ERROR) << "Failed to parse video: " << ErrorToString(parsed);
            break;
        }
        data += parsed;
        data_size -= parsed;

        // decode audio and write to output_file
        if (pkt->size > 0) {
            InnerDecodeVideo(codec_ctx, pkt, ofs);
        }

        // if decode end, drain the decoder
        if (data_size == 0 && ifs.eof()) {
            pkt->data = nullptr;
            pkt->size = 0;
            InnerDecodeVideo(codec_ctx, pkt, ofs);
            break;
        }
    }

    LOG(INFO) << "Decode H264 video end";

    av_packet_free(&pkt);
    avcodec_free_context(&codec_ctx);
    av_parser_close(parser_ctx);
}

#if 0
int main(int argc, char *argv[]) {
    google::InitGoogleLogging(argv[0]);
    FLAGS_logtostderr = true;
    FLAGS_minloglevel = google::GLOG_INFO;

    DecodeVideo("video.h264", "video.yuv");

    google::ShutdownGoogleLogging();
    return 0;
}
#endif