在ffmpeg的早期版本中,原生版本是不支持H265编码的flv的,不过自从2023年的ffmpeg版本起,ffmpeg已经可以原生支持h265编码的flv流,本文就来粗浅地分析一下ffmpeg对于h265编码flv支持的实现。
flv格式
在开始分析之前,先来简要复习一下flv的封装格式。flv文件由header与body组成,头部信息比较固定,通常就是FLV字符,当前流标记,Header长度等。 body部分由一个一个的tag拼接而成,每个tag也由tag haeder与 tag data组成,tag header的格式也可以参考雷教主的这篇文章,我们主要来看一下tag data中的内容,这里面包含了与编解码相关的信息。
2023年之前的老的flv格式
这是2023年以前的flv video tag data的头部字节的含义: 第一个字节的高四位表示当前的帧类型(是否为关键帧), 第一个字节的低四位表示当前视频数据的编码类型(为7时表示当前为h264编码,这也是应用非常广泛的一种编码格式), 第一个字节之后,是占据了一个字节的packetType字段,在codecId为7时,这个字段用于表示当前的帧是sps pps数据还是普通的nalu数据。当packetType为0时,表示后面的数据为sps pps数据,当packetType为1时表示普通帧数据。 最后三个字节表示cts,这个字段用于存在b帧时修正dts,单位始终为微秒
enhanced-flv格式
看完了之前的定义,我们来看一下enhanced-flv中对于h265 flv的协议制定。
首先,第一个字节的高四位表示了isExHeader和FrameType两重含义,通过伪代码可以看出,最高位作为enhanced-flv的标志位,最高位为1表示当前为enhanced-flv,否则遵循老的协议,高四位的后面三位表示当前的frameType,注意这里面字段对应的value的含义与老的协议已经有所不同
紧接着FrameType之后是占据4位的codecid,当最高位为0表示当前遵循老协议时,这里的字段值的定义是完全没有变化的,但是当最高位为1时,这里的四位就不再表示codecId,而是被解释为packetType, 新的定义如下图所示:
这里可以看一下packetType为3时,cts会隐式设置为0,也就是不再写这三个字节,相当于减少一点发包时的数据量了。。
紧接着packetType之后的,是四个字节的video FourCC,这里相当于在enhanced-flv中起到了之前的codecId的作用,即用来标识当前的视频编码类型,标记也十分简单直接,直接通过字符来表示。当packetType为0和1时,也同样分别表示sps pps数据以及帧数据。
同时,在enhanced-flv中还增加了MetaData这种数据类型,表示当前这一帧没有携带实际的视频数据,但是携带了AMF编码格式的metadata,比如colorInfo等
ffmpeg代码分析
源码之前 了无秘密 --侯捷
前面看了很久的理论,我们来看一下ffmpeg代码里面的实现
编码侧实现
static void flv_write_codec_header(AVFormatContext* s, AVCodecParameters* par, int64_t ts) {
int64_t data_size;
AVIOContext *pb = s->pb;
FLVContext *flv = s->priv_data;
if (par->codec_id == AV_CODEC_ID_AAC || par->codec_id == AV_CODEC_ID_H264
|| par->codec_id == AV_CODEC_ID_MPEG4 || par->codec_id == AV_CODEC_ID_HEVC
|| par->codec_id == AV_CODEC_ID_AV1 || par->codec_id == AV_CODEC_ID_VP9) {
int64_t pos;
avio_w8(pb,
par->codec_type == AVMEDIA_TYPE_VIDEO ?
FLV_TAG_TYPE_VIDEO : FLV_TAG_TYPE_AUDIO);
avio_wb24(pb, 0); // size patched later
put_timestamp(pb, ts);
avio_wb24(pb, 0); // streamid
pos = avio_tell(pb);
if (par->codec_id == AV_CODEC_ID_AAC) {
avio_w8(pb, get_audio_flags(s, par));
avio_w8(pb, 0); // AAC sequence header
if (!par->extradata_size && (flv->flags & FLV_AAC_SEQ_HEADER_DETECT)) {
PutBitContext pbc;
int samplerate_index;
int channels = par->ch_layout.nb_channels
- (par->ch_layout.nb_channels == 8 ? 1 : 0);
uint8_t data[2];
for (samplerate_index = 0; samplerate_index < 16;
samplerate_index++)
if (par->sample_rate
== ff_mpeg4audio_sample_rates[samplerate_index])
break;
init_put_bits(&pbc, data, sizeof(data));
put_bits(&pbc, 5, par->profile + 1); //profile
put_bits(&pbc, 4, samplerate_index); //sample rate index
put_bits(&pbc, 4, channels);
put_bits(&pbc, 1, 0); //frame length - 1024 samples
put_bits(&pbc, 1, 0); //does not depend on core coder
put_bits(&pbc, 1, 0); //is not extension
flush_put_bits(&pbc);
avio_w8(pb, data[0]);
avio_w8(pb, data[1]);
av_log(s, AV_LOG_WARNING, "AAC sequence header: %02x %02x.\n",
data[0], data[1]);
}
avio_write(pb, par->extradata, par->extradata_size);
} else {
if (par->codec_id == AV_CODEC_ID_HEVC) {
avio_w8(pb, FLV_IS_EX_HEADER | PacketTypeSequenceStart | FLV_FRAME_KEY); // ExVideoTagHeader mode with PacketTypeSequenceStart
avio_write(pb, "hvc1", 4);
} else if (par->codec_id == AV_CODEC_ID_AV1 || par->codec_id == AV_CODEC_ID_VP9) {
avio_w8(pb, FLV_IS_EX_HEADER | PacketTypeSequenceStart | FLV_FRAME_KEY);
avio_write(pb, par->codec_id == AV_CODEC_ID_AV1 ? "av01" : "vp09", 4);
} else {
avio_w8(pb, par->codec_tag | FLV_FRAME_KEY); // flags
avio_w8(pb, 0); // AVC sequence header
avio_wb24(pb, 0); // composition time
}
if (par->codec_id == AV_CODEC_ID_HEVC)
ff_isom_write_hvcc(pb, par->extradata, par->extradata_size, 0);
else if (par->codec_id == AV_CODEC_ID_AV1)
ff_isom_write_av1c(pb, par->extradata, par->extradata_size, 1);
else if (par->codec_id == AV_CODEC_ID_VP9)
ff_isom_write_vpcc(s, pb, par->extradata, par->extradata_size, par);
else
ff_isom_write_avcc(pb, par->extradata, par->extradata_size);
}
data_size = avio_tell(pb) - pos;
avio_seek(pb, -data_size - 10, SEEK_CUR);
avio_wb24(pb, data_size);
avio_skip(pb, data_size + 10 - 3);
avio_wb32(pb, data_size + 11); // previous tag size
}
}
flv_write_codec_header 函数负责写入flv流的第一个视频tag的头部信息,因为第一个视频tag通常包含了SPS/PPS等关键信息,所以单独使用一个函数来实现写入。 可以看到在写入Data部分的第一个字节的时候,关键帧标志与sps pps数据标志是同时写入的,所以第一个关键帧是与sps pps封装在同一个tag data之中的。
static int flv_write_packet(AVFormatContext *s, AVPacket *pkt)
{
AVIOContext *pb = s->pb;
AVCodecParameters *par = s->streams[pkt->stream_index]->codecpar;
FLVContext *flv = s->priv_data;
unsigned ts;
int size = pkt->size;
uint8_t *data = NULL;
uint8_t frametype = pkt->flags & AV_PKT_FLAG_KEY ? FLV_FRAME_KEY : FLV_FRAME_INTER;
int flags = -1, flags_size, ret = 0;
int64_t cur_offset = avio_tell(pb);
if (par->codec_type == AVMEDIA_TYPE_AUDIO && !pkt->size) {
av_log(s, AV_LOG_WARNING, "Empty audio Packet\n");
return AVERROR(EINVAL);
}
if (par->codec_id == AV_CODEC_ID_VP6F || par->codec_id == AV_CODEC_ID_VP6A ||
par->codec_id == AV_CODEC_ID_VP6 || par->codec_id == AV_CODEC_ID_AAC)
flags_size = 2;
else if (par->codec_id == AV_CODEC_ID_H264 || par->codec_id == AV_CODEC_ID_MPEG4 ||
par->codec_id == AV_CODEC_ID_HEVC || par->codec_id == AV_CODEC_ID_AV1 ||
par->codec_id == AV_CODEC_ID_VP9)
flags_size = 5;
else
flags_size = 1;
if (par->codec_id == AV_CODEC_ID_HEVC && pkt->pts != pkt->dts)
flags_size += 3;
if (par->codec_id == AV_CODEC_ID_AAC || par->codec_id == AV_CODEC_ID_H264
|| par->codec_id == AV_CODEC_ID_MPEG4 || par->codec_id == AV_CODEC_ID_HEVC
|| par->codec_id == AV_CODEC_ID_AV1 || par->codec_id == AV_CODEC_ID_VP9) {
size_t side_size;
uint8_t *side = av_packet_get_side_data(pkt, AV_PKT_DATA_NEW_EXTRADATA, &side_size);
if (side && side_size > 0 && (side_size != par->extradata_size || memcmp(side, par->extradata, side_size))) {
ret = ff_alloc_extradata(par, side_size);
if (ret < 0)
return ret;
memcpy(par->extradata, side, side_size);
flv_write_codec_header(s, par, pkt->dts);
}
flv_write_metadata_packet(s, par, pkt->dts);
}
if (flv->delay == AV_NOPTS_VALUE)
flv->delay = -pkt->dts;
if (pkt->dts < -flv->delay) {
av_log(s, AV_LOG_WARNING,
"Packets are not in the proper order with respect to DTS\n");
return AVERROR(EINVAL);
}
if (par->codec_id == AV_CODEC_ID_H264 || par->codec_id == AV_CODEC_ID_MPEG4 ||
par->codec_id == AV_CODEC_ID_HEVC || par->codec_id == AV_CODEC_ID_AV1 ||
par->codec_id == AV_CODEC_ID_VP9) {
if (pkt->pts == AV_NOPTS_VALUE) {
av_log(s, AV_LOG_ERROR, "Packet is missing PTS\n");
return AVERROR(EINVAL);
}
}
ts = pkt->dts;
if (s->event_flags & AVSTREAM_EVENT_FLAG_METADATA_UPDATED) {
write_metadata(s, ts);
s->event_flags &= ~AVSTREAM_EVENT_FLAG_METADATA_UPDATED;
}
avio_write_marker(pb, av_rescale(ts, AV_TIME_BASE, 1000),
pkt->flags & AV_PKT_FLAG_KEY && (flv->video_par ? par->codec_type == AVMEDIA_TYPE_VIDEO : 1) ? AVIO_DATA_MARKER_SYNC_POINT : AVIO_DATA_MARKER_BOUNDARY_POINT);
switch (par->codec_type) {
case AVMEDIA_TYPE_VIDEO:
avio_w8(pb, FLV_TAG_TYPE_VIDEO);
flags = ff_codec_get_tag(flv_video_codec_ids, par->codec_id);
flags |= frametype;
break;
case AVMEDIA_TYPE_AUDIO:
flags = get_audio_flags(s, par);
av_assert0(size);
avio_w8(pb, FLV_TAG_TYPE_AUDIO);
break;
case AVMEDIA_TYPE_SUBTITLE:
case AVMEDIA_TYPE_DATA:
avio_w8(pb, FLV_TAG_TYPE_META);
break;
default:
return AVERROR(EINVAL);
}
if (par->codec_id == AV_CODEC_ID_H264 || par->codec_id == AV_CODEC_ID_MPEG4) {
/* check if extradata looks like mp4 formatted */
if (par->extradata_size > 0 && *(uint8_t*)par->extradata != 1)
if ((ret = ff_nal_parse_units_buf(pkt->data, &data, &size)) < 0)
return ret;
} else if (par->codec_id == AV_CODEC_ID_HEVC) {
if (par->extradata_size > 0 && *(uint8_t*)par->extradata != 1)
if ((ret = ff_hevc_annexb2mp4_buf(pkt->data, &data, &size, 0, NULL)) < 0)
return ret;
} else if (par->codec_id == AV_CODEC_ID_AAC && pkt->size > 2 &&
(AV_RB16(pkt->data) & 0xfff0) == 0xfff0) {
if (!s->streams[pkt->stream_index]->nb_frames) {
av_log(s, AV_LOG_ERROR, "Malformed AAC bitstream detected: "
"use the audio bitstream filter 'aac_adtstoasc' to fix it "
"('-bsf:a aac_adtstoasc' option with ffmpeg)\n");
return AVERROR_INVALIDDATA;
}
av_log(s, AV_LOG_WARNING, "aac bitstream error\n");
}
/* check Speex packet duration */
if (par->codec_id == AV_CODEC_ID_SPEEX && ts - flv->last_ts[pkt->stream_index] > 160)
av_log(s, AV_LOG_WARNING, "Warning: Speex stream has more than "
"8 frames per packet. Adobe Flash "
"Player cannot handle this!\n");
if (flv->last_ts[pkt->stream_index] < ts)
flv->last_ts[pkt->stream_index] = ts;
if (size + flags_size >= 1<<24) {
av_log(s, AV_LOG_ERROR, "Too large packet with size %u >= %u\n",
size + flags_size, 1<<24);
ret = AVERROR(EINVAL);
goto fail;
}
avio_wb24(pb, size + flags_size);
put_timestamp(pb, ts);
avio_wb24(pb, flv->reserved);
if (par->codec_type == AVMEDIA_TYPE_DATA ||
par->codec_type == AVMEDIA_TYPE_SUBTITLE ) {
int data_size;
int64_t metadata_size_pos = avio_tell(pb);
if (par->codec_id == AV_CODEC_ID_TEXT) {
// legacy FFmpeg magic?
avio_w8(pb, AMF_DATA_TYPE_STRING);
put_amf_string(pb, "onTextData");
avio_w8(pb, AMF_DATA_TYPE_MIXEDARRAY);
avio_wb32(pb, 2);
put_amf_string(pb, "type");
avio_w8(pb, AMF_DATA_TYPE_STRING);
put_amf_string(pb, "Text");
put_amf_string(pb, "text");
avio_w8(pb, AMF_DATA_TYPE_STRING);
put_amf_string(pb, pkt->data);
put_amf_string(pb, "");
avio_w8(pb, AMF_END_OF_OBJECT);
} else {
// just pass the metadata through
avio_write(pb, data ? data : pkt->data, size);
}
/* write total size of tag */
data_size = avio_tell(pb) - metadata_size_pos;
avio_seek(pb, metadata_size_pos - 10, SEEK_SET);
avio_wb24(pb, data_size);
avio_seek(pb, data_size + 10 - 3, SEEK_CUR);
avio_wb32(pb, data_size + 11);
} else {
av_assert1(flags>=0);
if (par->codec_id == AV_CODEC_ID_HEVC) {
int pkttype = (pkt->pts != pkt->dts) ? PacketTypeCodedFrames : PacketTypeCodedFramesX;
avio_w8(pb, FLV_IS_EX_HEADER | pkttype | frametype); // ExVideoTagHeader mode with PacketTypeCodedFrames(X)
avio_write(pb, "hvc1", 4);
if (pkttype == PacketTypeCodedFrames)
avio_wb24(pb, pkt->pts - pkt->dts);
} else if (par->codec_id == AV_CODEC_ID_AV1 || par->codec_id == AV_CODEC_ID_VP9) {
avio_w8(pb, FLV_IS_EX_HEADER | PacketTypeCodedFrames | frametype);
avio_write(pb, par->codec_id == AV_CODEC_ID_AV1 ? "av01" : "vp09", 4);
} else {
avio_w8(pb, flags);
}
if (par->codec_id == AV_CODEC_ID_VP6)
avio_w8(pb,0);
if (par->codec_id == AV_CODEC_ID_VP6F || par->codec_id == AV_CODEC_ID_VP6A) {
if (par->extradata_size)
avio_w8(pb, par->extradata[0]);
else
avio_w8(pb, ((FFALIGN(par->width, 16) - par->width) << 4) |
(FFALIGN(par->height, 16) - par->height));
} else if (par->codec_id == AV_CODEC_ID_AAC)
avio_w8(pb, 1); // AAC raw
else if (par->codec_id == AV_CODEC_ID_H264 || par->codec_id == AV_CODEC_ID_MPEG4) {
avio_w8(pb, 1); // AVC NALU
avio_wb24(pb, pkt->pts - pkt->dts);
}
avio_write(pb, data ? data : pkt->data, size);
avio_wb32(pb, size + flags_size + 11); // previous tag size
flv->duration = FFMAX(flv->duration,
pkt->pts + flv->delay + pkt->duration);
}
if (flv->flags & FLV_ADD_KEYFRAME_INDEX) {
switch (par->codec_type) {
case AVMEDIA_TYPE_VIDEO:
flv->videosize += (avio_tell(pb) - cur_offset);
flv->lasttimestamp = pkt->dts / 1000.0;
if (pkt->flags & AV_PKT_FLAG_KEY) {
flv->lastkeyframetimestamp = flv->lasttimestamp;
flv->lastkeyframelocation = cur_offset;
ret = flv_append_keyframe_info(s, flv, flv->lasttimestamp, cur_offset);
if (ret < 0)
goto fail;
}
break;
case AVMEDIA_TYPE_AUDIO:
flv->audiosize += (avio_tell(pb) - cur_offset);
break;
default:
av_log(s, AV_LOG_WARNING, "par->codec_type is type = [%d]\n", par->codec_type);
break;
}
}
fail:
av_free(data);
return ret;
}
我们再来看一下flv_write_Pakcet的实现,首先注意为视频编码时flag_size都为5,在264时这是一个frameType + codecId的一个字节加上packetTYpe的一个字节再加上三个cts的固定字节,在h265 av1等编码方式上,则是frameType + packetType的一个字节,加上codecId的四个字节(几个字符相当于起到codecId的作用),如果有cts,会再额外加上三个字节。在写入帧数据时也是一样,会判断当前帧是否为关键帧以及是否有cts,写入对应的Type,这里不再赘述。
解码侧实现
我们直接来看flv_read_Packet中的关键代码
if (type == FLV_TAG_TYPE_AUDIO) {
stream_type = FLV_STREAM_TYPE_AUDIO;
flags = avio_r8(s->pb);
size--;
} else if (type == FLV_TAG_TYPE_VIDEO) {
stream_type = FLV_STREAM_TYPE_VIDEO;
flags = avio_r8(s->pb);
video_codec_id = flags & FLV_VIDEO_CODECID_MASK;
/*
* Reference Enhancing FLV 2023-03-v1.0.0-B.8
* https://github.com/veovera/enhanced-rtmp/blob/main/enhanced-rtmp-v1.pdf
* */
enhanced_flv = (flags >> 7) & 1;
size--;
if (enhanced_flv) {
video_codec_id = avio_rb32(s->pb);
size -= 4;
}
if (enhanced_flv && stream_type == FLV_STREAM_TYPE_VIDEO && (flags & FLV_VIDEO_FRAMETYPE_MASK) == FLV_FRAME_VIDEO_INFO_CMD) {
int pkt_type = flags & 0x0F;
if (pkt_type == PacketTypeMetadata) {
int ret = flv_parse_video_color_info(s, st, next);
av_log(s, AV_LOG_DEBUG, "enhanced flv parse metadata ret %d and skip\n", ret);
}
goto skip;
} else if ((flags & FLV_VIDEO_FRAMETYPE_MASK) == FLV_FRAME_VIDEO_INFO_CMD) {
goto skip;
}
} else if (type == FLV_TAG_TYPE_META) {
stream_type=FLV_STREAM_TYPE_SUBTITLE;
if (size > 13 + 1 + 4) { // Header-type metadata stuff
int type;
meta_pos = avio_tell(s->pb);
type = flv_read_metabody(s, next);
if (type == 0 && dts == 0 || type < 0) {
if (type < 0 && flv->validate_count &&
flv->validate_index[0].pos > next &&
flv->validate_index[0].pos - 4 < next) {
av_log(s, AV_LOG_WARNING, "Adjusting next position due to index mismatch\n");
next = flv->validate_index[0].pos - 4;
}
goto skip;
} else if (type == TYPE_ONTEXTDATA) {
avpriv_request_sample(s, "OnTextData packet");
return flv_data_packet(s, pkt, dts, next);
} else if (type == TYPE_ONCAPTION) {
return flv_data_packet(s, pkt, dts, next);
} else if (type == TYPE_UNKNOWN) {
stream_type = FLV_STREAM_TYPE_DATA;
}
avio_seek(s->pb, meta_pos, SEEK_SET);
}
} else {
av_log(s, AV_LOG_DEBUG,
"Skipping flv packet: type %d, size %d, flags %d.\n",
type, size, flags);
skip:
if (avio_seek(s->pb, next, SEEK_SET) != next) {
// This can happen if flv_read_metabody above read past
// next, on a non-seekable input, and the preceding data has
// been flushed out from the IO buffer.
av_log(s, AV_LOG_ERROR, "Unable to seek to the next packet\n");
return AVERROR_INVALIDDATA;
}
ret = FFERROR_REDO;
goto leave;
}
在读取到flv video data数据后,会根据首字节最高位来判断当前是否为enhancedFlv,如果是的话,会向后读四个字节来获取当前的codecId(h264编码依然兼容老的逻辑)
拿到codeId之后,如果当前av_stream还没有创建,那么会创建stream并设置codecId,此时播放程序就可以根据这个codecId去做相应的解码器的初始化。回到主流程,在获取到codecId之后,读取packet时还会获取当前的packetType(frameType在此函数中似乎没有消费,个人理解应该是解封装过程中并不需要感知是否为关键帧这些信息,只需要将packet数据从flv中分离出来),并根据packetType有相应的处理。在packetType为PacketTypeSequenceStart时,会读取extraData信息
if (type == 0 && (!st->codecpar->extradata || st->codecpar->codec_id == AV_CODEC_ID_AAC ||
st->codecpar->codec_id == AV_CODEC_ID_H264 || st->codecpar->codec_id == AV_CODEC_ID_HEVC ||
st->codecpar->codec_id == AV_CODEC_ID_AV1 || st->codecpar->codec_id == AV_CODEC_ID_VP9)) {
AVDictionaryEntry *t;
if (st->codecpar->extradata) {
if ((ret = flv_queue_extradata(flv, s->pb, stream_type, size)) < 0)
return ret;
ret = FFERROR_REDO;
goto leave;
}
if ((ret = flv_get_extradata(s, st, size)) < 0)
return ret;
/* Workaround for buggy Omnia A/XE encoder */
t = av_dict_get(s->metadata, "Encoder", NULL, 0);
if (st->codecpar->codec_id == AV_CODEC_ID_AAC && t && !strcmp(t->value, "Omnia A/XE"))
st->codecpar->extradata_size = 2;
ret = FFERROR_REDO;
goto leave;
}
在packetType为PacketTypeCodedFrames时,需要考虑三个字节的cts信息,
if (st->codecpar->codec_id == AV_CODEC_ID_H264 || st->codecpar->codec_id == AV_CODEC_ID_MPEG4 ||
(st->codecpar->codec_id == AV_CODEC_ID_HEVC && type == PacketTypeCodedFrames)) {
// sign extension
int32_t cts = (avio_rb24(s->pb) + 0xff800000) ^ 0xff800000;
pts = av_sat_add64(dts, cts);
if (cts < 0) { // dts might be wrong
if (!flv->wrong_dts)
av_log(s, AV_LOG_WARNING,
"Negative cts, previous timestamps might be wrong.\n");
flv->wrong_dts = 1;
} else if (FFABS(dts - pts) > 1000*60*15) {
av_log(s, AV_LOG_WARNING,
"invalid timestamps %"PRId64" %"PRId64"\n", dts, pts);
dts = pts = AV_NOPTS_VALUE;
}
size -= 3;
}
这里读取出cts并做处理,注意由于cts占据24位,而c语言中没有用于存储24位的数据类型,因此通过符号扩展将cts保存到一个int32_t类型数据中,从而实现后面的加减运算,直接赋值会导致正负属性丢失。
结语
以上就是ffmpeg中对于封装h265编码数据的flv的支持方式,与国内通常遵循之前的规范,将codecId设置位0x1c来实现h265 flv不同,ffmpeg中直接使用了一种新的协议,这样在未来可以不断兼容新的编解码格式,避免codecId 4位的限制。
参考资料: