MCAP学习笔记MCAP是一种用于异构时间戳数据的模块化容器文件格式，由机器人技术公司Foxglove于2021年推出，

MCAP（Modular Container for Autonomous Platforms，自主平台模块化容器）是一种用于异构时间戳数据的模块化容器文件格式，由机器人技术公司Foxglove于2021年推出，专为自动驾驶和机器人系统设计的一种数据存储格式。

核心特性

模块化容器格式：用于记录带时间戳的发布-订阅消息，持久支持任意序列化格式（如Protobuf、JSON等)。
灵活性与耐用性：设计适用于多种工作负荷、资源限制和持久化需求。
官方解析工具：提供Kaitai Struct描述文件mcap.ksy，可用于解析MCAP文件结构。

数据组织方式

通过消息（Message）、通道（Channel）、模式（Schema）来组织数据

通道：具有相同类型或架构的消息流，相当于发布者与订阅者之间的连接

模式：对通道上消息的结构和内容的描述，如Protoful Schema或JSON Schema

文件整体结构

MCAP文件由固定特征（Magic）、头部（Header）、数据段（Data section）、摘要（Summary）、摘要偏移(Summary offset section)、尾部（Footer）组成，其中摘要和摘要偏移为可选字段，结构如下：

<Magic><Header><Data section>[<Summary section>][<Summary Offset section>]<Footer><Magic>

数据、摘要和摘要偏移段用来充当~~记录序列：~~

[<record type><record content length><record><record type><record content length><record>...]

魔数

文件首尾必须包含固定的字节序列0x89, M, C, A, P, 0x30, \r, \n，其中0x30代表ASCII字符‘0’。

头部

头部Magic后的首个记录，格式：<0x01><record content length><record>

尾部

尾部Magic字段前的最后一个记录，格式：<0x02><record content length><record>

数据段

数据段包括类型为消息数据（message data）、附件（attachments）、支持记录（supporting records）类型的记录，允许被记录的类型：

分类	记录类型	说明
基础定义	Schema	定义消息序列化模式（如 Protobuf Schema）
	Channel	定义消息通道（主题、数据类型、元数据等）
数据载体	Message	具体消息内容（含时间戳、通道 ID、数据负载）
	Attachment	关联文件或二进制附件（如日志、图片）
分块机制	Chunk	消息分块存储（支持压缩和索引）
索引支持	Message Index	消息索引（记录消息位置偏移）
元数据	Metadata	文件级元数据（如作者、时间戳）
结束标识	Data End	数据段结束标记（唯一且必须为最后一个记录）

Schema 模式record op=0x03

定义纤细的序列化模式，是Channel引用的基础。

必须在Channel应用前出现，摘要段可包含副本。
ID为0的record无效，解析器需忽略。

Bytes	Name	Type	Description
2	id	uint16	A unique identifier for this schema within the file. Must not be zero
4 + N	name	String	An identifier for the schema.
4 + N	encoding	String	Format for the schema. The well-known schema encodings are preferred. An empty string indicates no schema is available.
4 + N	data	uint32 length-prefixed Bytes	Must conform to the schema encoding. If encoding is an empty string, data should be 0 length.

示例场景：

存储 ROS 2 消息的 Protobuf Schema，供 Channel 关联消息编码规则。

channel 通道record op=0x04

定义消息通道（主题、数据格式、元数据），是消息归属的逻辑单元。

必须在Message引用前出现，摘要段可包含副本。
相同ID的Channel必须完全一致，确保消息解析一致性。

Bytes	Name	Type	Description
2	id	uint16	A unique identifier for this channel within the file.
2	schema_id	uint16	The schema for messages on this channel. A schema_id of 0 indicates there is no schema for this channel.
4 + N	topic	String	The channel topic.
4 + N	message_encoding	String	Encoding for messages on this channel. The well-known message encodings are preferred.
4 + N	metadata	Map<string, string>	Metadata about this channel

示例场景：

定义机器人速度控制通道/cmd_vel，关联 Schema ID=1（Protobuf 格式），元数据标注单位。

Message 消息record op=0x05

存储单条带时间戳的消息数据，需与Channel定义匹配。

数据必须符合Channel关联的Schema结构。
不建议直接写入数据段，应通过Chunk分块存储以支持索引。

Bytes	Name	Type	Description
2	channel_id	uint16	Channel ID
4	sequence	uint32	Optional message counter to detect message gaps. If your middleware publisher provides a sequence number you can use that, or you can assign a sequence number in the recorder, or set to zero if this is not relevant for your workflow.
8	log_time	Timestamp	Time at which the message was recorded.
8	publish_time	Timestamp	Time at which the message was published. If not available, must be set to the log time.
N	data	Bytes	Message data, to be decoded according to the schema of the channel.

示例场景：

记录机器人在t=100ms时的速度消息[x=0.5, y=0]，sequence=5，data 为 Protobuf 序列化后的字节流。

Chunk 分块record op=0x06

批量存储消息及关联record（Schema/Channel/Message)，支持压缩和索引。

分块内消息的Channel必须在文件中提前定义（前序分块或数据段）。
若摘要段存在Chunk Index，所有Message必须包含在Chunk中。

优势：减少存储空间（如LZ4压缩传感器数据），配合Message Index和Chunk Index实现时间范围快速检索。

Bytes	Name	Type	Description
8	message_start_time	Timestamp	Earliest message log_time in the chunk. Zero if the chunk has no messages.
8	message_end_time	Timestamp	Latest message log_time in the chunk. Zero if the chunk has no messages.
8	uncompressed_size	uint64	Uncompressed size of the records field.
4	uncompressed_crc	uint32	CRC32 checksum of uncompressed records field. A value of zero indicates that CRC validation should not be performed.
4 + N	compression	String	compression algorithm. i.e. zstd, lz4, "". An empty string indicates no compression. Refer to well-known compression formats.
8 + N	records	uint64 length-prefixed Bytes	Repeating sequences of . Compressed with the algorithm in the compression field.

Message Index 消息索引record op=0x07

实现分块消息快速定位的关键组件，主要功能包括按按时间戳索引、通道级索引。

必须紧跟在对应的Chunk record之后，形成连续的文件结构
分块内每出现一个不同的Channel ID，必须对应一个Message Index记录。
若分块内仅包含单个Channel信息，则序列中只一个Message Index记录。

Bytes	Name	Type	Description
2	channel_id	uint16	Channel ID.
4 + N	records	Array<Tuple<Timestamp, uint64>>	每个元素包含消息的记录时间、消息在分块未压缩数据中的字节偏移量（相对于分块record字段的起始位置）两个字段，数据按消息记录时间升序排列，便于二分查找加速时间范围查询。

示例场景：

机器人数据回放，快速查找某时刻的传感器消息

Chunk Index 分块索引record op=0x08

记录分块位置及时间范围，实现随机访问。

每个Chunk对应一个Chunk Index，且摘要段必须要包含所有相关Channel/Schema副本。
解析器通过索引可以直接定位到姆比哦时间范围内的分块，无需全文件扫描。

Bytes	Name	Type	Description
8	message_start_time	Timestamp	Earliest message log_time in the chunk. Zero if the chunk has no messages.
8	message_end_time	Timestamp	Latest message log_time in the chunk. Zero if the chunk has no messages.
8	chunk_start_offset	uint64	Offset to the chunk record from the start of the file.
8	chunk_length	uint64	Byte length of the chunk record, including opcode and length prefix.
4 + N	message_index_offsets	Map<uint16, uint64>	Mapping from channel ID to the offset of the message index record for that channel after the chunk, from the start of the file. An empty map indicates no message indexing is available.
8	message_index_length	uint64	Total length in bytes of the message index records after the chunk.
4 + N	compression	String	The compression used within the chunk. Refer to well-known compression formats. This field should match the the value in the corresponding Chunk record.
8	compressed_size	uint64	The size of the chunk records field.
8	uncompressed_size	uint64	The uncompressed size of the chunk records field. This field should match the value in the corresponding Chunk record.

示例流程：查询log_time=200ms的消息时，解析器通过 Chunk Index 找到包含该时间的分块偏移，读取并解压缩分块数据。

Attachment 附件record op==0x09

存储辅助文件（如图片、日志），与消息关联

不得包含在Chunk中，需直接写入数据段。
Summary段通过Attachment Index记录附件位置，便于快速提取。

Bytes	Name	Type	Description
8	log_time	Timestamp	Time at which the attachment was recorded.
8	create_time	Timestamp	Time at which the attachment was created. If not available, must be set to zero.
4 + N	name	String	Name of the attachment, e.g "scene1.jpg".
4 + N	media_type	String	Media type (e.g "text/plain").
8 + N	data	uint64 length-prefixed Bytes	Attachment data.
4	crc	uint32	CRC32 checksum of preceding fields in the record. A value of zero indicates that CRC validation should not be performed.

Statistics 统计信息record op=0x0B

提供文件级统计数据，辅助快速了解内容概况。

最多包含一个Statistics字段，且需要在摘要段包含所有Channel副本。
工具可通过该记录快速展示通道列表及消息分布，避免全文件扫描。

Bytes	Name	Type	Description
8	message_count	uint64	Number of Message records in the file.
2	schema_count	uint16	Number of unique schema IDs in the file, not including zero.
4	channel_count	uint32	Number of unique channel IDs in the file.
4	attachment_count	uint32	Number of Attachment records in the file.
4	metadata_count	uint32	Number of Metadata records in the file.
4	chunk_count	uint32	Number of Chunk records in the file.
8	message_start_time	Timestamp	Earliest message log_time in the file. Zero if the file has no messages.
8	message_end_time	Timestamp	Latest message log_time in the file. Zero if the file has no messages.
4 + N	channel_message_counts	Map<uint16, uint64>	Mapping from channel ID to total message count for the channel. An empty map indicates this statistic is not available.

摘要段（可选）

提供快速检索的数据段元数据，包括以下记录类型：

Schema/Channel：数据段中对应record的副本，用于快速查询主题或模式。
Chunk Index：分块索引（record分块时间范围、偏移量）。
Attachment Index：附件索引（record附件位置）。
Statistics：文件统计信息（如消息总数、字节大小）。

同类型record（按opcode）必须连续存储，便于偏移段快速定位。

偏移段（可选）

存储摘要段中各record的偏移量，实现随机访问（如快速跳转到Chunk Index位置）。按照摘要段记录顺序，每个条目包含record类型、长度和文件偏移量。