MQTT-java 数据描述(Data representation)

151 阅读6分钟

目标

学习MQTT规范文档中的基础概念,更快捷理解规范中的其它协议描述

MQTT V5.0 规范(参见 “1.5 Data representation”)

1.5.1 Bits (位)

原文

Bits in a byte are labelled 7 to 0. Bit number 7 is the most significant bit, the least significant bit is assigned bit number 0

释义

一个字节的所有位从0到7进行编码标识。7号位代表最高位(缩写:MSB),0号位代表最低位(缩写:LSB)

1.5.2 Two Byte Integer(两个字节整数)

原文

Two Byte Integer data values are 16-bit unsigned integers in big-endian order: the high order byte precedes the lower order byte. This means that a 16-bit word is presented on the network as Most Significant Byte (MSB), followed by Least Significant Byte (LSB).

释义

两个字节整数数据是占据16位的无符号整数序列,在大端模式下:高位字节优先于低位字节被处理。即一个16位的数据在网络传输中,高位字节最先被处理,紧跟着是最低位字节。
最高位字节:简单可以理解为一个二进制字符串,最左边部分。最低位字节刚好相反

比如: 11110000 10101010
11110000为最高位字节, 10101010为最低位字节

1.5.3 Four Byte Integer(四个字节整数)

原文

Four Byte Integer data values are 32-bit unsigned integers in big-endian order: the high order byte precedes the successively lower order bytes. This means that a 32-bit word is presented on the network as Most Significant Byte (MSB), followed by the next most Significant Byte (MSB), followed by the next most Significant Byte (MSB), followed by Least Significant Byte (LSB).

释义

四个字节整数数据是占据32位的无符号整数序列,在大端模式下:高位字节有咸鱼地位字节处理。即一个32位的数据在网络传输中,先处理第一个最高位字节,接下来处理剩余最高位字节,紧接着继续处理剩余的最高位字节,最后处理最低位字节

比如:10101010 01010101 11001100 00110011
整体看,10101010 为最高字节,这个字节被处理后,数据流变为01010101 11001100 00110011
剩余部分中,01010101 位最高字节,这个字节被处理后,数据流变为11001100 00110011
可以看见剩余部分最高位字节就是11001100, 最低位字节00110011

1.5.4 UTF-8 Encoded String(UTF-8编码字符串)

原文

Text fields within the MQTT Control Packets described later are encoded as UTF-8 strings. UTF-8 [RFC3629] is an efficient encoding of Unicode [Unicode] characters that optimizes the encoding of ASCII characters in support of text-based communications.
Each of these strings is prefixed with a Two Byte Integer length field that gives the number of bytes in a UTF-8 encoded string itself, as illustrated in Figure 1.1 Structure of UTF-8 Encoded Strings below.Consequently, the maximum size of a UTF-8 Encoded String is 65,535 bytes.Unless stated otherwise all UTF-8 encoded strings can have any length in the range 0 to 65,535 bytes

释义

MQTT 控制包内文本字段是通过UTF-8编码。UTF-8编码是一种高效的Unicode字符编码方式,优化了支持文本通信的ASCII编码。 MQTT包内的文本编码字符串都会在其前边用两个字节整数,用来表示已编码字符串的长度 比如:“测试”的UTF-8编码为 11100110 10110101 10001011 11101000 10101111 10010101
那么在MQTT数据流中,将会在上边的编码前边增加两个字节,以代表其长度:00000000 00000110
最终为了传输“测试”这两个字,二进制流最终为:00000000 00000110 11100110 10110101 10001011 11101000 10101111 10010101

WX20230902-120134@2x.png

1.5.5 Variable Byte Integer(可变字节整数)

原文

The Variable Byte Integer is encoded using an encoding scheme which uses a single byte for values up to 127. Larger values are handled as follows. The least significant seven bits of each byte encode the data, and the most significant bit is used to indicate whether there are bytes following in the representation. Thus, each byte encodes 128 values and a "continuation bit". The maximum number of bytes in the Variable Byte Integer field is four. The encoded value MUST use the minimum number of bytes necessary to represent the value [MQTT-1.5.5-1]. This is shown in Table 1-1 Size of Variable Byte Integer

释义

可变字节整数是一种编码方案,采用单个字节表示0到127.更大一些的数值按照如下方式进行编码。每个字节的最低7位用来编码数据,最高一位用来表示是否存在编码数据。因此,每个字节可表示 128 个值和一个“延续位”。可变字节整数编码,最大数仅占用4个字节。
比如Java中一个int 类型,默认会占用4个字节,如果采用“可变字节”编码算法,小数字仅用一个字节即可表示

1.5.6 Binary Data(二进制数据)

原文

Binary Data is represented by a Two Byte Integer length which indicates the number of data bytes,followed by that number of bytes. Thus, the length of Binary Data is limited to the range of 0 to 65,535 Bytes.

释义

二进制数据由一个双字节整数长度表示,该长度表示数据字节数,后跟该字节数。因此,二进制数据的长度限制在 0 到 65,535 字节的范围内

1.5.7 UTF-8 String Pair(UTF-8编码的配对字符串)

原文

A UTF-8 String Pair consists of two UTF-8 Encoded Strings. This data type is used to hold name-value pairs. The first string serves as the name, and the second string contains the value. Both strings MUST comply with the requirements for UTF-8 Encoded Strings [MQTT-1.5.7-1]. If a receiver (Client or Server) receives a string pair which does not meet these requirements it is a Malformed Packet. Refer to section 4.13 for information about handling errors.

释义

UTF-8编码的配对字符串由两个 UTF-8 编码字符串组成。此数据类型用于保存名称-值对。第一个字符串用作名称,第二个字符串包含值。这两个字符串都必须符合 UTF-8 编码字符串 [MQTT-1.5.7-1] 的要求。如果接收方(客户端或服务器)收到不符合这些要求的字符串对,则为格式错误的数据包。有关处理错误的信息,请参阅第 4.13 节。