聊聊UUID

5,737 阅读4分钟

格式

UUID(universally unique identifier) 长度是128 bit,也就是由32个16进制数值组成。其中 M 表示 version,N 表示 Variants xxxxxxxx-xxxx-Mxxx-Nxxx-xxxxxxxxxxxx。例如:5aadc328-8d5e-11ec-8a00-acde48001122

版本

  • Version 1 (date-time and MAC address)
  • Version 2 (date-time and MAC address, DCE security version)
  • Versions 3 and 5 (namespace name-based)
  • Version 4 (random)

Version 1 (date-time and MAC address)

version1 的 UUID 是基于时间戳和 Mac 地址的。先用 ifconfig -v en9查看一下 Mac地址为:acde48001122

en9: flags=8863<UP,BROADCAST,SMART,RUNNING,SIMPLEX,MULTICAST> mtu 1500 index 4
	eflags=21000080<TXSTART,ECN_ENABLE,DIRECTLINK>
	ether ac:de:48:00:11:22
	inet6 fe80::aede:48ff:fe00:1122%en9 prefixlen 64 scopeid 0x4
	nd6 options=201<PERFORMNUD,DAD>
	media: autoselect (100baseTX <full-duplex>)
	status: active
	type: Ethernet
	link quality: 100 (good)
	state availability: 0 (true)
	scheduler: FQ_CODEL
	link rate: 100.00 Mbps
	qosmarking enabled: no mode: none

用 Python 生成的 UUID

>>> import uuid;
>>> uuid.uuid1();
UUID('5aadc328-8d5e-11ec-8a00-acde48001122')

Java 在 JDK 的工具包中,默认只提供了 V3 和 V4 版本的UUID实现。

V3
    public static UUID nameUUIDFromBytes(byte[] name) {
        MessageDigest md;
        try {
            md = MessageDigest.getInstance("MD5");
        } catch (NoSuchAlgorithmException nsae) {
            throw new InternalError("MD5 not supported", nsae);
        }
        byte[] md5Bytes = md.digest(name);
        md5Bytes[6]  &= 0x0f;  /* clear version        */
        md5Bytes[6]  |= 0x30;  /* set to version 3     */
        md5Bytes[8]  &= 0x3f;  /* clear variant        */
        md5Bytes[8]  |= 0x80;  /* set to IETF variant  */
        return new UUID(md5Bytes);
    }
    
V4
    public static UUID randomUUID() {
        SecureRandom ng = Holder.numberGenerator;

        byte[] randomBytes = new byte[16];
        ng.nextBytes(randomBytes);
        randomBytes[6]  &= 0x0f;  /* clear version        */
        randomBytes[6]  |= 0x40;  /* set to version 4     */
        randomBytes[8]  &= 0x3f;  /* clear variant        */
        randomBytes[8]  |= 0x80;  /* set to IETF variant  */
        return new UUID(randomBytes);
    }

我们依赖一个 Java 库来看一下生成的 V1 版本的 UUID,依赖如下:

    <dependency>
        <groupId>com.fasterxml.uuid</groupId>
        <artifactId>java-uuid-generator</artifactId>
        <version>4.0.1</version>
    </dependency>

生成 V1 版本的 UUID。

java-uuid-generator

UUID uuid = Generators.timeBasedGenerator().generate();

991c146f-8f07-11ec-93eb-3d5453c2d114

可以看到 Python 生成的 UUID 和 Java 生成的 UUID 的 Mac 地址并不相同。Python 是真实的 Mac 地址,如果我们去看 Java 的生成代码会发现,它的 Mac 地址是随机生成的。

我们分析一下 5aadc328-8d5e-11ec-8a00-acde48001122的结构

名称实例长度(hex digits)说明
time_low5aadc3288
time_mid8d5e4
time_hi_and_version11ec41(version)+1ec(time_hi)
variant and clock_sequence8a0041 to 3-bit "variant" in the most significant bits, followed by the 13 to 15-bit clock sequence (from wikipedia)

This field is composed of a varying number of bits. 0 - - Reserved for NCS backward compatibility 1 0 - The IETF aka Leach-Salz variant (used by this class) 1 1 0 Reserved, Microsoft backward compatibility 1 1 1 Reserved for future definition. (from Java comments)

clock sequence 目的是为了避免重复,理解为随机数据就好 | | node | acde48001122 | 12 | Mac地址 |

根据 time_low、time_mid、time_hi 我们计算一下 UUID 的生成时间。 首先拼接出完成时间 time_hi+time_mid+time_low 即 1ec+8d5e+5aadc328 转换成十进制 image.png 然后在通过 Java 获取一下时间戳,可以发现是一致的。

UUID uuid = UUID.fromString("5aadc328-8d5e-11ec-8a00-acde48001122");
System.out.println(uuid.timestamp());

138641124929422120

看看时间戳是怎么来的,它是从 **1582年10月15日 **开始每隔100纳秒加1。

being the number of 100-nanosecond intervals since midnight 15 October 1582 Coordinated Universal Time (UTC), the date on which the Gregorian calendar was first adopted

最后再来转成当前的时间:

UUID uuid = UUID.fromString("5aadc328-8d5e-11ec-8a00-acde48001122");
Calendar uuidEpoch = Calendar.getInstance(TimeZone.getTimeZone("UTC"));
uuidEpoch.clear();
uuidEpoch.set(1582, 9, 15, 0, 0, 0); // 9 = October
long epochMillis = uuidEpoch.getTime().getTime();
long time = (uuid.timestamp() / 10000L) + epochMillis;
System.out.println(time);

1644819692942

image.png

Version 2 (date-time and MAC address, DCE security version)

UUID Version 2 与 Version 1 类似,Version 2 的 clock_sequence 的低 8 位和 timestamp 的最低 32 位进行了修改,关于 Version 2,RFC 4122 提供的信息也比较少,有的语言也没有实现 UUID Version 2。

Python 2.7.18
    
>>> import uuid;
>>> uuid.uuid1();
UUID('5aadc328-8d5e-11ec-8a00-acde48001122')
>>> >>> uuid.uuid2();
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: 'module' object has no attribute 'uuid2'

Versions 3 and 5 (namespace name-based)

v3 和 v5 都是基于 namespace 的 v3 = MD5(namespace)

Java v3 实现
	public static UUID nameUUIDFromBytes(byte[] name) {
        MessageDigest md;
        try {
            md = MessageDigest.getInstance("MD5");
        } catch (NoSuchAlgorithmException nsae) {
            throw new InternalError("MD5 not supported", nsae);
        }
        byte[] md5Bytes = md.digest(name);
        md5Bytes[6]  &= 0x0f;  /* clear version        */
        md5Bytes[6]  |= 0x30;  /* set to version 3     */
        md5Bytes[8]  &= 0x3f;  /* clear variant        */
        md5Bytes[8]  |= 0x80;  /* set to IETF variant  */
        return new UUID(md5Bytes);
    }

v5 = SHA1(namespace)

Version 4 (random)

Java v4 实现
	public static UUID randomUUID() {
        SecureRandom ng = Holder.numberGenerator;

        byte[] randomBytes = new byte[16];
        ng.nextBytes(randomBytes);
        randomBytes[6]  &= 0x0f;  /* clear version        */
        randomBytes[6]  |= 0x40;  /* set to version 4     */
        randomBytes[8]  &= 0x3f;  /* clear variant        */
        randomBytes[8]  |= 0x80;  /* set to IETF variant  */
        return new UUID(randomBytes);
    }

Nil UUID

00000000-0000-0000-0000-000000000000

参考

en.wikipedia.org/wiki/Univer… www.ietf.org/rfc/rfc4122… stackoverflow.com/questions/1… python.iitter.com/other/19409…