NVLink是什么NVLink 是由 Nvidia 开发的基于有线的串行多通道近距离通信链路。与 PCI Express

NVLink 是由 Nvidia 开发的基于有线的串行多通道近距离通信链路。与 PCI Express 不同，设备可以由多个 NVLink 组成，并且设备使用网状网络而不是中央集线器进行通信。该协议于 2014 年 3 月首次发布，使用专有的高速信令互连（NVHS）

Performance[edit]

The following table shows a basic metrics comparison based upon standard specifications:

Interconnect	Transfer rate	Line code	Effective payload rate per lane per direction	Max total lane length (PCIe: incl. 5" for PCBs)	Realized in design
PCIe 1.x	2.5 GT/s	8b/10b	~0.25 GB/s	20" = ~51 cm
PCIe 2.x	5 GT/s	8b/10b	~0.5 GB/s	20" = ~51 cm
PCIe 3.x	8 GT/s	128b/130b	~1 GB/s	20" = ~51 cm[6]	Pascal, Volta, Turing
PCIe 4.0	16 GT/s	128b/130b	~2 GB/s	8−12" = ~20−30 cm[6]	Volta on Xavier (8x, 4x, 1x), Ampere, Power 9
PCIe 5.0	32 GT/s[7]	128b/130b	~4 GB/s		Hopper
PCIe 6.0	64 GT/s	128b/130b	~8 GB/s
NVLink 1.0	20 Gbit/s		~2.5 GB/s		Pascal, Power 8+
NVLink 2.0	25 Gbit/s		~3.125 GB/s		Volta, NVSwitch for Volta Power 9
NVLink 3.0	50 Gbit/s		~6.25 GB/s		Ampere, NVSwitch for Ampere
NVLink 4.0 (also as C2C, chip-to-chip)	100 Gbit/s [8]		~6.25 GB/s		Hopper, Nvidia Grace Datacenter/Server CPU NVSwitch for Hopper

The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options:

Semiconductor	Board/bus delivery variant	Interconnect	Transmission technology rate (per lane)	Lanes per sub-link (out + in)	Sub-link data rate (per data direction)	Sub-link or unit count	Total data rate (out + in)	Total lanes (out + in)	Total data rate (out + in)
Nvidia GP100	P100 SXM,[9] P100 PCI-E[10]	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GB/s	1	16 + 16 GB/s[11]	32 Ⓒ	32 GB/s
Nvidia GV100	V100 SXM2,[12] V100 PCI-E[13]	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GB/s	1	16 + 16 GB/s	32 Ⓒ	32 GB/s
Nvidia TU104	GeForce RTX 2080, Quadro RTX 5000	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GB/s	1	16 + 16 GB/s	32 Ⓒ	32 GB/s
Nvidia TU102	GeForce RTX 2080 Ti, Quadro RTX 6000/8000	PCIe 3.0	8 GT/s	16 + 16 Ⓑ	128 Gbit/s = 16 GB/s	1	16 + 16 GB/s	32 Ⓒ	32 GB/s
Nvidia Xavier[14]	(generic)	PCIe 4.0 Ⓓ 2 units: x8 (dual) 1 unit: x4 (dual) 3 units: x1[15][16]	16 GT/s	8 + 8 Ⓑ 4 + 4 Ⓑ 1 + 1	128 Gbit/s = 16 GB/s 64 Gbit/s = 8 GB/s 16 Gbit/s = 2 GB/s	Ⓓ 2 1 3	Ⓓ 32 + 32 GB/s 8 + 8 GB/s 6 + 6 GB/s	40 Ⓑ	80 GB/s
IBM Power9 [17]	(generic)	PCIe 4.0	16 GT/s	16 + 16 Ⓑ	256 Gbit/s = 32 GB/s	3	96 + 96 GB/s	96	192 GB/s
Nvidia GA100[18][19] Nvidia GA102[20]	Ampere A100 (SXM4 & PCIe)[21]	PCIe 4.0	16 GT/s	16 + 16 Ⓑ	256 Gbit/s = 32 GB/s	1	32 + 32 GB/s	32 Ⓒ	64 GB/s
Nvidia GP100	P100 SXM, (not available with P100 PCI-E)[22]	NVLink 1.0	20 GT/s	8 + 8 Ⓐ	160 Gbit/s = 20 GB/s	4	80 + 80 GB/s	64	160 GB/s
Nvidia Xavier	(generic)	NVLink 1.0[14]	20 GT/s[14]	8 + 8 Ⓐ	160 Gbit/s = 20 GB/s[23]
IBM Power8+	(generic)	NVLink 1.0	20 GT/s	8 + 8 Ⓐ	160 Gbit/s = 20 GB/s	4	80 + 80 GB/s	64	160 GB/s
Nvidia GV100	V100 SXM2[24] (not available with V100 PCI-E)	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GB/s	6[25]	150 + 150 GB/s	96	300 GB/s
IBM Power9 [26]	(generic)	NVLink 2.0 (BlueLink ports)	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GB/s	6	150 + 150 GB/s	96	300 GB/s
NVSwitch for Volta[27]	(generic) (fully connected 18x18 switch)	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GB/s	2 * 8 + 2 = 18	450 + 450 GB/s	288	900 GB/s
Nvidia TU104	GeForce RTX 2080, Quadro RTX 5000[28]	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GB/s	1	25 + 25 GB/s	16	50 GB/s
Nvidia TU102	GeForce RTX 2080 Ti, Quadro RTX 6000/8000[28]	NVLink 2.0	25 GT/s	8 + 8 Ⓐ	200 Gbit/s = 25 GB/s	2	50 + 50 GB/s	32	100 GB/s
Nvidia GA100[18][19]	Ampere A100 (SXM4 & PCIe[21])	NVLink 3.0	50 GT/s	4 + 4 Ⓐ	200 Gbit/s = 25 GB/s	12[29]	300 + 300 GB/s	96	600 GB/s
Nvidia GA102[20]	GeForce RTX 3090 Quadro RTX A6000	NVLink 3.0	28.125 GT/s	4 + 4 Ⓐ	112.5 Gbit/s = 14.0625 GB/s	4	56.25 + 56.25 GB/s	16	112.5 GB/s
NVSwitch for Ampere[30]	(generic) (fully connected 18x18 switch)	NVLink 3.0	50 GT/s	8 + 8 Ⓐ	400 Gbit/s = 50 GB/s	2 * 8 + 2 = 18	900 + 900 GB/s	288	1800 GB/s
NVSwitch for Hopper[30]	(fully connected 64 port switch)	NVLink 4.0	106.25 GT/s	9 + 9 Ⓐ	450 Gbit/s	18	900 GB/s	128	7200 GB/s
Nvidia Grace CPU[31]	Nvidia GH200 Superchip	PCIe-5 (4x, 16x) @ 512 GB/s
Nvidia Grace CPU[32]	Nvidia GH200 Superchip	NVLink-C2C @ 900 GB/s
Nvidia Hopper GPU[33]	Nvidia GH200 Superchip	NVLink-C2C @ 900 GB/s
Nvidia Hopper GPU[34]	Nvidia GH200 Superchip	NVLink 4 (18x) @ 900 GB/s

Note: Data rate columns were rounded by being approximated by transmission rate, see real world performance paragraph

- Ⓐ: sample value; NVLink sub-link bundling should be possible
- Ⓑ: sample value; other fractions for the PCIe lane usage should be possible
- Ⓒ: a single (no! 16) PCIe lane transfers data over a differential pair
- Ⓓ: various limitations of finally possible combinations might apply due to chip pin muxing and board design
- dual: interface unit can either be configured as a root hub or an end point
- generic: bare semiconductor without any board design specific restrictions applied