NVLink 是由 Nvidia 开发的基于有线的串行多通道近距离通信链路。与 PCI Express 不同,设备可以由多个 NVLink 组成,并且设备使用网状网络而不是中央集线器进行通信。该协议于 2014 年 3 月首次发布,使用专有的高速信令互连 (NVHS)
Performance[edit]
The following table shows a basic metrics comparison based upon standard specifications:
| Interconnect | Transfer rate | Line code | Effective payload rate per lane per direction | Max total lane length (PCIe: incl. 5" for PCBs) | Realized in design |
|---|---|---|---|---|---|
| PCIe 1.x | 2.5 GT/s | 8b/10b | ~0.25 GB/s | 20" = ~51 cm | |
| PCIe 2.x | 5 GT/s | 8b/10b | ~0.5 GB/s | 20" = ~51 cm | |
| PCIe 3.x | 8 GT/s | 128b/130b | ~1 GB/s | 20" = ~51 cm[6] | Pascal, Volta, Turing |
| PCIe 4.0 | 16 GT/s | 128b/130b | ~2 GB/s | 8−12" = ~20−30 cm[6] | Volta on Xavier (8x, 4x, 1x), Ampere, Power 9 |
| PCIe 5.0 | 32 GT/s[7] | 128b/130b | ~4 GB/s | Hopper | |
| PCIe 6.0 | 64 GT/s | 128b/130b | ~8 GB/s | ||
| NVLink 1.0 | 20 Gbit/s | ~2.5 GB/s | Pascal, Power 8+ | ||
| NVLink 2.0 | 25 Gbit/s | ~3.125 GB/s | Volta, NVSwitch for Volta Power 9 | ||
| NVLink 3.0 | 50 Gbit/s | ~6.25 GB/s | Ampere, NVSwitch for Ampere | ||
| NVLink 4.0 (also as C2C, chip-to-chip) | 100 Gbit/s [8] | ~6.25 GB/s | Hopper, Nvidia Grace Datacenter/Server CPU NVSwitch for Hopper |
The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options:
| Semiconductor | Board/bus delivery variant | Interconnect | Transmission technology rate (per lane) | Lanes per sub-link (out + in) | Sub-link data rate (per data direction) | Sub-link or unit count | Total data rate (out + in) | Total lanes (out + in) | Total data rate (out + in) |
|---|---|---|---|---|---|---|---|---|---|
| Nvidia GP100 | P100 SXM,[9] P100 PCI-E[10] | PCIe 3.0 | 8 GT/s | 16 + 16 Ⓑ | 128 Gbit/s = 16 GB/s | 1 | 16 + 16 GB/s[11] | 32 Ⓒ | 32 GB/s |
| Nvidia GV100 | V100 SXM2,[12] V100 PCI-E[13] | PCIe 3.0 | 8 GT/s | 16 + 16 Ⓑ | 128 Gbit/s = 16 GB/s | 1 | 16 + 16 GB/s | 32 Ⓒ | 32 GB/s |
| Nvidia TU104 | GeForce RTX 2080, Quadro RTX 5000 | PCIe 3.0 | 8 GT/s | 16 + 16 Ⓑ | 128 Gbit/s = 16 GB/s | 1 | 16 + 16 GB/s | 32 Ⓒ | 32 GB/s |
| Nvidia TU102 | GeForce RTX 2080 Ti, Quadro RTX 6000/8000 | PCIe 3.0 | 8 GT/s | 16 + 16 Ⓑ | 128 Gbit/s = 16 GB/s | 1 | 16 + 16 GB/s | 32 Ⓒ | 32 GB/s |
| Nvidia Xavier[14] | (generic) | PCIe 4.0 Ⓓ 2 units: x8 (dual) 1 unit: x4 (dual) 3 units: x1[15][16] | 16 GT/s | 8 + 8 Ⓑ 4 + 4 Ⓑ 1 + 1 | 128 Gbit/s = 16 GB/s 64 Gbit/s = 8 GB/s 16 Gbit/s = 2 GB/s | Ⓓ 2 1 3 | Ⓓ 32 + 32 GB/s 8 + 8 GB/s 6 + 6 GB/s | 40 Ⓑ | 80 GB/s |
| IBM Power9[17] | (generic) | PCIe 4.0 | 16 GT/s | 16 + 16 Ⓑ | 256 Gbit/s = 32 GB/s | 3 | 96 + 96 GB/s | 96 | 192 GB/s |
| Nvidia GA100[18][19] Nvidia GA102[20] | Ampere A100 (SXM4 & PCIe)[21] | PCIe 4.0 | 16 GT/s | 16 + 16 Ⓑ | 256 Gbit/s = 32 GB/s | 1 | 32 + 32 GB/s | 32 Ⓒ | 64 GB/s |
| Nvidia GP100 | P100 SXM, (not available with P100 PCI-E)[22] | NVLink 1.0 | 20 GT/s | 8 + 8 Ⓐ | 160 Gbit/s = 20 GB/s | 4 | 80 + 80 GB/s | 64 | 160 GB/s |
| Nvidia Xavier | (generic) | NVLink 1.0[14] | 20 GT/s[14] | 8 + 8 Ⓐ | 160 Gbit/s = 20 GB/s[23] | ||||
| IBM Power8+ | (generic) | NVLink 1.0 | 20 GT/s | 8 + 8 Ⓐ | 160 Gbit/s = 20 GB/s | 4 | 80 + 80 GB/s | 64 | 160 GB/s |
| Nvidia GV100 | V100 SXM2[24] (not available with V100 PCI-E) | NVLink 2.0 | 25 GT/s | 8 + 8 Ⓐ | 200 Gbit/s = 25 GB/s | 6[25] | 150 + 150 GB/s | 96 | 300 GB/s |
| IBM Power9[26] | (generic) | NVLink 2.0 (BlueLink ports) | 25 GT/s | 8 + 8 Ⓐ | 200 Gbit/s = 25 GB/s | 6 | 150 + 150 GB/s | 96 | 300 GB/s |
| NVSwitch for Volta[27] | (generic) (fully connected 18x18 switch) | NVLink 2.0 | 25 GT/s | 8 + 8 Ⓐ | 200 Gbit/s = 25 GB/s | 2 * 8 + 2 = 18 | 450 + 450 GB/s | 288 | 900 GB/s |
| Nvidia TU104 | GeForce RTX 2080, Quadro RTX 5000[28] | NVLink 2.0 | 25 GT/s | 8 + 8 Ⓐ | 200 Gbit/s = 25 GB/s | 1 | 25 + 25 GB/s | 16 | 50 GB/s |
| Nvidia TU102 | GeForce RTX 2080 Ti, Quadro RTX 6000/8000[28] | NVLink 2.0 | 25 GT/s | 8 + 8 Ⓐ | 200 Gbit/s = 25 GB/s | 2 | 50 + 50 GB/s | 32 | 100 GB/s |
| Nvidia GA100[18][19] | Ampere A100 (SXM4 & PCIe[21]) | NVLink 3.0 | 50 GT/s | 4 + 4 Ⓐ | 200 Gbit/s = 25 GB/s | 12[29] | 300 + 300 GB/s | 96 | 600 GB/s |
| Nvidia GA102[20] | GeForce RTX 3090 Quadro RTX A6000 | NVLink 3.0 | 28.125 GT/s | 4 + 4 Ⓐ | 112.5 Gbit/s = 14.0625 GB/s | 4 | 56.25 + 56.25 GB/s | 16 | 112.5 GB/s |
| NVSwitch for Ampere[30] | (generic) (fully connected 18x18 switch) | NVLink 3.0 | 50 GT/s | 8 + 8 Ⓐ | 400 Gbit/s = 50 GB/s | 2 * 8 + 2 = 18 | 900 + 900 GB/s | 288 | 1800 GB/s |
| NVSwitch for Hopper[30] | (fully connected 64 port switch) | NVLink 4.0 | 106.25 GT/s | 9 + 9 Ⓐ | 450 Gbit/s | 18 | 900 GB/s | 128 | 7200 GB/s |
| Nvidia Grace CPU[31] | Nvidia GH200 Superchip | PCIe-5 (4x, 16x) @ 512 GB/s | |||||||
| Nvidia Grace CPU[32] | Nvidia GH200 Superchip | NVLink-C2C @ 900 GB/s | |||||||
| Nvidia Hopper GPU[33] | Nvidia GH200 Superchip | NVLink-C2C @ 900 GB/s | |||||||
| Nvidia Hopper GPU[34] | Nvidia GH200 Superchip | NVLink 4 (18x) @ 900 GB/s |
Note: Data rate columns were rounded by being approximated by transmission rate, see real world performance paragraph
-
- Ⓐ: sample value; NVLink sub-link bundling should be possible
- Ⓑ: sample value; other fractions for the PCIe lane usage should be possible
- Ⓒ: a single (no! 16) PCIe lane transfers data over a differential pair
- Ⓓ: various limitations of finally possible combinations might apply due to chip pin muxing and board design
- dual: interface unit can either be configured as a root hub or an end point
- generic: bare semiconductor without any board design specific restrictions applied