NVLink是什么

358 阅读5分钟

NVLink 是由 Nvidia 开发的基于有线的串行多通道近距离通信链路。与 PCI Express 不同,设备可以由多个 NVLink 组成,并且设备使用网状网络而不是中央集线器进行通信。该协议于 2014 年 3 月首次发布,使用专有的高速信令互连 (NVHS)

Performance[edit]

The following table shows a basic metrics comparison based upon standard specifications:

InterconnectTransfer rateLine codeEffective payload rate per lane per directionMax total lane length (PCIe: incl. 5" for PCBs)Realized in design
PCIe 1.x2.5 GT/s8b/10b~0.25 GB/s20" = ~51 cm
PCIe 2.x5 GT/s8b/10b~0.5 GB/s20" = ~51 cm
PCIe 3.x8 GT/s128b/130b~1 GB/s20" = ~51 cm[6]Pascal, Volta, Turing
PCIe 4.016 GT/s128b/130b~2 GB/s8−12" = ~20−30 cm[6]Volta on Xavier (8x, 4x, 1x), Ampere, Power 9
PCIe 5.032 GT/s[7]128b/130b~4 GB/sHopper
PCIe 6.064 GT/s128b/130b~8 GB/s
NVLink 1.020 Gbit/s~2.5 GB/sPascal, Power 8+
NVLink 2.025 Gbit/s~3.125 GB/sVolta, NVSwitch for Volta Power 9
NVLink 3.050 Gbit/s~6.25 GB/sAmpere, NVSwitch for Ampere
NVLink 4.0 (also as C2C, chip-to-chip)100 Gbit/s [8]~6.25 GB/sHopper, Nvidia Grace Datacenter/Server CPU NVSwitch for Hopper

The following table shows a comparison of relevant bus parameters for real world semiconductors that all offer NVLink as one of their options:

SemiconductorBoard/bus delivery variantInterconnectTransmission technology rate (per lane)Lanes per sub-link (out + in)Sub-link data rate (per data direction)Sub-link or unit countTotal data rate (out + in)Total lanes (out + in)Total data rate (out + in)
Nvidia GP100P100 SXM,[9] P100 PCI-E[10]PCIe 3.0GT/s16 + 16 128 Gbit/s = 16 GB/s116 + 16 GB/s[11]32 32 GB/s
Nvidia GV100V100 SXM2,[12] V100 PCI-E[13]PCIe 3.0GT/s16 + 16 128 Gbit/s = 16 GB/s116 + 16 GB/s32 32 GB/s
Nvidia TU104GeForce RTX 2080, Quadro RTX 5000PCIe 3.0GT/s16 + 16 128 Gbit/s = 16 GB/s116 + 16 GB/s32 32 GB/s
Nvidia TU102GeForce RTX 2080 Ti, Quadro RTX 6000/8000PCIe 3.0GT/s16 + 16 128 Gbit/s = 16 GB/s116 + 16 GB/s32 32 GB/s
Nvidia Xavier[14](generic)PCIe 4.0 Ⓓ 2 units: x8 (dual) 1 unit: x4 (dual) 3 units: x1[15][16]16 GT/s8 + 8  4 + 4  1 + 1128 Gbit/s = 16 GB/s 64 Gbit/s = 8 GB/s 16 Gbit/s = 2 GB/sⒹ 2 1 3Ⓓ 32 + 32 GB/s 8 + 8 GB/s 6 + 6 GB/s40 80 GB/s
IBM Power9[17](generic)PCIe 4.016 GT/s16 + 16 256 Gbit/s = 32 GB/s396 + 96 GB/s96192 GB/s
Nvidia GA100[18][19] Nvidia GA102[20]Ampere A100 (SXM4 & PCIe)[21]PCIe 4.016 GT/s16 + 16 256 Gbit/s = 32 GB/s132 + 32 GB/s32 64 GB/s
Nvidia GP100P100 SXM, (not available with P100 PCI-E)[22]NVLink 1.020 GT/s8 + 8 160 Gbit/s = 20 GB/s480 + 80 GB/s64160 GB/s
Nvidia Xavier(generic)NVLink 1.0[14]20 GT/s[14]8 + 8 160 Gbit/s = 20 GB/s[23]
IBM Power8+(generic)NVLink 1.020 GT/s8 + 8 160 Gbit/s = 20 GB/s480 + 80 GB/s64160 GB/s
Nvidia GV100V100 SXM2[24] (not available with V100 PCI-E)NVLink 2.025 GT/s8 + 8 200 Gbit/s = 25 GB/s6[25]150 + 150 GB/s96300 GB/s
IBM Power9[26](generic)NVLink 2.0 (BlueLink ports)25 GT/s8 + 8 200 Gbit/s = 25 GB/s6150 + 150 GB/s96300 GB/s
NVSwitch for Volta[27](generic) (fully connected 18x18 switch)NVLink 2.025 GT/s8 + 8 200 Gbit/s = 25 GB/s2 * 8 + 2 = 18450 + 450 GB/s288900 GB/s
Nvidia TU104GeForce RTX 2080, Quadro RTX 5000[28]NVLink 2.025 GT/s8 + 8 200 Gbit/s = 25 GB/s125 + 25 GB/s1650 GB/s
Nvidia TU102GeForce RTX 2080 Ti, Quadro RTX 6000/8000[28]NVLink 2.025 GT/s8 + 8 200 Gbit/s = 25 GB/s250 + 50 GB/s32100 GB/s
Nvidia GA100[18][19]Ampere A100 (SXM4 & PCIe[21])NVLink 3.050 GT/s4 + 4 200 Gbit/s = 25 GB/s12[29]300 + 300 GB/s96600 GB/s
Nvidia GA102[20]GeForce RTX 3090 Quadro RTX A6000NVLink 3.028.125 GT/s4 + 4 112.5 Gbit/s = 14.0625 GB/s456.25 + 56.25 GB/s16112.5 GB/s
NVSwitch for Ampere[30](generic) (fully connected 18x18 switch)NVLink 3.050 GT/s8 + 8 400 Gbit/s = 50 GB/s2 * 8 + 2 = 18900 + 900 GB/s2881800 GB/s
NVSwitch for Hopper[30](fully connected 64 port switch)NVLink 4.0106.25 GT/s9 + 9 450 Gbit/s18900 GB/s1287200 GB/s
Nvidia Grace CPU[31]Nvidia GH200 SuperchipPCIe-5 (4x, 16x) @ 512 GB/s
Nvidia Grace CPU[32]Nvidia GH200 SuperchipNVLink-C2C @ 900 GB/s
Nvidia Hopper GPU[33]Nvidia GH200 SuperchipNVLink-C2C @ 900 GB/s
Nvidia Hopper GPU[34]Nvidia GH200 SuperchipNVLink 4 (18x) @ 900 GB/s

Note: Data rate columns were rounded by being approximated by transmission rate, see real world performance paragraph

    • : sample value; NVLink sub-link bundling should be possible
    • : sample value; other fractions for the PCIe lane usage should be possible
    • : a single (no! 16) PCIe lane transfers data over a differential pair
    • : various limitations of finally possible combinations might apply due to chip pin muxing and board design
    • dual: interface unit can either be configured as a root hub or an end point
    • generic: bare semiconductor without any board design specific restrictions applied