NVLink and NVSwitch

NVIDIA GPUs can be linked together in either pairs with NVLink, or larger groups using an NVSwitch configuration. This allows the data between the GPUs to be copied between the devices quicker than the system bus. Note that code must be written to specifically make use of this connection and that there is no simple way to make the two cards act as a single card.

Use NVLink with PyTorch

PyTorch has built-in support for using NVlink if you tell it to use the nccl for the torch.distributed backend. You must use the nccl backend if you’re targeting the “superpod” nodes with this feature in order to make use of the fast interconnect between the nodes.

Request nodes

In Unity, only the nodes with the following types of GPUs have NVLink available:

v100
a40
a100

Not all of the v100 nodes have NVLink, and those that do are inconsistent in their layout. The scheduler does not provide a way to request GPUs linked wih NVLink, so you need to use --exclusive to get an entire node, and then use the topology if it’s found.

The a40 nodes have GPUs linked in pairs. Since there is no way to ensure a pair, the best option is to use --exclusive and use the topology as discovered.

The a100 nodes, with the exception of the ece-gpu nodes, have NVSwitch, so all the GPUs are linked to each other at the same speed (NV4 for 4x nodes, and NV12 for 8x nodes). Additionally gpu[013-024] are connected with Infiniband (low-latency networking) so that cards across multiple nodes can talk with minimal latency.

Documentation

Documentation

NVLink and NVSwitch

Use NVLink with PyTorch

Request nodes