GPUs on Unity
Graphics Processing Units (GPUs) provide a powerful tool to run code in parallel at a larger scale than traditional CPU parallel workload. However, this large-scale parallel workload comes at a tradeoff with slower communication times. It is important to note that using one or more GPUs does not guarantee that code will run faster, although many popular software packages have been modified to incorporate GPUs for better performance.
Available GPU resources
Device | Arch | Max Compute Capability* | Max VRAM |
---|---|---|---|
NVIDIA GeForce GTX TITAN X | Maxwell | sm_52 | vram12 |
Tesla M40 24GB | Maxwell | sm_52 | vram23 |
NVIDIA GeForce GTX 1080 Ti | Pascal | sm_61 | vram11 |
Tesla V100-PCIE-16GB | Volta | sm_70 | vram16 |
Tesla V100-SXM2-16GB | Volta | sm_70 | vram16 |
Tesla V100-SXM2-32GB | Volta | sm_70 | vram32 |
NVIDIA GeForce RTX 2080 | Turing | sm_75 | vram8 |
NVIDIA GeForce RTX 2080 Ti | Turing | sm_75 | vram11 |
Quadro RTX 8000 | Turing | sm_75 | vram48 |
NVIDIA A100-PCIE-40GB | Ampere | sm_80 | vram40 |
NVIDIA A100-SXM4-80GB | Ampere | sm_80 | vram80 |
NVIDIA A40 | Ampere | sm_86 | vram48 |
NVIDIA GH200 | Hopper | sm_90 | vram95 |
NVIDIA L40S | Ada Lovelace | sm_89 | vram48 |
*: CUDA Compute Capability provides some details on which features may or may not be available for particular GPUs. More details about compute capabilities for all NVIDIA GPUs can be found on the NVIDIA Compute Capability page and a complete list of features availble with each compute capability can be found in the CUDA Documentation. As new features come out, older GPUs on Unity may become deprecated, which will be included on this page as new versions are released.
Request GPU resources
You can request GPU access on Unity through Slurm either for an interactive job or using a batch script, as shown in the following examples.
Interactive job
srun -p gpu-preempt -t 02:00:00 --gpus=1 --pty /bin/bash
Batch script
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=1 # Request access to 1 GPU
$SBATCH --constraint=2080ti # Request access to a 2080ti GPU
./myscript.sh
To select specific GPUs, use the --constraint
flags with Slurm,
or add the gpu type to --gpus
.
Using --constraint
allows you to select multiple possible GPUs that fulfill the requirements.
You can use:
--constraint=[2080|2080ti]
, which is best if you are using GPUs across more than one node to ensure the same model is used across all entire job- or
--constraint=sm_70&vram12
The available GPU constraints are listed below:
- 2080ti
- 1080ti
- 2080
- titanx
- m40
- rtx8000
- v100
- a100
- a40
- l40s
- gh200
Batch script with specific GPU
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=2080ti:1 # Request access to 1 2080tiGPU
./myscript.sh
Batch script with constraint
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=1 # Request access to 1 2080tiGPU
#SBATCH --constraint=2080ti
./myscript.sh
Batch script with constraint specifying multiple options
#!/bin/bash
#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00 # Set max job time for 2 hours
#SBATCH --gpus=1 # Request access to 1 2080tiGPU
#SBATCH --constraint=2080ti|1080ti|2080
./myscript.sh
How to choose a GPU
To reduce time a job is spent waiting in the queue, select the least powerful GPU that can run your code. The choice of GPU is typically limited by the amount of available GPU memory.
The following are some general guidelines for choosing GPUs:
- GeForce GPUs are good for any lower memory tasks or prototyping.
- Select the minimum amount of vram that can fit your needs.
- The large vram GPUs are often in high demand. Be prepared for your jobs to spend some time in the queue before being able to launch.
- Whenever possible, use constraints to specify the necessary GPUs.
How to choose a partition
Partitions are a required option for all GPU jobs and can be specified using either -p
or --partition=
. When listing a partition, more than one may be specified and listed in a comma-separated list. Slurm will attempt to use these partitions in priority order to allocate the specified resources.
- Jobs that require less than 2 hours: gpu-preempt, priority partitions
- Jobs that cannot be preempted, and require up to 48 hours: gpu, priority partitions
- Jobs that cannot be preempted, and require more than 48 hours: additionally specify
--qos=long
For an informative list of partitions on Unity, see Unity Partitions.
GPU-enabled software
The following sections include useful information about GPU-enabled software, such as CUDA, cuDNN, and OpenMPI. Some software, such as TensorFlow, requires setting up the environment in a specific way. The Set up a Tensorflow GPU environment section will guide you through how to use a conda environment to set up a TensorFlow GPU environment.
CUDA
CUDA is NVIDIA’s parallel computing platform. A version of CUDA will typically be required to be loaded for most GPU jobs because it allows access to the NVIDIA compiler suite (nvcc, nvfortran) and the NVIDIA GPU profiling tool (nsys).
Available versions of cuda can be be listed using module spider cuda
cuDNN
cuDNN is the Cuda Deep Neural Network library, often used to accelerate deep learning frameworks in Keras, PyTorch, TensorFlow, and others.
OpenMPI
OpenMPI includes the OpenMPI compilers for MPI compiled against the cuda compilers. OpenMPI is necessary to use if software that uses both MPI and GPU acceleration.
Many programming languages are able to use one or more GPUs, including:
- Python
- Matlab
- Julia
- C++ (using Cuda or OpenACC)
- Fortran (using Cuda or OpenACC)
- C (using Cuda or OpenACC)
Set up a TensorFlow GPU environment
Some software, especially with python, requires setting up the environment in a specific way. For python programs that can use GPU, such as TensorFlow, it is best to use a conda environment.
The following steps will show you how to set up a conda environment for TensorFlow:
Request an interactive session with a GPU node using the following command:
srun -t 01:00:00 -p gpu-preempt --gpus=1 --mem=16G --pty /bin/bash
Load modules using the following commands:
module load conda/latest module load cuda/11.4.0 module load cudnn/cuda11-8.4.1.50
Create and activate the environment using the following commands:
conda create --name TensorFlow-env python=3.9
conda activate TensorFlow-env pip install TensorFlow pip install tensorrt conda install ipykernel
- TensorFlow 2 requires a python version of at least 3.9
- If you do not request enough memory, TensorRT will fail to install.
Add the environment to Jupyter using the following command:
python -m ipykernel install --user --name TensorFlow-env --display-name="TensorFlow-Env"
A new kernel with the name TensorFlow-Env appears with new Open OnDemand sessions.
Track GPU power usage
To track the power usage of GPUs being used by your jobs, use the following command:
nvidia-smi --query-gpu=power,draw --format=csv --loop-ms=100
Troubleshoot problems with GPUs
The first two options require connecting to the node(s) on which your job is running. See Monitoring a batch job for details on how to do that.
Nvidia-smi
To view ongoing GPU processes, use nvidia-smi pmon
. This command can run on any GPU node without needing to install any additional software.
If you are getting error messages, add the following command to your scripts to find out which GPU is being used:
nvidia-smi -L
Nvitop
Nvitop offers an interactive view of ongoing process for NVIDIA GPUs. It is available in the default PATH for x86 architecture.
To install nvitop via pip, use the command pip install nvitop
. Once nvitop is installed, it can be used either as a standalone process or within a python script for a more detailed analysis.
Common errors
Out of Memory: (ex. CUDA_ERROR_OUT_OF_MEMORY
, torch.cuda.OutOfMemory) means that a GPU with more available VRAM may be necessary, or that the code being run should be modified to reduce the memory usage. For machine learning models running out of memory, try reducing the batch size or ensuring your data management is optimized.
For other software, check the documentation on controlling GPU memory usage.