Unity
Unity
About
News
Events
Docs
Contact Us
code
search
login
Unity
Unity
About
News
Events
Docs
Contact Us
dark_mode
light_mode
code login
search

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
      • Gypsum
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • AlpacaFarm
      • audioset
      • biomed_clip
      • blip_2
      • blip_2
      • coco
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DINO v2
      • epic-kitchens
      • florence
      • gemma
      • gpt
      • gte-Qwen2
      • ibm-granite
      • Idefics2
      • Imagenet 1K
      • inaturalist
      • infly
      • instruct-blip
      • intfloat
      • LAION
      • linq
      • llama
      • Llama2
      • llama3
      • llama4
      • Llava_OneVision
      • Lumina
      • mixtral
      • msmarco
      • natural-questions
      • objaverse
      • openai-whisper
      • pythia
      • qwen
      • R1-1776
      • rag-sequence-nq
      • red-pajama-v2
      • s1-32B
      • satlas_pretrain
      • scalabilityai
      • SlimPajama
      • t5
      • Tulu
      • V2X
      • video-MAE
      • videoMAE-v2
      • vit
      • wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • dfam
      • EggNOG
      • EggNOG
      • gmap
      • GMAP-GSNAP database (human genome)
      • GTDB
      • igenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • NCBI RefSeq database
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • params
      • PDB70
      • PDB70 for ColabFold
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
      • Gypsum
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • AlpacaFarm
      • audioset
      • biomed_clip
      • blip_2
      • blip_2
      • coco
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DINO v2
      • epic-kitchens
      • florence
      • gemma
      • gpt
      • gte-Qwen2
      • ibm-granite
      • Idefics2
      • Imagenet 1K
      • inaturalist
      • infly
      • instruct-blip
      • intfloat
      • LAION
      • linq
      • llama
      • Llama2
      • llama3
      • llama4
      • Llava_OneVision
      • Lumina
      • mixtral
      • msmarco
      • natural-questions
      • objaverse
      • openai-whisper
      • pythia
      • qwen
      • R1-1776
      • rag-sequence-nq
      • red-pajama-v2
      • s1-32B
      • satlas_pretrain
      • scalabilityai
      • SlimPajama
      • t5
      • Tulu
      • V2X
      • video-MAE
      • videoMAE-v2
      • vit
      • wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • dfam
      • EggNOG
      • EggNOG
      • gmap
      • GMAP-GSNAP database (human genome)
      • GTDB
      • igenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • NCBI RefSeq database
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • params
      • PDB70
      • PDB70 for ColabFold
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

On this page

  • Available GPU resources
  • Request GPU resources
    • Interactive job
    • Batch script
    • Batch script with specific GPU
    • Batch script with constraint
    • Batch script with constraint specifying multiple options
    • How to choose a GPU
    • How to choose a partition
  • GPU-enabled software
    • CUDA
    • cuDNN
    • OpenMPI
    • Set up a TensorFlow GPU environment
  • Track GPU power usage
  • Troubleshoot problems with GPUs
    • Nvidia-smi
    • Nvitop
    • Common errors
  1. Unity
  2. Documentation
  3. Tools & Software
  4. Unity GPUs

GPUs on Unity

Graphics Processing Units (GPUs) provide a powerful tool to run code in parallel at a larger scale than traditional CPU parallel workload. However, this large-scale parallel workload comes at a tradeoff with slower communication times. It is important to note that using one or more GPUs does not guarantee that code will run faster, although many popular software packages have been modified to incorporate GPUs for better performance.

Available GPU resources

DeviceArchMax Compute Capability*Max VRAMConstraint name
NVIDIA GeForce GTX TITAN XMaxwellsm_52vram12titanx
Tesla M40 24GBMaxwellsm_52vram23m40
NVIDIA GeForce GTX 1080 TiPascalsm_61vram111080ti
Tesla V100-PCIE-16GBVoltasm_70vram16v100
Tesla V100-SXM2-16GBVoltasm_70vram16v100
Tesla V100-SXM2-32GBVoltasm_70vram32v100
NVIDIA GeForce RTX 2080Turingsm_75vram82080
NVIDIA GeForce RTX 2080 TiTuringsm_75vram112080ti
Quadro RTX 8000Turingsm_75vram48rtx8000
NVIDIA A100-PCIE-40GBAmperesm_80vram40a100, a100-40g
NVIDIA A100-SXM4-80GBAmperesm_80vram80a100, a100-80g
NVIDIA A16Amperesm_86vram16a16
NVIDIA A40Amperesm_86vram48a40
NVIDIA GH200Hoppersm_90vram102gh200
NVIDIA L40SAda Lovelacesm_89vram48l40s
NVIDIA L4Ada Lovelacesm_89vram23l4
stylus_note
Max compute capability and max VRAM represent the largest constraint that the gpu can satisfy. Most constraints with a lower compute capability and lower VRAM will be satisfied by GPUs with a minimum of the specified requirement.

*: CUDA Compute Capability provides some details on which features may or may not be available for particular GPUs. More details about compute capabilities for all NVIDIA GPUs can be found on the NVIDIA Compute Capability page and a complete list of features availble with each compute capability can be found in the CUDA Documentation. As new features come out, older GPUs on Unity may become deprecated, which will be included on this page as new versions are released.

Request GPU resources

You can request GPU access on Unity through Slurm either for an interactive job or using a batch script, as shown in the following examples.

stylus_note
Not all software is able to use GPUs, and some software will require special options, dependencies, or alternate versions to be able to run with GPUs. Please ensure your software supports GPU use before requesting these resources.

Interactive job

salloc -p gpu-preempt -t 02:00:00 --gpus=1 --mem=8G

Batch script

#!/bin/bash

#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00    # Set max job time for 2 hours
#SBATCH --gpus=1       # Request access to 1 GPU
$SBATCH --constraint=2080ti # Request access to a 2080ti GPU

./myscript.sh

To select specific GPUs, use the --constraint flags with Slurm, or add the gpu type to --gpus.

lightbulb

Using --constraint allows you to select multiple possible GPUs that fulfill the requirements.

You can use:

  • --constraint=[2080|2080ti], which is best if you are using GPUs across more than one node to ensure the same model is used across all entire job
  • or --constraint=sm_70&vram12
lightbulb
Not all GPU types are available in every Partition

Batch script with specific GPU

#!/bin/bash

#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00    # Set max job time for 2 hours
#SBATCH --gpus=2080ti:1       # Request access to 1 2080tiGPU

./myscript.sh

Batch script with constraint

#!/bin/bash

#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00    # Set max job time for 2 hours
#SBATCH --gpus=1       # Request access to 1 2080tiGPU
#SBATCH --constraint=2080ti

./myscript.sh

Batch script with constraint specifying multiple options

#!/bin/bash

#SBATCH -p gpu-preempt # Submit job to to gpu-preempt partition
#SBATCH -t 02:00:00    # Set max job time for 2 hours
#SBATCH --gpus=1       # Request access to 1 2080tiGPU
#SBATCH --constraint=2080ti|1080ti|2080

./myscript.sh

How to choose a GPU

To reduce time a job is spent waiting in the queue, select the least powerful GPU that can run your code. The choice of GPU is typically limited by the amount of available GPU memory.

The following are some general guidelines for choosing GPUs:

  • GeForce GPUs are good for any lower memory tasks or prototyping.
  • Select the minimum amount of vram that can fit your needs.
  • The large vram GPUs are often in high demand. Be prepared for your jobs to spend some time in the queue before being able to launch.
  • Whenever possible, use constraints to specify the necessary GPUs.

How to choose a partition

Partitions are a required option for all GPU jobs and can be specified using either -p or --partition=. When listing a partition, more than one may be specified and listed in a comma-separated list. Slurm will attempt to use these partitions in priority order to allocate the specified resources.

  • Jobs that require less than 2 hours: gpu-preempt, priority partitions
  • Jobs that cannot be preempted, and require up to 48 hours: gpu, priority partitions
  • Jobs that cannot be preempted, and require more than 48 hours: additionally specify --qos=long
stylus_note
Use checkpointing in the gpu-preempt partition for your job needs more than 2 hours.

For an informative list of partitions on Unity, see Unity Partitions. For a quick guide to for how many GPUs are in each partition, see the GPU Summary table.

GPU-enabled software

The following sections include useful information about GPU-enabled software, such as CUDA, cuDNN, and OpenMPI. Some software, such as TensorFlow, requires setting up the environment in a specific way. The Set up a Tensorflow GPU environment section will guide you through how to use a conda environment to set up a TensorFlow GPU environment.

stylus_note
When using multiple GPUs on a node, be sure to use the Slurm variable SLURM_GPUS_ON_NODE to ensure that ALL allocated GPUs on a node can be accessed by your software. This is especially critical with software such as pytorch that allows you to specify the –nproc-per-node at runtime.

CUDA

CUDA is NVIDIA’s parallel computing platform. A version of CUDA will typically be required to be loaded for most GPU jobs because it allows access to the NVIDIA compiler suite (nvcc, nvfortran) and the NVIDIA GPU profiling tool (nsys).

Available versions of cuda can be be listed using module spider cuda

stylus_note
Be sure to check which version(s) of cuda are compatible with the software that is being used.

cuDNN

cuDNN is the Cuda Deep Neural Network library, often used to accelerate deep learning frameworks in Keras, PyTorch, TensorFlow, and others.

OpenMPI

OpenMPI includes the OpenMPI compilers for MPI compiled against the cuda compilers. OpenMPI is necessary to use if software that uses both MPI and GPU acceleration.

Many programming languages are able to use one or more GPUs, including:

  • Python
  • Matlab
  • Julia
  • C++ (using Cuda or OpenACC)
  • Fortran (using Cuda or OpenACC)
  • C (using Cuda or OpenACC)

Set up a TensorFlow GPU environment

Some software, especially with python, requires setting up the environment in a specific way. For python programs that can use GPU, such as TensorFlow, it is best to use a conda environment.

warning
See our Conda documentation on how to change where the environment is built. This is important because TensorFlow environments are large.

The following steps will show you how to set up a conda environment for TensorFlow:

  1. Request an interactive session with a GPU node using the following command:

    srun -t 01:00:00 -p gpu-preempt --gpus=1 --mem=16G --pty /bin/bash
    
  2. Load modules using the following commands:

    module load conda/latest
    module load cuda/12.6
    module load cudnn/8.9.7.29-12-cuda12.6
    
  3. Create and activate the environment using the following commands:

    conda create --name TensorFlow-env python=3.9
    
    conda activate TensorFlow-env
    pip install TensorFlow
    pip install tensorrt
    conda install ipykernel
    
stylus_note
  • TensorFlow 2 requires a python version of at least 3.9
  • If you do not request enough memory, TensorRT will fail to install.
  1. Add the environment to Jupyter using the following command:

    python -m ipykernel install --user --name TensorFlow-env --display-name="TensorFlow-Env"
    

    A new kernel with the name TensorFlow-Env appears with new Open OnDemand sessions.

lightbulb
A similar procedure can be used for PyTorch. See the PyTorch docs for the correct pip invocation. Always use the latest CUDA version.

Track GPU power usage

To track the power usage of GPUs being used by your jobs, use the following command:

nvidia-smi --query-gpu=power.draw --format=csv --loop-ms=100

Troubleshoot problems with GPUs

The first two options require connecting to the node(s) on which your job is running. See Monitoring a batch job for details on how to do that.

Nvidia-smi

To view ongoing GPU processes, use nvidia-smi pmon. This command can run on any GPU node without needing to install any additional software.

If you are getting error messages, add the following command to your scripts to find out which GPU is being used:

nvidia-smi -L

Nvitop

Nvitop offers an interactive view of ongoing process for NVIDIA GPUs. It is available in the default PATH for x86 architecture.

stylus_note
If installing nvitop locally for power9 or arm, it is highly recommended to use an isolated conda environment or python virtual environment.

To install nvitop via pip, use the command pip install nvitop. Once nvitop is installed, it can be used either as a standalone process or within a python script for a more detailed analysis.

Common errors

Out of Memory: (ex. CUDA_ERROR_OUT_OF_MEMORY, torch.cuda.OutOfMemory) means that a GPU with more available VRAM may be necessary, or that the code being run should be modified to reduce the memory usage. For machine learning models running out of memory, try reducing the batch size or ensuring your data management is optimized. For other software, check the documentation on controlling GPU memory usage.

Last modified: Monday, April 28, 2025 at 9:43 AM. See the commit on GitLab.
University of Massachusetts Amherst University of Massachusetts Amherst University of Rhode Island University of Rhode Island University of Massachusetts Dartmouth University of Massachusetts Dartmouth University of Massachusetts Lowell University of Massachusetts Lowell University of Massachusetts Boston University of Massachusetts Boston Mount Holyoke College Mount Holyoke College Smith College Smith College
search
close