Unity
Unity
About
News
Events
Docs
Contact Us
code
search
login
Unity
Unity
About
News
Events
Docs
Contact Us
dark_mode
light_mode
code login
search

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
      • Gypsum
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • AlpacaFarm
      • audioset
      • bigcode
      • biomed_clip
      • blip_2
      • blip_2
      • coco
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DINO v2
      • epic-kitchens
      • florence
      • gemma
      • glm
      • gpt
      • gte-Qwen2
      • ibm-granite
      • Idefics2
      • Imagenet 1K
      • inaturalist
      • infly
      • instruct-blip
      • internLM
      • intfloat
      • LAION
      • lg
      • linq
      • llama
      • Llama2
      • llama3
      • llama4
      • Llava_OneVision
      • Lumina
      • mixtral
      • msmarco
      • natural-questions
      • objaverse
      • openai-whisper
      • phi
      • playgroundai
      • pythia
      • qwen
      • R1-1776
      • rag-sequence-nq
      • red-pajama-v2
      • s1-32B
      • satlas_pretrain
      • scalabilityai
      • sft
      • SlimPajama
      • t5
      • Tulu
      • V2X
      • video-MAE
      • videoMAE-v2
      • vit
      • wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • dfam
      • EggNOG
      • EggNOG
      • gmap
      • GMAP-GSNAP database (human genome)
      • GTDB
      • igenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • NCBI RefSeq database
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • params
      • PDB70
      • PDB70 for ColabFold
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

Documentation

  • Requesting An Account
  • Get Started
    • Quick Start
    • Common Terms
    • HPC Resources
    • Theory of HPC
      • Overview of threads, cores, and sockets in Slurm for HPC workflows
    • Git Guide
  • Connecting to Unity
    • SSH
    • Unity OnDemand
    • Connecting to Desktop VS Code
  • Get Help
    • Frequently Asked Questions
    • How to Ask for Help
    • Troubleshooting
  • Cluster Specifications
    • Node List
    • Partition List
      • Gypsum
    • Storage
    • Node Features (Constraints)
      • NVLink and NVSwitch
    • GPU Summary List
  • Managing Files
    • Command Line Interface (CLI)
    • Disk Quotas
    • FileZilla
    • Globus
    • Scratch: HPC Workspace
    • Unity OnDemand File Browser
  • Submitting Jobs
    • Batch Jobs
      • Array Batch Jobs
      • Large Job Counts
      • Monitor a batch job
    • Helper Scripts
    • Interactive CLI Jobs
    • Unity OnDemand
    • Message Passing Interface (MPI)
    • Slurm cheat sheet
  • Software Management
    • Building Software from Scratch
    • Conda
    • Modules
      • Module Usage
    • Renv
    • Unity OnDemand
      • JupyterLab OnDemand
    • Venv
  • Tools & Software
    • ColabFold
    • R
      • R Parallelization
    • Unity GPUs
  • Datasets
    • AI and ML
      • AlpacaFarm
      • audioset
      • bigcode
      • biomed_clip
      • blip_2
      • blip_2
      • coco
      • Code Llama
      • DeepAccident
      • DeepSeek
      • DINO v2
      • epic-kitchens
      • florence
      • gemma
      • glm
      • gpt
      • gte-Qwen2
      • ibm-granite
      • Idefics2
      • Imagenet 1K
      • inaturalist
      • infly
      • instruct-blip
      • internLM
      • intfloat
      • LAION
      • lg
      • linq
      • llama
      • Llama2
      • llama3
      • llama4
      • Llava_OneVision
      • Lumina
      • mixtral
      • msmarco
      • natural-questions
      • objaverse
      • openai-whisper
      • phi
      • playgroundai
      • pythia
      • qwen
      • R1-1776
      • rag-sequence-nq
      • red-pajama-v2
      • s1-32B
      • satlas_pretrain
      • scalabilityai
      • sft
      • SlimPajama
      • t5
      • Tulu
      • V2X
      • video-MAE
      • videoMAE-v2
      • vit
      • wildchat
    • Bioinformatics
      • AlphaFold3 Databases
      • BFD/MGnify
      • Big Fantastic Database
      • checkm
      • ColabFoldDB
      • dfam
      • EggNOG
      • EggNOG
      • gmap
      • GMAP-GSNAP database (human genome)
      • GTDB
      • igenomes
      • Kraken2
      • MGnify
      • NCBI BLAST databases
      • NCBI RefSeq database
      • NCBI RefSeq database
      • Parameters of Evolutionary Scale Modeling (ESM) models
      • params
      • PDB70
      • PDB70 for ColabFold
      • PINDER
      • PLINDER
      • Protein Data Bank
      • Protein Data Bank database in mmCIF format
      • Protein Data Bank database in SEQRES records
      • Tara Oceans 18S amplicon
      • Tara Oceans MATOU gene catalog
      • Tara Oceans MGT transcriptomes
      • Uniclust30
      • UniProtKB
      • UniRef100
      • UniRef30
      • UniRef90
      • Updated databases for ColabFold
    • Using HuggingFace Datasets

On this page

  • 0. Using this guide
  • 1. Understanding Your Computational Needs
  • 2. Choosing CPU Resources
  • 3. Choosing GPU Resources
  • 4. Resource Allocation
  • 5. Monitoring and Debugging
  • 6. Storage and Compute Environments
  • 7. Documentation and Support
  1. Unity
  2. Documentation
  3. Get Started
  4. Quick Start

Quick Start Guide

0. Using this guide

This guide frames some of the rest of the documentation. Although not necessary, you may want to go through these preliminaries:

  • Request your account
  • Connect to Unity
  • Install your software, but also see below.
  • Read some of the basics of writing batch job scripts, but finish the rest of this guide before submitting a job.

If you are unfamiliar with using the command line, there are some external resources that can help.

1. Understanding Your Computational Needs

  • Define Objectives:
    • Clearly understand your computational needs—whether it’s for large-scale simulations, data analysis, machine learning training, etc.
    • It is important that you understand whether your code can take advantage of parallelism on multiple CPUs, and/or on a single or multiple GPUs.
    • Only codes that have been explicitly written to run in parallel on multiple CPUs or multiple GPUs can take advantage of these resources. Be sure to check!
  • Scaling: Before launching production analyses for a parallel CPU or GPU code you should study how the execution time of your code scales with the allocated resources (number of CPU cores, number of GPUs). Some advice on doing so is available here.
  • Time to solution: Understand that the time to solution is the sum of the time your job is waiting in the queue plus the time it takes to run it. For a parallel code, allocating more CPU / GPU resources should speed up execution time if it scales well, but, at the same time, this generally increases the queue time, depending on how busy the cluster is. Aim for balance to reduce time to solution! See the figure below for a sketch.

Graphs showing the relationship between resources requested, queueing time, execution time, and total time

Time To Science

[Credit: https://researchcomputing.princeton.edu/support/knowledge-base/slurm]

2. Choosing CPU Resources

  • Serial jobs:
    • Code uses only a single CPU-core.
    • Opt for higher CPU clock speeds (GHz). E.g., use --prefer=x86_64_v4
  • Multi-threaded jobs:
    • Code uses multiple CPU-cores via a shared-memory parallel programming model (OpenMP, pthreads). For instance, the Python library NumPy uses this.
    • Empirically determine the optimal number of cores through a scaling analysis.
  • Multi-node jobs:
    • Code uses distributed-memory parallelism based on MPI (Message Passing Interface) and uses multiple CPU-cores on multiple nodes simultaneously.
    • Only codes explicitly written to run in parallel can utilize multiple cores across nodes.
    • You must empirically determine the optimal number of nodes and cores through a scaling analysis.
    • Your code is likely to benefit from a fast interconnect between nodes: if applicable, run on nodes with Infiniband by adding --constraint=ib to your batch script.
  • Memory Requirements:
    • Select CPU nodes according to the amount of RAM required. See the list of nodes. You can specify e.g. --mem=30G.

3. Choosing GPU Resources

  • GPU support:
    • Only codes which have been explicitly written to run on GPUs can take advantage of GPUs.
    • Submitting a CPU-only code to a GPU queue does not speed-up the execution time but it does waste GPU resources. It also increases your queue time and lowers the priority of your next job submission, making you and other users wait longer! Please don’t do this!
  • Multi-GPU jobs:
    • Many codes only use a single GPU. Please avoid requesting multiple GPUs unless you are certain that your code can use them efficiently!
    • Even if your code supports multiple GPUs, you must first conduct a scaling analysis to find the optimal number of GPUs to use.
  • GPU type selection:
    • Read through available GPU resources and find a suitable GPU type for your code based on compute capability and required GPU memory (VRAM). Note that not all GPU types are available in every partition.
    • Use unity-slurm-gpu-usage to see the available GPU resources on the cluster.
    • Select less powerful GPUs if possible (e.g. M40, A40, V100, L40S)!

4. Resource Allocation

  • Job Scheduling: Unity uses the SLURM job scheduler. Refer to the documentation here to efficiently manage and allocate resources. Clearly specify your resource requirements in your Slurm job batch scripts.
  • Resource Requests: Request the minimum necessary resources to avoid wasting compute power and ensure fair access for all users.
  • Selecting a appropriate partition for your job:
    • A list of partitions available on Unity is available here.
    • Use unity-slurm-partition-usage to see how busy partitions are.
    • Short: For jobs which are 2 days or less, specify --partition=cpu (or --partition=gpu if requesting a GPU)
    • Long: For jobs which are more than 2 days, specify -q long as well
    • Preempt: For jobs which require less than 2 hours, specify --partition=cpu-preempt (or --partition=gpu-preempt if requesting a GPU). If your jobs run longer than two hours, higher priority job may preempt (terminate and requeue) it after two hours.
    • QoS: For jobs shorter than 4 hours, you can boost the job priority for one small job by adding the parameter --qos=short to your job batch script or salloc / sbatch command. See this page for details.
    • Note that not all GPU types are available in every partition. Modify the suggestions for partitions as needed.
lightbulb
Priority Partitions
For users of Priority Partitions, you should specify those partitions in the --partition list, if appropriate, for example --partition=gpu,superpod-a100 or --partition=gpu,gpu-preempt,uri-gpu, etc. Note that the scheduler will prefer the priority partition over the general access partitions, and the general access over the preempt. See the Partition list for access requirements.

5. Monitoring and Debugging

  • Resource Utilization:
    • Monitor CPU/GPU usage, memory consumption, and job progress using cluster monitoring tools.
    • Use seff <job_id> to get statistics, including:
      • CPU efficiency: actual core time / (# of cores * run time)
      • Memory efficiency: percentage of requested memory used
      • Both efficiencies should ideally be close to 100%.
    • Use Unity tools to monitor resources:
      • E.g. echo gpu001 | unity-slurm-node-usage
    • Access a compute node to monitor memory, CPU and GPU usage for a running job:
      • You can log into a compute node via srun (see this page)
      • Use htop to monitor CPU and memory usage of the compute node; look at your processes, as you may be sharing the node with other users.
      • Use nvidia-smi or nvitop to monitor GPU compute and memory loads (see here)
  • Profile your code:
    • Profiling can identify bottlenecks in your code where code is inefficient and optimization is possible. Several tools are available for different programming languages.
    • Python: cProfile or line_profiler
    • Deep learning, CUDA: TensorFlow Profiler, PyTorch Profiler, CUDA profiling tools
    • C/C++/Fortran: gprof, Intel VTune

6. Storage and Compute Environments

  • Use appropriate storage:
    • Your /home directory is for configuration files and code installations. It has a set quota of 50G.
    • Write results from your codes to your subdirectory in /work. If you run out of disk space, discuss with your PI and use /scratch and /project as appropriate. Read about storage documentation.
  • Use conda environments:
    • Use conda environments when possible to install packages without having to request installation from a Unity administrator (make sure that the package that you want isn’t already available as a software module: search using module spider <package-name>)
    • The hidden .conda directory in your home directory can grow very large and exceed the quota on your /home directory. You can move it to your subdirectory in your PI’s /work directory and create a symbolic link (with ln -s) back to your home directory, or follow one of the suggestions on this page.

7. Documentation and Support

  • Cluster Documentation: Familiarize yourself with the cluster’s documentation for frequently asked questions, specific guidelines and best practices: TOC
  • Support Channels:
    • Join Slack: Unity Community Slack to chat with staff & other users and get quick help.
    • Join Unity office hours: Tue 2:30-4 PM EST/EDT: Zoom
lightbulb
How to Ask for Help
For a faster resolution, follow these guidelines on how to ask for help.
Last modified: Friday, March 14, 2025 at 2:20 PM. See the commit on GitLab.
University of Massachusetts Amherst University of Massachusetts Amherst University of Rhode Island University of Rhode Island University of Massachusetts Dartmouth University of Massachusetts Dartmouth University of Massachusetts Lowell University of Massachusetts Lowell University of Massachusetts Boston University of Massachusetts Boston Mount Holyoke College Mount Holyoke College Smith College Smith College
search
close