Using SRUN to Submit Jobs¶
Usually, if you have to run a single application multiple times, or if you are trying to run a non-interactive application, you should use sbatch instead of srun, since sbatch allows you to specify parameters in the file, and is non-blocking (see below).
SRUN is a so-called blocking command, as in it will not let you execute other commands until this command is finished (not necessarily the job, just the allocation). For example, if you run
srun /bin/hostname and resources are available right away, the job will be sent out and the result saved into a file. If resources are not available, you will be stuck in the command while you are pending in the queue.
Please note that like sbatch, you can run a batch file using srun.
The command syntax is
srun <options> [executable] <args>
Options is where you can specify the resources you want for the executable, or define. The following are some of the options available; to see all available parameters run
-c <num>Number of CPUs (threads) to allocate to the job per task
-n <num>The number of tasks to allocate (for MPI)
-G <num>Number of GPUs to allocate to the job
--mem <num>[K|M|G|T]Memory to allocate to the job (in MB by default)
-p <partition>Partition to submit the job to
To run an interactive job (in this case a bash prompt), the command might look like this (
--pty is the important option):
srun -c 6 -p cpu --pty bash
To run an application on the cluster that uses a GUI, you must use an interactive job, in addition to the
srun -c 6 -p cpu --pty --x11 xclock
You cannot run an interactive/gui job using the
sbatch command, you must use
Slurm can send you emails based on the status of your job via the
Common mail types are
BEGIN, END, FAIL, INVALID_DEPEND, and REQUEUE. See the sbatch man page
srun --mail-type=BEGIN hostname
#!/bin/bash #SBATCH --mail-type=BEGIN hostname
There is also the
Time Limit Email - Preventing Loss of Work¶
When your job reaches its time limit, it will be killed, even if it's 99% of the way through its task. Without checkpointing, all those CPU hours will be for nothing and you will have to schedule the job all over again.
One way to prevent this is to check on your job's output as it approaches its time limit. You can specify