You are located in service: RWTH High Performance Computing (Linux)

Some SLURM Commands

Some SLURM Commands



Job Submission

To run your programs you need to create a batch script and use the command-line to submit it to SLURM with sbatch.This is what is called a batch job.  We strongly recommend to not define your batch job using the command-line or use sacct, but to use a batch script instead and sbatch. Support is much more difficult and time consuming without batch script to debug, so issues take longer to solve.

A batch job using a script called "" that runs your program can be submitted like this:

$ sbatch <

Job script skeleton

You can use this simple skeleton as a start for your jobscript. The script starts with a zsh shebang.

### SBATCH Section
#directives need to be in the beginning of the jobscript
### Your Program Section
#goes here, the second part of the jobscript


Please note that SBATCH commands should not be mixed with your own code or programs! 

Please remark, that the aforementioned SHEBANG (first! line of the jobscript) is the only supported shebang. If you have a problem with your script, which cannot be reproduced with the ZSH, don't expect to get a thorough analysis of the problem.

After submitting your job script, you will obtain a job id from Slurm. This job id is important to identify your batch job, cancel it, and for support tickets.

Job Cancellation

To cancel your batch job any time, you can use the command scancel:

$ scancel <jobid>

Job Monitoring

Please remark, that the commands 'squeue', 'sinfo' and 'spart' must not be used for high-frequency monitoring of the batch system state. If you want to use these commands in a periodic manner, please execute them *at most* once every 10 seconds.

The command squeue displays your running and pending jobs. You can see a list of your own queued and running jobs like this:

$ squeue -u $USER

Please use

$ squeue - u $USER --start

to show the expected start time and likely resources to be allocated to your pending jobs. This start time is only an estimate, is not guaranteed and might be change due to higher priority jobs or job backfilling.

spart - shows user specific partition information with core count of available nodes and pending jobs. Please find detailed information here.

$ spart
          c18m                  5359     59520           242             873             1       1240 ||              3          48            187
  c18m_low                  5359     59520         2432                 0             1       1240 ||              3          48            187
          c18g                     738       2592         1416           7592             2           54 ||              3          48            187
  c18g_low                     738       2592               0                 0             2           54 ||              3          48            187
          c16m                  9240     14592               0            1200        385         608 ||              5          24            124
  c16m_low                  9240     14592               0                  0        385         608 ||              5          24            124
          c16s                    1152       1152               0                  0            8             8 ||              7        144          1020
  c16s_low                    1152       1152               0                  0            8             8 ||              7        144          1020
          c16g                     160         216               0                  0            6             9 ||              5          24            124
  c16g_low                     160         216               0                  0            6             9 ||              5          24            124
            ind                    1952       1952              0                  0           44           44 ||              1         40               61
              ih                  10414     14604        5313             1237        207         414 ||              1           1                0
                                   YOUR PEND PEND YOUR    DEFAULT    MAXIMUM
                                      RUN   RES  OTHR TOTL   JOB-TIME   JOB-TIME
   COMMON VALUES:    0         0        0       0          15 mins        30 days

Job Accounting

Batch jobs consume 'core-hours' from the resources defined in your batch script. The used core-hours are taken from your quota.

You should use r_wlm_usage to show your consumed quota.

The following is an example output for a fake project rwth1234 with "r_wlm_usage -p rwth1234 -q"

Account:                                          rwth1234
Type:                                                  rwth-m
Start of Accounting Period:             01.05.2018
End of Accounting Period:              01.05.2019
State of project:                                     active
Quota monthly (core-h):                       200000
Total quota (core-h):                         2.400 Mio
Remaining core-h of prev. month:                   0
Consumed core-h current month:                   0
Consumable core-h (%):                            200
Consumable core-h:                            200000
Default partition:                                    c18m
Allowed partitions:                         c18m,c18g
Max. allowed wallclocktime:               1.0 days

Please also see SLURM Accounting on how jobs are accounted with SLURM.

(r_wlm_usage  is the SLURM version of r_batch_usage; to see old LSF usage, please use r_batch_usage)

Partition State

Please remark, that the commands 'squeue', 'sinfo' and 'spart' must not be used for high-frequency monitoring of the batch system state. If you want to use these commands in a periodic manner, please execute them *at most* once every 10 seconds.

The command sinfo can give you information about the partitions:

$ sinfo -s

This shows you the available partitions, e.g.

c18m         up       5-00:00:00  289/398/345/1032  ncm[0001-1032]
c18g          up       5-00:00:00  3/45/0/48              ncg[01-48]

You see the partiton name, the state of the partition, the maximum wallclock time for a job in the partition, the state of the nodes and the nodelist. The nodestates are (in contrast to the manpage) (A)llocated/(I)dle/(O)ther/(T)otal. So, take the second column of the notestates into consideration if you are looking for free hosts.


Basic Job Parameters

short and long parameters

Each option can be specified using either a short or a long notation.
 Do not use both in one jobscript as this may lead to faulty behaviour!
Please consider the following example for illustration:

  • -c <numcpus> is the shortform
  • --cpus-per-task=<numcpus> is the long form

Please remark that the shortform expects a blank after the parameter while the long form expects a '='.

parameters first

The SLURM parser stops parsing the #SBATCH magic cookies if it hits a line with a 'normal' command, i.e. a line that doesn't start with these exact characters.
All following parameters will be completely ignored.


example script
#SBATCH -J "my jobname"
the jobname will NOT be set in this case.

Should you experience batch arguments being ignored by SLURM, please also double-check for spelling mistakes in both your parameters and arguments.
Coming across expressions like "--accnt=rwth0000" or "--output=namewith blanks.%J" will cause the SLURM parser to terminate and ignore subsequent #SBATCH statements.
Be advised that indentation of #SBATCH parameters is not supported (see above).
  • "Slots"

    -c, --cpus-per-task=<numcpus> for OpenMP/Hybrid -n, --ntasks=<numtasks> for Processes/MPI --ntasks-per-node=<numtasks> span[ptile=<numtasks>] -N, --nodes=<numnodes> span[hosts=<numnodes>]

  • Job Name

    -J --job-name=<jobname>

  • Memory

    --mem-per-cpu=<size> #memory needed per allocated CPU, which can be more than the ordered tasks (hybrid jobs!) # PLEASE DO NOT USE THE FOLLOWING OPTION: --mem=<size> memory needed per NODE

  • Output Files

    -o, --output=<filename> # -e, --error=<filename> # we do not recommend to use this, analyzing problems is easier, if STDERR is merged into STDOUT

  • Wall Clock Limit

    -t, --time=d-hh:mm:ss


Advanced job parameters

Submitting with a project

-A, --account=<projectname>

Submit your job for project <projectname>. This additionally chooses the default partition for you.

Submitting a GPU job

# request one gpu per node
# request two gpus per node
# request two volta gpus (CLAIX18)
# request two pascal gpus (CLAIX16)
# please note that a batch job ordering one GPU must be non-exclisive in order not to block the remaining GPU of the node

the right partition will be chosen for you, you do not need to request a partition.

Submitting to specific partition

In general it is not required or recommended to submit a job to a specific partition, since the selection is driven by the project and/or the specified job requirements. However, in some cases (e.g., performance analysis of specifice hardware) it might be relevant.

# select a partition
-p <partition>

A list of partitions can be found here or you can use the 

sinfo or r_wlm_usage -p <proj> -q

command as documented above.

BeeOND (BeeGFS On-Demand)

All our compute node have local SSDs integrated. In contrast to network file systems like $HOME, $WORK or the parallel file system $HPCWORK these SSDs are local devices. Thus, a job making use of them might benefit from this local devices in terms of performance. Furthermore, the performance might be much better for jobs using many small files compared to the performance on $HPCWORK.

In order to make use of these local SSDs in multiple node jobs you can use BeeGFS On-Demand (BeeOND) by adding the following line your submission

#SBATCH --beeond

This will set up a shared, temporary (!) BeeGFS file system across all nodes allocated for your job, which means the data can be accessed by each process (e.g., MPI rank) from any node involved. You can access the file system using the path stored in the environment variable $BEEOND. Please note: It is a temporary file system which only lives as long as your batch job is running. All data will be deleted in the epilog of the batch job. Thus, you have to copy back all relevant data to $HOME, $WORK or $HPCWORK in your batch job.

Since BeeGFS is a parallel file system you can influence the striping of the file, where a stripe of 1 means to keep local to the current node and a stripe of n means a distribution of the file to n nodes (i.e., n SSDs). The distribution is done with a specified chunk size. For instance, you can change the striping to 16 and the chunk size to 1 MB by using the following command:

$ beegfs-ctl --setpattern --numtargets=16 --chunksize=1m

The numtargets must not be higher than the amount of involved compute nodes.

A typical workflow for you job might be as following:

    1. Request a BeeOND file system by adding "#SBATCH --beeond" to your batch script.
    2. Copy the required raw data to $BEEOND.
    3. Change the directory to the corresponding directory (e.g., $BEEOND/yourdata).
    4. Start the preprocessing of your job (e.g., the domain decomposition).
    5. Start your application.
    6. Copy back all relevant (and only the relevant!) data/results. This is very important, because you cannot access the data after the job or from any other not (e.g. a dialog system).

--jump up--


Further Information

last changed on 11/17/2022

How did this content help you?

Creative Commons Lizenzvertrag
This work is licensed under a Creative Commons Attribution - Share Alike 3.0 Germany License