Slurm Commands
- Job Submission
- Job script skeleton
- Job Cancellation
- Job Monitoring
- Job Efficiency
- Basic Job Parameters
- Submitting with a project
- Submitting a GPU job
- Submitting to specific partition
- BeeOND (BeeGFS On-Demand)
All Commands have been anonymized to the best technical capabilities of the system, while still maintaining the expected features from Slurm intact. Users should therefore not be able to see external jobs, users, or projects.
We strongly recommend users to use a batch script for final computations and submit them with sbatch <BATCH_SCRIPT> <OPTIONAL_ARGUMENTS>
.
A batch job using a script called "jobscript.sh" that runs your program can be submitted like this:
sbatch jobscript.sh -n 1
You can use this simple skeleton as a start for your jobscript. The script starts with a zsh shebang (we only support zsh).
#!/usr/bin/zsh
### SBATCH Section
# Slurm arguments need to be at the beginning of the jobscript
#SBATCH -n 1
### Your Program Section
# Your program goes here, in the second part of the jobscript
srun hostname
After submitting your job script, you will obtain a job id from Slurm. This job id is important to identify your batch job, cancel it, and for support tickets.
To cancel your batch job any time, you can use the command scancel <JOBID>
scancel 12345678
Please Note: The commands squeue
, sinfo
and spart
must not be used for high frequency monitoring of jobs. If you want to use these commands in a periodic manner, please execute them at most once every 2 minutes.The command squeue --me
displays your running and pending jobs. You can also see your jobs with:
squeue -u $USER
To show the expected start time (estimate not guaranteed):
squeue --me --start
To see all the available partitions (detailed information here):
spart
QUEU STA FREE TOTAL RESORC OTHER FREE TOTAL || DEFMEM CORES NODE
PARTITIO TUS CORES CORES PENDNG PENDNG NODES NODES || GB/CPU /NODE MEM-GB
c18m 2737 59520 1104 2574 20 1240 || 3 48 187
c18m_low 2737 59520 336 96 20 1240 || 3 48 187
c18m_verylow 2737 59520 0 0 20 1240 || 3 48 187
c18g 1330 2592 40 8641 8 54 || 3 48 187
c18g_low 1330 2592 0 0 8 54 || 3 48 187
c18g_verylow 1330 2592 0 0 8 54 || 3 48 187
dgx2 54 96 0 0 0 2 || 24 48 1508
dgx2_low 54 96 0 0 0 2 || 24 48 1508
dgx2_verylow 54 96 0 0 0 2 || 24 48 1508
ih 2898 3360 0 0 66 83 || 1 24 61
YOUR PEND PEND YOUR DEFAULT MAXIMUM
RUN RES OTHR TOTL JOB-TIME JOB-TIME
COMMON VALUES: 0 0 0 0 15 mins 30 days
After a job has finished, you can get a job efficiency report. This provides information about CPU Efficiency and Memory Efficiency. The command is:
seff JOBID
which shows the output
Job ID: 12345678
Cluster: rcc
User/Group: ab123456/ab123456
State: CANCELLED (exit code 0)
Nodes: 1
Cores per node: 48
CPU Utilized: 10:04:49
CPU Efficiency: 34.81% of 1-04:57:36 core-walltime
Job Wall-clock time: 00:36:12
Memory Utilized: 164.03 GB
Memory Efficiency: 89.73% of 182.81 GB
Short and long parameters:
Please consider the following example for arguments that can be used:
-c <numcpus>
is the shortform--cpus-per-task=<numcpus>
is the long form
Please remark that the shortform expects a blank after the parameter while the long form expects a '='.
They are both used as optional arguments on the sbatch command or as part of job scripts.
Parameters first
The Slurm parser stops parsing the #SBATCH
directives if it hits a line with a 'normal' command, i.e. a line that isn't empty or starts with #
. All following parameters will be completely ignored.
Example:
#!/usr/bin/zsh
ls
#SBATCH -J "my jobname"
The jobname will not be set in this case. Should you experience batch arguments being ignored by Slurm, please also double-check for spelling mistakes in both your parameters and arguments. Coming across expressions like --accnt=rwth0000
or --output=namewith blanks.%J
will cause the Slurm parser to terminate and ignore subsequent #SBATCH
directives. Be advised that indentation of #SBATCH
directives is not supported.
Slots
-c, --cpus-per-task=<numcpus>
for OpenMP/Hybrid-n, --ntasks=<numtasks>
for Processes/MPI--ntasks-per-node=<numtasks>
-N, --nodes=<numnodes>
Job Name
Please avoid using special characters, only alphanumeric are recommended! You can use special Slurm flags like %J for the Job ID-J --job-name=<jobname>
Memory
--mem-per-cpu=<size>
Memory needed per allocated CPU, which can be more than the ordered tasks (hybrid jobs!) # PLEASE DO NOT USE THE FOLLOWING OPTION:--mem=<size>
memory needed per NODE
Output Files
-o, --output=<filename>
Do not use ~ or variables like $HOME as path of to output files! use explicit names /home/ab123456-e, --error=<filename>
We do not recommend to use this, analyzing problems is easier, if STDERR is merged into STDOUT
Wall Clock Limit
-t, --time=d-hh:mm:ss
-A, --account=<projectname>
Submit your job for project <projectname>
. This additionally chooses the default partition for you.
# request one gpu per node
- --gres=gpu:<type>:1
# request two gpus per node
- --gres=gpu:<type>:2
# request two volta gpus (CLAIX18)
- --gres=gpu:volta:2
The right partition will be chosen for you, you do not need to request a partition.
Submitting to specific partition
In general it is not required or recommended to submit a job to a specific partition, since the selection is driven by the project and/or the specified job requirements. However, in some cases (e.g., performance analysis of specifice hardware) it might be relevant.
# select a partition
-p <partition>
A list of partitions can be found here or you can use the sinfo
or r_wlm_usage -p <project> -q
.
All our compute node have local SSDs integrated. In contrast to network file systems like $HOME
, $WORK
or the parallel file system $HPCWORK
these SSDs are local devices. Thus, a job making use of them might benefit from this local devices in terms of performance. Furthermore, the performance might be much better for jobs using many small files compared to the performance on $HPCWORK
.
In order to make use of these local SSDs in multiple node jobs you can use BeeGFS On-Demand (BeeOND) by adding the following line your batch script:
#SBATCH --beeond
Please Note: The job will become exclusive. This means that all cores of every allocated node will be allocated to this job. Therefore increasing your consumption of corehours.
This will set up a shared, temporary (!) BeeGFS file system across all nodes allocated for your job, which means the data can be accessed by each process (e.g., MPI rank) from any node involved. You can access the file system using the path stored in the environment variable $BEEOND
.
Please Note: It is a temporary file system which only lives as long as your batch job is running. All data will be deleted in the epilog of the batch job. Thus, you have to copy back all relevant data to $HOME
, $WORK
or $HPCWORK
in your batch job.
beegfs-ctl --setpattern --numtargets=16 --chunksize=1m
The numtargets must not be higher than the amount of involved compute nodes.
A typical workflow for you job might be as following:
- Request a BeeOND file system by adding
#SBATCH --beeond
to your batch script. - Copy the required raw data to
$BEEOND
. - Change the directory to the corresponding directory (e.g.,
cd $BEEOND/yourdata
). - Start the preprocessing of your job (e.g., the domain decomposition).
- Start your application.
- Copy back all relevant (and only the relevant!) data/results. This is very important, because you cannot access the data after the job or from any other not (e.g. a login node).
Further Information