Job Management
This guide provides a brief reference to commonly used Slurm commands for job management. For a complete list of commands and their use, please refer to the Slurm reference and beginner's guide. You can also use the --help
option or check the man pages (man <COMMAND>
) for more information.
Please Note: The commands squeue
, sinfo
and spart
should not be used for high-frequency job monitoring. If you need to use these commands periodically, execute them at most once every 2 minutes.
Table of Content
- sbatch - Submit Jobs
- squeue - Display Job Queue
- scancel - Cancel Jobs
- salloc - Request an interactive Job
- sacct - Display Job Accounting and more
- sinfo - Display Information about the Cluster
- seff - Check Job Performance
sbatch <BATCH_SCRIPT> [ADDITIONAL_ARGUMENTS]
Submits a batch script to the Slurm queue and creates a job with an associated job ID. The batch script should specify the job parameters, such as the ressource requirements, and the commands to be executed on the computing nodes. The job parameters can also be suplied as command-line arguments. For guidance on writing batch scripts, see in this short tutorial.
Displays pending and running jobs in the Slurm queue. Completed or canceled jobs are removed from this list.
squeue --me
Shows all of your currently pending and running jobs.squeue --me --start
Here the estimated start time of pending jobs will be shown (estimate not guaranteed).squeue --Format=state -p c23g -h | sort | uniq -c
This shows the number of running and pending jobs on the Claix-2023 GPU partition.
Cancels one or more specified jobs. You can provide a comma or space-separated list of job IDs. Mind that by default this command does not provide a confirmation of cancellation.
scancel --me
Cancels all of your jobs.scancel -v <JOB_ID>
Cancels the job with the corresponding ID and provides details of the process.
Requests a computing node for interactive use. Job parameters are specified similarly to sbatch. Once the job starts running, you will be redirected to the head node with an interactive shell. Exiting the shell terminates the job.
salloc -p c23g --gres=gpu:1 -n 24 -t 1:00:00
This will reserve a GPU and 24 cores for one hour on a node in Claix-2023.
Display Job Accounting and more
Prints details about your pending, running and past jobs.
sacct -S $(date -I --date="yesterday")
Shows all jobs that have been submitted since yesterday.sacct -o JobName%15,JobID,AllocTres%70
Shows billing and allocated ressources.sacct -o JobName%15,JobID,WorkDir%70
Shows the working directories.sacct -o JobName%15,JobID,Start,End,Elapsed
Lists start, end and used time.
Display Information about the Cluster
sinfo [OPTIONS]By default, lists the partitions, node states and corresponding host names of nodes.
sinfo -O Partition,NodeAIOT,CPUsState:30
Shows the current load of the cluster. Mind that free nodes or cores may not be immediately available as they could be reserved for larger jobs. The output shows the number of (partially) allocated (A) and idle (I) Nodes, as well as those that are in other states (O), such as down. In addition, the total number of nodes (T) is displayed.
Provides information about the load on the cluster and the length of the Slurm queue.
Provides an efficiency report for completed jobs, including data on core and memory usage. For more detailed monitoring, consider using additional job monitoring tools.