IT Center Help

You are located in service: RWTH High Performance Computing (Linux)

What is Slurm?

Slurm as a Workload Manager and Scheduler

Slurm is a resource allocator that assigns users to compute nodes for a period of time to run computations. Multiple user programs can queue and run at the same time; Slurm takes care of scheduling and running them on their own resources.

Users can interact with Slurm from within the cluster, usually through the login nodes.
Users request cores, main memory, time and then send their programs to Slurm to be queued.
Slurm reserves resources (CPU for traditional HPC with MPI and GPUs for ML) and waits for the turn in the waiting queue.
After a waiting period, the resources are allocated and the program is run on compute nodes.

The following diagram shows a simplified view of how Slurm manages Jobs on the CLAIX Compute Cluster:

In the previous diagram a user access a login node, then creates a Slurm Job, sends it to the Slurm Queue and waits for the resources to be allocated, with the computations following after that.

last changed on 02/23/2024

How did this content help you?

This work is licensed under a Creative Commons Attribution - Share Alike 3.0 Germany License