You are located in service: RWTH High Performance Computing (Linux)

What is Slurm?

What is Slurm?

Slurm as a Workload Manager and Scheduler

Slurm is a resource allocator that assigns users to compute nodes for a period of time to run computations. Multiple user programs can queue and run at the same time; Slurm takes care of scheduling and running them on their own resources.

  • Users can interact with Slurm from within the cluster, usually through the login nodes.
  • Users request cores, main memory, time and then send their programs to Slurm to be queued.
  • Slurm reserves resources (CPU for traditional HPC with MPI and GPUs for ML) and waits for the turn in the waiting queue.
  • After a waiting period, the resources are allocated and the program is run on compute nodes.

The following diagram shows a simplified view of how Slurm manages Jobs on the CLAIX Compute Cluster:

In the previous diagram a user access a login node, then creates a Slurm Job, sends it to the Slurm Queue and waits for the resources to be allocated, with the computations following after that.

 

last changed on 02/23/2024

How did this content help you?

Creative Commons Lizenzvertrag
This work is licensed under a Creative Commons Attribution - Share Alike 3.0 Germany License