Sie befinden sich im Service: RWTH High Performance Computing (Linux)

What is Slurm?

What is Slurm?

Slurm as a Workload Manager and Scheduler

Slurm is a resource allocator that assigns users to compute nodes for a period of time to run computations. Multiple user programs can queue and run at the same time; Slurm takes care of scheduling and running them on their own resources.

  • Users can interact with Slurm from within the cluster, usually through the login nodes.
  • Users request cores, main memory, time and then send their programs to Slurm to be queued.
  • Slurm reserves resources (CPU for traditional HPC with MPI and GPUs for ML) and waits for the turn in the waiting queue.
  • After a waiting period, the resources are allocated and the program is run on compute nodes.

The following diagram shows a simplified view of how Slurm manages Jobs on the CLAIX Compute Cluster:

In the previous diagram a user access a login node, then creates a Slurm Job, sends it to the Slurm Queue and waits for the resources to be allocated, with the computations following after that.

 

zuletzt geändert am 23.02.2024

Wie hat Ihnen dieser Inhalt geholfen?

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz