You are located in service: RWTH High Performance Computing (Linux)

GPU and MPI Slurm Jobs

GPU and MPI Slurm Jobs

Kurzinformation

Here you will find more example job scripts for the Slurm batch system.


Detailinformation

All examples run a maximum of 15 minutes.
Copy the contents of any example to a new file in the HPC file systems.
You can then run the example with sbatch filename.sh .

Serial

A sample program hostname that does not use MPI:

#!/usr/bin/zsh

### Maximum runtime before programs are killed by Slurm
### Your programs should end before this!
#SBATCH --time=00:15:00

### Name the job
#SBATCH --job-name=SERIAL_JOB

### Declare the merged STDOUT/STDERR file
#SBATCH --output=output.%J.txt

### Begin of executable commands
hostname

OpenMP

A sample program ./a.out that runs on 1 single node, only uses OpenMP multithreading and requires 8 CPUS:


#!/usr/bin/zsh

### Maximum runtime
#SBATCH --time=00:15:00

### Ask for 8 cpus (tasks in Slurm terms) in the same node
#SBATCH --ntasks=1         ### 8*1 = 8 CPUS
#SBATCH --cpus-per-task=8  ### 8*1 = 8 CPUS
#SBATCH --nodes=1

### Name the job
#SBATCH --job-name=OPENMP_JOB

### Declare an output STDOUT/STDERR file
#SBATCH --output=output.%J.txt

### Beginning of your programs
### Note: The OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK envvar is set automatically by us
###       for you as a convenience. Expert users can modify it:
### OMP_NUM_THREADS=

./a.out

MPI

A sample program ./a.out that uses MPI parallelism to run 8 Ranks on 8 CPUs on 2 nodes:


#!/usr/bin/zsh

### Maximum runtime
#SBATCH --time=00:15:00

### Ask for 8 Task/CPUs/MPI Ranks
#SBATCH --ntasks=8

### Ask for 1 node, use more nodes if you need them
#SBATCH --nodes=2 

### Name the job
#SBATCH --job-name=MPI_JOB

### Declare the merged STDOUT/STDERR file
#SBATCH --output=output.%J.txt

### Beginning of executable commands
$MPIEXEC $FLAGS_MPI_BATCH ./a.out

Hybrid

This example illustrates a batch script for an MPI + OpenMP hybrid job, which uses 2 CLAIX-2018 nodes with 4 ranks, 2 ranks per node, 1 MPI rank per socket and 24 OMP threads per socket. This uses all 48 cores per node.


#!/usr/bin/zsh

### Maximum runtime
#SBATCH --time=00:15:00

### Ask for four tasks (which are 4 MPI ranks)
#SBATCH --ntasks=4

### Ask for 24 threads per task=MPI rank (which is 1 thread per core on one socket on CLAIX18)
#SBATCH --cpus-per-task=24

### Name the job
#SBATCH --job-name=HYBRID_JOB

### Declare the merged STDOUT/STDERR file
#SBATCH --output=output.%J.txt

### Beginning of executable commands
### Note: the OMP_NUM_THREADS envvar is set automatically - do not overwrite!

$MPIEXEC $FLAGS_MPI_BATCH ./a.out

Simple GPU Example

Please Note: Loading a CUDA module may require loading additional modules. Check the output of module spider CUDA/<version> for information.

Run deviceQuery (from NVIDIA SDK) on one device:


#!/usr/bin/zsh

### Maximum runtime
#SBATCH --time=00:15:00

#SBATCH -J gpu_serial
#SBATCH -o gpu_serial.%J.log
#SBATCH --gres=gpu:1

module load CUDA

# Print some debug information
echo; export; echo; nvidia-smi; echo

$CUDA_ROOT/extras/demo_suite/deviceQuery -noprompt

MPI + GPU

To run an MPI application on the GPU nodes, you need to take special care to correctly set the numer of MPI ranks per node. Typical setups are:

  1. One process per node (ppn):

    If your process uses both GPUs at the same time, e.g. via cudaSetDevice or by accepting CUDA_VISIBLE_DEVICES (set automatically by the batch system). Let ./cuda-mpi be the path to your MPI-compatible CUDA program:

    
    #!/usr/bin/zsh
    
    ### Maximum runtime
    #SBATCH --time=00:15:00
    
    ### Setup in this script:
    ### - 4 nodes (c18g)
    ### - 1 rank per node
    ### - 2 GPUs per rank (= both GPUs from the node)
    #SBATCH -J 4-1-2
    #SBATCH -o 4-1-2.%J.log
    #SBATCH --ntasks=4
    #SBATCH --ntasks-per-node=1
    #SBATCH --gres=gpu:2
    
    module load CUDA
    
    # Print some debug information
    echo; export; echo; nvidia-smi; echo
    
    $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
    
  2. two processes per node (ppn):

    If each process communicates to its own single GPU and thus using both GPUs on a node (recommended setup). Let ./cuda-mpi be the path to your MPI-compatible CUDA program:

    
    #!/usr/bin/zsh
    
    ### Maximum runtime
    #SBATCH --time=00:15:00
    
    ### Setup in this script:
    ### - 2 nodes (c18g, default)
    ### - 2 ranks per node
    ### - 1 GPU per rank (= both GPUs from the node)
    
    #SBATCH -J 2-2-1
    #SBATCH -o 2-2-1.%J.log
    #SBATCH --ntasks=4
    #SBATCH --ntasks-per-node=2
    #SBATCH --gres=gpu:2
    
    module load CUDA
    
    #print some debug informations...
    echo; export; echo; nvidia-smi; echo
    
    $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
    
  3. More than 2 processes per node:

    If you also have processes that do computation on the CPU only.


 Zusatzinformation

Hybrid toy example for download

Please find a hybrid Fortran toy code, Makefile and Slurm job script for download here. You need to adjust your project account, your working directory and probably your job log file name in slurm.job.

Download

hybrid-slurm-example.tar

extract using the command


tar -xf hybrid-slurm-example.tar

edit and adjust the file slurm.job

Compile with: make compile

and submit with: make submit

and check the job log file after the job has terminated.

 

last changed on 08/07/2024

How did this content help you?

Creative Commons Lizenzvertrag
This work is licensed under a Creative Commons Attribution - Share Alike 3.0 Germany License