IT Center Help

Sie befinden sich im Service: RWTH High Performance Computing (Linux)

GPU Batch Slurm Jobs

Detailinformation

Please Note

Loading a CUDA module may require loading additional modules. Check the output of module spider CUDA/<version> for information.

Simple GPU Example

Run deviceQuery (from NVIDIA SDK) on one device:

#!/usr/bin/zsh

#SBATCH -J gpu_serial
#SBATCH -o gpu_serial.%J.log
#SBATCH --gres=gpu:1

module load CUDA

# Print some debug information
echo; export; echo; nvidia-smi; echo

$CUDA_ROOT/extras/demo_suite/deviceQuery -noprompt

MPI + GPU

To run an MPI application on the GPU nodes, you need to take special care to correctly set the numer of MPI ranks per node. Typical setups are:

One process per node (ppn):

If your process uses both GPUs at the same time, e.g. via cudaSetDevice or by accepting CUDA_VISIBLE_DEVICES (set automatically by the batch system). Let ./cuda-mpi be the path to your MPI-compatible CUDA program:

#!/usr/bin/zsh

### Setup in this script:
### - 4 nodes (c18g)
### - 1 rank per node
### - 2 GPUs per rank (= both GPUs from the node)
#SBATCH -J 4-1-2
#SBATCH -o 4-1-2.%J.log
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=1
#SBATCH --gres=gpu:2

module load CUDA

# Print some debug information
echo; export; echo; nvidia-smi; echo

$MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi

two processes per node (ppn):

If each process communicates to its own single GPU and thus using both GPUs on a node (recommended setup). Let ./cuda-mpi be the path to your MPI-compatible CUDA program:

#!/usr/bin/zsh

### Setup in this script:
### - 2 nodes (c18g, default)
### - 2 ranks per node
### - 1 GPU per rank (= both GPUs from the node)

#SBATCH -J 2-2-1
#SBATCH -o 2-2-1.%J.log
#SBATCH --ntasks=4
#SBATCH --ntasks-per-node=2
#SBATCH --gres=gpu:2

module load CUDA

#print some debug informations...
echo; export; echo; nvidia-smi; echo

$MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi

More than 2 processes per node:
If you also have processes that do computation on the CPU only.

Zusatzinformation

Information about the accounting of Slurm can be found in Slurm Accounting.

Information about Submitting a GPU job can be found in Submitting a GPU job.

zuletzt geändert am 20.10.2023

Wie hat Ihnen dieser Inhalt geholfen?

Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz