GPU Batch Slurm Jobs
Please Note
module spider CUDA/<version>
for information.Simple GPU Example
Run deviceQuery
(from NVIDIA SDK) on one device:
#!/usr/bin/zsh
#SBATCH -J gpu_serial
#SBATCH -o gpu_serial.%J.log
#SBATCH --gres=gpu:1
module load CUDA
# Print some debug information
echo; export; echo; nvidia-smi; echo
$CUDA_ROOT/extras/demo_suite/deviceQuery -noprompt
MPI + GPU
To run an MPI application on the GPU nodes, you need to take special care to correctly set the numer of MPI ranks per node. Typical setups are:
One process per node (ppn):
If your process uses both GPUs at the same time, e.g. via
cudaSetDevice
or by acceptingCUDA_VISIBLE_DEVICES
(set automatically by the batch system). Let./cuda-mpi
be the path to your MPI-compatible CUDA program:#!/usr/bin/zsh ### Setup in this script: ### - 4 nodes (c18g) ### - 1 rank per node ### - 2 GPUs per rank (= both GPUs from the node) #SBATCH -J 4-1-2 #SBATCH -o 4-1-2.%J.log #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=1 #SBATCH --gres=gpu:2 module load CUDA # Print some debug information echo; export; echo; nvidia-smi; echo $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
two processes per node (ppn):
If each process communicates to its own single GPU and thus using both GPUs on a node (recommended setup). Let
./cuda-mpi
be the path to your MPI-compatible CUDA program:#!/usr/bin/zsh ### Setup in this script: ### - 2 nodes (c18g, default) ### - 2 ranks per node ### - 1 GPU per rank (= both GPUs from the node) #SBATCH -J 2-2-1 #SBATCH -o 2-2-1.%J.log #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=2 #SBATCH --gres=gpu:2 module load CUDA #print some debug informations... echo; export; echo; nvidia-smi; echo $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
More than 2 processes per node:
If you also have processes that do computation on the CPU only.
Information about the accounting of Slurm can be found in Slurm Accounting.
Information about Submitting a GPU job can be found in Submitting a GPU job.