GPU and MPI Slurm Jobs
Here you will find more example job scripts for the Slurm batch system.
All examples run a maximum of 15 minutes.
Copy the contents of any example to a new file in the HPC file systems.
You can then run the example with sbatch filename.sh
.
Serial
A sample program hostname
that does not use MPI:
#!/usr/bin/zsh
### Maximum runtime before programs are killed by Slurm
### Your programs should end before this!
#SBATCH --time=00:15:00
### Name the job
#SBATCH --job-name=SERIAL_JOB
### Declare the merged STDOUT/STDERR file
#SBATCH --output=output.%J.txt
### Begin of executable commands
hostname
OpenMP
A sample program
./a.out
that runs on 1 single node, only uses OpenMP multithreading and requires 8 CPUS:
#!/usr/bin/zsh
### Maximum runtime
#SBATCH --time=00:15:00
### Ask for 8 cpus (tasks in Slurm terms) in the same node
#SBATCH --ntasks=1 ### 8*1 = 8 CPUS
#SBATCH --cpus-per-task=8 ### 8*1 = 8 CPUS
#SBATCH --nodes=1
### Name the job
#SBATCH --job-name=OPENMP_JOB
### Declare an output STDOUT/STDERR file
#SBATCH --output=output.%J.txt
### Beginning of your programs
### Note: The OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK envvar is set automatically by us
### for you as a convenience. Expert users can modify it:
### OMP_NUM_THREADS=
./a.out
MPI
A sample program
./a.out
that uses MPI parallelism to run 8 Ranks on 8 CPUs on 2 nodes:
#!/usr/bin/zsh
### Maximum runtime
#SBATCH --time=00:15:00
### Ask for 8 Task/CPUs/MPI Ranks
#SBATCH --ntasks=8
### Ask for 1 node, use more nodes if you need them
#SBATCH --nodes=2
### Name the job
#SBATCH --job-name=MPI_JOB
### Declare the merged STDOUT/STDERR file
#SBATCH --output=output.%J.txt
### Beginning of executable commands
$MPIEXEC $FLAGS_MPI_BATCH ./a.out
Hybrid
This example illustrates a batch script for an MPI + OpenMP hybrid job, which uses 2 CLAIX-2018 nodes with 4 ranks, 2 ranks per node, 1 MPI rank per socket and 24 OMP threads per socket. This uses all 48 cores per node.
#!/usr/bin/zsh
### Maximum runtime
#SBATCH --time=00:15:00
### Ask for four tasks (which are 4 MPI ranks)
#SBATCH --ntasks=4
### Ask for 24 threads per task=MPI rank (which is 1 thread per core on one socket on CLAIX18)
#SBATCH --cpus-per-task=24
### Name the job
#SBATCH --job-name=HYBRID_JOB
### Declare the merged STDOUT/STDERR file
#SBATCH --output=output.%J.txt
### Beginning of executable commands
### Note: the OMP_NUM_THREADS envvar is set automatically - do not overwrite!
$MPIEXEC $FLAGS_MPI_BATCH ./a.out
Simple GPU Example
Please Note: Loading a CUDA module may require loading additional modules. Check the output of module spider CUDA/<version>
for information.
Run
deviceQuery
(from NVIDIA SDK) on one device:
#!/usr/bin/zsh
### Maximum runtime
#SBATCH --time=00:15:00
#SBATCH -J gpu_serial
#SBATCH -o gpu_serial.%J.log
#SBATCH --gres=gpu:1
module load CUDA
# Print some debug information
echo; export; echo; nvidia-smi; echo
$CUDA_ROOT/extras/demo_suite/deviceQuery -noprompt
MPI + GPU
To run an MPI application on the GPU nodes, you need to take special care to correctly set the numer of MPI ranks per node. Typical setups are:
One process per node (ppn):
If your process uses both GPUs at the same time, e.g. via
cudaSetDevice
or by acceptingCUDA_VISIBLE_DEVICES
(set automatically by the batch system). Let./cuda-mpi
be the path to your MPI-compatible CUDA program:#!/usr/bin/zsh ### Maximum runtime #SBATCH --time=00:15:00 ### Setup in this script: ### - 4 nodes (c18g) ### - 1 rank per node ### - 2 GPUs per rank (= both GPUs from the node) #SBATCH -J 4-1-2 #SBATCH -o 4-1-2.%J.log #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=1 #SBATCH --gres=gpu:2 module load CUDA # Print some debug information echo; export; echo; nvidia-smi; echo $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
two processes per node (ppn):
If each process communicates to its own single GPU and thus using both GPUs on a node (recommended setup). Let
./cuda-mpi
be the path to your MPI-compatible CUDA program:#!/usr/bin/zsh ### Maximum runtime #SBATCH --time=00:15:00 ### Setup in this script: ### - 2 nodes (c18g, default) ### - 2 ranks per node ### - 1 GPU per rank (= both GPUs from the node) #SBATCH -J 2-2-1 #SBATCH -o 2-2-1.%J.log #SBATCH --ntasks=4 #SBATCH --ntasks-per-node=2 #SBATCH --gres=gpu:2 module load CUDA #print some debug informations... echo; export; echo; nvidia-smi; echo $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
More than 2 processes per node:
If you also have processes that do computation on the CPU only.
Hybrid toy example for download
Please find a hybrid Fortran toy code, Makefile and Slurm job script for download here. You need to adjust your project account, your working directory and probably your job log file name in slurm.job.
Download
extract using the command
tar -xf hybrid-slurm-example.tar
edit and adjust the file
slurm.job
Compile with:
make compile
and submit with:
make submit
and check the job log file after the job has terminated.