Sie befinden sich im Service: RWTH Compute Cluster Linux (HPC)

GPU Batch Mode

GPU Batch Mode

Detailinformation

Simple GPU Example

Run deviceQuery (from NVIDIA SDK) on one device:

gpu_batch_serial.sh
#!/usr/local_rwth/bin/zsh
 
#SBATCH -J gpu_serial
#SBATCH -o gpu_serial.%J.log
 
#SBATCH --gres=gpu:1
 
module load cuda
 
#print some debug informations...
echo; export; echo;  nvidia-smi; echo
 
$CUDA_ROOT/extras/demo_suite/deviceQuery -noprompt
 

MPI + GPU Jobs

To run an MPI application on the GPU nodes, you need to take special care to correctly set the numer of MPI ranks per node.  Typical setups are:

  1. one process per node (ppn):

    1. If you would like to use only one GPU per node (Why? For exclusive jobs you waste the ressources of second GPU, and even in non-exclusive case you should try to pack two of your MPI ranks on same node to reduce the amount of nodes); or
    2. If your process uses both GPUs at the same time, e.g. via cudaSetDevice or by accepting CUDA_VISIBLE_DEVICES (set automatically by the batch system):

      gpu_batch_4n_1r_2g.sh
      #!/usr/local_rwth/bin/zsh
       
      ### Setup in this script:
      ### - 4 nodes (c18g, default)
      ### - 1 rank per node
      ### - 2 GPUs per rank (= both GPUs from the node)
       
      #SBATCH -J 4-1-2
      #SBATCH -o 4-1-2.%J.log
      #SBATCH --ntasks=4
      #SBATCH --ntasks-per-node=1
      #SBATCH --gres=gpu:2
       
      module load cuda
       
      #print some debug informations...
      echo; export; echo;  nvidia-smi; echo
       
      $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi -g 2
  2. two processes per node (ppn):

    1. If each process communicates to a own single GPU and thus using both GPUs on a node (recommended setup):

       
      gpu_batch_2n_2r_1g.sh (start on Pascal GPUs)
      #!/usr/local_rwth/bin/zsh
       
      ### Setup in this script:
      ### - 2 nodes (c16g)
      ### - 2 ranks per node
      ### - 1 GPU per rank (= both GPUs from the node)
       
      #SBATCH -J 2-2-1
      #SBATCH -o 2-2-1.%J.log
       
      #SBATCH --ntasks=4
      #SBATCH --ntasks-per-node=2
      #SBATCH --gres=gpu:pascal:2
       
      module load cuda
       
      #print some debug informations...
      echo; export; echo;  nvidia-smi; echo
       
      $MPIEXEC $FLAGS_MPI_BATCH ./cuda-mpi
  3. More than 2 processes per node:

    1. If you also have processes that do computation on the CPU only.

Zusatzhinweis

Information about the accounting of SLURM can be found in SLURM Accounting.

Information about Submitting a GPU job can be found in Submitting a GPU job.

zuletzt geändert am 29.01.2021

Wie hat Ihnen dieser Inhalt geholfen?

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz