CUDA
The Nvidia CUDA Toolkit provides support for CUDA C.
Depending on the version, you might have to load additional modules until you can load CUDA:
module load CUDA/11.8.0
Available CUDA versions can be listed with module spider CUDA
. Specifying a version will list the needed modules: module spider CUDA/11.8.0
Table of Contents
Loading the module will prepend to your $PATH
and $LD_LIBRARY_PATH
variables the correct locations and furthermore will provide the variable $CUDA_ROOT
, which contains the root directory of the loaded toolkit installation. The currently loaded version can also be obtained by nvcc --version
. Documentation is no longer available under $CUDA_ROOT/doc
. Instead please visit https://docs.nvidia.com/cuda
You can compile and link e.g. your CUDA C program called pi.cu by:
nvcc pi.cu
If you want to set a certain computate capability, you can use the corresponding compiler flag. For example, to set compute capability 6.0 do:
nvcc -arch=sm_60 pi.cu
Combining MPI and CUDA is a possibility to scale over several nodes. For instance, you can run one MPI process per machine while each process uses one (or two if available) GPUs. Another possible scenario for dual-GPU machines is to specify that there should be two MPI processes per node and each of them uses one GPU.
To use CUDA with MPI, our recommendation is to compile your CUDA code with nvcc
and then link it with $MPICC
or $MPICXX
by explicitly specifying the CUDA libraries. For example:
nvcc -arch=sm_60 -m64 -c foo.cu -o foo.o
$MPICXX -c bar.cpp -o bar.o
$MPICXX foo.o bar.o -o foobar.exe -L$CUDA_ROOT/lib64 -lcudart
Run your program with $MPIEXEC
.
Usually nvcc
uses the GNU compiler (check with nvcc -v
or nvcc -dryrun
). But you can also use other compilers in combination with nvcc
. For instance, to make it work with the Intel compiler, use:
nvcc -ccbin [<path to intel compiler>/]icc ...
The NVIDIA GPU Computing SDK provides a lot of examples in CUDA C. They can be used to verify the correct setup of the GPU (see examples deviceQuery and bandwithTest), as a starting point for your own application and to give you the idea of how to implement certain algorithm on a GPU. You can find the how-to of the SDK here.
Please refer to PGI's webpages https://www.pgroup.com/resources/cudafortran.htm