The Nvidia CUDA Toolkit provides support for CUDA C.
Depending on the version, you might have to load additional modules until you can load CUDA:
module load CUDA/11.8.0
Available CUDA versions can be listed with
module spider CUDA. Specifying a version will list the needed modules:
module spider CUDA/11.8.0
Table of Contents
Loading the module will prepend to your
$LD_LIBRARY_PATH variables the correct locations and furthermore will provide the variable
$CUDA_ROOT, which contains the root directory of the loaded toolkit installation. The currently loaded version can also be obtained by
nvcc --version. Documentation is no longer available under
$CUDA_ROOT/doc. Instead please visit https://docs.nvidia.com/cuda
You can compile and link e.g. your CUDA C program called pi.cu by:
If you want to set a certain computate capability, you can use the corresponding compiler flag. For example, to set compute capability 6.0 do:
nvcc -arch=sm_60 pi.cu
Combining MPI and CUDA is a possibility to scale over several nodes. For instance, you can run one MPI process per machine while each process uses one (or two if available) GPUs. Another possible scenario for dual-GPU machines is to specify that there should be two MPI processes per node and each of them uses one GPU.
To use CUDA with MPI, our recommendation is to compile your CUDA code with
nvcc and then link it with
$MPICXX by explicitly specifying the CUDA libraries. For example:
nvcc -arch=sm_60 -m64 -c foo.cu -o foo.o $MPICXX -c bar.cpp -o bar.o $MPICXX foo.o bar.o -o foobar.exe -L$CUDA_ROOT/lib64 -lcudart
Run your program with
nvccuses the GNU compiler (check with
nvcc -v or
nvcc -dryrun). But you can also use other compilers in combination with
nvcc. For instance, to make it work with the Intel compiler, use:
nvcc -ccbin [<path to intel compiler>/]icc ...
The NVIDIA GPU Computing SDK provides a lot of examples in CUDA C. They can be used to verify the correct setup of the GPU (see examples deviceQuery and bandwithTest), as a starting point for your own application and to give you the idea of how to implement certain algorithm on a GPU. You can find the how-to of the SDK here.
Please refer to PGI's webpages https://www.pgroup.com/resources/cudafortran.htm