You are located in service: RWTH Compute Cluster Linux (HPC)




The Nvidia CUDA Toolkit provides support for CUDA C. To use it, you have to first load the corresponding module:

module load cuda[/version]



Loading the module will, for instance, set your PATH and LD_LIBRARY_PATH variables to the correct location and furthermore will provide the variable $CUDA_ROOT, which contains the root directory of the loaded toolkit installation. The currently loaded version can also be obtained by nvcc --version. Documentation can be found under $CUDA_ROOT/doc, e.g. the NVIDIA Programming Guide, the Best Practices Guide, Tuning Guide and more.

You can compile and link e.g. your CUDA C program called by:


If you want to set a certain computate capability, you can use the corresponding compiler flag. For example, to set compute capability 6.0 do:

nvcc -arch=sm_60



Combining MPI and CUDA is a possibility to scale over several nodes. For instance, you can run one MPI process per machine while each process uses one (or two if available) GPUs. Another possible scenario for dual-GPU machines is to specify that there should be two MPI processes per node and each of them uses one GPU.
To use CUDA with MPI, our recommendation is to compile your CUDA code with nvcc and then link it with $MPICC or $MPICXX by explicitly specifying the CUDA libraries. For example:

nvcc -arch=sm_60 -m64 -c -o foo.o
$MPICXX -c bar.cpp -o bar.o
$MPICXX foo.o bar.o -o foobar.exe -L$CUDA_ROOT/lib64 -lcudart

Run your program with $MPIEXEC.

Usually nvcc uses the GNU compiler (check with nvcc -v or nvcc -dryrun ). But you can also use other compilers in combination with nvcc. For instance, to make it work with the Intel compiler, use:

nvcc -ccbin [<path to intel compiler>/]icc ...

CUDA Samples

The NVIDIA GPU Computing SDK provides a lot of examples in CUDA C. They can be used to verify the correct setup of the GPU (see examples deviceQuery and bandwithTest), as a starting point for your own application and to give you the idea of how to implement certain algorithm on a GPU. You can find the how-to of the SDK here.


PGI CUDA Fortran

Please refer to PGI's webpages

last changed on 29.01.2021

How did this content help you?