Sie befinden sich im Service: RWTH High Performance Computing (Linux)




The Nvidia CUDA Toolkit provides support for CUDA C.

Depending on the version, you might have to load additional modules until you can load CUDA:

module load CUDA/11.8.0

Available CUDA versions can be listed with module spider CUDA. Specifying a version will list the needed modules: module spider CUDA/11.8.0

Table of Contents

  2. CUDA + MPI
  3. CUDA Samples
  4. PGI CUDA Fortran




Loading the module will prepend to your $PATH and $LD_LIBRARY_PATH variables the correct locations and furthermore will provide the variable $CUDA_ROOT, which contains the root directory of the loaded toolkit installation. The currently loaded version can also be obtained by nvcc --version. Documentation is no longer available under $CUDA_ROOT/doc. Instead please visit

You can compile and link e.g. your CUDA C program called by:


If you want to set a certain computate capability, you can use the corresponding compiler flag. For example, to set compute capability 6.0 do:

nvcc -arch=sm_60


Combining MPI and CUDA is a possibility to scale over several nodes. For instance, you can run one MPI process per machine while each process uses one (or two if available) GPUs. Another possible scenario for dual-GPU machines is to specify that there should be two MPI processes per node and each of them uses one GPU.

To use CUDA with MPI, our recommendation is to compile your CUDA code with nvcc and then link it with $MPICC or $MPICXX by explicitly specifying the CUDA libraries. For example:

nvcc -arch=sm_60 -m64 -c -o foo.o
$MPICXX -c bar.cpp -o bar.o
$MPICXX foo.o bar.o -o foobar.exe -L$CUDA_ROOT/lib64 -lcudart

Run your program with $MPIEXEC.

Usually nvccuses the GNU compiler (check with nvcc -v or nvcc -dryrun). But you can also use other compilers in combination with nvcc. For instance, to make it work with the Intel compiler, use:

nvcc -ccbin [<path to intel compiler>/]icc ...

CUDA Samples

The NVIDIA GPU Computing SDK provides a lot of examples in CUDA C. They can be used to verify the correct setup of the GPU (see examples deviceQuery and bandwithTest), as a starting point for your own application and to give you the idea of how to implement certain algorithm on a GPU. You can find the how-to of the SDK here.

PGI CUDA Fortran

Please refer to PGI's webpages

zuletzt geändert am 04.04.2023

Wie hat Ihnen dieser Inhalt geholfen?

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz