OpenACC
OpenACC on the RWTH cluster is supported by the PGI and latests GCC compiler. You should start your development sessions by first switching the default Intel compiler to the PGI (or latests GCC) compiler:
$ module switch intel pgi # or $ module switch intel gcc/9 # or $ module load pgi |
In order to simplify and unify the usage of OpenACC we introduced the $FLAGS_OFFLOAD_OPENACC and $FLAGS_<COMPILER>_OFFLOAD_OPENACC environment variables which can be used for building applications with all supported compilers.
You should be aware that the PGI compiler is a commercial tool and therefore you will need a license if you would like to install it on your home computer. However, you can use PGI's community edition for personal use. Please refer to their webpage.
Once the compiler module has been loaded, you can find everything under directory $PGI/linux86-64/<YYYY>, e.g. the include files in the subdirectory include and the executables in bin. Furthermore, there is an examples directory and the folder doc contains the documentation. The PGI compiler comes with its own CUDA Toolkit, which means that you do not have to load the CUDA Toolkit first. However, it also means that PGI may come with a different CUDA Toolkit version than supported on the cluster. The corresponding files are located in the folder cuda.
When compiling you have to supply the target platform (use $FLAGS_OFFLOAD_OPENACC envvar). If you want to get compiler feedback, you should also enable -Minfo=accel. The corresponding command line for a C program would look like the following:
pgcc -Minfo=accel $FLAGS_OFFLOAD_OPENACC pi.c |
You can enable the usage of a certain CUDA toolkit version by extending the target flag, e.g. "cuda5.0" or just "5.0" for Toolkit 5.0 (analogous for other toolkit versions).
pgf90 -Minfo -ta=nvidia,5.0 jacobi.F90 |
The PGI compiler also creates binaries that by default are compatible with several compute capabilities (cf. compiler feedback). If you only wish to have an executable for the a certain GPU type, specify the compute capability. For example, for compute capability 6.0 do:
pgf90 -Minfo -ta=nvidia,cc60 jacobi.F90 |
You can, of course, combine all target flags.
The program pgaccelinfo gives you information about all supported accelerator devices on the host.
Useful hints for PGI's OpenACC
If you would like to know (e.g. while developing or testing the software) whether your program is really being executed on the GPU, you could set the following on the command line:
export ACC_NOTIFY=1|2|3 |
With ACC_NOTIFY set you will get an output, which indicates every kernel launch and/or data transfer including source file name, function name, line number of the kernel, CUDA grid size and CUDA block size, e.g.:
launch kernel file=f1.90 function=main line=22 grid=32 block=256 |
You can also easily add a profiling library to your program which prints out the GPU usage time(s) (i.e. initialization, data movement, kernel execution). To achieve that compile using the following option:
-ta=nvidia,time |
If you would like to keep the graphics card open, use pgcudainit. This reduces or eliminates the GPU initialization time (e.g. useful when doing time measurements).