You are located in service: RWTH Compute Cluster Linux (HPC)

openmpi

openmpi

Kurzinformation

Open MPI (http://www.openmpi.org) is developed by several groups and vendors. It may be used as an alternative to Intel MPI.


Detailinformation 

1.How to access the software

To set up the environment  for the Open MPI use

module unload intelmpi; module load openmpi

Currently a version of Intel MPI is the standard MPI in the cluster environment, so the corresponding Open MPI module is not loaded by default.


This will set environment variables for further usage. The list of variables can be obtained with

module help openmpi

The compiler drivers are mpicc for C, mpif77 and mpif90 or mpifort  since v1.7 for Fortran, mpicxx and mpiCC for C++. To start MPI programs, mpiexec is used.
We strongly recommend using the environment variables $MPIFC, $MPICC, $MPICXX and $MPIEXEC set by the module system in particular because the compiler driver variables are set according to the latest loaded compiler module. Example:

$MPIFC -c prog.f90
$MPIFC prog.o -o prog.exe
$MPIEXEC -np 4 prog.exe

In order to start your Open MPI job, please use $MPIEXEC and $FLAGS_MPI_BATCH envvars:

$MPIEXEC $FLAGS_MPI_BATCH python2 my_mpi4py_script.py

Note: 'srun' instead $MPIEXEC of may not work.

Further Information / Known issues

  • Open MPI version 1.10.7 and older have a race condition in handling of simultaneous MPI_Abort calls, cf. https://www.mail-archive.com/users@lists.open-mpi.org//msg31755.html As a consequence your MPI job calling MPI_Abort can become stuck in 'RUN' mode until the end of the maximal run time instead of immediately termination. To prevent resource waste (job called MPI_Abort and is dead, but still locks resources) please check the output of your jobs from time to time and remove dead jobs.

  • In Open MPI 2.x and later, memory hooks (for allocating memory) have been removed, cf. https://www.mail-archive.com/users@lists.open-mpi.org//msg31048.html This results in up to 30% loss of maximal bandwidth on an InfiniBand network (old Bull cluster). When using new(er) OpenMPI versions on IB network try the 'memalign' script:

    $MPIEXEC $FLAGS_MPI_BATCH memalign  a.out
  • Your application is crashing from time to time on the new CLAIX nodes, but known to run well on old Bull nodes? Try to add '-x PSM2_KASSIST_MODE=none' to the command line,

    $MPIEXEC -x PSM2_KASSIST_MODE=none $FLAGS_MPI_BATCH  a.out
  • Open MPI is not ABI compatible between major versions (e.g. 1.10.x and 3.1.0), and sometimes even between ninor releases. Trying to start an old binary using a new [major] version of Open MPI loaded ends in an undefined behaviour.

last changed on 29.01.2021

How did this content help you?