Open MPI (http://www.openmpi.org) is developed by several groups and vendors. It may be used as an alternative to Intel MPI.
1.How to access the software
To set up the environment for the Open MPI use
module unload intelmpi; module load openmpi
Currently a version of Intel MPI is the standard MPI in the cluster environment, so the corresponding Open MPI module is not loaded by default.
This will set environment variables for further usage. The list of variables can be obtained with
module help openmpi
The compiler drivers are mpicc for C, mpif77 and mpif90 or mpifort since v1.7 for Fortran, mpicxx and mpiCC for C++. To start MPI programs, mpiexec is used.
We strongly recommend using the environment variables $MPIFC, $MPICC, $MPICXX and $MPIEXEC set by the module system in particular because the compiler driver variables are set according to the latest loaded compiler module. Example:
$MPIFC -c prog.f90
$MPIFC prog.o -o prog.exe
$MPIEXEC -np 4 ./prog.exe
In order to start your Open MPI job, please use $MPIEXEC and $FLAGS_MPI_BATCH envvars:
$MPIEXEC $FLAGS_MPI_BATCH python2 my_mpi4py_script.py
Note: 'srun' instead $MPIEXEC of may not work.
Further Information / Known issues
Open MPI version 1.10.7 and older have a race condition in handling of simultaneous MPI_Abort calls, cf. https://firstname.lastname@example.org//msg31755.html As a consequence your MPI job calling MPI_Abort can become stuck in 'RUN' mode until the end of the maximal run time instead of immediately termination. To prevent resource waste (job called MPI_Abort and is dead, but still locks resources) please check the output of your jobs from time to time and remove dead jobs.
In Open MPI 2.x and later, memory hooks (for allocating memory) have been removed, cf. https://email@example.com//msg31048.html This results in up to 30% loss of maximal bandwidth on an InfiniBand network (old Bull cluster). When using new(er) OpenMPI versions on IB network try the 'memalign' script:$MPIEXEC $FLAGS_MPI_BATCH memalign a.out
Your application is crashing from time to time on the new CLAIX nodes, but known to run well on old Bull nodes? Try to add '-x PSM2_KASSIST_MODE=none' to the command line,$MPIEXEC -x PSM2_KASSIST_MODE=none $FLAGS_MPI_BATCH ./a.out
Open MPI is not ABI compatible between major versions (e.g. 1.10.x and 3.1.0), and sometimes even between ninor releases. Trying to start an old binary using a new [major] version of Open MPI loaded ends in an undefined behaviour.