Sie befinden sich im Service: RWTH Compute Cluster Linux (HPC)

python

python

Kurzinformation

Python is a widespread programming language used in many domains. The availability of many libraries (Python modules, not to be mixed up with environment modules system) make it feasible for scientific computing.

Note: for Jupyter Notebook remarks see this page, for TensorFlow as a representative of python-based DL/AI frameworks see this page.

Note: the Python version 2 and version 3 are mainly incompatible. The Python-2 has reached EOL as of 2020-01-01 and is deprecated. Migrate your application to Python-3.

Note : The Python-3 binaries typically are called 'python3', 'pip3' and so on; there are also 'python2' and 'pip2' (and so on) binaries available for Python-2 version, making essentially two versions versions of Python available at the same time with 'python' being python2 or python3 - environment-dependent, today linked typically to version 2 of python due to historical reasons. This may be subject of change. It is highly reccomendable to use 'python2' / 'python3' instead of 'python' even if 'python' links to the right version in actual environment as it would enshure the right version of Python interpreter being ised.

There are many roots of Python software in the HPC Cluster:


 Detailinformation

1. Python brought along by the OS distribution (Linux)

In Linux, there is a version of Python boxed in the distribution, delivered as RPM. This Python and some Py-modules (NumPy, SciPy, Matplotlib, ...) are available by default. If you miss a Py-module let us know - maybe there is a suitable RPM available. Not all RPMs will be installed (i.e. we cannot install RPMs whose depend on OS-distributed MPI packages as these MPI RPMs are incompatible to the batch system).

The RPMs installed in exactly that versions which are available by OS, also typically older versions. The RPMs are updated accordingly to the OS distribution update rules. We do not offer any other versions/updates/fixes besides the distribution RPMs. Users cannot modify this installation. You can add further Py-Modules (or newer versions of preinstalled ones) in your $HOME as desctibed below.

 

2. Python from the module system

If you need yet another version of Python as the one available by default in Linux environment, take a look at the output of 'module avail python' command. It lists a number of Python installations available in the HPC Cluster. E.g. type 'module load python/3.9.1'  for loading of the the appropriate version of Python3.

  • Note: the name of the Python binary version 3 is 'python3' and not just 'python'! The name 'python' will still be resolved as the Linux's one Python-2 interpreter.
  • You can add Py-Modules and packages in your $HOME as described below.
  • some Py-Modules (NumPy, SciPy, ...) are bundled in these installations, are build in optimised form using Intel MKL, offereng up to 2x speedup in comparison to vanilla installations using (parallel!) OpenBLAS, and likely tremendous speedup if compared to some unoptimised serial implementations (danger of those in unoptimised 'Conda' installations). (Note: in older versions of modules the NumPy/SciPy and friends are made available by PYTHONPATH environment variable set by loading the python environment module. Whenever modifying this environment variable, you will likely wish to enhance the value instead to overwrite it, e.g. 'export PYTHONPATH=$HOME/my/newpatho/added:$PYTHONPATH' - remember that this would disarm your likely newer versions of any package installed locally in $HOME/.local adding a mess into your environment...)
  • install updated versions of preinstalled modules if needed locally as described below by using 'pip3 install --upgrade ..' and friends.
 

3. Intel Python

Intel also offer Python releases, available via the environment modules. List available versions with  'module avail pythoni' (note the 'i' at the end!) and load one by e.g. 'module load pythoni/3.7' for Python3 version. Intel Python also bring in optimised NumPy+SciPy versions using Intel MKL. All below notes  about installing software in $HOME and PYTHONPATH apply. Note that Intel also call Python-3 binary 'python' in addition to of 'python3', which differs the Intel release of python from other ones.
 

4. Own version in your $HOME

You're free to install your own version of Python in your $HOME. All notes above about installing software in $HOME and PYTHONPATH apply.
 

5. Installing of additional Python packages

The users are welcome to add needed Py-modules in their $HOME directory, e.g. by 'pip3'. Python software often rely on GCC compilers and often do not work [out of the box] with Intel compilers; thus type 'module switch intel gcc' before adding any Py-Modules. You can install Python module in your $HOME in many ways:

  • Install the modules in an arbitrary directory in $HOME (or $WORK or..) and propagate it by the PYTHONPATH environment variable (ultimate flexibility, need to take care about environment), or
  • Install in standard-$HOME installation $HOME/.local e. g. by 'pip3 install --user ...'  (easy-to-use as not needed to set up the PYTHONPATH environment variable, less flexible, danger of side effects when using multiple Python versions and/or very compilicated toolchains like tensorflow and other AI toolhains).
  • Use Virtual Environment (like Conda). TBD.

Examples:

$ module switch intel gcc
export MYPY=$HOME/SomeDirForPythonInstall
export PYTHONPATH=$PYTHONPATH:$MYPY/lib/python3.6/site-packages
mkdir -p $MYPY/lib/python3.6/site-packages
$ easy_install-3.6 --prefix $MYPY  theano
# Note: you need to set PYTHONPATH anytime you wish to use 'teano' Py-module!

$ module switch intel gcc
$ pip3 install --user theano
# Note: the installation is in $HOME/.local/[bin|lib] directories.
# Beware of clash if using multiple Py-modules
# Maybe you like to add $HOME/.local/bin  to your $PATH envvar.

TBD: Conda/Anaconda/VirtialEnvironment Example.

 

5.0 Checking the installed python modules

Use 'pip3 -vvv list' to see the list of available Python modules in the active environment. Newer versions of 'pip' list also location of that module giving hint which version is used: one boxed with python installation, one of 'pip3 install --user ...' installation to default $HOME (recognizable on $HOME/.local/lib/.... in path), or one of $PYTHONPATH to some location.

For more insight in python search paths (SYS.PATH) see

If a module is available in more that one locations (in different versions?) the first one found in paths will be used. This can be useful if installing an update to already-availble python module by 'pip3 install --user --upgrade ...', but can also be a pitfail:  PYTHONPATH superseeds local path $HOME/.local/... - if in PYTHONPATH is an older version of a python module, your installtion using 'pip install --user FooBar' will succeed but wil not be used!

 

5.1 Deep Learning

5.2. MPI

It is possible to use Python with MPI (Message Passing Interface). One widely used interface is 'mpi4py'. You must install this package into your $HOME using command 'pip2' or 'pip3' (depending on Python version used):

$ pip2 install --user mpi4py
$ pip3 install --user mpi4py

Note that this will install a version pf mpi4py adopted to combination of Python (version), MPI library (vendor and version), compiler (vendor and version). In case of any change in vendor, and also highly likely in case of any change of version of any those component, you must reinstall 'mpi4py'. Backup and/or cleanup content of your $HOME/.local prior that step!

 

5.3. NumPy, SciPy and BLAS implementations

NumPy and SciPy are two wide-used scientific/numerical Python packages. They rely on a fast (and parallelised with threads) BLAS/LAPACK implementation. Intel MKL is the recommended  implementation in Intel CPUs and is proven to be up to 2x faster that OpenBLAS (which is not bad, too!). When using NumPy/SciPy linked to Intel MKL (or OpenBLAS) you can use up all cores of todays multicore computer with significant (or even nearly-linear) speedp. Set up the number of threads by using the OMP_NUM_THREADS envvar (or sumbit a shared memory parallelised batch job). The most disadvantageous implementation of BLAS/LAPACK would be the native/naive implementation from netlib, which is not parallelised at all and stays single-core and thus pretty slow. You can see the configuration of NumPy and SciPy in your environment bythose command:

$ python3 -c "import numpy as np; print(np.version.version); np.show_config()"
$ python3 -c "import scipy as sp; print(sp.version.version); sp.show_config()"

When installing an updated version of NumPy/SciPy - wittingly or not wittingly but as a side effect of installing a package by 'pip3 --user --upgrade something' - you can advise to use Intel MKL, as described here https://software.intel.com/content/www/us/en/develop/articles/build-numpy-with-mkl-and-icc.html by creating a '.numpy-site.cfg' file in your $HOME directory. Note that the 'pip3' may ignore this file by default and install a BLOB wheel, so force it by '--no-binary' flag:

$ cat $HOME/.numpy-site.cfg
[mkl]
library_dirs = /usr/local_rwth/sw/python/3.9.1/x86_64/extra/lib
include_dirs = /rwthfs/rz/SW/intel/Compiler/19.1/3.304/mkl/include
lapack_libs = = mkl_rt
mkl_libs = iomp5,mkl_rt
$ pip3 install --user --force-reinstall --ignore-installed --no-binary :all: numpy
$ pip3 install --user --force-reinstall --ignore-installed --no-binary :all: scipy

After (re)installation of your NumPy and when interested you can try to benchmark NumPy/SciPy and compare your results to

 

6. Known issues

  • When using mpi4py and Intel MPI 2018, the MPI process will stuck forever when startet across multiple nodes. Workarounds:
    • configure your job to stay single-node
    • use intelmpi/2019
    • switch to Open MPI (you have to reinstall mpi4py from scratch!)
  • Do you see an error of type 'ImportError: No module named FooBar'?
    • a python module named FooBar is missing. Solution in most cases would be to install it using 'pip3 install --user FooBar' - for example see above section '5. Installing of additional Python packages.'.
  • Do you see an error of type 'ImportError: /home/ab123456/.local/lib/python3.?/site-packages/............86_64-linux-gnu.so: undefined symbol: __intel_sse2_strlen'?
    • a python module containig a library was built using Intel compilers and miss some libraries. Easiest solution: reinstall (everything) from scrath using GCC compilers:
    • $ module switch intel gcc
    • (backup ~/.local)
    • mv ~/.local/lib/python3.?  ~/.local/lib/__RM_ME__python3.? 
    • pip3 install --user --no-cache-dir someUsefulPythonSoftware
  • if any strange behaviour keeps also after a re-insatllation of software from scratch, use ' --no-cache-dir ' to disable the pip's cache (maybe containing malformed packeges).

zuletzt geändert am 11.06.2021

Wie hat Ihnen dieser Inhalt geholfen?

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz