You are located in service: RWTH High Performance Computing (Linux)

python

python

Kurzinformation

Python is a widespread programming language used in many domains. The availability of many libraries (Python modules, not to be mixed up with environment modules system) make it feasible for scientific computing.

Note: for Jupyter Notebook remarks see this page, for TensorFlow as a representative of python-based DL/AI frameworks see this page.

Note: Python 2 and 3 are mainly incompatible. Python 2 has reached EOL as of 2020-01-01 and is deprecated. Migrate your application to Python 3.

Note : The Python 3 binaries typically are called 'python3', 'pip3' and so on; there are also 'python2' and 'pip2' (and so on) binaries available for Python 2 version, making essentially two versions versions of Python available at the same time with 'python' being python2 or python3 - environment-dependent, today linked typically to version 2 of python due to historical reasons. This may be subject of change. It is highly reccomendable to use 'python2' / 'python3' instead of 'python' even if 'python' links to the right version in actual environment as it would enshure the right version of Python interpreter being ised.

There are many roots of Python software in the HPC Cluster:


 Detailinformation

1. Python Provided by the OS Distribution (Linux)

In Linux, there is a version of Python boxed in the distribution, delivered as an RPM. This Python and some py-modules (NumPy, SciPy, Matplotlib, ...) are available by default. If you miss a py-module let us know - maybe there is a suitable RPM available. Not all RPMs will be installed (i.e. we cannot install RPMs which depend on OS-distributed MPI packages as these MPI RPMs are incompatible with the batch system).

The RPMs also typically contain older versions of the packaged software. They are updated according to the OS distribution update rules. We do not offer any other versions/updates/fixes besides the distribution RPMs. Users cannot modify this installation. You can add further Py-Modules (or newer versions of preinstalled ones) in your $HOME as described below.

 

2. Python from the Module System

If you need yet another version of Python as the one available by default in Linux environment, take a look at the output of 'module avail python' command. It lists a number of Python installations available in the HPC Cluster. E.g. type 'module load python/3.9.1'  for loading of the the appropriate version of Python3.

  • Note: the name of the Python binary version 3 is 'python3' and not just 'python'! The name 'python' will still be resolved to the Python 2 interpreter provided by the operating system.
  • You can add py-modules and packages in your $HOME as described below.
  • some Py-Modules (NumPy, SciPy, ...) are bundled in these installations, are build in optimised form using Intel MKL, offereng up to 2x speedup in comparison to vanilla installations using (parallel!) OpenBLAS, and likely tremendous speedup if compared to some unoptimised serial implementations (danger of those in unoptimised 'Conda' installations). (Note: in older versions of modules the NumPy/SciPy and friends are made available by PYTHONPATH environment variable set by loading the python environment module. Whenever modifying this environment variable, you will likely wish to enhance the value instead to overwrite it, e.g. 'export PYTHONPATH=$HOME/my/newpatho/added:$PYTHONPATH' - remember that this would disarm your likely newer versions of any package installed locally in $HOME/.local adding a mess into your environment...)
  • install updated versions of preinstalled modules if needed locally as described below by using 'pip3 install --upgrade ..' and friends.
 

3. Intel Python

Intel also offers Python releases, available via the environment modules. You may list available versions with  'module avail pythoni' (note the 'i' at the end!) and load one by e.g. 'module load pythoni/3.7'. Intel Python also brings in optimized NumPy+SciPy versions using Intel MKL. All notes below about installing software in $HOME and PYTHONPATH apply. Note also that Intel calls the Python 3 binary 'python' in addition to of 'python3'.
 

4. Own Version in Your $HOME

You're free to install your own version of Python in your $HOME. Using Conda is often preferred here, since building current releases of Python on CentOS 7 can prove tricky.
 

5. Installation of Additional Python Packages

Users are welcome to add needed Py-modules in their $HOME directory, e.g. by 'pip3'. Python software often rely on GCC compilers and often do not work [out of the box] with Intel compilers; thus type 'module switch intel gcc' before adding any Py-Modules. You can install Python module in your $HOME in many ways:

  • Install modules in the default personal location $HOME/.local , e.g. via 'pip3 install --user ...'  (easy-to-use, less flexible, danger of side effects when using multiple Python versions and/or very compilicated toolchains like tensorflow and other AI toolchains)
  • Use a virtual Environment (like Conda oder venv), see section on conda.
  • Install the modules in an arbitrary directory in $HOME (or $WORK or..) and propagate it via the PYTHONPATH environment variable. This allows for more flexibility than the solutions above but should only be used if you are aware of the implications. If you are unsure, use one of the former methods.
 

Following you can see an example on how to install a python module properly. Be aware that you might profit from choosing a recent GCC version (run 'module avail gcc' for reference). However, doing so introduces that specific GCC version as a dependency and forces you to switch modules before executing any python script that relies on the modules installed with this GCC version. If you only need this for certain modules, switching to a virtual environment is the better solution. Python modules that take advantage of the intel compiler are almost always best installed in a separate virtual environment. All modules installed in the default location chosen by pip need to be installed with the same compiler module (upgrading to a more recent compiler version is usually fine). That means that you need to settle on one compiler and stick with it. 

$ module switch intel gcc
$ pip3 install --user theano
# Note: the installation is in $HOME/.local/[bin|lib] directories.
# Beware of clash if using multiple Py-modules
# Maybe you like to add $HOME/.local/bin  to your $PATH envvar.

 

5.0 Checking the Installed Python Modules

Use 'pip3 -vvv list' to see the list of available Python modules in the active environment. Newer versions of 'pip' list also location of that module giving hint which version is used: one boxed with python installation, one of 'pip3 install --user ...' installation to default $HOME (recognizable on $HOME/.local/lib/.... in path), or one of $PYTHONPATH to some location.

For more insight in python search paths (SYS.PATH) see

If a module is available in more that one locations (in different versions?) the first one found in paths will be used. This can be useful if installing an update to already-availble python module by 'pip3 install --user --upgrade ...', but can also be a pitfail:  PYTHONPATH superseeds local path $HOME/.local/... - if in PYTHONPATH is an older version of a python module, your installtion using 'pip install --user FooBar' will succeed but wil not be used!

 

5.1 Deep Learning

5.2. MPI

It is possible to use Python with MPI (Message Passing Interface). One widely used interface is 'mpi4py'. You must install this package into your $HOME using command 'pip2' or 'pip3' (depending on Python version used):

$ pip2 install --user mpi4py
$ pip3 install --user mpi4py

Note that this will install a version pf mpi4py adopted to combination of Python (version), MPI library (vendor and version), compiler (vendor and version). In case of any change in vendor, and also highly likely in case of any change of version of any those component, you must reinstall 'mpi4py'. Backup and/or cleanup content of your $HOME/.local prior that step!

 

5.3. NumPy, SciPy and BLAS implementations

NumPy and SciPy are two wide-used scientific/numerical Python packages. They rely on a fast (and parallelised with threads) BLAS/LAPACK implementation. Intel MKL is the recommended  implementation in Intel CPUs and is proven to be up to 2x faster that OpenBLAS (which is not bad, too!). When using NumPy/SciPy linked to Intel MKL (or OpenBLAS) you can use up all cores of todays multicore computer with significant (or even nearly-linear) speedp. Set up the number of threads by using the OMP_NUM_THREADS envvar (or sumbit a shared memory parallelised batch job). The most disadvantageous implementation of BLAS/LAPACK would be the native/naive implementation from netlib, which is not parallelised at all and stays single-core and thus pretty slow. You can see the configuration of NumPy and SciPy in your environment bythose command:

$ python3 -c "import numpy as np; print(np.version.version); np.show_config()"
$ python3 -c "import scipy as sp; print(sp.version.version); sp.show_config()"

When installing an updated version of NumPy/SciPy - wittingly or not wittingly but as a side effect of installing a package by 'pip3 --user --upgrade something' - you can advise to use Intel MKL, as described here https://software.intel.com/content/www/us/en/develop/articles/build-numpy-with-mkl-and-icc.html by creating a '.numpy-site.cfg' file in your $HOME directory. Note that the 'pip3' may ignore this file by default and install a BLOB wheel, so force it by '--no-binary' flag:

$ cat $HOME/.numpy-site.cfg
[mkl]
library_dirs = /usr/local_rwth/sw/python/3.9.1/x86_64/extra/lib
include_dirs = /rwthfs/rz/SW/intel/Compiler/19.1/3.304/mkl/include
lapack_libs = = mkl_rt
mkl_libs = iomp5,mkl_rt
$ pip3 install --user --force-reinstall --ignore-installed --no-binary :all: numpy
$ pip3 install --user --force-reinstall --ignore-installed --no-binary :all: scipy

After (re)installation of your NumPy and when interested you can try to benchmark NumPy/SciPy and compare your results to

 

5.4 Using Conda

Conda is a package and environment manager that can be used to create custom environments independent of the Python versions provided via the modules system. The advantage of these environments is that you can install multiple versions of the same packages or even multiple versions of the Python interpreter without risking any interference. Conda installs precompiled packages by default, so the compiler modules become (mostly) irrelevant when installing conda packages.

  • When to use conda
    • If you need multiple independent installations of one package or Python itself
    • If you need to interoperate with non-Python code that is offered through conda
       

The following guide describes how to install conda via the Miniconda installer.

Installation

Choose an appropriate installer from https://docs.conda.io/en/latest/miniconda.html#linux-installers . If unsure about the Python version, choose the newest one. Look for the 64-bit linux installer that doesn't have any extra information in its name, i.e. the installer for x86_64 and copy the link by right-clicking it and selecting the right menu entry depending on your browser.

Execute the following commands on the cluster, using the link you copied. Keep in mind that the filename of the installer depends on your choice.

$ wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
$ bash Miniconda3-latest-Linux-x86_64.sh

Choose an installation directory of your liking (the default is fine) and let conda run conda init. This will modify your $HOME/.zshrc. Unless you plan to work exclusively using conda environments, we strongly recommend not auto-activating the default environment. To disable this behavior, run:

$ conda config --set auto_activate_base false

 

Using Conda in Slurm Jobs

Conda works by inserting code into your interactive shell configuration file to make the 'conda' command available. The shell executing a Slurm jobscript does not run this configuration file, thus you need to manually perform these steps in the jobscript. If you installed Miniconda in a non-default location, you need to modify the pathes accordingly:

# Insert this AFTER the #SLURM argument section of your job script
export CONDA_ROOT=$HOME/miniconda3
. $CONDA_ROOT/etc/profile.d/conda.sh
export PATH="$CONDA_ROOT/bin:$PATH"
# Now you can activate your configured conda environments
conda activate myenvironment


 

6. Known issues

  • When using mpi4py and Intel MPI 2018, the MPI process will be stuck forever when started across multiple nodes. Workarounds:
    • Configure your job to stay single-node
    • Use intelmpi/2019
    • Switch to Open MPI (you have to reinstall mpi4py from scratch!)
       
  • Do you see an error of type 'ImportError: No module named FooBar'?
    • A python module named FooBar is missing. Solution in most cases would be to install it using 'pip3 install --user FooBar' - for example see above section '5. Installing of additional Python packages.'.
       
  • Do you see an error of type 'ImportError: /home/ab123456/.local/lib/python3.?/site-packages/............86_64-linux-gnu.so: undefined symbol: __intel_sse2_strlen'?
    • A python module containig a library was built using Intel compilers and miss some libraries. Easiest solution: reinstall (everything) from scrath using GCC compilers:
    • $ module switch intel gcc
    • (backup ~/.local)
    • mv ~/.local/lib/python3.?  ~/.local/lib/__RM_ME__python3.? 
    • pip3 install --user --no-cache-dir someUsefulPythonSoftware
       
  • if any strange behaviour persists after a re-installation of software from scratch, use ' --no-cache-dir ' to disable the pip's cache (maybe containing malformed packeges).

last changed on 09/15/2022

How did this content help you?

Creative Commons Lizenzvertrag
This work is licensed under a Creative Commons Attribution - Share Alike 3.0 Germany License