You are located in service: RWTH High Performance Computing (Linux)

Python

Python

Kurzinformation

Python is a widespread programming language used in many domains. The availability of many libraries (Python modules, not to be mixed up with our modules system) make it feasible for scientific computing.

Table of Contents

  1. Python Provided by the OS Distribution (Linux)
  2. Python from the module system
  3. Own version in your $HOME
  4. Installation of Additional Python Packages
    1. Checking the installed python modules
    2. Deep Learning
    3. MPI
    4. NumPy, SciPy and BLAS implementations
    5. Using Conda
  5. Known issues

Detailinformation

 

Note

For Jupyter Notebook remarks see this page, for TensorFlow as a representative of Python-based DL/AI frameworks see this page.
Python 2 has reached EOL as of 2020-01-01 and migration to Python 3 has been advised since 2008(!). Python2 support may (= will) be dropped with a future upgrade without explicit notice. Migrate your application to Python 3.

Python Provided by the OS Distribution (Linux)

Our linux provides basic python installations without the need for conda: "python2" or "python3" are available in the default (no conda) environment. You could use these versions to run your own scripts. However, we strongly recommend to use the python versions provided via the module system.

Python from the Module System (Recommended)

You can list available Python module versions with the command:

module spider Python


You can pass a version number to the spider command, e.g.,  ml spider Python/3.10.4.This will tell you which compiler versions a specific Python version is available for. All toolchains built on that compiler version can be used to load that Python module.
 

After loading a python module, you can install python packages in your $HOME.

  • Several Python modules provided via the module system have been built using optimized Math libraries (e.g. Intel MKL, OpenBLAS) which offers a significant speedup compared to generic builds that can be installed using PIP or conda. We recommend using these versions wherever possible.
  • Install updated versions of preinstalled modules if needed locally as described below by using python3 -m pip install --upgrade and friends.

Own Version in Your $HOME

You're free to install your own version of Python in your $HOME. Using Conda is often preferred here.

Using Conda

Conda can be used to create custom environments independent of the Python versions provided via the modules system. You can install multiple versions of the same packages or even multiple versions of the Python interpreter without risking any interference. 

  • When to use conda
    • If you need multiple independent installations of one package or Python itself
    • If you need to interoperate with non-Python code that is offered through conda

The following guide describes how to install conda via the Miniconda installer.

Base Environment Installation

Login to a CPU or a login GPU node if you intend on using GPUs with your conda environment.

Choose an appropriate installer from https://docs.conda.io/en/latest/miniconda.html#linux-installers . If unsure about the Python version, choose the newest one. Look for the 64-bit linux installer that doesn't have any extra information in its name, i.e. the installer for x86_64 and copy the link by right-clicking it and selecting the right menu entry depending on your browser.

Execute the following commands on the cluster, using the link you copied. Keep in mind that the filename of the installer depends on your choice.

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
bash Miniconda3-latest-Linux-x86_64.sh

Please do not load the base environment into your default environment by default. To disable the base environment by default, use:

conda config --set auto_activate_base false

install the conda base environmentwith:

conda init.

You can then load the base environment again using:


export CONDA_ROOT=$HOME/miniconda3 
source $CONDA_ROOT/etc/profile.d/conda.sh
export PATH="$CONDA_ROOT/bin:$PATH"

Named Environment Installation

In your console load any relevant modules that you think are relevant for your named environment, like for example cuda:

module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1

Load your base environment:


export CONDA_ROOT=$HOME/miniconda3
source $CONDA_ROOT/etc/profile.d/conda.sh
export PATH="$CONDA_ROOT/bin:$PATH"

Create your named environment

 conda create --name mytestenv python==3.9

In batch scripts then you can load modules, then base environment and then the named environment:


module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1
export CONDA_ROOT=$HOME/miniconda3
source $CONDA_ROOT/etc/profile.d/conda.sh
export PATH="$CONDA_ROOT/bin:$PATH"
conda activate mytestenv 
 

Installation of Additional Python Packages

Users are welcome to add needed Py-modules in their $HOME directory, e.g. by 'pip3'. Python software often rely on GCC compilers and often do not work [out of the box] with Intel compilers; thus type module switch intel foss before adding any Py-Modules. You can install Python module in your $HOME in many ways:

  • Install modules in the default personal location $HOME/.local , e.g. via python3 -m pip install --user ... (easy-to-use, less flexible, danger of side effects when using multiple Python versions and toolchains)
  • Use a virtual Environment (like Conda oder venv), see section on conda.
  • Install the modules in an arbitrary directory in $HOME (or another location) and propagate it via the $PYTHONPATH environment variable. This allows for more flexibility than the solutions above but should only be used if you are aware of the implications. If you are unsure, use one of the former methods.

Following you can see an example on how to install a Python module properly. Be aware that you might profit from choosing a recent GCC version. However, doing so introduces that specific GCC version as a dependency and forces you to switch modules before executing any Python script that relies on the modules installed with this GCC version. If you only need this for certain modules, switching to a virtual environment is the better solution. Python modules that take advantage of the intel compiler are almost always best installed in a separate virtual environment. All modules installed in the default location chosen by pip need to be installed with the same compiler module, the same math library (IntelMKL/OpenBLAS) and the same MPI implementation (upgrading to a more recent version is usually fine). That means that you need to settle with one compiler and stick with it.  If you want to maintain multiple environments for different toolchains, Python venvs are recommended.

ml switch intel foss
ml load Python
python3 -m pip install --user theano
# Note: the installation is in $HOME/.local/[bin|lib] directories.
# Beware of clashes if using multiple Py-modules
# It is advised to add $HOME/.local/bin to your $PATH variable for modules that add programs in that location

Checking the Installed Python Modules

Use python3 -m pip -vvv list to see the list of available Python modules in the active environment. Newer versions of pip list also location of that module giving hint which version is used.

For more insight in Python search paths (SYS.PATH) see

If a module is available in more than one location (in different versions?) the first one found in paths will be used. This can be useful if installing an update to already-availble Python module by python3 -m pip install --user --upgrade ..., but can also be a pitfail: $PYTHONPATH superseds local path $HOME/.local/... - if in $PYTHONPATH is an older version of a python module, your installation using python3 -m pip install --user FooBar will succeed but won't be used!

Deep Learning

MPI

It is possible to use Python with MPI (Message Passing Interface). One widely used interface is mpi4py. We provide an Lmod module for mpi4py that can be loaded via the module command. Be aware that you need to load the right version for your Python interpreter version.

Alternatively you can install a local version of mpi4py:

python3 -m pip install --user mpi4py

Please Note: This will install a version of mpi4py adopted to a combination of Python (version), MPI library (vendor and version), compiler (vendor and version). In case of any change in vendor, and also highly likely in case of any change of version of any of those components, you must reinstall mpi4py. Backup and/or cleanup content of your $HOME/.local prior that step!

NumPy, SciPy and BLAS implementations

NumPy and SciPy are two widely-used scientific/numerical Python packages. They rely on a fast (and parallelised with threads) BLAS/LAPACK implementation. Intel MKL is the recommended implementation in Intel CPUs and is proven to be up to 2x faster than OpenBLAS. When using NumPy/SciPy linked to Intel MKL (or OpenBLAS) you can use up all cores of todays multicore computer with significant (or even nearly-linear) speedup. The most disadvantageous implementation of BLAS/LAPACK would be the native/naive implementation from netlib, which is not parallelised at all and stays single-core and thus pretty slow. You can see the configuration of NumPy and SciPy in your environment:

python3 -c "import numpy as np; print(np.version.version); np.show_config()"
python3 -c "import scipy as sp; print(sp.version.version); sp.show_config()"

When installing an updated version of NumPy/SciPy - wittingly or not wittingly but as a side effect of installing a package by python3 -m pip --user --upgrade MODULE - you can advise to use Intel MKL, as described in NumPy/SciPy Application Note by creating a '.numpy-site.cfg' file in your $HOME directory. Note that pip may ignore this file by default and install a BLOB wheel, so force it by '--no-binary' flag:

cat $HOME/.numpy-site.cfg

Should give the output:

[mkl]
library_dirs = /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64
include_dirs = /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/imkl/2022.1.0/mkl/2022.1.0/include
lapack_libs = mkl_rt
mkl_libs = iomp5,mkl_rt

Depending on which MKL Version you need. The actual path for library_dirs and include_dirs are $MKLROOT/lib/intel64 and $MKLROOT/include. The value of $MKLROOT is visible after module load imkl/VERSION.

Then use:

python3 -m pip install --user --force-reinstall --ignore-installed --no-binary :all: numpy
python3 -m pip install --user --force-reinstall --ignore-installed --no-binary :all: scipy

After (re)installation of your NumPy and when interested you can try to benchmark NumPy/SciPy and compare your results to

 

Known issues

  • Do you see an error of type ImportError: No module named FooBar?
    • A python module named FooBar is missing. Solution in most cases would be to install it using python3 -m pip install --user FooBar - See the section on installing own packages
  • Do you see an error of type ImportError: /home/ab123456/.local/lib/python3.?/site-packages/............86_64-linux-gnu.so: undefined symbol: __intel_sse2_strlen?
    • A python module containing a library was built using Intel compilers and misses some libraries. Easiest solution: reinstall (everything) from scratch using GCC compilers:
    • module switch intel foss
    • (backup ~/.local)
    • mv ~/.local/lib/python3.? ~/.local/lib/__RM_ME__python3.?
    • python3 -m pip install --user --no-cache-dir someUsefulPythonSoftware
  • if any strange behaviour persists after a re-installation of software from scratch, use --no-cache-dir to disable the pip's cache (maybe containing malformed packeges).

last changed on 12/04/2023

How did this content help you?

Creative Commons Lizenzvertrag
This work is licensed under a Creative Commons Attribution - Share Alike 3.0 Germany License