IT Center Help

Sie befinden sich im Service: RWTH High Performance Computing (Linux)

Python

Information

Python is a widespread programming language used in many domains. The availability of many libraries (Python modules, not to be mixed up with our modules system) make it feasible for scientific computing.

Anleitung

Note

For Jupyter Notebook remarks see this page, for TensorFlow as a representative of Python-based DL/AI frameworks see this page.

Python 2 has reached EOL as of 2020-01-01 and migration to Python 3 has been advised since 2008(!). Python2 support may (= will) be dropped with a future upgrade without explicit notice. Migrate your application to Python 3.

Python Provided by the OS Distribution (Linux)

The Linux distribution installed on our systems includes basic Python installations, with "python2" and "python3" readily available in the default environment without requiring conda. While you can use these versions to run your scripts, we highly recommend utilizing the Python versions offered through the module system for optimal performance and compatibility.

Python from the Module System (Recommended)

You can list available Python module versions with the command:

module spider Python

You can also pass a version number to the spider command, e.g., module spider Python/3.10.4.This will tell you which compiler versions a specific Python version is available for. All toolchains built on that compiler version can be used to load that Python module.

After loading a python module, you can install python packages in your personal userspace, i.e. $HOME.

Several Python modules provided via the module system have been built using optimized Math libraries (e.g. Intel MKL, OpenBLAS) which offers a significant speedup compared to generic builds that can be installed using PIP or conda. We recommend using these versions wherever possible.
Install updated versions of preinstalled modules if needed locally as described below by using python3 -m pip install --user --upgrade and friends.

Python Software provided by Containers

For several software packages, e.g. TensorFlow, PyTorch and others, we provide prebuild and optimized Apptainer containers, that also comprise the required and dependent software packages. It is also possible to to construct custom containers that can be shared with e.g. team members.

Python via Custom Virtual Environments

If you have very specific requirements or want to use different version of Python packages or software that is not provided by the previous solutions, you are free to install your own Python version or construct custom user-specific virtual environments. Common approaches for that include

IMPORTANT: Due to the current license model of Anaconda and issues that would arise when using Anaconda packages on our systems, we blocked all Anaconda package repositories in our firewall.

Instead please make use of open-source variants such as or Miniforge (conda-forge)

Using Conda

Conda can be used to create custom environments independent of the Python versions provided via the modules system. You can install multiple versions of the same packages or even multiple versions of the Python interpreter without risking any interference.

When to use conda
- If you need multiple independent installations of one package or Python itself
- If you need to interoperate with non-Python code that is offered through conda

The following guide describes how to install conda via the Miniforge installer. This distribution only uses open-source channels with permissive licenses such as conda-forge.

Installation of Conda Through Miniforge

Follow the installation instructions in the Miniforge readme:
We strongly advise NOT to activate the base environment unless you are aware of the interactions between the conda environment and the module system. If you have enabled the auto-activation on setup, you can undo this by executing

conda config --set auto_activate_base false

Use Conda in SLURM

Conda works by adding a command via your shell configuration file. This is not loaded in Slurm job scripts and thus conda is not immediately available. To fix this, you can add this snippet to your job script:

# if you installed Miniforge to a different location, change the path accordingly
export CONDA_ROOT=$HOME/miniforge3
source $CONDA_ROOT/etc/profile.d/conda.sh
export PATH="$CONDA_ROOT/bin:$PATH"
conda activate myenv

Why do package versions differ from my local computer?

Since September 2024, we are overriding the default configuration of conda to use the conda-forge channel since the default channels are not freely available in enterprise settings. This might mean that some of packages are installed in newer or slightly different versions. For most purposes, this should not affect your experience as a user.

Installation of Additional Python Packages

Users are welcome to add needed Py-modules in their $HOME directory, e.g. by 'pip3'. Python software often rely on GCC compilers and often do not work [out of the box] with Intel compilers; thus type module switch intel foss before adding any Py-Modules. You can install Python module in your $HOME in many ways:

Install modules in the default personal location $HOME/.local , e.g. via python3 -m pip install --user ... (easy-to-use, less flexible, danger of side effects when using multiple Python versions and toolchains)
Use a virtual Environment (like Conda oder venv), see section on conda.
Install the modules in an arbitrary directory in $HOME (or another location) and propagate it via the $PYTHONPATH environment variable. This allows for more flexibility than the solutions above but should only be used if you are aware of the implications. If you are unsure, use one of the former methods.

Following you can see an example on how to install a Python module properly. Be aware that you might profit from choosing a recent GCC version. However, doing so introduces that specific GCC version as a dependency and forces you to switch modules before executing any Python script that relies on the modules installed with this GCC version. If you only need this for certain modules, switching to a virtual environment is the better solution. Python modules that take advantage of the intel compiler are almost always best installed in a separate virtual environment. All modules installed in the default location chosen by pip need to be installed with the same compiler module, the same math library (IntelMKL/OpenBLAS) and the same MPI implementation (upgrading to a more recent version is usually fine). That means that you need to settle with one compiler and stick with it. If you want to maintain multiple environments for different toolchains, Python venvs are recommended.

ml switch intel foss
ml load Python
python3 -m pip install --user theano
# Note: the installation is in $HOME/.local/[bin|lib] directories.
# Beware of clashes if using multiple Py-modules
# It is advised to add $HOME/.local/bin to your $PATH variable for modules that add programs in that location

Checking the Installed Python Modules

Use python3 -m pip -vvv list to see the list of available Python modules in the active environment. Newer versions of pip list also location of that module giving hint which version is used.

For more insight in Python search paths (SYS.PATH) see

https://www.devdungeon.com/content/python-import-syspath-and-pythonpath-tutorial
https://datacadamia.com/lang/python/engine/searchpath
python3 -c 'import sys; print(sys.path)'

If a module is available in more than one location (in different versions?) the first one found in paths will be used. This can be useful if installing an update to already-availble Python module by python3 -m pip install --user --upgrade ..., but can also be a pitfail: $PYTHONPATH superseds local path $HOME/.local/... - if in $PYTHONPATH is an older version of a python module, your installation using python3 -m pip install --user FooBar will succeed but won't be used!

Deep Learning

See Tensorflow

MPI

It is possible to use Python with MPI (Message Passing Interface). One widely used interface is mpi4py. We provide an Lmod module for mpi4py that can be loaded via the module command. Be aware that you need to load the right version for your Python interpreter version.

Alternatively you can install a local version of mpi4py:

python3 -m pip install --user mpi4py

Please Note: This will install a version of mpi4py adopted to a combination of Python (version), MPI library (vendor and version), compiler (vendor and version). In case of any change in vendor, and also highly likely in case of any change of version of any of those components, you must reinstall mpi4py. Backup and/or cleanup content of your $HOME/.local prior that step!

NumPy, SciPy and BLAS implementations

NumPy and SciPy are two widely-used scientific/numerical Python packages. They rely on a fast (and parallelised with threads) BLAS/LAPACK implementation. Intel MKL is the recommended implementation in Intel CPUs and is proven to be up to 2x faster than OpenBLAS. When using NumPy/SciPy linked to Intel MKL (or OpenBLAS) you can use up all cores of todays multicore computer with significant (or even nearly-linear) speedup. The most disadvantageous implementation of BLAS/LAPACK would be the native/naive implementation from netlib, which is not parallelised at all and stays single-core and thus pretty slow. You can see the configuration of NumPy and SciPy in your environment:

python3 -c "import numpy as np; print(np.version.version); np.show_config()"
python3 -c "import scipy as sp; print(sp.version.version); sp.show_config()"

When installing an updated version of NumPy/SciPy - wittingly or not wittingly but as a side effect of installing a package by python3 -m pip install --user --upgrade MODULE - you can advise to use Intel MKL, as described in NumPy/SciPy Application Note by creating a '.numpy-site.cfg' file in your $HOME directory. Note that pip may ignore this file by default and install a BLOB wheel, so force it by '--no-binary' flag:

cat $HOME/.numpy-site.cfg

Should give the output:

[mkl]
library_dirs = /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64
include_dirs = /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/imkl/2022.1.0/mkl/2022.1.0/include
lapack_libs = mkl_rt
mkl_libs = iomp5,mkl_rt

Depending on which MKL Version you need. The actual path for library_dirs and include_dirs are $MKLROOT/lib/intel64 and $MKLROOT/include. The value of $MKLROOT is visible after module load imkl/VERSION.

Then use:

python3 -m pip install --user --force-reinstall --ignore-installed --no-binary :all: numpy
python3 -m pip install --user --force-reinstall --ignore-installed --no-binary :all: scipy

After (re)installation of your NumPy and when interested you can try to benchmark NumPy/SciPy and compare your results to

NumPy/SciPy Application Note
https://github.com/tmolteno/necpp/issues/18
http://markus-beuckelmann.de/blog/boosting-numpy-blas.html
our benchmarks:
The 3.6.8.sysdef is the vanilla Python version from Linux; the versions python/3.7.3 use OpenBLAS and python/3.7.9 use Intel MKL. These benchmarks were recorded with previous installations from a different module system.

Known issues

Do you see an error of type ImportError: No module named FooBar?
- A python module named FooBar is missing. Solution in most cases would be to install it using python3 -m pip install --user FooBar - See the section on installing own packages
Do you see an error of type ImportError: /home/ab123456/.local/lib/python3.?/site-packages/............86_64-linux-gnu.so: undefined symbol: __intel_sse2_strlen?
- A python module containing a library was built using Intel compilers and misses some libraries. Easiest solution: reinstall (everything) from scratch using GCC compilers:
- module switch intel foss
- (backup ~/.local)
- mv ~/.local/lib/python3.? ~/.local/lib/__RM_ME__python3.?
- python3 -m pip install --user --no-cache-dir someUsefulPythonSoftware
if any strange behaviour persists after a re-installation of software from scratch, use --no-cache-dir to disable the pip's cache (maybe containing malformed packeges).

zuletzt geändert am 27.06.2025

Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz

Sie befinden sich im Service: RWTH High Performance Computing (Linux)

Python

Table of Contents

Note