Sie befinden sich im Service: RWTH High Performance Computing (Linux)

Python

Python

Kurzinformation

Python is a widespread programming language used in many domains. The availability of many libraries (Python modules, not to be mixed up with our modules system) make it feasible for scientific computing.

Table of Contents

  1. Python Provided by the OS Distribution (Linux)
  2. Python from the module system
  3. Own version in your $HOME
  4. Installation of Additional Python Packages
    1. Checking the installed python modules
    2. Deep Learning
    3. MPI
    4. NumPy, SciPy and BLAS implementations
    5. Using Conda
  5. Known issues

Detailinformation

 

Note

For Jupyter Notebook remarks see this page, for TensorFlow as a representative of Python-based DL/AI frameworks see this page.
Python 2 has reached EOL as of 2020-01-01 and migration to Python 3 has been advised since 2008(!). Python2 support may (= will) be dropped with a future upgrade without explicit notice. Migrate your application to Python 3.

Python Provided by the OS Distribution (Linux)

Our linux provides basic python installations without the need for conda: "python2" or "python3" are available in the default (no conda) environment. You could use these versions to run your own scripts. However, we strongly recommend to use the python versions provided via the module system.

Python from the Module System (Recommended)

You can list available Python module versions with the command:

module spider Python


You can pass a version number to the spider command, e.g.,  ml spider Python/3.10.4.This will tell you which compiler versions a specific Python version is available for. All toolchains built on that compiler version can be used to load that Python module.

After loading a python module, you can install python packages in your $HOME.

  • Several Python modules provided via the module system have been built using optimized Math libraries (e.g. Intel MKL, OpenBLAS) which offers a significant speedup compared to generic builds that can be installed using PIP or conda. We recommend using these versions wherever possible.
  • Install updated versions of preinstalled modules if needed locally as described below by using python3 -m pip install --upgrade and friends.
     

Own Version in Your $HOME

You're free to install your own version of Python in your $HOME. Conda is a common tool to install python-based software environments.
 

Using Conda

Conda can be used to create custom environments independent of the Python versions provided via the modules system. You can install multiple versions of the same packages or even multiple versions of the Python interpreter without risking any interference. 

  • When to use conda
    • If you need multiple independent installations of one package or Python itself
    • If you need to interoperate with non-Python code that is offered through conda

The following guide describes how to install conda via the Miniforge installer. This distribution only uses open-source channels with permissive licenses such as conda-forge.

Installation of Conda Through Miniforge

Login to a CPU or a GPU login node depending on whether you intend to use GPUs with your conda environment.

Follow the installation instructions in the Miniforge readme:
We strongly advise NOT to activate the base environment unless you are aware of the interactions between the conda environment and the module system. If you have enabled the auto-activation on setup, you can undo this by executing

conda config --set auto_activate_base false
 

Use Conda in SLURM

Conda works by adding a command via your shell configuration file. This is not loaded in Slurm job scripts and thus conda is not immediately available. To fix this, you can add this snippet to your job script:
# if you installed Miniforge to a different location, change the path accordingly
export CONDA_ROOT=$HOME/miniforge3
source $CONDA_ROOT/etc/profile.d/conda.sh
export PATH="$CONDA_ROOT/bin:$PATH"
conda activate myenv


Why do package versions differ from my local computer?

Since September 2024, we are overriding the default configuration of conda to use the conda-forge channel since the default channels are not freely available in enterprise settings. This might mean that some of packages are installed in newer or slightly different versions. For most purposes, this should not affect your experience as a user.

 

Installation of Additional Python Packages

Users are welcome to add needed Py-modules in their $HOME directory, e.g. by 'pip3'. Python software often rely on GCC compilers and often do not work [out of the box] with Intel compilers; thus type module switch intel foss before adding any Py-Modules. You can install Python module in your $HOME in many ways:

  • Install modules in the default personal location $HOME/.local , e.g. via python3 -m pip install --user ... (easy-to-use, less flexible, danger of side effects when using multiple Python versions and toolchains)
  • Use a virtual Environment (like Conda oder venv), see section on conda.
  • Install the modules in an arbitrary directory in $HOME (or another location) and propagate it via the $PYTHONPATH environment variable. This allows for more flexibility than the solutions above but should only be used if you are aware of the implications. If you are unsure, use one of the former methods.

Following you can see an example on how to install a Python module properly. Be aware that you might profit from choosing a recent GCC version. However, doing so introduces that specific GCC version as a dependency and forces you to switch modules before executing any Python script that relies on the modules installed with this GCC version. If you only need this for certain modules, switching to a virtual environment is the better solution. Python modules that take advantage of the intel compiler are almost always best installed in a separate virtual environment. All modules installed in the default location chosen by pip need to be installed with the same compiler module, the same math library (IntelMKL/OpenBLAS) and the same MPI implementation (upgrading to a more recent version is usually fine). That means that you need to settle with one compiler and stick with it.  If you want to maintain multiple environments for different toolchains, Python venvs are recommended.

ml switch intel foss
ml load Python
python3 -m pip install --user theano
# Note: the installation is in $HOME/.local/[bin|lib] directories.
# Beware of clashes if using multiple Py-modules
# It is advised to add $HOME/.local/bin to your $PATH variable for modules that add programs in that location

Checking the Installed Python Modules

Use python3 -m pip -vvv list to see the list of available Python modules in the active environment. Newer versions of pip list also location of that module giving hint which version is used.

For more insight in Python search paths (SYS.PATH) see

If a module is available in more than one location (in different versions?) the first one found in paths will be used. This can be useful if installing an update to already-availble Python module by python3 -m pip install --user --upgrade ..., but can also be a pitfail: $PYTHONPATH superseds local path $HOME/.local/... - if in $PYTHONPATH is an older version of a python module, your installation using python3 -m pip install --user FooBar will succeed but won't be used!

Deep Learning

MPI

It is possible to use Python with MPI (Message Passing Interface). One widely used interface is mpi4py. We provide an Lmod module for mpi4py that can be loaded via the module command. Be aware that you need to load the right version for your Python interpreter version.

Alternatively you can install a local version of mpi4py:

python3 -m pip install --user mpi4py

Please Note: This will install a version of mpi4py adopted to a combination of Python (version), MPI library (vendor and version), compiler (vendor and version). In case of any change in vendor, and also highly likely in case of any change of version of any of those components, you must reinstall mpi4py. Backup and/or cleanup content of your $HOME/.local prior that step!

NumPy, SciPy and BLAS implementations

NumPy and SciPy are two widely-used scientific/numerical Python packages. They rely on a fast (and parallelised with threads) BLAS/LAPACK implementation. Intel MKL is the recommended implementation in Intel CPUs and is proven to be up to 2x faster than OpenBLAS. When using NumPy/SciPy linked to Intel MKL (or OpenBLAS) you can use up all cores of todays multicore computer with significant (or even nearly-linear) speedup. The most disadvantageous implementation of BLAS/LAPACK would be the native/naive implementation from netlib, which is not parallelised at all and stays single-core and thus pretty slow. You can see the configuration of NumPy and SciPy in your environment:

python3 -c "import numpy as np; print(np.version.version); np.show_config()"
python3 -c "import scipy as sp; print(sp.version.version); sp.show_config()"

When installing an updated version of NumPy/SciPy - wittingly or not wittingly but as a side effect of installing a package by python3 -m pip install --user --upgrade MODULE - you can advise to use Intel MKL, as described in NumPy/SciPy Application Note by creating a '.numpy-site.cfg' file in your $HOME directory. Note that pip may ignore this file by default and install a BLOB wheel, so force it by '--no-binary' flag:

cat $HOME/.numpy-site.cfg

Should give the output:

[mkl]
library_dirs = /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/imkl/2022.1.0/mkl/2022.1.0/lib/intel64
include_dirs = /cvmfs/software.hpc.rwth.de/Linux/RH8/x86_64/intel/skylake_avx512/software/imkl/2022.1.0/mkl/2022.1.0/include
lapack_libs = mkl_rt
mkl_libs = iomp5,mkl_rt

Depending on which MKL Version you need. The actual path for library_dirs and include_dirs are $MKLROOT/lib/intel64 and $MKLROOT/include. The value of $MKLROOT is visible after module load imkl/VERSION.

Then use:

python3 -m pip install --user --force-reinstall --ignore-installed --no-binary :all: numpy
python3 -m pip install --user --force-reinstall --ignore-installed --no-binary :all: scipy

After (re)installation of your NumPy and when interested you can try to benchmark NumPy/SciPy and compare your results to

 

Known issues

  • Do you see an error of type ImportError: No module named FooBar?
    • A python module named FooBar is missing. Solution in most cases would be to install it using python3 -m pip install --user FooBar - See the section on installing own packages
  • Do you see an error of type ImportError: /home/ab123456/.local/lib/python3.?/site-packages/............86_64-linux-gnu.so: undefined symbol: __intel_sse2_strlen?
    • A python module containing a library was built using Intel compilers and misses some libraries. Easiest solution: reinstall (everything) from scratch using GCC compilers:
    • module switch intel foss
    • (backup ~/.local)
    • mv ~/.local/lib/python3.? ~/.local/lib/__RM_ME__python3.?
    • python3 -m pip install --user --no-cache-dir someUsefulPythonSoftware
  • if any strange behaviour persists after a re-installation of software from scratch, use --no-cache-dir to disable the pip's cache (maybe containing malformed packeges).

zuletzt geändert am 13.09.2024

Wie hat Ihnen dieser Inhalt geholfen?

Creative Commons Lizenzvertrag
Dieses Werk ist lizenziert unter einer Creative Commons Namensnennung - Weitergabe unter gleichen Bedingungen 3.0 Deutschland Lizenz