You are located in service: RWTH Compute Cluster Linux (HPC)

Jupyter Notebook

Jupyter Notebook

Kurzinformation

Note: for general Python remarks see this page, for TensorFlow as a representative of Python-based AI/DL codes see this page.

Jupyter Notebook is a web-based application for capturing the whole computation process: developing, documenting, and executing code, as well as communicating the results.

A Jupyter notebook combines two components:

  • A web application: a browser-based tool for interactive authoring of documents which combine explanatory text, mathematics, computations and their rich media output.
  • Notebook documents: a representation of all content visible in the web application, including inputs and outputs of the computations, explanatory text, mathematics, images, and rich media representations of objects.

Detailinformation

1. Installation

The easiest way to install Juputer Notebook software in your $HOME in order to use it on the computers in the HPC Cluster is via 'pip3'.

Jupyter Template

# Load the latest python
$ module switch intel gcc
$ module load python/3.8.7

# Install jupyter as local user
$ pip3 install --user  jupyter

# Do some tests!
$ python3 -c "import jupyter; print ('Juputer    ', jupyter.__version__)"
$ pip3 -vvv list  | grep jupy

example output:

pk224850@login18-g-2:~[516]$ python3 -c "import jupyter; print ('Juputer    ', jupyter.__version__)"
Juputer     1.0.0
pk224850@login18-g-2:~[517]$ pip3 -vvv list  | grep jupy                                             
jupyter                1.0.0     /rwthfs/rz/cluster/home/pk224850/.local/lib/python3.8/site-packages       pip
jupyter-client         6.1.11    /rwthfs/rz/cluster/home/pk224850/.local/lib/python3.8/site-packages       pip
jupyter-console        6.2.0     /rwthfs/rz/cluster/home/pk224850/.local/lib/python3.8/site-packages       pip
jupyter-core           4.7.1     /rwthfs/rz/cluster/home/pk224850/.local/lib/python3.8/site-packages       pip
jupyterlab-pygments    0.1.2     /rwthfs/rz/cluster/home/pk224850/.local/lib/python3.8/site-packages       pip
jupyterlab-widgets     1.0.0     /rwthfs/rz/cluster/home/pk224850/.local/lib/python3.8/site-packages       pip

Depending on how old your Python (and installed Python software) is (are) you could need one or more of the flags "--upgrade --force-reinstall --no-cache-dir". Remember that they could break your existing software chains - do not start without a back up!

2. Starting the Notebook Server in a batch job

Random Ports
enlightenedThe example job scripts on this page use random ports. Please monitor your job logs and restart the job or change the random port range in case of a port collision.

Here is a template for submitting a jupyter-notebook server as a slurm batch job. You may need to edit some of the slurm options, including the time limit. Save your edited version of this script on the cluster, and submit it with sbatch jobscript.sh

Jupyter Template

#!/usr/local_rwth/bin/zsh

#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=8G
#SBATCH --time=1-0:00:00
#SBATCH --job-name=jupyter-notebook
#SBATCH --output=jupyter-notebook-%J.log

# Load the same python as used for installation
module load python/3.8.7

# get tunneling info
XDG_RUNTIME_DIR=""
port=$(shuf -i8000-9999 -n1)
node=$(hostname -s)
PYTHON_USER_BASE=$(python3.6 -m site --user-base)

# start Juputer Notebook. Remember: it will be active during the job run time only!
$PYTHON_USER_BASE/bin/jupyter-notebook --no-browser --port=${port} --ip=${node}

 

3. Connecting to the Notebook Server

Once your submitted batch job starts, your notebook server will be ready for you to connect. You can run squeue -u$(whoami) to check the status of (all of) your job(s). You will see an "R" in the ST or status column for your notebook job if it is running. If you see a "PD" in the status column, you will have to wait for your job to start running to connect. The log file with information about how to connect will be in the directory you submitted the script from, and be named jupyter-notebook-[jobid].log where jobid is the slurm id for your job.

 

4. Browsing the Notebook

Finally, open a web browser on login node and enter the address of your jupyter notebook, from the last lines of the log file. This link will look something like

http://ncm0000:9230/?token=b7fb26d6291d970498a5a4d5b2c77cc40ba41acee5f22376

After opening this link you will land on a wep page showiing the content of the directory Jupyter job started in, and able to open a new Notebook or Terminal.

 example screenshots

Jupyter Notebook Files

Jupyter Notebook Binder

Jupyter Notebook terminal

5. Starting the Notebook Server on interactive front ends

In above example the Notebook Server run within of a batch job. This is advantageus if you need more-that-trivial amount on compute ressources (e.g. access to GPUs, or many cores to compute something in parallel). If You need neglible amount of ressources - this is the case all the time you edit your program, think and contemplate, or drink some coffee, in short: all the time your notebook does not run - a batch job is overkill or even waste of ressources (in case of manycore batch jobs). Whenever you are using Notebook for interactive development not using much of ressources you are welcome to start your jupyter Notebook direct on interactive nodes (no wait time for scheduling of the batch job).For example you can just save the above example script in a file and execute it (in a 'screen' session?).

However remember that the general rules for ressource usage on interactive nodes apply: do not run long-term, and/or manycore runs. Do not start Jupyter Notebook on GPU front ends, as any import of GPU-capable AI/DL frameworks like TensorFlow will likely lead to locking of the GPUs - and the kill signal to the notebook after a quite short time.

 

6. Starting the Notebook Server in a batch job with SSH tunnel

This job template will start a batch job and create a ssh tunnel from the submitting host to the compute node. You can work on your local host using a local browser. You may need to edit some of the slurm options, including the time limit. Save your edited version of this script on the cluster, and submit it with sbatch jobscript.sh

#!/usr/local_rwth/bin/zsh

#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --mem-per-cpu=8G
#SBATCH --time=1-0:00:00
#SBATCH --job-name=jupyter-notebook
#SBATCH --output=jupyter-notebook-%J.log

 

echo "------------------------------------------------------------"
echo "SLURM JOB ID: $SLURM_JOBID"
echo "Running on nodes: $SLURM_NODELIST"
echo "------------------------------------------------------------"

# Load the same python as used for installation
module load python/3.8.7

# set a random port for the notebook, in case multiple notebooks are
# on the same compute node.

NOTEBOOKPORT=`shuf -i 8000-8500 -n 1`

# set a random port for tunneling, in case multiple connections are happening
# on the same login node.

TUNNELPORT=`shuf -i 8501-9000 -n 1`

# set a random access token
TOKEN=`cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 49 | head -n 1`

echo "On your local machine, run:"
echo ""
echo "ssh -L8888:localhost:$TUNNELPORT $SLURM_SUBMIT_HOST -N -4"
echo ""
echo "and point your browser to http://localhost:8888/?token=$TOKEN"
echo "Change '8888' to some other value if this port is already in use on your PC,"
echo "for example, you have more than one remote notebook running."
echo "To stop this notebook, run 'scancel $SLURM_JOB_ID'"

# Set up a reverse SSH tunnel from the compute node back to the submitting host (login01 or login02)
# This is the machine we will connect to with SSH forward tunneling from our client.

ssh -R$TUNNELPORT\:localhost:$NOTEBOOKPORT $SLURM_SUBMIT_HOST -N -f

# Start the notebook
srun -n1 $(python3 -m site --user-base)/bin/jupyter-notebook --no-browser --port=$NOTEBOOKPORT --NotebookApp.token=$TOKEN --log-level WARN
# To stop the notebook, use 'scancel'

 

7. General information

Jupyter Working Directory

By default Jupyter opens a view to the submit directory of the job.
To change this behaviour the jupyter command in the last line of the job script can be altered like this:

jupyter-notebook --no-browser --port=${port} --ip=${node} --notebook-dir /path/to/directory

last changed on 19.02.2021

How did this content help you?