TensorFlow
TensorFlow™ is an open source software library for numerical computation using data flow graphs. It is being developed by the Google Brain Team and is often used for a wide range of applications, most prominently machine learning.
Table of Contents
- Install
- Using TensorFlow (Interactive Systems)
- Using TensorFlow (Batch Mode)
- Example batch scripts
- Tutorials and examples
Install
TensorFlow is a very rapidly developing software. Their dependencies are often bleeding-edge; any update could lead to issues with installation and/or usability. Do not forget to make a backup of your installation prior updating in order to have a save harbour where you could return back!
TensorFlow has requirements on 'compute capability' of GPGPUs which may be subject of change with any update. GPUs of older design could stop working with new versions of TensorFlow. Older GPUs are in most cases also slower.
Our cluster has about 5% of all nodes equipped with GPUs. A modern, multi-core CPU is also a very powerful computing device. Starting your computations on CPUs is a good choice as it helps to drop the GPU installation and usage complexity.
Prior installing a GPGPU enabled version of TensorFlow, you must decide on a matching CUDA and CuDNN compatible version.
Use Apptainer container (with and without GPU, recommended)
NVIDIA offers a set of prebuilt containers with AI software being deployed. These containers are a reasonable source of a working installation of TensorFlow and offer a low-effort access to this software: no installation work for you at all! See our documentation on Apptainer.
Pre-Packaged by pip
For other Python and Tensorflow versions refer to the official website: Install and their overview of tested configurations. Especially in regards to compatible GCC, cuDNN and CUDA versions.
Example for TensorFlow version 2.11.0 with Python 3.9.6, cuDNN 8.1.1.33 and CUDA 11.2.1 (cuDNN loads CUDA automatically, check cuDNN module version).
module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1
python3 -m pip install --user tensorflow==2.11.0
python3 -c "import tensorflow as tf; print(tf.__version__)"
python3 -c "import tensorflow as tf; print(tf.reduce_sum(tf.random.normal([1000, 1000])))"
python3 -c "import tensorflow as tf; print(\"Num GPUs Available: \", len(tf.config.list_physical_devices('GPU')))"
The output will depend on if you are running this on a login node with GPUs.
Known issue:
Sometimes, TensorFlow on a GPU node fails to get the GPUs with messages like:
>>> print(tf.__version__)
2.1.0
>>> tf.test.gpu_device_name()
2020-06-04 14:36:14.070893: I tensorflow/core/platform/cpu_feature_guard.cc:142] Your CPU supports instructions that this TensorFlow binary was not compiled to use: AVX2 AVX512F FMA
2020-06-04 14:36:14.079830: I tensorflow/core/platform/profile_utils/cpu_utils.cc:94] CPU Frequency: 2100000000 Hz
2020-06-04 14:36:14.081897: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x39b4c10 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-06-04 14:36:14.081919: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-06-04 14:36:14.084737: I tensorflow/stream_executor/platform/default/dso_loader.cc:44] Successfully opened dynamic library libcuda.so.1
2020-06-04 14:36:14.104121: W tensorflow/compiler/xla/service/platform_util.cc:276] unable to create StreamExecutor for CUDA:0: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE: invalid device ordinal
2020-06-04 14:36:14.180907: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x3a813e0 initialized for platform CUDA (this does not guarantee that XLA will be used). Devices:
2020-06-04 14:36:14.180930: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Tesla V100-SXM2-16GB, Compute Capability 7.0
2020-06-04 14:36:14.183748: I tensorflow/compiler/jit/xla_gpu_device.cc:136] Ignoring visible XLA_GPU_JIT device. Device number is 0, reason: Invalid argument: device CUDA:0 not supported by XLA service
2020-06-04 14:36:14.184112: F tensorflow/stream_executor/lib/statusor.cc:34] Attempting to fetch value instead of handling error Internal: failed initializing StreamExecutor for CUDA device ordinal 0: Internal: failed call to cuDevicePrimaryCtxRetain: CUDA_ERROR_INVALID_DEVICE: invalid device ordinal
zsh: abort (core dumped) python3
The root of cause is the unavailability of (at least some) GPU(s), locked by some other users. Check it using nvidia-smi
:
pk224850@login18-g-1:~[501]$ nvidia-smi
Thu Jun 4 14:36:34 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | 0 |
| N/A 38C P0 62W / 300W | 1074MiB / 16160MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | 0 |
| N/A 35C P0 38W / 300W | 11MiB / 16160MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
| 0 41713 C ...v/versions/3.7.6/envs/pc_aug/bin/python 1063MiB |
+-----------------------------------------------------------------------------+
Here you can see a process running on a GPU, which is obviously locked due to fact that the GPUs are in exclusive use on the front end nodes.
Less convenient are states like this:
pk224850@login18-g-2:~[513]$ nvidia-smi
Thu Jun 4 14:37:23 2020
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 440.44 Driver Version: 440.44 CUDA Version: 10.2 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
|===============================+======================+======================|
| 0 Tesla V100-SXM2... Off | 00000000:61:00.0 Off | 0 |
| N/A 38C P0 52W / 300W | 318MiB / 16160MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
| 1 Tesla V100-SXM2... Off | 00000000:62:00.0 Off | 0 |
| N/A 37C P0 53W / 300W | 318MiB / 16160MiB | 0% E. Process |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: GPU Memory |
| GPU PID Type Process name Usage |
|=============================================================================|
+-----------------------------------------------------------------------------+
Here you can see no process is locking GPUs, but note the non-zero Memory-Usage. Highly likely something is fishy on that node. Try again in a half hour; if the error persists ask the support to reboot this node.
Important Note:
TensorFlow with GPU support takes over the memory management of all (visible) devices attached to the host, even if no GPU is actually used. Because the GPUs in the interactive front ends are process-exclusive, other users cannot use the idling GPUs. Please refer to Using TensorFlow for advice on how to fix this by making unused GPUs invisible to the program. On interactive front ends, exit Python session immediately after finish of your test (otherwise python and tensorflow would stay to lock the GPUs). If there are no free GPUs available on a front end node (e.g. due to test of any other user) trying to load GPU-capable version of TensorFlow will end in an error and SIGSEGV and core dump of Python (see 'known issue' above). Test the availability of GPUs using nvidia-smi
command prior to trying a GPU-capable TensorFlow. Or explicitly run on non GPU nodes.
Build from Sources (for brave hearts)
You can also build TensorFlow directly from source. This allows you to specify the CPU/GPU-features you want to use, which might allow better optimization. This installation method is very tedious and difficult, so only use it if you know what you are doing. See here for more information.
Using TensorFlow (Interactive Systems)
Interactive usage of our GPUs on login nodes should be limited to brief testing.
To execute your TensorFlow program in a terminal, first enter the containing folder and then start your program with Python. For example, if your TensorFlow program is called program.py and you saved it in a subfolder named subfolder
in your $HOME
directory, you can use the following commands:
module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1
cd "$HOME/subfolder/"
python3 program.py
Please Note:
The modules to load depend on which you had loaded for the installation of TensorFlow.
Using CPU
In case your TensorFlow version supports GPU acceleration (meaning you installed tensorflow instead of tensorflow-cpu) and you are on a node with GPUs, then make sure to make all system GPUs invisible to TensorFlow by executing this in your shell:
export CUDA_VISIBLE_DEVICES=""
Otherwise, TensorFlow will block the GPUs for all users, even if your program does not use them. The easiest solution would be to use login nodes without a GPU.
Using one of two GPUs
To use a GPU, you need to have the correct version of TensorFlow installed and you need to log into an interactive host with GPUs. For example the login node login18-g-1
. If you only want to use one of two GPUs, you need to make the other one invisible, otherwise it will not be available to other users. Depending on which GPUs are free at the time, use one of the following:
export CUDA_VISIBLE_DEVICES=0
# OR
export CUDA_VISIBLE_DEVICES=1
Please note:
For more detailed information about how to use one or more GPUs with Tensorflow, see here https://www.tensorflow.org/guide/distributed_training
Using TensorFlow (Batch Mode)
In your batch script, before executing your TensorFlow application, you have to load Python, and cuDNN which automatically loads CUDA:
module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1
(This is not needed by an Apptainer containerized version - there you only need to load the container.)
After loading all necessary modules, your script should execute your application. If your Python file is called program.py
and resides in a subfolder called subfolder
of your $HOME
directory, you can execute it like this:
cd "$HOME/subfolder/"
python3 program.py
See the next section about batch scripts for complete, ready-to-use examples of batch scripts.
Remember that in case you need only one GPU, your job must be not exclusive and ask for less or equal to 24 cores.
Example batch scripts
CPU-only Tensorflow Job
#!/usr/bin/zsh
### Job name
#SBATCH --job-name=MyTensorFlowJob
### Output path for stdout and stderr
### %J is the job ID
#SBATCH --output output_%J.txt
### Request the time you need for execution. The full format is D-HH:MM:SS
### You must at least specify minutes OR days and hours and may add or
### leave out any other parameters
#SBATCH --time=80
### Request the number of parallel threads for your job
#SBATCH --ntasks=24
### if needed: switch to your working directory (where you saved your program)
#cd $HOME/subfolder/
### Load modules
module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1
### Execute your application
python3 program.py
TensorFlow Job for one (or more) GPUs
#!/usr/bin/zsh
### Job name
#SBATCH --job-name=MyTensorFlowJob
### Output path for stdout and stderr
### %J is the job ID, %I is the array ID
#SBATCH --output=output_%J.txt
### Request the time you need for execution. The full format is D-HH:MM:SS
### You must at least specify minutes OR days and hours and may add or
### leave out any other parameters
#SBATCH --time=80
### Request a host with a Volta GPU
### If you need two GPUs, change the number accordingly
#SBATCH --gres=gpu:volta:1
### if needed: switch to your working directory (where you saved your program)
#cd $HOME/subfolder/
### Load modules
module load GCCcore/.9.3.0
module load Python/3.9.6
module load cuDNN/8.1.1.33-CUDA-11.2.1
### Execute your application
python3 program.py
Tutorials and examples
See here for TensorFlow tutorials provided by Google.