Development on Login-Nodes
The CLAIX cluster offers specialized login-nodes for development. This setup allows you to develop and compile code without the need to queue jobs via SLURM. However, it’s important to remember that resources are shared based on a fair-use principle, and usage limits are enforced to prevent exploitative use. To monitor your processes, you can use the command top -u $USER
.
For tests of non-GPU applications, please submit a job to the devel partition.
All common compilers are available through the module system.
Development on login23-g-1
The node login23-g-1 is equipped with four H100 NVIDIA GPUs, designated for development and testing purposes.
The GPUs are configured to support a broad range of tests and accommodate as many users as possible simultaneously. Note that the GPU settings may change based on user feedback.
Gathering Information about GPUs
To gain insights into the operations of the GPU, you can use the command
nvidia-smi
This will show you information about the GPUs, as well as the load, occupancy and running processes. Most importantly, this command will also show the operational mode of the GPUs (see below).
Selecting specific GPUs
To select a GPU or MIG-instance, set the environment variable as follows:
export CUDA_VISIBLE_DEVICES=<GUID>
To select multiple GPUs provide a comma-separated list. The GUIDs can be retrieved using the command:
nvidia-smi -L
Recommendations
- Reserve GPU Memory: When using a non-exclusive GPU, it is advisable to reserve the amount of GPU memory you anticipate needing for your application. Otherwise, you may unexpectedly run out of memory and your application may crash.
- Multi-GPU-Tests: To test multi-GPU applications, select two GPUs that are not partitioned into MIG instances. It is not possible to use two MIG-instances or to combine a MIG-instance with a different GPU.