Hints for the selection of the compute node type on Claix
Hints for the selection of the compute node type on CLAIX
Hints for the selection of the compute node type on CLAIX (with SLURM)
For your free quota you can use the CLAIX-18 nodes.
For your project you can use the partitios your project is configured for. New projecs are typically configired for CLAIX-18 nodes, old/prolongued porjects for the nodes on which the project already run on. You also can ask for additional partitions if needed; do not forget to motivate your request. Test for which partitions your project is configreid for: SLURM Accounting
For all compute projects batch jobs are directed to their primary compute node type.
For most compute projects the primary compute node type is set to CLAIX-2018-MPI.
The characteristics of this node type are:
- 2 Intel Xeon Platinum 8160 Processors “SkyLake” (2.1 GHz, 24 cores each) and thus 48 cores per node,
- 192 GB main memory per node (~4 GB main memory per core)
CLAIX-2016 MPI, SMP and GPU
CLAIX-2016 is out of maintnance and will be taken out of production soon. Currently running projects will continue as scheduled. However, application for new projects on CLAIX-2016 is not possible anymore.
A group of CLAIX-2018-GPU nodes are configured like CLAIX-2018-MPI nodes plus they are equipped with 2 NVIDIA Tesla V100 GPUs each.
NVLINK is employed to link these 2 GPUs with each other. Each GPU provides 16 GB HBM2 memory.
There is a rather constellation due to a softare bug on which the older P100 GPU may be faster than V100 GPU, but typically the newer GPU would be faster for your code.
The CLAIX-2018-GPU nodes are open for free/test quota usage and for all projecs configured for CLAIX-18 cluster.
Submitting of batch jobs on GPU cluster: GPU batch mode
24 hrs versus 120 hrs max job runtime
It is desirable that compute jobs do not run "forever", mainly for the following reasons:
1. If there is some kind of system crash or a long running job terminates abruptly, a lot of compute cycles will be wasted. Therefore, using software able to write checkpoint files every few hours and restart from those checkpoints is a very good idea. Restarting the job from the most recent checkpoint instead from the beginning reduces the loss to a reasonable amount.
2. Every now and then, it is necessary that the system administrators schedule a downtime for maintenance and upgrades. Obviously, long running jobs are an obstacle.
3. Long running jobs disturb the good mixture of jobs from many users, a prerequisite for everyone to get a fair share of the system resources and make decent progress. They also conflict with the scheduling of large parallel jobs and lead to long waiting times and bad overall system usage.
Therefore, the maximum runtime for jobs has been set to 24 hrs like at many other HPC sites. Jobs that can run on up to four nodes may run up to 120 hrs (5 days).
Note: For applications which do not support sufficient check pointing, we might approve jobs running for more than 120 hours. Please explicitly explain why you have this requirement in your project application. In any case exceptional approvals are coupled to the following conditions:
- Jobs with more than 120 hours must not use more than 4 compute nodes.
- The IT Center reserves the right to kill the job after 120 hours for maintenance reasons.
The batch system is usually able to correctly determine the kind of compute node from the job resource requirements, within of allowed partitions. Thus for the GPU jobs the partition will be set automatically, see GPU batch mode
#SBATCH -p c16s
Overview of the available Hardware: Overview