Sie befinden sich im Service: RWTH Compute Cluster Linux (HPC)

Hints for the selection of the compute node type on Claix

Hints for the selection of the compute node type on Claix

Kurzinformation

Hints for the selection of the compute node type on CLAIX-2016 and CLAIX-2018


Detailinformation

Hints for the selection of the compute node type on CLAIX (with SLURM)

For your free quota you can use the CLAIX-18 nodes.

For your project you can use the partitios your project is configured for. New projecs are typically configired for CLAIX-18 nodes, old/prolongued porjects for the nodes on which the project already run on. You also can ask for additional partitions if needed; do not forget to motivate your request. Test for which partitions your project is configreid for: SLURM Accounting

CLAIX-2018-MPI

For all compute projects batch jobs are directed to their primary compute node type.
For most compute projects the primary compute node type is set to CLAIX-2018-MPI.

The characteristics of this node type are:

  • 2 Intel Xeon Platinum 8160 Processors “SkyLake” (2.1 GHz, 24 cores each) and thus 48 cores per node,
  • 192 GB main memory per node (~4 GB main memory  per core)

CLAIX-2016-MPI

When applying for a JARA computing project, access to CLAIX-2016-MPI can be requested (see also https://www.jara.org/de/656 )

For compute projects approved the primary compute node type is set to CLAIX-2016-MPI; in turn you cannot use CLAIX-2018-MPI nodes with this projects.

The characteristics of this node type are:

  • 2 Intel E5-2650 v4 processors“Broadwell” (2.2 GHz, 12 cores each) and thus 24 cores per node,
  • 128 GB main memory per node ( ~5 GB main memory per core)

Claix-2016-SMP

The main advantage of a “fat” SMP compute node is the ability to provide a lot of memory to many cores accessing that shared memory coherently.  As it turns out the demand for such applications is rather low. Thus by default no projecs are configured to be able to use these kind of hardware; please request explicitely for this kind of nodes in the project description for a new application or open a ticket via mail on servicedesk@itc.rwth-aachen.de

MPI programs typically do not really profit from these rather expensive machines. That is why Claix consists of a large number of "small" Claix-2016-MPI and CLAIX-2018-MPI nodes (24 resp. 48 cores, 128 resp. 192 GB memory) and only 8 fat CLAIX-2016-SMP nodes (144 cores, 1 TB memory). For jobs, which need a lot of memory per core, Claix-SMP nodes (7.1 GB/core) have a slight advantage compared to Claix-2016-MPI and CLAIX-2018-MPI nodes (5.3 resp. 4 GB/core).

The characteristics of this node type are:

  • 8 Intel E7-8860 v4processors “Broadwell“  (2.2 GHz, 18 cores each) and thus 144 cores per node,
  • 1 TB main memory per node (~7 GB main memory per core)

CLAIX-2016-GPU and CLAIX-2018-GPU

A few CLAIX-2016-GPU nodes are configured like CLAIX-2016-MPI nodes plus they are equipped with 2 NVIDIA Pascal P100 GPUs each.

Also, a group of CLAIX-2018-GPU nodes are configured like CLAIX-2018-MPI nodes plus they are equipped with 2 NVIDIA Tesla V100 GPUs each.
NVLINK is employed to link these 2 GPUs with each other. Each GPU provides 16 GB HBM2 memory.

There is a rather constellation due to a softare bug on which the older P100 GPU may be faster than V100 GPU, but typically the newer GPU would be faster for your code.

The CLAIX-2018-GPU nodes are open for free/test quota usage and for all projecs configured for CLAIX-18 cluster. The CLAIX-2016-GPU nodes available for projects configured for CLAIX-16 cluster only. Plase request explicitely in the project description for a new application or open a ticket via mail on servicedesk@itc.rwth-aachen.de if you desire for using both kinds of GPU nodes with the same project.

Submitting of batch jobs on GPU cluster: GPU batch mode

24 hrs versus 120 hrs max job runtime

It is desirable that compute jobs do not run "forever", mainly for the following reasons:

1. If there is some kind of system crash or a long running job terminates abruptly, a lot of compute cycles will be wasted. Therefore, using software able to write checkpoint files every few hours and restart from those checkpoints is a very good idea. Restarting the job from the most recent checkpoint instead from the beginning reduces the loss to a reasonable amount.

2. Every now and then, it is necessary that the system administrators schedule a downtime for maintenance and upgrades. Obviously, long running jobs are an obstacle.

3. Long running jobs disturb the good mixture of jobs from many users, a prerequisite for everyone to get a fair share of the system resources and make decent progress. They also conflict with the scheduling of large parallel jobs and lead to long waiting times and bad overall system usage.

Therefore, the maximum runtime for jobs has been set to 24 hrs like at many other HPC sites. Jobs that can run on up to four nodes may run up to 120 hrs (5 days).

Note: For applications which do not support sufficient check pointing, we might approve jobs running for more than 120 hours. Please explicitly explain why you have this requirement in your project application. In any case exceptional approvals are coupled to the following conditions: 

  1. Jobs with more than 120 hours must not use more than 4 compute nodes.
  2. The IT Center reserves the right to kill the job after 120 hours for maintenance reasons.

Job submission

The batch system is usually able to correctly determine the kind of compute node from the job resource requirements, within of allowed partitions. Thus for the GPU jobs the partition will be set automatically, see GPU batch mode

Within of your set of allowed partitions (see SLURM Accounting and Hardware of the RWTH Compute Cluster) you can manually set the partition by

#SBATCH -p c16s

Zusatzinformation

Overview of the available Hardware: Overview

zuletzt geändert am 29.01.2021

Wie hat Ihnen dieser Inhalt geholfen?