File Systems Overview
Introduction
Each HPC user, along with their compute project, is allocated storage space on the HPC cluster to facilitate their research and computational tasks. The cluster offers various file systems, each with its unique advantages and limitations. This page provides an overview of these storage options.
Key Points:
- The appropriate file system for any workflow depends on the task at hand. Understanding the differences between the file systems described on this page helps with this evaluation. In addition, the decision tree below may also help in the decision process.
- Users are primarily responsible for backing up their data. The HPC cluster is not intended as a long-term data storage solution. It is crucial to regularly backup data to avoid any potential loss.
- Additional storage space for compute time projects can be requested by following the guidelines provided here.
- Instructions on how to check the available quota are detailed here.
The three permanent file systems are: $HOME
, $WORK
, and $HPCWORK
. To navigate to your personal storage space, simply use the cd command followed by the file systems’ name. For example, cd $WORK
will take you to your storage space on $WORK
. For a specific compute time project, use cd /home/<project-id>
to access a project's space in the $HOME
partition, cd /work/<project-id>
for the $WORK
partition, or cd /hpcwork/<project-id>
for the $HPCWORK
partition.
Overview
$HOME
Upon logging into the HPC Cluster, users are directed to their personal home-directory within $HOME
. As a Network File System (NFS) with an integrated backup solution, this file system is particularly well-suited for the storage of important results that are challenging to reproduce, as well as for the development of code projects. However, $HOME
is less suitable for running large-scale compute jobs due to its limited space of 150 GB. In addition, frequent and massive file transfers, creation, and deletion can put significant strain on the backup system.
$WORK
$WORK
shares some similarities with $HOME
, as it is also operated on an NFS. However, the key difference is that $WORK
has no backup solution. This absence of backups allows for a more generous storage quota of 250 GB, and there is greater flexibility in expanding storage for compute projects if needed.
$WORK
is particularly suitable for compute jobs that are not heavily dependent on I/O performance and that generate numerous small files.
$HPCWORK
The $HPCWORK
file system is based on Lustre, which allows for larger storage space and improved I/O-performance compared to $HOME
and $WORK
. Each user and compute project are granted a default quota of 1000 GB on this file system. In addition, the system can handle extremely large files and fast parallel access to them.
These benefits are possible, in part because the metadata of files is stored in metadata (MD) databases and handled by specialized servers. However, each file, regardless of its size, occupies a similar amount of space in the MD database. To maintain a managable amount of MD database entries for each user and compute poject, there is also a quota on the number of files on $HPCWORK
. The default file quota is set to 50,000 files.
Mind that $HPCWORK
also provides no backup solution.
$BEEOND
For compute jobs with high I/O performance demands, users can leverage the internal SSDs of the compute nodes. The BeeOND (BeeGFS on Demand) temporary file system enables users to utilize the SSD storage across all requested nodes as a single, parallel file system within a single namespace.
Key Considerations when using BeeOND:
- The amount of allocated storage depends on the type and number of requested nodes (for more information click here).
- Compute jobs that will become exclusive.
- Within the job, the file system path for BeeOND is accessible via the environment variable
$BEEOND
. - The storage space on the filesystem is strictly temporary! All files will be automatically deleted after the compute jobs concludes.
This example job script shows how to use BeeOND:
#!/usr/bin/zsh
### Request BeeOND
#SBATCH --beeond
### Specify other Slurm commands
### Copy Files to Beeond
cp -r $WORK/yourfiles $BEEOND
### Navigate to Beeond
cd $BEEOND/yourfiles
### Perform your job
echo "hello world" > result
### Afterwards copy results back to your partition
cp -r $BEEOND/yourfiles/result $WORK/yourfiles/
If you are unsure which file system to use for your compute jobs, the following decision tree may help:
Summary
In the following table you can see a summary of all and additional details of the available file systems discussed before.
File System | Type | Path | Persistence | Snapshots | Backup | Quota (space) | Quota (#files) | Use Cases |
---|---|---|---|---|---|---|---|---|
$HOME | NFS/CIFS | /home/<username> | permanent | $HOME_SNAPSHOT | yes | 150 GB | - | Source code, configuration files, important results |
$WORK | NFS/CIFS | /work/<username> | permanent | $WORK_SNAPSHOT | no | 250 GB | - | Many small working files |
$HPCWORK | Lustre | /hpcwork/<username> | permanent | - | no | 1000 GB | 50,000 | I/O intensive compute jobs large files |
$BEEOND | BeeOND | stored in $BEEOND | temporary | - | no | limited by the sum of sizes of local disks | - | IO intensive compute jobs, many small working files, any kind of scratch data |
Additional Information
Visibility of Filesystem
Each user directory is only mounted when a process actually accesses it. Therefore, you might not see a specific user directory in a listing of /home
, /work
or /hpcwork
(which can be confusing, especially if you are using a graphical file manager). This does not mean that a user directory does not exist, but that you might have to type the path to a user directory explicitly to actually get there. Any action that causes an access to your directory will mount it and thereby make it visible. This can be done by changing into the directory in a terminal or typing the full path of your directory in the address bar of your file manager.
Backup Exclusion
The following files and directories are currently excluded from the backup in $HOME:
- Complete sub-directories:
.NOBACKUP
~/.cache
~/.comsol/*/configuration
~/.Trash*
~/.local/share/Trash*
- File patterns:
core.*.rz.RWTH-Aachen.DE.[1-9]*.[1-9]*
core.*.hpc.itc.rwth-aachen.de.[1-9]*.[1-9]*
Snapshots
Snapshots reflect the state of a file system at previous points in time. By changing to a snapshot directory, ($HOME_SNAPSHOT
or $WORK_SNAPSHOT)
, you can access previous versions of your files. The files within the snapshots are read-only, they can not be altered or deleted. Please note that the snapshots creation policy is subject to change. If space gets short, we may decide to create less snapshots, delete existing snapshots or omit them completely. Snapshots are not an alternative to a backup. If a file system gets damaged, all snapshots are lost, too.
$TMP
In case you are using a sinlge node exclusively and would like to use the local SSD, BeeOND can cause some unneccessary overhead. In this case you can directly access the SSD via the path stored in $TMP
.
We do not reccomend using the local SSD storage on shared nodes, as there are no restrictions on how much space each user can occupy. It is therefore unpredictable if storage space on the local SSD is available.