A guide to using Apocrita's scratch storage¶

The Apocrita scratch storage is a high performance storage system designed for short-term file storage, such as working data. We recently replaced the hardware that provides this service, and expanded the capacity from 250TB to around 450TB. This article will look at the recent changes, and suggest some best practices when using the scratch system.

Keen readers may recall we used to have an NVME-based scratch storage - marginally faster than the SAS-based SSD storage we are currently using. We had to replace the previous system because the vendor was acquired by another company, and the parent company dropped the necessary hardware & software support required to integrate it with our HPC system. The new system has a larger capacity, with the benefit of scalability.

One benefit of the new system is that we are no longer dependent on proprietary kernel modules (provided by the vendor for the old system) - improving security, administration and service availability.

Our performance testing suggests the new system is between 2 and 5 times faster than our long term storage (which is based on high capacity hard drives) and similar (within 10%) performance of the previous scratch system, resulting in a very good trade-off in price per TB and performance.

As a result of the increased capacity, we have increased the personal scratch quotas from 1TB to 3TB. To provide a quota of 3TB to every user, we over-provision the storage based upon a short-term usage model - files are automatically deleted 35 days after they are last read, or written to.

Why do we automatically delete files? Scratch is designed as temporary space for working data, and is not backed up, nor designed as a final destination for your data. Experience shows that if we don’t automatically purge old data, it eventually fills up and individual quotas would have to be reduced significantly. We experienced the accumulation of stale temporary data on our original scratch system some years ago. When spending significantly more money on a high-performance flash-based storage we wanted to ensure optimal use, so we implemented the automatic purging of old data after consulting various QMUL research user groups.

Usage policy¶

We do not permit using touch to artificially modify the time stamps of data in scratch. On rare occasions, there may be a justifiable reason for doing this, we will send out emails and reduce quotas if we find people abusing this fair-use policy.

This policy exists for the following reasons:

Scratch is NOT backed up and we feel it is important that files of a non-temporary nature should be backed up.
Scratch is more expensive than the main storage, it is a premium resource which we are able to provide for free, on the understanding that the usage policy is adhered to.
Scratch is over-provisioned, based upon modelling in the in and outflow of data from the system.
Anyone using scratch as long-term storage is causing a risk that the system will fill up and cause many jobs to fail besides their own.
Touching files en-masse puts unnecessary stress on the system which can slow other people's jobs and reduce the life span of the storage.

To maximise available space, we would like to encourage users to tidy up files once jobs are finished and not wait for the automated system to run. This could be adopted as part of your job workflow if you are already copying processed data to long-term storage, for example. If 150 people simultaneously use 3TB each then our system will fill up. By performing a manual tidy-up we can ensure the scratch is available for everyone when they need it, even when the cluster is at its busiest.

Good practice¶

The best use of scratch is to:

Copy or create the working directory in scratch, containing the data files to be worked on.
Run the job(s).
Copy the data you want to keep, to your group fileset (consider compressing it at this point).
Delete the files from scratch.

It is possible to make use of $TMPDIR, which is local temporary space on each node. While this has the benefit of automatically deleting at the end of the job, this is only really useful for smaller sets of data, due to the lower capacity of local storage.

For easy access to your scratch space from your home directory, you may wish to create a symbolic link to the scratch area. For example, entering ln -s /data/scratch/$USER $HOME/scratch will create a link named scratch in your home directory, which may make copying of files more simple, particularly when using a graphical utility such as filezilla. Adding a link in this way does not affect your home directory quota.

The qmquota command shows your scratch usage in addition to other filesets you have access to. Note that quotas are applied to size and also number of files (inodes). Exceeding either the size or inode limit will mean no further files can be written to scratch, until you are within limits.

Automatic file deletion¶

Files are deleted based on timestamps. Each file has 3 timestamps:

Atime : time of last access (only updated once a day due to performance reasons).
Ctime : time the file was last changed or written to.
Mtime : time the file was last modified or had its permissions/ownership changed.

We use the most recent of these timestamps to check whether the file is eligible for removal. The time shown by ls -l does not always give a correct idea of the file's expiry time. Using ls -lu will give the atime instead.

The deletion script is run at 7am daily:

Files over 31 days old will be listed in a file and emailed to you. This list does not distinguish between files due for deletion tomorrow or within the next 4 days, so please act quickly if you find a file you need in this email.
Files over 35 days (5 weeks) are deleted and no further emails are sent relating to these files.

As a last resort, the deleted files remain for 3 days in snapshots. This gives you a full week from first email to stop the auto-deletion from being finally executed.

Snapshots

Snapshots are normally taken at 23:00 every night of both scratch and our main Research Data Store. These are kept for 7 days on main, but only 3 days on scratch. You may retrieve files from any day within that time period, so long as the file existed at the time the snapshot occurred. The snapshots are stored in the hidden .snapshots directory anywhere on the filesystem. We also have tape-based backup of non-scratch files, and these go back for the last 3 months.

If you have any questions regarding how scratch operates or need some assistance in more effective usage, please contact us.

Title image by Karolina Grabowska on Pexels, available under the Creative Commons CC0 licence.