Apocrita Research Data Storage: Notice of “at risk” period
Wednesday 19th and Thursday 20th June 2019.
We will be performing essential firmware updates and patching to the storage system on the Apocrita HPC Cluster. This is to finalise the upgrade work we have been caring out over recent months.
We do not expect this work to affect access and use of the cluster, and jobs will continue to run as normal. However, we will have reduced redundancy and some additional load while we carry out this work.
While the updates do not require downtime, there is a raised risk of an unforeseen incident occurring during the procedure, which will be performed in cooperation with the storage vendor.
Detail
When we deployed the new scratch last year, we upgraded the nodes to the latest version of Spectrum Scale 5. The main aim of this update is to bring some of these enhancements to our long term storage as well. This update will improve performance and stability.
These updates will enable us to carry out future updates on the cluster and continue to keep the cluster up to date and running at its best.
We will also be taking the opportunity to install firmware and kernel updates on our storage servers, to resolve some issues that have been reported to us by our storage vendor.
The work involves updating each of the 4 redundant storage servers in turn, allowing us to keep the storage available. It may however mean that the remaining three machines are under slightly higher load while we carry out this work.