The next era for Apocrita is here¶
For much of the year we have been working on a major project to upgrade Apocrita to a new operating system, (Rocky Linux 9, hereafter known as Rocky 9). As part of the project, we have deployed a new package building tool to help us recompile all of the research applications to work on the new system. We are now draining the cluster in batches to upgrade to Rocky 9, so now is a good opportunity to move across if you haven't already, to avoid longer queueing times as we reduce the number of remaining CentOS 7 nodes.
Frequently asked questions when migrating from CentOS 7
Please see the FAQs section for commonly asked questions relating to the migration from CentOS 7 to Rocky 9.
Notice
This blog post will receive further updates during the migration phase.
What is the new system?¶
The existing Apocrita cluster runs the CentOS 7 Operating System, which is based off Redhat Enterprise Linux (RHEL) 7. This has reached end-of-life and we are replacing it with Rocky 9, based off RHEL9. Apart from security and compliance reasons, this will provide the benefit of much newer libraries and applications.
The existing end-of-life platform is no longer able to support newer software releases - various packages are no longer working, as our community of R users have discovered recently. Our new platform offers newer versions of research applications, so this will be of particular interest to anyone wishing to get latest versions of software.
Additionally, due to recent decommissioning of older nodes, we have been able to compile the new applications against a more modern CPU architecture, providing better compatibility and performance for all applications.
We have currently provisioned a new login node, ondemand service and various
types of compute and GPU nodes available on Rocky 9. This includes our
brand new vhm
high memory servers which can accommodate jobs up to 3TB RAM.
How can I use the new system?¶
Although jobs can be submitted from the CentOS 7 frontend nodes, you will not
be able to view the list of newly-built applications from a CentOS 7 node. To
access the Rocky 9 login node, SSH into login-test.hpc.qmul.ac.uk
using your
existing Apocrita username, password and SSH key. To check what applications
are available, you can run module avail
from this server.
We have provided a new ondemand server, ondemand-dev.hpc.qmul.ac.uk which brings new functionality, and newer versions of applications (for example a newer RStudio).
We will soon be moving the default login and ondemand servers to Rocky 9.
We have a range of compute nodes running Rocky 9, which can be checked with
nodestatus -t rocky
from the login server.
To run on the new Rocky 9 nodes, you must:
- request
-l rocky
to ensure your job is allocated to a Rocky 9 node - ensure your job script is loading a module provided under Rocky 9
Who should use the new system?¶
All users of Apocrita should now be ensuring that their workloads run correctly on Rocky 9. We have provisioned serial, parallel and GPU nodes running Rocky 9. Any issues encountered should be reported via the slack channel, or via support ticket.
Changes for conda users¶
Due to some recent anaconda licence changes, we now only provide the
miniforge
module for using conda environments. This works in the same way as
anaconda
and miniconda
but does not use certain problematic anaconda
repositories. Miniforge is the officially supported installation method for the
faster Mamba libsolver and comes with the popular conda-forge channel
pre-configured as its only channel, where most of the common conda packages are
found.
We strongly advise not to use any existing conda environments on Rocky 9 that
you created previously on CentOS 7. Instead, you should start with fresh
environments. If you want to "migrate" old ones,
export them as a YML file
on a CentOS 7 node and then use that same YML file to
re-create the environment
again on a Rocky 9 node.
More detailed information can be found in this separate blog post.
Migration from CentOS 7 FAQs¶
Do I need to copy/move my data across from the CentOS 7 cluster?¶
No. All current storage systems (home directories, scratch and lab shares) are
available on Rocky 9 using the same path as the CentOS 7 cluster, for example
/data/home/$USER
for home directories and /data/scratch/$USER
for scratch.
You will not need to copy or move your data when migrating to Rocky 9.
How can I use applications on the Rocky 9 cluster?¶
The modules built on Rocky 9 are similar to those in CentOS 7, providing a
familiar environment for users transitioning between the two clusters. To view
a list of all available modules, run module avail
.
Note that we have only built newer versions of software on the Rocky 9 cluster, so you should check that the versions available are suitable for your research.
See the Using Modules page on our docs site for more information.
How can I resolve GLIBC_2.17' not found
errors in Rocky 9?¶
GLIBC is an immovable core library of the Operating System. Software requiring version 2.17 of GLIBC (CentOS 7) will need to be rebuilt using the newer version of GLIBC present in Rocky 9.
You may also need to remove any local files written in your home directory
before rebuilding the software. This includes, but is not limited to:
~/.cache
, ~/.jupyter
and~/.local
.
Do I need to kill my running CentOS 7 jobs?¶
No. Your currently running jobs can run to completion. Once they have
completed, we recommend only submitting Rocky 9 jobs using the extra
-l rocky
parameter.
Please raise a support ticket if you require any assistance with submitting Rocky 9 jobs.
Do I need to request a new HPC account on the Rocky 9 cluster?¶
No. Your existing HPC username and password will continue to work on the Rocky 9 cluster.
Do I need to generate a new SSH key for the Rocky 9 cluster?¶
No. Even though we now recommend using ED25519 keys, we are still supporting existing 4096+ RSA keys.
See our SSH keys page on our docs site for more information.
How can I fix a REMOTE HOST IDENTIFICATION HAS CHANGED!
login error?¶
This is shown because phase 2 of the cluster migration includes changing the
login.hpc.qmul.ac.uk
address to point to the Rocky 9 login servers, which
have a different host identification than the CentOS 7 frontend nodes.
To fix this, you will need to remove the cached entry for
login.hpc.qmul.ac.uk
in your local ~/.ssh/known_hosts
file by either
manually removing it using a text editor, or by running the
ssh-keygen -R login.hpc.qmul.ac.uk
command. This is a one-off operation
caused by the move from CentOS 7 to Rocky 9.
After removing the cached entry, simply log into the cluster again and accept the new host identification if prompted.
Photo by Chris Liverani on Unsplash.