Skip to content

2018

Performance testing with NVMe storage and Spectrum Scale 5

We have recently procured 120TB of NVMe based SSD storage from E8 Storage for the Apocrita HPC Cluster. The plan is to deploy this to replace our oldest and slowest provision of scratch storage. We have been performing extensive testing on this new storage as we expect it to offer new possibilities and advantages within the cluster.

What is the ITSR RSE team?

ITS Research has a Research Software Engineering team. This post introduces the team and how it supports research in Queen Mary University of London. You can also see how to contact the team and why you may want to.

Cluster Hardware Upgrades and Additions

As part of our commitment to regular upgrades to the HPC service, and to keep up with ever-growing demand, we are pleased to announce the addition of new hardware to the Apocrita HPC Cluster for the benefit of all QMUL Researchers.

Short queue

In addition to the primary queue, there is a queue designed to minimise waiting times for short jobs and interactive sessions, in response to users who requested the ability to quickly obtain qlogin sessions for quick tests and debugging. This short queue runs on a wider selection of nodes and is automatically selected if your runtime request is 1 hour or less.

Deprecated modules

We removed some problematic module files. Please check your job scripts for use of these modules:

  • Python: Due to a number of issues with the module installs of python, older versions below 2.7.14 and 3.6.3 are being removed from Apocrita (python/2.7.13, python/2.7.13-1, python/2.7.13-3, python/3.6.1, python/3.6.2, python/3.6.2-2).
  • Java: version java/1.8.0_121-oracle causes problems with mass thread spawning on the cluster and will be removed. java/1.8.0_152-oracle will remain the default version loaded.

Home and Group Directories

During the summer, home directories were migrated to the new storage platform. This means that quotas have grown slightly as the underlying block size has increased.

The qmquota command will tell you how much space you are using, and that quotas are applied on size as well as the number of files. Each Research group gets a free 1Tb of storage space on the cluster; if your group has not got one then please contact us and we can organise it for your group.

Tier 2 Announcement

QMUL have access to powerful Tier 2 (formerly known as Regional) HPC resources, predominantly for EPSRC researchers. If you run multi-node parallel code (using MPI, for example), you will likely benefit from using the Tier2 clusters.

Deprecated openmpi 2.0.2-gcc module

We identified a problem with the openmpi/2.0.2-gcc module and have removed it as the correct interface was not being used for the MPI communication between nodes. This resulted in potentially much slower communication and consequently jobs taking longer to run.

Programs should be rebuilt against the other available openmpi modules which correctly select the Infiniband interconnect as default for communication. Recent users of this module have been contacted directly.