news,

Cluster Hardware Upgrades and Additions

Simon Butcher Simon Butcher Follow Dec 04, 2018 · 2 mins read
Cluster Hardware Upgrades and Additions
Share this

As part of our commitment to regular upgrades to the HPC service, and to keep up with ever-growing demand, we are pleased to announce the addition of new hardware to the Apocrita HPC Cluster for the benefit of all QMUL Researchers.

The additions form part of series of exciting infrastructure improvements which are in the process of being deployed, namely:

1) New compute nodes: available now

  • 32 new “sdv” nodes for general use, running fast Intel skylake processors. Currently 16 sdv nodes are allocated to the standard smp queues, and 16 for the multi-node parallel queues (accessed using -l infiniband=sdv-i, see here). We expect to balance this ratio slightly after analysing demand.

  • An additional growing number of “restricted” nodes have also been funded by researchers who require guaranteed access to resources. This approach is worth considering when writing grant proposals, as it is a very efficient and scalable way of leveraging the existing investment in the HPC cluster.

2) Networking upgrades:

  • maximum single stream through the firewall is now 10Gb/s
  • added new 100Gb/s “top of cluster” core network switches
  • new nodes utilise 25Gb/s Ethernet connections and where applicable, EDR 100Gb/s infiniband
  • ensure minimum 10Gb/s connectivity from any node (work in progress)

3) New srm high memory nodes: available now

  • some researchers have occasional requirements to run jobs utilising large amounts of memory. Our existing “sm” nodes have been supplemented with 2 srm nodes with 768GB RAM for performing this type of work.

4) New GPU nodes

  • we will shortly be deploying 2 “sbg” nodes which will feature high performance Nvidia V100 GPU cards. They will add to our existing GPU capabilities, which have been popular for machine learning and molecular dynamics workloads.

5) Fast Scratch storage

  • we are pleased to announce that we will soon attach a ~100TB E8 flash storage array to the cluster. Testing has shown that i/o-constrained applications are performing much better on the this high performance storage.
  • we recently participated in the International IO-500 storage benchmark challenge, and ranked 8th on the 10-node challenge, and 19th on the overall list, beating the likes of Archer and Cern.
  • a further announcement will be sent when the fast scratch is ready for full production use.

6) Access to Tier 2 HPC facilities

  • A reminder that QMUL also have access to Tier 2 clusters for large parallel jobs, predominantly for EPSRC-funded research.

7) Citations

  • Please remember to cite Apocrita in your research, as it strengthens our business case for upgrades such as this.
Simon Butcher
Written by Simon Butcher Follow
Head of Research Applications. He likes open source software, maths and problem-solving.