Welcome to the QMUL HPC blog¶

October 14, 2024
in tutorial
7 min read

Beginner’s Guide to Regular Expressions with Examples

Regular expressions, or regex, are patterns used to match strings of text. They can be very useful for searching, validating, or manipulating text efficiently. This guide will introduce the basics of regex with easy-to-follow examples.

October 4, 2024
in rse
12 min read

A Short Guide to PyTorch DDP

In this blog post, we explore what torchrun and DistributedDataParallel are and how they can be used to speed up your neural network training by using multiple GPUs.

September 4, 2024
in news
3 min read

Upgraded highmem nodes on Apocrita

The Apocrita highmem nodes have just been upgraded so that they contain newer CPUs with more modern instruction sets.

July 12, 2024
in rse
7 min read

A Slight Case of Overthreading

We still encounter jobs on the HPC cluster that try to use all the cores on the node on which they're running, regardless of how many cores they requested, leading to node alarms. Sometimes, jobs try to use exactly twice or one-and-a-half the allocated cores, or even that number squared. This was a little perplexing at first. In your enthusiasm to parallelize your code, make sure someone else hasn't already done so.

March 12, 2024
in tutorial
5 min read

Using Git projects within RStudio

In this tutorial we'll be showing you how to create a new Git project within RStudio using either a new or existing GitHub repository

March 1, 2024
in tutorial
5 min read

Using R inside of Conda

Whilst most Apocrita users will want to use the R module or RStudio via OnDemand for R workflows, it is also possible to use R inside of Conda via Miniforge.

February 22, 2024
in rse
21 min read

Some Pleasingly Parallel GPU Case Studies in Machine Learning

In a previous blog, we discussed ways we could use multiprocessing and mpi4py together to use multiple nodes of GPUs. We will cover some machine learning principles and two examples of pleasingly parallel machine learning problems. Also known as embarrassingly parallel problems, I rather call them pleasingly because there isn't anything embarrassing when you design your problem to be run in parallel. When doing so, you could launch very similar functions to each GPU and collate their results when needed.

January 26, 2024
in article, featured
4 min read

A look at the Grace Hopper superchip

NVIDIA recently announced the GH200 Grace Hopper Superchip which is a combined CPU+GPU with high memory bandwidth, designed for AI workloads. These will also feature in the forthcoming Isambard AI National supercomputer. We were offered the chance to pick up a couple of these new servers for a very attractive launch price.

The CPU is a 72-core ARM-based Grace processor, which is connected to an H100 GPU via the NVIDIA chip-2-chip interconnect, which delivers 7x the bandwidth of PCIe Gen5, commonly found in our other GPU nodes. This effectively allows the GPU to seamlessly access the system memory. This datasheet contains further details.

Since this new chip offers a lot of potential for accelerating AI workloads, particularly for workloads requiring large amounts of GPU RAM or involving a lot of memory copying between the host and the GPU, we've been running a few tests to see how this compares with the alternatives.

January 26, 2024
in rse
12 min read

Some Strategies for Using Multiple Nodes of GPUs

Using multiple GPUs is one option to speed up your code. On Apocrita, we have V100, A100 and H100 GPUs available, with up to 4 GPUs per node. On other compute clusters, JADE2 has 8 V100 GPUs per node and Sulis has 3 A100 GPUs per node. If your problem is pleasingly parallel, you can distribute identical or similar tasks to each GPU on a node, or even on multiple nodes.

December 18, 2023
in news
2 min read

Modules Update December 2023

Since the last module update in December 2022, we have:

added/moved 61 modules to production
added 2 modules to the development environment
deprecated 3 modules
deleted 12 modules