Welcome to the QMUL HPC blog¶

June 2, 2025
in rse
19 min read

Python GPU Programming with Numba and CuPy

In a previous blog, we looked at using Numba to speed up Python code by using a just-in-time (JIT) compiler and multiple cores. The speed-up is remarkable with small changes to the existing code.

In this blog post, we will continue exploring the Numba ecosystem and implement the Gauss map on the GPU, gaining further speed up while still writing Python code. We will also look at CuPy which is another way to write and run GPU code. Instead of being Pythonic, it allows hybrid programming where the GPU code is written in CUDA but executed in Python.

One key advantage of hybrid programming is writing software in your preferred language while optimising performance-critical sections in a faster language - the best of both worlds!

May 6, 2025
in news
2 min read

Rocky 9 benefits

The majority of the cluster has now been upgraded to Rocky 9 and the remaining CentOS 7 nodes will be updated in due course. There may be some users that are still hesitant to move over, but there are a few reasons why you should.

March 21, 2025
in rse
3 min read

R on Rocky 9

With the major operating system upgrade from Centos 7 to Rocky 9, we want to ensure that using R, RStudio, and Open OnDemand (OOD) is as seamless as possible. This post will include new tips for a better experience, as well as a reiteration of the important or frequently forgotten old tips.

March 21, 2025
in tutorial
5 min read

Using Python inside of R with reticulate

A previous blog post covered Using R inside of Conda, but what about if you want to use Python packages inside an existing R or RStudio via OnDemand session? This is where the reticulate R Interface to Python comes in.

March 12, 2025
in tutorial
7 min read

Using uv for Python package management

Traditionally we have recommend that users use a Python virtualenv or Conda environment to manage personal package installations via pip install and mamba install commands. But a new contender has entered the fray: uv, an extremely fast Python package and project manager, written in Rust.

February 7, 2025
in tutorial
8 min read

Moving from CentOS 7 to Rocky 9 for Python users

The Apocrita cluster was upgraded from CentOS 7 to Rocky 9 recently. There are some important things Python users need to know, such as how to migrate your existing environments to Rocky 9, as well as how to tackle some common problems during the process.

January 22, 2025
in news
4 min read

The next era for Apocrita is here

For much of the year we have been working on a major project to upgrade Apocrita to a new operating system, (Rocky Linux 9, hereafter known as Rocky 9). As part of the project, we have deployed a new package building tool to help us recompile all of the research applications to work on the new system.

The majority of the cluster has now been upgraded to Rocky 9. The remaining CentOS 7 nodes will be updated in due course.

January 17, 2025
in rse
8 min read

A PyTorch DDP Case Study With ImageNet

In this blog post, we will play about with neural networks, on a dataset called ImageNet, to give some intuition on how these neural networks work. We will train them on Apocrita with DistributedDataParallel and show benchmarks to give you a guide on how many GPUs to use. This is a follow on from a previous blog post where we explained how to use DistributedDataParallel to speed up your neural network training with multiple GPUs.

December 23, 2024
in news
4 min read

High Performance Computing (HPC) events from late 2024

2024 has been productive year in the outreach and education of HPC to different schools at Queen Mary University of London. We have formed alliances with different managers and PIs from various schools within the University who understand the value that HPC can add to their scientific research. We are pleased to share our latest event in 2024:

December 3, 2024
in rse
10 min read

Unification of Memory on the Grace Hopper Nodes

The delivery of new GPUs for research is continuing, most notable is the new Isambard-AI cluster at Bristol. As new cutting-edge GPUs are released, software engineers are tasked with being made aware of the new architectures and features these new GPUs offer.

The new Grace-Hopper GH200 nodes, as announced in a previous blog post, consist of a 72-core NVIDIA Grace CPU and an H100 Tensor Core GPU. One of the key innovations is the NVIDIA NVLink Chip-2-Chip (C2C) and unified memory, which allows fast and seamless automation of transferring data from CPU to GPU. It also allows the GPU to be oversubscribed, allowing it to handle data much larger than it can host, potentially tackling out-of-GPU memory problems. This allows software engineers to focus on implementing algorithms without having to think too much about memory management.

This blog post will demonstrate manual GPU memory management and introduce managed and unified memory with simple examples to illustrate its benefits. We'll try and keep this to an introductory level but the blog does assume basic knowledge of C++, CUDA and compiling with nvcc.