Skip to content

2024

Unification of Memory on the Grace Hopper Nodes

The delivery of new GPUs for research is continuing, most notable is the new Isambard-AI cluster at Bristol. As new cutting-edge GPUs are released, software engineers are tasked with being made aware of the new architectures and features these new GPUs offer.

The new Grace-Hopper GH200 nodes, as announced in a previous blog post, consist of a 72-core NVIDIA Grace CPU and an H100 Tensor Core GPU. One of the key innovations is the NVIDIA NVLink Chip-2-Chip (C2C) and unified memory, which allows fast and seamless automation of transferring data from CPU to GPU. It also allows the GPU to be oversubscribed, allowing it to handle data much larger than it can host, potentially tackling out-of-GPU memory problems. This allows software engineers to focus on implementing algorithms without having to think too much about memory management.

This blog post will demonstrate manual GPU memory management and introduce managed and unified memory with simple examples to illustrate its benefits. We'll try and keep this to an introductory level but the blog does assume basic knowledge of C++, CUDA and compiling with nvcc.

Call for testers: the next phase of Apocrita

For much of the year we have been working on a major project to upgrade Apocrita to a new operating system. As part of the project, we have deployed a new package building tool to help us recompile all of the research applications to work on the new system. We are now calling for Apocrita users to preview and test this new system, ahead of our full roll-out, to help bring about a smoother and quicker transition.

This is an opportunity to check that your applications work on the new system, and for us to address any issues before we fully roll it out.

High Performance Computing for the Wolfson Institute Population Health

If you go to run every morning, or drive to work on weekdays, you should know that every journey is unique. For me, every High Performance Computing (HPC) workshop I deliver has its own personality. The audience, the material tailored to each audience, the interactions and questions, and of course, the energy of the community. Last Thursday September 26, an HPC workshop for the Wolfson Institute of Population Health was held from 2:00 p.m. to 5:00 p.m. The seminar includes, as usual, presentations, coffee break, quiz and treats, and the photographs to make it memorable.

High Performance Computing for SEMS

On August the 9th, the High Performance Computing for the School of Engineering and Material Science workshop was held at the Sofa Room at Dept. W. Around 16 researchers who already use Apocrita attended the event. The event covered six topics: Linux commands for Apocrita, HPC clusters at QMUL, Launching HPC jobs, Applications for SEMS, Using GPUs, and Miscellaneous.

A Slight Case of Overthreading

We still encounter jobs on the HPC cluster that try to use all the cores on the node on which they're running, regardless of how many cores they requested, leading to node alarms. Sometimes, jobs try to use exactly twice or one-and-a-half the allocated cores, or even that number squared. This was a little perplexing at first. In your enthusiasm to parallelize your code, make sure someone else hasn't already done so.

Linux for Apocrita Workshop

On May 3, 2024 Queen Mary University of London conducted a workshop to introduce our students to Linux at the Department W building in Whitechapel. Students from a variety of programmes at Queen Mary attended the workshop. Many students who participated are working towards Masters and PhD degrees.