We still encounter jobs on the HPC cluster
that try to use all the cores on the node on which they're running, regardless
of how many cores they requested, leading to node alarms. Sometimes, jobs try
to use exactly twice or one-and-a-half the allocated cores, or even that number
squared. This was a little perplexing at first. In your enthusiasm to parallelize
your code, make sure someone else hasn't already done so.
On May 3, 2024 Queen Mary University of London conducted a workshop to
introduce our students to Linux at the Department W building in Whitechapel.
Students from a variety of programmes at Queen Mary attended the workshop.
Many students who participated are working towards Masters and PhD degrees.
The High Performance Computing (HPC) team organised an event to celebrate
February's bonus day this year. The goal was to introduce the HPC team members
to the research community at QMUL, and to have the opportunity to ask the HPC expert
in-person about any issue related to the performance of HPC jobs in Apocrita.
Here is a quick summary of what we covered in the session:
Whilst most Apocrita users will want to use the
R module or
RStudio via OnDemand for R
workflows, it is also possible to use R inside of Conda via
Miniforge.
In a previous blog, we discussed ways we could use multiprocessing and
mpi4py together to use multiple nodes of GPUs. We will cover some machine
learning principles and two examples of pleasingly parallel machine learning
problems. Also known as embarrassingly parallel problems, I rather call them
pleasingly because there isn't anything embarrassing when you design your
problem to be run in parallel. When doing so, you could launch very similar
functions to each GPU and collate their results when needed.
NVIDIA recently announced the GH200 Grace Hopper Superchip which is a
combined CPU+GPU with high memory bandwidth, designed for AI workloads. These
will also feature in the forthcoming Isambard
AI National supercomputer. We were offered the chance to pick up a couple of
these new servers for a very attractive launch price.
The CPU is a 72-core ARM-based Grace processor, which is connected to an H100
GPU via the NVIDIA chip-2-chip interconnect, which delivers 7x the bandwidth of
PCIe Gen5, commonly found in our other GPU nodes. This effectively allows the
GPU to seamlessly access the system memory. This
datasheet
contains further details.
Since this new chip offers a lot of potential for accelerating AI workloads,
particularly for workloads requiring large amounts of GPU RAM or involving a
lot of memory copying between the host and the GPU, we've been running a few
tests to see how this compares with the alternatives.
Using multiple GPUs is one option to speed up your code. On Apocrita, we have
V100, A100 and H100 GPUs available, with up to 4 GPUs per node. On other compute
clusters, JADE2 has 8 V100 GPUs per node and
Sulis has 3 A100 GPUs per node. If your problem
is pleasingly parallel, you can distribute identical or similar tasks to each
GPU on a node, or even on multiple nodes.
We held a 2-hour HPC workshop last Friday, December 15th.
We arranged an agenda in coordination with the research student at QMUL,
Peter Alexander Lock. It covered the generalities of Linux,
accessing Apocrita, submitting jobs, and HPC commands.