A Short Guide to PyTorch DDP
In this blog post, we explore what
torchrun
and
DistributedDataParallel
are and how they can be used to speed up your neural network training by using
multiple GPUs.
In this blog post, we explore what
torchrun
and
DistributedDataParallel
are and how they can be used to speed up your neural network training by using
multiple GPUs.
If you go to run every morning, or drive to work on weekdays, you should know that every journey is unique. For me, every High Performance Computing (HPC) workshop I deliver has its own personality. The audience, the material tailored to each audience, the interactions and questions, and of course, the energy of the community. Last Thursday September 26, an HPC workshop for the Wolfson Institute of Population Health was held from 2:00 p.m. to 5:00 p.m. The seminar includes, as usual, presentations, coffee break, quiz and treats, and the photographs to make it memorable.
The Apocrita highmem
nodes have just been upgraded so that they contain newer
CPUs with more modern instruction sets.
On August the 9th, the High Performance Computing for the School of Engineering and Material Science workshop was held at the Sofa Room at Dept. W. Around 16 researchers who already use Apocrita attended the event. The event covered six topics: Linux commands for Apocrita, HPC clusters at QMUL, Launching HPC jobs, Applications for SEMS, Using GPUs, and Miscellaneous.
We still encounter jobs on the HPC cluster that try to use all the cores on the node on which they're running, regardless of how many cores they requested, leading to node alarms. Sometimes, jobs try to use exactly twice or one-and-a-half the allocated cores, or even that number squared. This was a little perplexing at first. In your enthusiasm to parallelize your code, make sure someone else hasn't already done so.
To make better use of the resources available on GPU nodes, the Apocrita and Andrena GPUs now support 12 cores per GPU. Please update your job scripts from 8 cores and 11G per GPU to 12 cores and 7.5G per GPU - this will maintain approximately the same total RAM per job, while increasing the core count.
On May 3, 2024 Queen Mary University of London conducted a workshop to introduce our students to Linux at the Department W building in Whitechapel. Students from a variety of programmes at Queen Mary attended the workshop. Many students who participated are working towards Masters and PhD degrees.
In this tutorial we'll be showing you how to create a new Git project within RStudio using either a new or existing GitHub repository
The High Performance Computing (HPC) team organised an event to celebrate February's bonus day this year. The goal was to introduce the HPC team members to the research community at QMUL, and to have the opportunity to ask the HPC expert in-person about any issue related to the performance of HPC jobs in Apocrita.
Here is a quick summary of what we covered in the session:
Whilst most Apocrita users will want to use the R module or RStudio via OnDemand for R workflows, it is also possible to use R inside of Anaconda.