Skip to content

Welcome to the QMUL HPC blog

A look at the Grace Hopper superchip

NVIDIA recently announced the GH200 Grace Hopper Superchip which is a combined CPU+GPU with high memory bandwidth, designed for AI workloads. These will also feature in the forthcoming Isambard AI National supercomputer. We were offered the chance to pick up a couple of these new servers for a very attractive launch price.

The CPU is a 72-core ARM-based Grace processor, which is connected to an H100 GPU via the NVIDIA chip-2-chip interconnect, which delivers 7x the bandwidth of PCIe Gen5, commonly found in our other GPU nodes. This effectively allows the GPU to seamlessly access the system memory. This datasheet contains further details.

Since this new chip offers a lot of potential for accelerating AI workloads, particularly for workloads requiring large amounts of GPU RAM or involving a lot of memory copying between the host and the GPU, we've been running a few tests to see how this compares with the alternatives.

Some Strategies for Using Multiple Nodes of GPUs

Using multiple GPUs is one option to speed up your code. On Apocrita, we have V100, A100 and H100 GPUs available, with up to 4 GPUs per node. On other compute clusters, JADE2 has 8 V100 GPUs per node and Sulis has 3 A100 GPUs per node. If your problem is pleasingly parallel, you can distribute identical or similar tasks to each GPU on a node, or even on multiple nodes.

Modules Update December 2023

Since the last module update in December 2022, we have:

  • added/moved 61 modules to production
  • added 2 modules to the development environment
  • deprecated 3 modules
  • deleted 12 modules

R Workflow

Nowadays, there seems to be an R package for anything and everything. While this makes starting a project in R seem quick and easy, there are considerations to take into account that will make your life easier in the long run.

Keeping Files Tidy

Files quickly proliferate and need to be kept tidy. It is important that the correct people can access the files, and file systems are well-structured for easy navigation.

A Large Scale Ancestry Study | Living with Machines

Living with Machines is a funded project at The Alan Turing Institute (aka the Turing), bringing together academics from different disciplines, to answer research questions such as how did historical newspapers tell the political landscape, how were accidents in factories reported, how did road and settlement names change, how did people change occupations during the industrial revolution...