Skip to content

2025

Python GPU Programming with Numba and CuPy

In a previous blog, we looked at using Numba to speed up Python code by using a just-in-time (JIT) compiler and multiple cores. The speed-up is remarkable with small changes to the existing code.

In this blog post, we will continue exploring the Numba ecosystem and implement the Gauss map on the GPU, gaining further speed up while still writing Python code. We will also look at CuPy which is another way to write and run GPU code. Instead of being Pythonic, it allows hybrid programming where the GPU code is written in CUDA but executed in Python.

One key advantage of hybrid programming is writing software in your preferred language while optimising performance-critical sections in a faster language - the best of both worlds!

Rocky 9 benefits

The majority of the cluster has now been upgraded to Rocky 9 and the remaining CentOS 7 nodes will be updated in due course. There may be some users that are still hesitant to move over, but there are a few reasons why you should.

R on Rocky 9

With the major operating system upgrade from Centos 7 to Rocky 9, we want to ensure that using R, RStudio, and Open OnDemand (OOD) is as seamless as possible. This post will include new tips for a better experience, as well as a reiteration of the important or frequently forgotten old tips.

The next era for Apocrita is here

For much of the year we have been working on a major project to upgrade Apocrita to a new operating system, (Rocky Linux 9, hereafter known as Rocky 9). As part of the project, we have deployed a new package building tool to help us recompile all of the research applications to work on the new system.

The majority of the cluster has now been upgraded to Rocky 9. The remaining CentOS 7 nodes will be updated in due course.

A PyTorch DDP Case Study With ImageNet

In this blog post, we will play about with neural networks, on a dataset called ImageNet, to give some intuition on how these neural networks work. We will train them on Apocrita with DistributedDataParallel and show benchmarks to give you a guide on how many GPUs to use. This is a follow on from a previous blog post where we explained how to use DistributedDataParallel to speed up your neural network training with multiple GPUs.