The ITSR support team often receive tickets from
R users that cover similar
ground. So we thought we would collate our most frequent responses into some
"Top Tips"! The tips below apply equally to Rscript but this article only
covers the interactive R program.
You may wonder why some jobs start immediately but some wait in the queue
for hours or days, even if your job is quite simple. If you notice your job
has been queueing for a while, you may want to consider adjusting the
requested resources to reduce queueing time and reduce any potential resource
wastage as the job runs. Below, we outline two useful tools for you to check
the resource usage of previous jobs.
In this tutorial we'll be showing you how to visualise HEALPix results using
Jupyter Notebook in our OnDemand appliance
on the Apocrita HPC cluster. We'll start with installing the required Python
packages before demonstrating how to run the
Healpy tutorial.
Information about running other components of HEALPix not covered in
this tutorial can be found on our
docs site.
An understanding of file permissions is important to the success of
computational jobs, and the security of your files.
The default settings are suitable for some, but not every use-case: without
sufficient awareness, your files may be visible to people who should not
be able to access them, and vice-versa.
The information in this article is now outdated, please see
docs.hpc.qmul.ac.uk/apps/ml/tensorflow/
for recent documentation regarding TensorFlow installation and usage.
In this tutorial we'll be showing you how to run a
TensorFlow job using the
GPU nodes on the Apocrita HPC
cluster. We will expand upon the essentials provided on the QMUL HPC
docs site, and provide more explanation of the process.
We'll start with software installation before demonstrating a simple task and
a more complex real-world example that you can adapt for your own jobs, along
with tips on how to check if the GPU is being used.
The Apocrita scratch storage
is a high performance storage system designed for short-term file storage,
such as working data. We recently replaced the hardware that provides this
service, and expanded the capacity from 250TB to around 450TB. This article
will look at the recent changes, and suggest some best practices when using
the scratch system.
In response to a coordinated security attack on HPC sites world-wide, it has
been necessary to implement some changes to enforce a higher level of
authentication security. In this article, we begin with providing some useful
information to understand key-based authentication, and document the process
for regaining access to the cluster; SSH keys and passwords were revoked for
all users as a precautionary measure.
This article presents a selection of useful tips for running successful and
well-performing jobs on the QMUL Apocrita cluster.
In the ITS Research team, we spend quite a bit of time monitoring the Apocrita
cluster and checking jobs are running correctly, to ensure that this valuable
resource is being used effectively. If we notice a problem with your job, and
think we can help, we might send you an email with some recommendations on how
your job can run more effectively. If you receive such an email, please don't
be offended! We realise there are users with a range of experience, and the
purpose of this post is to point out some ways to ensure you get your results
as quickly and correctly as possible, and to ease the learning curve a little
bit.
At any one time, a typical HPC cluster is usually full. This is not such a bad
thing, since it means the substantial investment is working hard for the
money, rather than sitting idle. A less ideal situation is having to wait too
long to get your research results. However, jobs are constantly starting and
finishing, and many new jobs get run shortly after being added to the queue. If
your resource requirements are rather niche, or very large, then you will be
competing with other researchers for a more scarce resource.
In any case, whatever sort of jobs you run, it is important to choose resources
optimally, in order to get the best results. Using fewer cores, although
increasing the eventual run time, may result in a much shorter queuing time.