Installing R packages more quickly using Ncpus¶
Installing packages into a personal R library can sometimes take quite a long time, but it doesn't always have to be this way.
Ncpus¶
There is an option for the install.packages
command in R called Ncpus
. The
function of this option according to the help page for install.packages
is:
Ncpus: The number of parallel processes to use for a parallel install of more than one source package.
Up until recently, there was no value assigned to Ncpus
on Apocrita, which
meant that it would fall back to its default value of 1
. However, we recently
carried out some internal testing (having read a very useful
blog post[1]) to see if increasing this
value would have any effect on how long packages take to install
(spoiler alert - it does!)
Package install benchmarks¶
We carried out some benchmarks for two of the most commonly installed R packages
on Apocrita: Seurat
and tidyverse
.
1 core¶
First, let's take a look at how long a standard install of both packages takes
when we don't use any value for Ncpus
and install them in an R session on
Apocrita requesting 1 core:
Seurat
* DONE (Seurat)
user system elapsed
1410.265 246.715 1806.903
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:30:09 | 2G | 1.53G | 1 | 91% |
----------------------------------------------
tidyverse
* DONE (tidyverse)
user system elapsed
937.008 174.041 1182.636
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+----- +
| 0:19:46 | 2G | 0.58G | 1 | 93% |
----------------------------------------------
So, about 30 minutes for Seurat
and about 20 minutes for tidyverse
.
2 cores¶
To take advantage of parallel processes on Apocrita, we need to request multiple
cores. So, let's try installing those two packages again, but this time we'll
request 2 cores and set the value of Ncpus
to match:
Seurat
* DONE (Seurat)
user system elapsed
1494.449 308.158 1012.763
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:16:55 | 4G | 1.66G | 2 | 88% |
----------------------------------------------
tidyverse
* DONE (tidyverse)
user system elapsed
1003.867 222.840 701.113
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:11:43 | 4G | 0.77G | 2 | 87% |
----------------------------------------------
So, as you can see, a huge difference. Seurat
only took about 17 minutes to
install, and tidyverse
took about 12 minutes.
4 cores¶
But how does this scale? Let's try requesting 4 cores and setting Ncpus
to
match.
Seurat
* DONE (Seurat)
user system elapsed
1588.283 377.616 601.312
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:10:05 | 8G | 2.01G | 4 | 81% |
----------------------------------------------
tidyverse
* DONE (tidyverse)
user system elapsed
1063.013 276.057 472.438
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:07:56 | 8G | 1.26G | 4 | 70% |
----------------------------------------------
So, another uplift; Seurat
took just 10 minutes to install, and tidyverse
took just about 8 minutes to install.
8 cores¶
Let's take a look at what happens if we request 8 cores and set Ncpus
to
match:
Seurat
* DONE (Seurat)
user system elapsed
1684.220 503.392 500.563
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:08:22 | 16G | 2.91G | 8 | 54% |
----------------------------------------------
tidyverse
* DONE (tidyverse)
user system elapsed
1120.654 367.373 443.721
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:07:27 | 32G | 1.75G | 8 | 41% |
----------------------------------------------
Now we start to see a much less dramatic difference - Seurat
took about 8.5
minutes and tidyverse
took about 7.5 minutes. The
EFF
percentage is also starting to drop quite significantly, showing that the
requested cores weren't efficiently utilised.
16 cores¶
Finally, let's take a look at what happens if we request 16 cores and set
Ncpus
to match:
Seurat
* DONE (Seurat)
user system elapsed
1798.189 685.703 480.321
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:08:04 | 32G | 4.14G | 16 | 32% |
----------------------------------------------
tidyverse
* DONE (tidyverse)
user system elapsed
1207.006 507.690 416.986
# Truncated jobstats output:
----------------------------------------------
| DURATION | MEM R | MEM U | CORES | EFF |
+-----------+-------+---------+-------+------+
| 0:07:00 | 32G | 2.20G | 16 | 25% |
----------------------------------------------
Seurat
took 8 minutes and tidyverse
took about 7 minutes, so very little
difference compared to when we used 8 cores with 8 set for Ncpus
. And the
EFF
percentage has dropped yet again.
Final results¶
So, let's take a look at those results faced off:
Ncpus | Seurat Duration | Seurat Speedup | tidyverse Duration | tidyverse Speedup |
---|---|---|---|---|
1 | 0:30:09 | - | 0:19:46 | - |
2 | 0:16:55 | c. 1.8x faster | 0:11:43 | c. 1.7x faster |
4 | 0:10:05 | c. 3x faster | 0:07:56 | c. 2.5x faster |
8 | 0:08:22 | c. 3.6x faster | 0:07:27 | c. 2.6x faster |
16 | 0:08:04 | c. 3.7x faster | 0:07:00 | c. 2.8x faster |
So, as we can see, the "sweet spot" for most users will most likely be 4 cores. This is because each package still has to install its chain of dependencies, and there are likely to be a few bottlenecks along the way.
Using Ncpus on Apocrita¶
So, how do you make use of the Ncpus
option on Apocrita? Well, the good news
is that we have largely done it for you, as long as you use the default R module
(currently R/4.2.2
) or the most recent version of RStudio 2022
(2022.12.0-353 & R 4.2.2 (Centos7)
) available on OnDemand. Older versions of
both the module and RStudio don't set Ncpus
automatically.
When requesting a compute node on Apocrita, the
environment variable ${NSLOTS}
is automatically set to match the number of cores you request (be it in a
job script,
interactive qlogin session
or
RStudio Open OnDemand session).
A recent change we have made is to automatically import the value of ${NSLOTS}
into your R session, whether using the
R module or
RStudio. The value of
${NSLOTS}
will be used to set an R variable called nslots
. You can see this
in action in an RStudio session that has requested 4 cores:
We have set both the default R module and RStudio 2022 to set the value of
Ncpus
to this nslots
variable, which you can check using the getOption()
command:
> getOption("Ncpus")
[1] "4"
All you need to do to take advantage of this is to request your session with multiple cores and the rest is taken care of. As you can see above, there is little benefit to requesting more than 4 cores, but there is a huge benefit to requesting more than 1. However, be aware that at busier times, a job requesting more cores may require more queueing time.
Furthermore, setting Ncpus
gives a speed boost when updating packages via
update.packages()
.
Check your core request!
Be aware that the above advice is for installing packages to create your personal R library. It's best to carry this out in dedicated session. You can do this interactively in a qlogin or RStudio session, manually installing each package in turn. You could also write an environment creation Rscript file that contains installation commands for all the packages required for your environment to aid simple recreation in future. This can then be run in RStudio, or by submitting it to run as a job script.
Once your library is created, you can end your session. Then, when you are running actual R code, make sure your core request is appropriate! Most R code and packages only make use of one core, so only request multiple cores for compute jobs if you are sure that they will be used! You can keep track of your core usage using the jobstats tool.
If you have any questions regarding the use of Ncpus
, please contact us on our
Slack channel (QMUL users only), or
by sending an email to
its-research-support@qmul.ac.uk which
is handled directly by staff with relevant expertise.
References¶
[1] Speeding up package installation, (2017)
Title image: Cris DiNoto on unsplash