R Tutorial Part One - Top Tips¶
The ITSR support team often receive tickets from
R
users that cover similar
ground. So we thought we would collate our most frequent responses into some
"Top Tips"! The tips below apply equally to Rscript
but this article only
covers the interactive R
program.
1. Clearing out existing R environments¶
If you have made multiple failed attempts to install packages into an R
environment, it can often end up broken and/or corrupted. In these
circumstances, it is often best to clear out the environment and start again
with a clean one.
You can either clear all of your R
environments at once:
rm -rf ~/R/x86_64-pc-linux-gnu-library
Or you can delete version-specific environments, e.g.:
rm -rf ~/R/x86_64-pc-linux-gnu-library/X.Y
(where X.Y
is the R
version number - e.g. 4.1
or 4.2
)
This will give you a "clean slate" for any R
environments you subsequently
create.
2. Giving yourself enough time and RAM¶
Since creating R
environments can often take some time, it is best to do it in
an interactive session.
As stated in our docs, you
can specify an amount of runtime with the "-l h_rt=
" argument. It is best to
give yourself more time than you think you will need - if you run out of time
before the creation of an R
environment has completed, you will be returned
to the frontend and you will have to start the whole operation again.
Likewise, it is a good idea to give yourself 4GB RAM to avoid running out halfway through creating the environment and having to start again.
So, to request a qlogin
session with 24 hours runtime and 4GB, just issue the
following from the frontend:
qlogin -l h_vmem=4G -l h_rt=24:00:00
If the cluster is full, you may not be able to immediately obtain a qlogin
session for 24 hours. In which case, a one-hour session will usually suffice
(qlogin -l h_vmem=4G
) and is usually available immediately due to the short
queue additionally using idle cores on restricted nodes.
Once your R
environment is fully setup, you can exit this session
(type q()
or logout
followed by enter, or use Ctrl+D
) and return to the
frontend. You can then create and submit job scripts that load this environment
as required.
3. Forgetting to source environments and load modules¶
Once you have created an R
environment in an interactive qlogin
session,
this environment can then be called in any job scripts. However, a common
mistake is to forget to load the same modules that were previously loaded during
the creation of the R
environment and any Anaconda/Python environment,
if applicable.
For example, the R
package terra
follows the following recipe to install:
$ module load gcc gdal proj geos R
$ R
> install.packages("terra",repos = "https://cran.ma.imperial.ac.uk")
Warning in install.packages("terra", repos = "https://cran.ma.imperial.ac.uk") :
'lib = "/share/apps/centos7/R/.../lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/X.Y’
to install packages into? (yes/No/cancel) yes
...
** help##
*** installing help indices
*** copying figures
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (terra)
However, if you then try to load terra
in a subsequent job without first
loading the same set of modules used to build terra
, you will encounter an
error:
$ module load R
$ R
>library(terra)
Error: package or namespace load failed for ‘terra’ in dyn.load(file, DLLpath = DLLpath, ...):
unable to load shared object '/data/home/abc123/R/x86_64-pc-linux-gnu-library/X.Y/Rcpp/libs/Rcpp.so':
/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /data/home/abc123/R/x86_64-pc-linux-gnu-library/X.Y/Rcpp/libs/Rcpp.so)
This specific error has occurred because the gcc
module isn't loaded. As you
can see, this doesn't occur if you load the required modules (which are the same
ones loaded during creation), before entering the R
environment:
$ module load gcc gdal proj geos R
$ R
> library(terra)
terra 1.5.21
>
Likewise, if you used an Anaconda environment
or a Python virtualenv
to install dependencies alongside R
, you need to remember to source this as
part of your job script so that it is available, before launching R.
For Anaconda:
module load anaconda3
conda activate <envname>
R
For Python (must be run where the virtualenv is stored):
module load python
source <envname>/bin/activate
R
We hope you find these tips useful. As usual, you can ask a question on our Slack channel (QMUL users only), or by sending an email to its-research-support@qmul.ac.uk which is handled directly by staff with relevant expertise.
Title image: Cris DiNoto on unsplash