Skip to content

R Tutorial Part One - Top Tips

The ITSR support team often receive tickets from R users that cover similar ground. So we thought we would collate our most frequent responses into some "Top Tips"! The tips below apply equally to Rscript but this article only covers the interactive R program.

1. Clearing out existing R environments

If you have made multiple failed attempts to install packages into an R environment, it can often end up broken and/or corrupted. In these circumstances, it is often best to clear out the environment and start again with a clean one.

You can either clear all of your R environments at once:

rm -rf ~/R/x86_64-pc-linux-gnu-library

Or you can delete version-specific environments, e.g.:

rm -rf ~/R/x86_64-pc-linux-gnu-library/X.Y

(where X.Y is the R version number - e.g. 4.1 or 4.2)

This will give you a "clean slate" for any R environments you subsequently create.

2. Giving yourself enough time and RAM

Since creating R environments can often take some time, it is best to do it in an interactive session. As stated in our docs, you can specify an amount of runtime with the "-l h_rt=" argument. It is best to give yourself more time than you think you will need - if you run out of time before the creation of an R environment has completed, you will be returned to the frontend and you will have to start the whole operation again.

Likewise, it is a good idea to give yourself 4GB RAM to avoid running out halfway through creating the environment and having to start again.

So, to request a qlogin session with 24 hours runtime and 4GB, just issue the following from the frontend:

qlogin -l h_vmem=4G -l h_rt=24:00:00

If the cluster is full, you may not be able to immediately obtain a qlogin session for 24 hours. In which case, a one-hour session will usually suffice (qlogin -l h_vmem=4G) and is usually available immediately due to the short queue additionally using idle cores on restricted nodes.

Once your R environment is fully setup, you can exit this session (type q() or logout followed by enter, or use Ctrl+D) and return to the frontend. You can then create and submit job scripts that load this environment as required.

3. Forgetting to source environments and load modules

Once you have created an R environment in an interactive qlogin session, this environment can then be called in any job scripts. However, a common mistake is to forget to load the same modules that were previously loaded during the creation of the R environment and any Anaconda/Python environment, if applicable.

For example, the R package terra follows the following recipe to install:

$ module load gcc gdal proj geos R

$ R

> install.packages("terra",repos = "https://cran.ma.imperial.ac.uk")
Warning in install.packages("terra", repos = "https://cran.ma.imperial.ac.uk") :
  'lib = "/share/apps/centos7/R/.../lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/X.Y’
to install packages into? (yes/No/cancel) yes
...
** help##
*** installing help indices
*** copying figures
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (terra)

However, if you then try to load terra in a subsequent job without first loading the same set of modules used to build terra, you will encounter an error:

$ module load R

$ R

>library(terra)

Error: package or namespace load failed for ‘terra’ in dyn.load(file, DLLpath = DLLpath, ...):

unable to load shared object '/data/home/abc123/R/x86_64-pc-linux-gnu-library/X.Y/Rcpp/libs/Rcpp.so':

/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /data/home/abc123/R/x86_64-pc-linux-gnu-library/X.Y/Rcpp/libs/Rcpp.so)

This specific error has occurred because the gcc module isn't loaded. As you can see, this doesn't occur if you load the required modules (which are the same ones loaded during creation), before entering the R environment:

$ module load gcc gdal proj geos R

$ R

> library(terra)
terra 1.5.21
>

Likewise, if you used an Anaconda environment or a Python virtualenv to install dependencies alongside R, you need to remember to source this as part of your job script so that it is available, before launching R.

For Anaconda:

module load anaconda3
conda activate <envname>
R

For Python (must be run where the virtualenv is stored):

module load python
source <envname>/bin/activate
R

We hope you find these tips useful. As usual, you can ask a question on our Slack channel (QMUL users only), or by sending an email to its-research-support@qmul.ac.uk which is handled directly by staff with relevant expertise.


Title image: Cris DiNoto on unsplash