R Tutorial Part Two - R vs RStudio¶
Following up from part one of our R tutorial we'll be taking a look at the differences between R - the command-line language which can be loaded as a module and used in your Apocrita batch jobs - and Rstudio - the graphical development environment, accessed via a web server and provided via the OnDemand service.
R vs RStudio¶
It might appear at first glance that the two applications are completely interchangeable and, if you're not using any extra packages, then they can be treated that way for simple tasks. The complications arise once you want to perform any tasks that requires R to interact with extra software packages.
R, as a module, can be used in either a qsub job or a qlogin session and can be easily combined with other applications by simply loading other modules as required. This can be extremely useful when you're trying to install extra packages within R, as they frequently depend on external software to function correctly. A good example of this would be the commonly used Seurat package. The default version of GCC on the cluster is 4.8.0 but recent changes mean that Seurat now requires the use of GCC 10.2.0, along with the Geos library. Both of these are available as modules and, once you have them loaded, you can then run R and install Seurat.
This is where the interface of RStudio can leave you struggling. For some users, the command-line batch mode of R offers great productivity and offers easy scaling up to a high volume of jobs. For others, the interactive graphical environment is preferable for their workflow. The key issue is that Rstudio is limited to gcc module selection only, which needs to be selected before starting the OnDemand session.
Loading modules into the RStudio environment
Starting with the Rstudio 2022 version within OnDemand, we also offer a free-text field for you to specify other modules that may be required to install or use a package within Rstudio"
In some cases you can install R packages within a qlogin session, with the necessary modules loaded, and then successfully run the package in RStudio without the modules but this is not a guaranteed solution and should be treated with caution.
Rstudio versions¶
Another issue that can cause confusion is the different RStudio options. The OnDemand service provides a variety of combinations of RStudio and R and these can generally be selected to suit your requirements. There is, of course, a big caveat when it comes to using these different versions and the key point is whether the option selected is labelled with (Ubuntu) or (Centos7).
The versions that we would recommend to use are those that end with (Centos7). Due to some incompatibilities experienced with the Ubuntu container, we are standardising on the CentOS build going forward, and may remove the Ubuntu versions in future. If you use RStudio with R 4.1.1 to create a workflow that you subsequently wish to move to a scripted qsub job then this will work very easily, provided you load the matching R module. In the same way you could use a qlogin session interchangeably with an OnDemand session running RStudio, provided that the R versions selected match.
Anaconda R versions¶
The final warning that needs to be given is not actually between R and RStudio (but the title would have been less snappy). Some Anaconda packages install their own version of R as part of their set up and this often causes issues for those who aren't aware of this. When running a workflow that uses Anaconda and R in combination always double check what version of R is being used, as conflicts between package versions is common place and can create numerous problems. It's particularly worth remembering that R packages can change over time, so Anaconda calling an outdated version might suddenly break a workflow that was previously fine.
Title image: Cris DiNoto on unsplash