Skip to content

R Tutorial Part One - Top Tips

The ITSR support team often receive tickets from R users that cover similar ground. So we thought we would collate our most frequent responses into some "Top Tips"! The tips below apply equally to Rscript but this article only covers the interactive R program.

1. Clearing out existing R environments

If you have made multiple failed attempts to install packages into an R environment, it can often end up broken and/or corrupted. In these circumstances, it is often best to clear out the environment and start again with a clean one.

You can either clear all of your R environments at once:

rm -rf ~/R/x86_64-pc-linux-gnu-library

Or you can delete version-specific environments, e.g.:

rm -rf ~/R/x86_64-pc-linux-gnu-library/X.Y

(where X.Y is the R version number - e.g. 4.1 or 4.2)

This will give you a "clean slate" for any R environments you subsequently create.

2. Symlinking R to scratch

One quality of life suggestion we sometimes make is symlinking your R directory to your scratch space. Your scratch space has a far more generous quota compared to your home directory, meaning you avoid the risk of filling up your home directory.

To symlink your R directory:

  1. First, remove your existing R directory (making sure that you have backed up anything inside you may need beforehand):
rm -rf ~/R
  1. Then, create a new R directory on your scratch:
mkdir /data/scratch/$USER/R
  1. Next, create a symlink in your home directory that points to your newly created scratch directory:
ln -s /data/scratch/$USER/R ~/R

You can check that this has worked correctly by running the following command from inside your home directory:

ls -la

Which should output something like this:

lrwxrwxrwx    1 abc123 users     22 Jun 22 09:32 R -> /data/scratch/abc123/R

Any R operations going forward will now automatically use /data/scratch/$USER/R for storage. The above process is optional, but we recommend it if the 100GB allocated to your home directory isn't enough.

Warning: Auto-deletion

Be mindful of the auto-deletion policy for scratch (files are automatically deleted 65 days after the last modification time - but you will receive an email alert just before this happens) and be sure to copy anything critical back to your home directory and/or your own machine.

3. Giving yourself enough time and RAM

Since creating R environments can often take some time, it is best to do it in an interactive session. As stated in our docs, you can specify an amount of runtime with the "-l h_rt=" argument. It is best to give yourself more time than you think you will need - if you run out of time before the creation of an R environment has completed, you will be returned to the frontend and you will have to start the whole operation again.

Likewise, it is a good idea to give yourself 4GB RAM to avoid running out halfway through creating the environment and having to start again.

So, to request a qlogin session with 24 hours runtime and 4GB, just issue the following from the frontend:

qlogin -l h_vmem=4G -l h_rt=24:00:00

If the cluster is full, you may not be able to immediately obtain a qlogin session for 24 hours. In which case, a one-hour session will usually suffice (qlogin -l h_vmem=4G) and is usually available immediately due to the short queue additionally using idle cores on restricted nodes.

Once your R environment is fully setup, you can exit this session (type q() or logout followed by enter, or use Ctrl+D) and return to the frontend. You can then create and submit job scripts that load this environment as required.

4. Forgetting to source environments and load modules

Once you have created an R environment in an interactive qlogin session, this environment can then be called in any job scripts. However, a common mistake is to forget to load the same modules that were previously loaded during the creation of the R environment and any Anaconda/Python environment, if applicable.

For example, the R package terra follows the following recipe to install:

$ module load gcc gdal proj geos R

$ R

> install.packages("terra",repos = "https://cran.ma.imperial.ac.uk")
Warning in install.packages("terra", repos = "https://cran.ma.imperial.ac.uk") :
  'lib = "/share/apps/centos7/R/.../lib64/R/library"' is not writable
Would you like to use a personal library instead? (yes/No/cancel) yes
Would you like to create a personal library
‘~/R/x86_64-pc-linux-gnu-library/X.Y’
to install packages into? (yes/No/cancel) yes
...
** help##
*** installing help indices
*** copying figures
** building package indices
** testing if installed package can be loaded from temporary location
** checking absolute paths in shared objects and dynamic libraries
** testing if installed package can be loaded from final location
** testing if installed package keeps a record of temporary installation path
* DONE (terra)

However, if you then try to load terra in a subsequent job without first loading the same set of modules used to build terra, you will encounter an error:

$ module load R

$ R

>library(terra)

Error: package or namespace load failed for ‘terra’ in dyn.load(file, DLLpath = DLLpath, ...):

unable to load shared object '/data/home/abc123/R/x86_64-pc-linux-gnu-library/X.Y/Rcpp/libs/Rcpp.so':

/lib64/libstdc++.so.6: version `GLIBCXX_3.4.20' not found (required by /data/home/abc123/R/x86_64-pc-linux-gnu-library/X.Y/Rcpp/libs/Rcpp.so)

This specific error has occurred because the gcc module isn't loaded. As you can see, this doesn't occur if you load the required modules (which are the same ones loaded during creation), before entering the R environment:

$ module load gcc gdal proj geos R

$ R

> library(terra)
terra 1.5.21
>

Likewise, if you used an Anaconda environment or a Python virtualenv to install dependencies alongside R, you need to remember to source this as part of your job script so that it is available, before launching R.

For Anaconda:

module load anaconda3
conda activate <envname>
R

For Python (must be run where the virtualenv is stored):

module load python
source <envname>/bin/activate
R

We hope you find these tips useful. As usual, you can ask a question on our Slack channel (QMUL users only), or by sending an email to its-research-support@qmul.ac.uk which is handled directly by staff with relevant expertise.


Title image: Cris DiNoto on unsplash