Skip to content

Installing R packages more quickly using Ncpus

Installing packages into a personal R library can sometimes take quite a long time, but it doesn't always have to be this way.

Ncpus

There is an option for the install.packages command in R called Ncpus. The function of this option according to the help page for install.packages is:

Ncpus: The number of parallel processes to use for a parallel install of more than one source package.

Up until recently, there was no value assigned to Ncpus on Apocrita, which meant that it would fall back to its default value of 1. However, we recently carried out some internal testing (having read a very useful blog post[1]) to see if increasing this value would have any effect on how long packages take to install (spoiler alert - it does!)

Package install benchmarks

We carried out some benchmarks for two of the most commonly installed R packages on Apocrita: Seurat and tidyverse.

1 core

First, let's take a look at how long a standard install of both packages takes when we don't use any value for Ncpus and install them in an R session on Apocrita requesting 1 core:

Seurat

* DONE (Seurat)

    user   system  elapsed
1410.265  246.715 1806.903

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:30:09 |    2G |   1.53G |     1 |  91% |
----------------------------------------------

tidyverse

* DONE (tidyverse)

    user   system  elapsed
 937.008  174.041 1182.636

 # Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+----- +
|   0:19:46 |    2G |   0.58G |     1 |  93% |
----------------------------------------------

So, about 30 minutes for Seurat and about 20 minutes for tidyverse.

2 cores

To take advantage of parallel processes on Apocrita, we need to request multiple cores. So, let's try installing those two packages again, but this time we'll request 2 cores and set the value of Ncpus to match:

Seurat

* DONE (Seurat)

    user   system  elapsed
1494.449  308.158 1012.763

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:16:55 |    4G |   1.66G |     2 |  88% |
----------------------------------------------

tidyverse

* DONE (tidyverse)

    user   system  elapsed
1003.867  222.840  701.113

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES | EFF  |
+-----------+-------+---------+-------+------+
|   0:11:43 |    4G |   0.77G |     2 |  87% |
----------------------------------------------

So, as you can see, a huge difference. Seurat only took about 17 minutes to install, and tidyverse took about 12 minutes.

4 cores

But how does this scale? Let's try requesting 4 cores and setting Ncpus to match.

Seurat

* DONE (Seurat)

    user   system  elapsed
1588.283  377.616  601.312

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:10:05 |    8G |   2.01G |     4 |  81% |
----------------------------------------------

tidyverse

* DONE (tidyverse)

    user   system  elapsed
1063.013  276.057  472.438

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:07:56 |    8G |   1.26G |     4 |  70% |
----------------------------------------------

So, another uplift; Seurat took just 10 minutes to install, and tidyverse took just about 8 minutes to install.

8 cores

Let's take a look at what happens if we request 8 cores and set Ncpus to match:

Seurat

* DONE (Seurat)

    user   system  elapsed
1684.220  503.392  500.563

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:08:22 |   16G |   2.91G |     8 |  54% |
----------------------------------------------

tidyverse

* DONE (tidyverse)

    user   system  elapsed
1120.654  367.373  443.721

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:07:27 |   32G |   1.75G |     8 |  41% |
----------------------------------------------

Now we start to see a much less dramatic difference - Seurat took about 8.5 minutes and tidyverse took about 7.5 minutes. The EFF percentage is also starting to drop quite significantly, showing that the requested cores weren't efficiently utilised.

16 cores

Finally, let's take a look at what happens if we request 16 cores and set Ncpus to match:

Seurat

* DONE (Seurat)

    user   system  elapsed
1798.189  685.703  480.321

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:08:04 |   32G |   4.14G |    16 |  32% |
----------------------------------------------

tidyverse

* DONE (tidyverse)

    user   system  elapsed
1207.006  507.690  416.986

# Truncated jobstats output:

----------------------------------------------
|  DURATION | MEM R |  MEM U  | CORES |  EFF |
+-----------+-------+---------+-------+------+
|   0:07:00 |   32G |   2.20G |    16 |  25% |
----------------------------------------------

Seurat took 8 minutes and tidyverse took about 7 minutes, so very little difference compared to when we used 8 cores with 8 set for Ncpus. And the EFF percentage has dropped yet again.

Final results

So, let's take a look at those results faced off:

Ncpus Seurat Duration Seurat Speedup tidyverse Duration tidyverse Speedup
1 0:30:09 - 0:19:46 -
2 0:16:55 c. 1.8x faster 0:11:43 c. 1.7x faster
4 0:10:05 c. 3x faster 0:07:56 c. 2.5x faster
8 0:08:22 c. 3.6x faster 0:07:27 c. 2.6x faster
16 0:08:04 c. 3.7x faster 0:07:00 c. 2.8x faster

So, as we can see, the "sweet spot" for most users will most likely be 4 cores. This is because each package still has to install its chain of dependencies, and there are likely to be a few bottlenecks along the way.

Using Ncpus on Apocrita

So, how do you make use of the Ncpus option on Apocrita? Well, the good news is that we have largely done it for you, as long as you use the default R module (currently R/4.2.2) or the most recent version of RStudio 2022 (2022.12.0-353 & R 4.2.2 (Centos7)) available on OnDemand. Older versions of both the module and RStudio don't set Ncpus automatically.

When requesting a compute node on Apocrita, the environment variable ${NSLOTS} is automatically set to match the number of cores you request (be it in a job script, interactive qlogin session or RStudio Open OnDemand session).

A recent change we have made is to automatically import the value of ${NSLOTS} into your R session, whether using the R module or RStudio. The value of ${NSLOTS} will be used to set an R variable called nslots. You can see this in action in an RStudio session that has requested 4 cores:

RStudio session showing nslots variable

We have set both the default R module and RStudio 2022 to set the value of Ncpus to this nslots variable, which you can check using the getOption() command:

> getOption("Ncpus")
[1] "4"

All you need to do to take advantage of this is to request your session with multiple cores and the rest is taken care of. As you can see above, there is little benefit to requesting more than 4 cores, but there is a huge benefit to requesting more than 1. However, be aware that at busier times, a job requesting more cores may require more queueing time.

Furthermore, setting Ncpus gives a speed boost when updating packages via update.packages().

Check your core request!

Be aware that the above advice is for installing packages to create your personal R library. It's best to carry this out in dedicated session. You can do this interactively in a qlogin or RStudio session, manually installing each package in turn. You could also write an environment creation Rscript file that contains installation commands for all the packages required for your environment to aid simple recreation in future. This can then be run in RStudio, or by submitting it to run as a job script.

Once your library is created, you can end your session. Then, when you are running actual R code, make sure your core request is appropriate! Most R code and packages only make use of one core, so only request multiple cores for compute jobs if you are sure that they will be used! You can keep track of your core usage using the jobstats tool.

If you have any questions regarding the use of Ncpus, please contact us on our Slack channel (QMUL users only), or by sending an email to its-research-support@qmul.ac.uk which is handled directly by staff with relevant expertise.

References

[1] Speeding up package installation, (2017)


Title image: Cris DiNoto on unsplash