Note: This first appeared in Biometric Bulletin Volume 33, Issue 2, Software Corner.

Parallel computing refers to situations where calculations are carried out simultaneously, for example distributing the calculations across multiple cores of your computer’s processor, as opposed to having the calculations run sequentially on a single core. Parallel computing is particularly suitable for ‘single program, multiple data’ problems, for example in simulations and bootstrapping.

Parallel computation in **R** has come a long way over the last 10 years. If you tried to parallelise your **R** code a few years ago, you probably worked with the architecture specific `snow`

(Windows) or `multicore`

(Unix-like) packages. Since 2011 **R** has supported parallel computation as part of the base distribution with introduction of the `parallel`

package (R version 2.14.0 released in October 2011). The `parallel`

package builds on `multicore`

and `snow`

to provide a (mostly) platform agnostic method of leveraging multiple cores to speed up the computation of*embarrassingly parallel* problems

This note discusses how incorporate `parallel`

and associated packages, with little or no additional effort on the part of the statistical practitioner, to speed up data processing and statistical analysis pipelines.

### Getting started

The `parallel`

package is part of base **R** which means that it’s already installed and you can’t find it on CRAN. You can load it in the usual way `library("parallel")`

. The first thing you’ll want to do is `detectCores()`

which checks how many cores you have available wherever **R** is running (probably your laptop or desktop computer).

### Parallel apply

The family of apply functions (`apply`

, `lapply`

, `tapply`

, `sapply`

, etc.) in **R** provide an extremely convenient way of applying a function to the margins of an array, matrix, list, etc. If you’re using `apply`

on a data frame, e.g. `apply(X,2,median)`

to compute the median of the columns of `X`

, you should consider using `lapply`

instead because it’s much faster, e.g. `lapply(X,median)`

which computes the median of each variable in the data frame `X`

(there’s no need for a margin argument).

The apply functions operate serially, i.e. they calculate each value in sequence and are constrained to use only one of your computer’s cores. If you’re on a Unix-like system, the `mclapply`

function is an easy way to distribute the computation across the available cores. To use `mclapply`

, first code your calculations as a function that can be called with `lapply`

. Make sure it works serially using `lapply`

and then use `mclapply`

to perform the computations in parallel.

```
library("parallel")
X <- data.frame(matrix(rnorm(1e+07), ncol = 200))
mclapply(X, median)
```

In this simple example, there is only a relatively minor speed improvement. To get the most out of parallel processes, the functions to be run in parallel should be non-trivial. There is an overhead associated with forking the process, so it is possible to slow your code down with `mclapply `

if the time taken to send the tasks out to various cores takes longer than performing the task serially.

Windows can’t “fork” processes in the same way that Unix-like systems do. This means that you won’t see any improvements in speed when using `mclapply`

on a Windows machine. On the plus side, your code won’t break – it will work as if you were simply using `lapply`

. Microsoft has announced plans to incorporate a Ubuntu image into future releases of Windows 10 through a new infrastructure they’re calling “Windows Subsystem for Linux”. This means you may soon be able to run **R** in a native Unix-like environment (which supports forking and hence`mclapply`

) on Windows 10 machines.

### Parallel loops

An alternative to `mclapply`

is the `foreach`

function which is a little more involved, but works on Windows and Unix-like systems, and allows you to use a loop structure rather than an apply structure. To use `foreach`

you need to register a “parallel backend”, for example using the`doParallel`

package. The `doParallel`

package acts as an interface between `foreach`

and the `parallel`

package. A simple example of how this works is given below where we calculate a percentile bootstrap confidence interval for a least absolute deviations (LAD) regression parameter. In the code below, we resample a dataset 10,000 times and each time generate LAD regression coefficients.

`library("quantreg") # for quantile regression function rq() data(engel) # the data set we'll use # help(engel) # plot(foodexp ~ income, data = engel) # fit1 <- rq(foodexp ~ income, tau = 0.5, data = engel) # abline(fit1) library("foreach") library("doParallel") cl <- makeCluster(2) # create a cluster with 2 cores registerDoParallel(cl) # register the cluster res = foreach(i = 1:10000, .combine = "rbind", .packages = "quantreg") %dopar% { # generate a bootstrap sample boot_dat <- engel`

`[sample(1:nrow(engel), replace = TRUE), ] # fit the model fit1 <- rq(foodexp ~ income, tau = 0.5, data = boot_dat) # return the coefficients fit1$coef } stopCluster(cl) # shut down the cluster`

In the `foreach()`

function we’ve specified that the results should be combined using `rbind`

(i.e. the rows will be bound together, where the iththrow is the LAD regression coefficients from the ithth bootstrap sample) and we’ve indicated that the `quantreg`

package needs to be loaded on each of the processes. The output is a matrix, that we’ve called `res`

, which consists of two columns and 10,000 rows. We obtain a 95% percentile bootstrap by extracting the appropriate quantiles:

```
resdf <- as.data.frame(res)
quantile(resdf$income, probs = c(0.025,0.975), type = 1)
```

```
## 2.5% 97.5%
## 0.4704510 0.6125974
```

The for loop syntax is very similar to a regular for loop in **R**, except it starts with foreach and uses the `%dopar%`

function. If you use `%do%`

instead of `%dopar%`

evaluates the loop sequentially where one CPU will run at 100% until the job is finished. In the above example we asked for two cores to be used, hence with `%dopar%`

both cores will run at 100% until the job is done. This means that a `%dopar%`

loop running on two cores will finish in roughly half the time that it would have taken to run on a single core. As an example, the code above with 100,000 replications using`%do%`

takes 3.0 minutes on my computer whereas using `%dopar%`

takes only 1.6 minutes. It’s not exactly two times faster because of the computational overheads involved in sending the tasks out to the different cores and collating the results. For more details see the `doParallel`

and `foreach`

vignettes.

### Distributed computing

The function `mclapply`

can only use the cores of one machine, i.e. jobs can’t be distributed over several nodes of compute cluster. One way to do this in **R** is to use the `parLapply`

function which can utilise the Message Passing Interface (MPI) system. For further details see the `Rmpi `

package. The `doMPI`

package provides an MPI parallel backend for the `foreach`

package.

### Other considerations

**Parallel random number generation:** When bootstrapping or performing simulation studies, it is desirable for each instance of R to generate independent, reproducible pseudo-random numbers. If there is an object `.Random.seed`

in your R workspace that is then shared with the worker cores, all your instances of R may inavertently run identical simulations using identical “random” numbers. Alternatively, if `.Random.seed`

is not in the workspace, then you will have independent streams of random numbers but it will not be reproducible. The `parallel`

package includes a random number generator designed to overcome these issues. It can be enabled using `RNGkind("L'Ecuyer-CMRG")`

. See the documentation of the `parallel`

package for details.

**Multi-threaded linear algebra computation:** **R** can also be compiled against multi-threaded linear algebra libraries (BLAS, LAPACK) which can speed up calculations. One of the easiest ways to do this is to install Microsoft R Open (formerly known as Revolution R Open), however doing so ties you to a CRAN snapshot taken at a fixed point in time. You can still install the most up-to-date version of a package by manually specifying the `repos`

argument.

**Bioconductor:** If you work with Bioconductor packages, you should look into BiocParallel which provides modified versions of functions optimised for parallel evaluation, tailored for use with Bioconductor objects.

So the next time you think about leaving a simulation running on your computer for the weekend, consider using `mclapply`

instead of `lapply `

or rewriting that `for`

loop as a `foreach`

loop and have it run overnight instead, or send it out to a supercomputer and have the results within a couple of hours!

Havi Murad (editor - Biometric Bulletin of IBS)July 12, 2016 at 6:05 amThis is a great article Garth. Thanks so much for agreeing to write it for the Biometric Bulletin. I hope you will write many other articles for us.