RStudio deployment for fun and profit

Overview

Teaching: 0 min
Exercises: 20 min
Questions
Objectives
  • Run an R workflow both through RStudio and the terminal using containers

RStudio example

R is a popular language in several domains of science, particularly because of its statistical packages. It often requires installing a large number of dependencies, and installing these on an HPC system can be tedious.

Instead we can use an R container to simplify the process.

Rocker

The group Rocker has published a large number of R images we can use, including an Rstudio image. To begin, we’ll pull a Tidyverse container image (contains R, RStudio, data science packages):

$ docker pull rocker/tidyverse:3.5

Running a scripted R workflow on the shell

Let us create a dedicated directory for this example:

$ mkdir r_example
$ cd r_example

We are going to use a minimalistic example taken from the workshop Programming with R by the Software Carpentry. The script readings-06.R from their Episode 5 is made available here for convenience, you can copy-paste the content in a file using your favourite text editor:

main <- function() {
  args <- commandArgs(trailingOnly = TRUE)
  action <- args[1]
  filenames <- args[-1]
  stopifnot(action %in% c("--min", "--mean", "--max"))

  if (length(filenames) == 0) {
    process(file("stdin"), action)
  } else {
    for (f in filenames) {
      process(f, action)
    }
  }
}

process <- function(filename, action) {
  dat <- read.csv(file = filename, header = FALSE)

  if (action == "--min") {
    values <- apply(dat, 1, min)
  } else if (action == "--mean") {
    values <- apply(dat, 1, mean)
  } else if (action == "--max") {
    values <- apply(dat, 1, max)
  }
  cat(values, sep = "\n")
}

main()

Let us download and unzip the required sample dataset:

$ wget http://swcarpentry.github.io/r-novice-inflammation/data/r-novice-inflammation-data.zip
$ unzip -q r-novice-inflammation-data.zip

Now, we can run the R script using the R container we pulled; we’re going to compute average values in this example:

$ docker run -v `pwd`:/data -w /data rocker/tidyverse:3.5 Rscript readings-06.R --mean data/inflammation-*.csv

Using an RStudio web server to run the analysis

Let us start up the web server using the following docker command:

$ docker run -d -p 80:8787 --name rstudio -v `pwd`/data:/home/rstudio/data -e PASSWORD=<Pick your password> rocker/tidyverse:3.5

Here we’re opening up the container port 8787 and mapping it to the host port 80 so we can access the Rtudio server remotely. Note you need to store a password in a variable; it will be required below for the web login.

You just need to open a web browser and point it to localhost if you are running Docker on your machine, or <Your VM's IP Address> if you are running on a cloud service.

You should see a prompt for credentials, with user defaulting to rstudio, and password..

Now you can run the same analysis from the RStudio console:

> system("Rscript readings-06.R --mean data/inflammation-*.csv")

Once you’re done, stop the container with:

$ docker stop rstudio

Running a scripted R workflow on HPC with Shifter

We can run the same analysis on HPC through command line using Shifter.

To get started let’s pull the required R container image:

$ module load shifter
$ sg $PAWSEY_PROJECT -c 'shifter pull rocker/tidyverse:3.5'

Now let’s change directory to either $MYSCRATCH or $MYGROUP, e.g.

$ cd $MYSCRATCH

Let’s create a dedicated directory and download the sample data:

$ mkdir r_example
$ cd r_example
$ wget http://swcarpentry.github.io/r-novice-inflammation/data/r-novice-inflammation-data.zip
$ unzip -q r-novice-inflammation-data.zip

With your favourite text editor, create the R file readings-06.R (see contents above),

and then create a SLURM script, we’ll call it rscript.sh (remember to specify your Pawsey project ID in the script!):

#!/bin/bash -l

#SBATCH --account=<your-pawsey-project>
#SBATCH --partition=workq
#SBATCH --ntasks=1
#SBATCH --time=00:05:00
#SBATCH --export=NONE
#SBATCH --job-name=rstudio

module load shifter

# run R script
srun --export=all shifter run rocker/tidyverse:3.5 Rscript readings-06.R --mean data/inflammation-*.csv

Let’s submit the script via SLURM:

$ sbatch --reservation <your-pawsey-reservation> rscript.sh

Key Points

  • Containers are great way to manage R workflows. You likely still want to have a local installation of R/Rstudio for some testing, but if you have set workflows, you can use containers to manage them. You can also provide Rstudio servers for collaborators