Session 2: setting up graphical applications

Overview

Teaching: 10 min
Exercises: 50 min
Questions
Objectives

Goals

In this session, we’re having a look at examples of how to setup and run containerised applications that feature a graphical user interface (GUI).

In particular we’ll cover:

For the first two exercises, we’re going to review the following Singularity syntax:

The RStudio example follows the webinar episode GUI enabled web applications: RStudio in a container. Here, however, we’ll go in greater detail, and you’ll proceed step by step yourself.

With the third exercise, we will discuss what are the requirements to run an X11 application in containers, and go through a small case together.

GUIDED - Run a containerised R analysis from command line

Let’s cd in the appropriate directory:

$ cd /data/containers-bioinformatics-workshop
$ cd exercises/graphical

Now, suppose we need to set up an analysis pipeline with R. We will need to run it on different systems, both via command line interface (CLI) and RStudio, and even share the protocol with other collaborators. Hence using a container can be a great choice to improve portability and reproducibility.

The R script we need is at home_rstudio/script.r. It creates a graphical representation of a very simple phylogenetic tree. This is based on the example on Chapter 5 of the online book by Yu Lab.
The required toolset consists of the tidyverse collection plus the ggtree package (this setup is used in the session on building as well).

We’ve already made the image available on Docker Hub: marcodelapierre/ggtree:2.0.4. It’s based on the image rocker/tidyverse:3.6.1 by Rocker. Let’s pull it:

$ singularity pull docker://marcodelapierre/ggtree:2.0.4

You might know that every R installation ships with a utility to run scripts from the command line, Rscript. In this case we can use it as: Rscript home_rstudio/script.r.

Let’s run the R script through CLI

Solution

$ singularity exec ggtree_2.0.4.sif Rscript home_rstudio/script.r

We’ll get an image file as an output, tree.png:

$ ls -lh tree.png
-rw-r--r-- 1 ubuntu ubuntu  96K Jun  9 05:52 tree.png

To save time, I can show you the content of this file from my screen.

ROOM - Spawn an interactive RStudio web session

All right, now it’s your turn: I want you to work out the setup to run a containerised Rstudio session.

Actually, you already saw the solution during one of the webinars, so getting it right is not the point.
Instead, the main point is to stop at each step of its construction, to understand why it’s needed. You’ll see a mix of arguments, relative both to the general containerised setup and to the specific way the Rocker container you’re using is setup.

At the end, you’ll also be able to re-run the phylogenetic tree script you executed above, this time through RStudio.

Hints

On the rserver command syntax

Why are we running the following?

rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper
  • --www-port 8787 sets the communication port the Rstudio server will listen at for incoming connections; 8787 is a common choice for Rstudio;
  • --www-address 0.0.0.0 is to let connections come from all IP addresses on the host machine, for instance not only local ones but also ones coming through the internet (this will be our case of usage with Nimbus);
  • --auth-none=0 is to keep password authentication enabled, a wise security choice;
  • --auth-pam-helper-path=pam-helper is to use a specific RStudio utility for authentication, that allows to define the password as an environment variable prior to starting the server.

First attempt: plain

To start, run the command above from within the ggtree container.

Solution

$ singularity exec ggtree_2.0.4.sif \
    rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper

Now go to the browser and open the web address http://<YOUR NIMBUS VM'S IP>:8787. What happens?

Comments

You’re getting a welcome page, good! The server is up and running.

However, You’re getting prompted for a username and password.
You can get your username from the terminal with echo $USER, or in alternative with whoami. In Nimbus Ubuntu VMs by default it’s ubuntu.

However, you’re still stuck as you have no password to enter.

Now kill the server from the terminal by hitting Ctrl-C.

Second attempt: setup access credentials

This modification is required by the way RStudio works: if we want authentication on the web server, we need to define a PASSWORD variable in the shell environment.

Define the PASSWORD variable, then re-run the singularity exec command.

Hints:

  • define it as SINGULARITYENV_PASSWORD, this is a good practice to ensure it’s always passed in the container, even if we’re enabling further container isolation;
  • define it by prepending export, as the variable needs to be available to singularity, which is a sub-process of the active shell;
  • for completeness, also define the USER name with SINGULARITYENV_USER in the same way, again to ensure this is defined also if the container has tighter isolation.

Solution

$ export SINGULARITYENV_PASSWORD="<Pick-Your-Password>"
$ export SINGULARITYENV_USER="$USER"
$ singularity exec ggtree_2.0.4.sif \
    rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper

Re-load the page in the web-browser.
Now you now what credentials to use for access.

What happens?

Comments

You will get to the next screen, however an error message is popping up: Error occurred during transmission. If you click on OK you’ll get a blank screen.

If you check the terminal where you launched the server from, there’s no error message. Not really helpful (such blank terminal output is a shortcoming of RStudio).

Kill the session again with Ctrl-C.

Third attempt: a writable location

What is happening is a consequence of the nature of Singularity containers: they’re read only.
As many other packages, RStudio needs to write initialisation and configuration files in the user’s home at runtime, which it cannot do in a read-only Singularity container.

To work around this, you can bind mount a host directory to act as the user’s home in the container, using for instance the -B flag.

Before you proceed, there’s another matter to discuss, this time related to the way the rocker containers are setup. These containers have a special user predefined in the Dockerfile, named rstudio and with user ID 1000.
Get your user ID in the host, using the command id -u. In Nimbus VMs by default it’s a 1000.
Now, because of this possible clash between user ID in the rocker containers and the one in the host, what happens is that the path of the HOME directory in the RStudio server will have a different value depending on the host user ID. It will be home/rstudio if the host ID is 1000, and will instead coincide with the host $HOME otherwise (for instance, Zeus users at Pawsey fall in this latter case).

So, in the case of this workshop setup, you will need the container home to be /home/rstudio.
There’s a directory ready to be used as fake home for RStudio in the current directory, home_rstudio. So you can go with -B home_rstudio:/home/rstudio.

Now go ahead with this bind mounting!

Solution

$ export SINGULARITYENV_PASSWORD="<Pick-Your-Password>"
$ export SINGULARITYENV_USER="$USER"
$ singularity exec \
    -B home_rstudio:/home/rstudio \
    ggtree_2.0.4.sif \
    rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper

You don’t need to redefine the shell variables, they’re just here for the sake of good documentation.

And now for another go with the web server on the browser.

Comments

This time it seems to have worked.

You are getting the RStudio interface… you have just spawned a working RStudio server!

As a final note, if you don’t want to bother about the two possibilities we discussed above for the container HOME path, you can actually go for a double mount of the host directory. Singularity will let you do it and work correctly: -B home_rstudio:/home/rstudio -B home_rstudio:$HOME.

In the bottom right box, you can see there’s the analysis file script.r.
You can execute it by typing the following in the console on the left:

> source('script.r')

…and after a few seconds, the phylogenetic tree is visualised on the right!

When you’re done, click on the Power Off button at the top right of the RStudio interface. You can either save the workspace data or not. They will be saved in the fake home in the host, and remain available for future sessions if you need them.
Then go back to the terminal and hit Ctrl-C.

You might remember that in the webinar covering this topic we also discussed container isolation, in order to better clean up the RStudio session upon exit. In the interest of time, you’re not going through this aspect in this example. You’ll get a taste in the next example, though.

ROOM - Bonus - Launch a Jupyter Notebook session

Go through this only if time allows.

Another quite powerful and general-purpose graphical platform is Jupyter. It provides a friendly interface to execute workflows in several languages, the most common being Python and R. Notably, these days a lot of Python visualisation libraries are being developed specifically for Jupyter, so that the only way to use them is on this platform.

In this example you need to visualise a network of gene correlations by using the package ipycytoscape. There’s a notebook ready for use at home_jupyter/TipGeneExample.ipynb. This example notebook comes from the Github page of the package.
Note we’ve already made the image available on Docker Hub: marcodelapierre/ipycytoscape:0.2.2. It is based on the image jupyter/datascience-notebook:latest by Jupyter Stacks.

Similar to the RStudio example, you’ll go through steps to work out the execution syntax for a containerised Jupyter Notebook.

First, download the image:

$ singularity pull docker://marcodelapierre/ipycytoscape:0.2.2

First: plain run

The standard command to start a Jupyter Notebook is:

jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME

Here, --no-browser disables the creation of a window in the machine where the server is launched (which would be useless, as you’ll be connecting remotely using the web browser).
The flag --notebook-dir is handy to set the working directory.

Now, try and execute that command from the container.

Solution

$ singularity exec \
    ipycytoscape_0.2.2.sif \
    jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME

What happens?

Comments

You’re getting lots of output and an error (well, at least Jupyter is verbose and tells you what’s going on):

[C 08:39:47.917 NotebookApp] Bad config encountered during initialization:
[C 08:39:47.917 NotebookApp] No such notebook dir: ''/home/ubuntu''

This seems quite clear: you need to mount a writable home directory!

Second: home directory

Note that in the Jupyter Stacks container images, you’re known internally as the user jovyan. However, the HOME directory with Singularity has the same path than the host.

Use the -B flag to mount home_jupyter as $HOME.

Solution

$ singularity exec \
    -B home_jupyter:$HOME \
    ipycytoscape_0.2.2.sif \
    jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME

What have you got?

Comments

Well, yet another error message:

OSError: [Errno 30] Read-only file system: '/run/user'

The /run directory is a system directory used by some processes to store runtime files.

Third and last: container isolation

Files written in the /run directory are volatile, you don’t really need them.
A simple workaround is to use the container isolation variable -C, or --containall. This option isolates the container from the host, including the use of volatile /run and /tmp directories instead of the host ones, to better clean up the session upon exit. In this way no entry will be written in the host /run.

Try and add the -C flag to the runtime:

Solution

$ singularity exec \
    -C \
    -B home_jupyter:$HOME \
    ipycytoscape_0.2.2.sif \
    jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME

Comments

It has worked!

The terminal is waiting after having produced some relevant output:

[..]

[I 08:47:16.450 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation).
[C 08:47:16.454 NotebookApp] 
    
    To access the notebook, open this file in a browser:
        file:///home/ubuntu/.local/share/jupyter/runtime/nbserver-16-open.html
    Or copy and paste one of these URLs:
        http://nimbus1:8888/?token=bde09af92ccaa4af48de43693ac3371b3fe9637f3a32c0b4
     or http://127.0.0.1:8888/?token=bde09af92ccaa4af48de43693ac3371b3fe9637f3a32c0b4

Notice the long string after the keyword token= in the last line. Select it and copy it in your clipboard.

Now, open your browser, and go to http://<YOUR NIMBUS VM'S IP>:8888.
You’re asked for a token. Paste it from your clipboard.

Click on the notebook TipGeneExample.ipynb.
Then Click on the Cell menu, and then on the item Run all.

Scroll up a bit… here is your graph for the gene network!

When you’re done, click on Logout on the top right of the page.
Then go back to the terminal, hit Ctrl-C, enter y, and hit Enter to shutdown the server.

ROOM - Questions for Reflection

These are a set of questions designed to kick off the discussion within your breakout room:

  • Look back at what you went through in the previous exercises. What was the most interesting learning outcome? Which was the hardest bit to digest?
  • Do you use RStudio, Jupyter or other graphical web platforms?
  • Do you see advantages in terms of reproducibility, portability and collaboration, that a containerised approach could bring in using this type of platforms?
  • If you use RStudio or Jupyter, think about how many different packages and package dependencies you normally need. Do you think that a containerised approach might help in better managing large Python/R software stacks?
  • In your analysis workflows, do you use any other application with a desktop window interface? (this is for the upcoming exercise…)

GUIDED - Launch a containerised X11 application

Now we’re going to see something new compared to the webinars.

A good number of Linux scientific packages provide a graphical interface by means of desktop windows. The X Window System, or X11, is a quite common framework to achieve this. It is cross-platform (Linux, macOS and Windows) and can handle windows both locally and from a remote server.

In this exercise we’re going to launch the genomics viewer IGV, by Broad Institute. We’ll use it to visualise a mutation in a human genome.

X11 requirements

  1. Our local machine needs an X11 system. Linux boxes should all be good to go. On macOS we’ll need XQuartz; on Windows we’ll need MobaXterm or Cygwin/X.

  2. If we’re running our application by remotely connecting to a Linux machine, that machine will need an X11 setup, too.

  3. We’ll also need to enable X11 forwarding when establishing the SSH connection to that machine. This is why in the setup for this workshop we used ssh with the -X flag (-Y from macOS).

  4. There is one small requirement for the image that packages the X11 enabled application. It’s a utility called xauth, used to authenticate the X client in the container with the X server of the local machine. To this end, the Dockerfile (or def file) for the image needs to contain an apt command (Ubuntu/Debian) to install xauth. E.g. in a Dockerfile:

     [..]
        
     RUN apt-get install -y xauth
        
     [..]
    

    To be precise, this package is required when running containers with Docker, not Singularity. It’s a good practice to embed it in order to enforce compatibility across different container engines.

  5. One last thing to know before we run is that we need to bind mount an Xauthority file in the container. This file contains a secret string, or magic cookie, that is used by the X client to authenticate with the X server.

    Usually this file is located in the user’s home directory, so the mounting will look like:

     -B ~/.Xauthority
    

    Less often, a dedicated location is used for this file which is contained in the $XAUTHORITY variable. In this case, this will do the job: -B $XAUTHORITY.

Running the X11 container

We’ve built a container image for IGV ourselves, to ensure xauth is included (for the curious, here is the Dockerfile). The image is already publicly available for download, so let’s pull it:

$ singularity pull docker://marcodelapierre/igv:2.8.3

Similar to the RStudio and Jupyter examples above, IGV will try to write config files in the home. Let’s bind mount an host directory to this end, in addition to ~/.Xauthority. Note how in this case the order of bind mounts matters: you need to mount the fake home, and then the ~/.Xauthority, which lies in your host home. If you go the other way round, the fake home will hide that single file, and you’ll get an error like:

Can't connect to X11 window server using 'nimbus1:10.0' as the value of the DISPLAY variable.

This is the same error we would get if we forgot to mount the authority file, or if the local/remote X11 setup were incomplete.

And now let’s go!

$ singularity exec -B home_igv:$HOME -B ~/.Xauthority igv_2.8.3.sif igv

We will get a bunch of output lines in the terminal, and then… a window should open on your desktop!

We’ve prepared a batch script to open the required files and locate the mutation along the genome.
Click on the Tools menu, then Run Batch Script, select batch.igv and then click Open.

That’s it!

When you’re done, click on File, then Exit.

Key Points