Session 2: setting up graphical applications
Overview
Teaching: 10 min
Exercises: 50 minQuestions
Objectives
Goals
In this session, we’re having a look at examples of how to setup and run containerised applications that feature a graphical user interface (GUI).
In particular we’ll cover:
- an RStudio web server (with a phylogenetic tree package)
- if time allows, a Jupyter Notebook web server (with a visualisation library for interaction networks)
- an X11 windowed application (IGV, the Interactive Genome Viewer by Broad Institute)
For the first two exercises, we’re going to review the following Singularity syntax:
singularity exec
to run commands from containers;- some runtime flags for
exec
, namely-B
and-C
; - shell variables for Singularity, i.e. prepended with
SINGULARITYENV_
.
The RStudio example follows the webinar episode GUI enabled web applications: RStudio in a container. Here, however, we’ll go in greater detail, and you’ll proceed step by step yourself.
With the third exercise, we will discuss what are the requirements to run an X11 application in containers, and go through a small case together.
GUIDED - Run a containerised R analysis from command line
Let’s cd in the appropriate directory:
$ cd /data/containers-bioinformatics-workshop
$ cd exercises/graphical
Now, suppose we need to set up an analysis pipeline with R. We will need to run it on different systems, both via command line interface (CLI) and RStudio, and even share the protocol with other collaborators. Hence using a container can be a great choice to improve portability and reproducibility.
The R script we need is at home_rstudio/script.r
. It creates a graphical representation of a very simple phylogenetic tree. This is based on the example on Chapter 5 of the online book by Yu Lab.
The required toolset consists of the tidyverse
collection plus the ggtree
package (this setup is used in the session on building as well).
We’ve already made the image available on Docker Hub: marcodelapierre/ggtree:2.0.4
. It’s based on the image rocker/tidyverse:3.6.1
by Rocker. Let’s pull it:
$ singularity pull docker://marcodelapierre/ggtree:2.0.4
You might know that every R installation ships with a utility to run scripts from the command line, Rscript
. In this case we can use it as: Rscript home_rstudio/script.r
.
Let’s run the R script through CLI
Solution
$ singularity exec ggtree_2.0.4.sif Rscript home_rstudio/script.r
We’ll get an image file as an output,
tree.png
:$ ls -lh tree.png
-rw-r--r-- 1 ubuntu ubuntu 96K Jun 9 05:52 tree.png
To save time, I can show you the content of this file from my screen.
ROOM - Spawn an interactive RStudio web session
All right, now it’s your turn: I want you to work out the setup to run a containerised Rstudio session.
Actually, you already saw the solution during one of the webinars, so getting it right is not the point.
Instead, the main point is to stop at each step of its construction, to understand why it’s needed. You’ll see a mix of arguments, relative both to the general containerised setup and to the specific way the Rocker container you’re using is setup.
At the end, you’ll also be able to re-run the phylogenetic tree script you executed above, this time through RStudio.
Hints
- you’re using the same container as above:
ggtree_2.0.4.sif
; - the command to start an Rstudio server is
rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper
; - you’ll embed it in a
singularity exec
command; - you’ll add flags to this command (see steps below);
- you need to have ready the IP address of your Nimbus VM;
- to monitor the outcomes of your setup, you’ll use your browser to navigate to the web address
http://<YOUR NIMBUS VM'S IP>:8787
.
On the
rserver
command syntaxWhy are we running the following?
rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper
--www-port 8787
sets the communication port the Rstudio server will listen at for incoming connections;8787
is a common choice for Rstudio;--www-address 0.0.0.0
is to let connections come from all IP addresses on the host machine, for instance not only local ones but also ones coming through the internet (this will be our case of usage with Nimbus);--auth-none=0
is to keep password authentication enabled, a wise security choice;--auth-pam-helper-path=pam-helper
is to use a specific RStudio utility for authentication, that allows to define the password as an environment variable prior to starting the server.
First attempt: plain
To start, run the command above from within the
ggtree
container.Solution
$ singularity exec ggtree_2.0.4.sif \ rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper
Now go to the browser and open the web address
http://<YOUR NIMBUS VM'S IP>:8787
. What happens?Comments
You’re getting a welcome page, good! The server is up and running.
However, You’re getting prompted for a username and password.
You can get your username from the terminal withecho $USER
, or in alternative withwhoami
. In Nimbus Ubuntu VMs by default it’subuntu
.However, you’re still stuck as you have no password to enter.
Now kill the server from the terminal by hitting
Ctrl-C
.
Second attempt: setup access credentials
This modification is required by the way RStudio works: if we want authentication on the web server, we need to define a
PASSWORD
variable in the shell environment.Define the
PASSWORD
variable, then re-run thesingularity exec
command.Hints:
- define it as
SINGULARITYENV_PASSWORD
, this is a good practice to ensure it’s always passed in the container, even if we’re enabling further container isolation;- define it by prepending
export
, as the variable needs to be available tosingularity
, which is a sub-process of the active shell;- for completeness, also define the
USER
name withSINGULARITYENV_USER
in the same way, again to ensure this is defined also if the container has tighter isolation.Solution
$ export SINGULARITYENV_PASSWORD="<Pick-Your-Password>" $ export SINGULARITYENV_USER="$USER" $ singularity exec ggtree_2.0.4.sif \ rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper
Re-load the page in the web-browser.
Now you now what credentials to use for access.What happens?
Comments
You will get to the next screen, however an error message is popping up: Error occurred during transmission. If you click on OK you’ll get a blank screen.
If you check the terminal where you launched the server from, there’s no error message. Not really helpful (such blank terminal output is a shortcoming of RStudio).
Kill the session again with
Ctrl-C
.
Third attempt: a writable location
What is happening is a consequence of the nature of Singularity containers: they’re read only.
As many other packages, RStudio needs to write initialisation and configuration files in the user’s home at runtime, which it cannot do in a read-only Singularity container.To work around this, you can bind mount a host directory to act as the user’s home in the container, using for instance the
-B
flag.Before you proceed, there’s another matter to discuss, this time related to the way the
rocker
containers are setup. These containers have a special user predefined in the Dockerfile, namedrstudio
and with user ID1000
.
Get your user ID in the host, using the commandid -u
. In Nimbus VMs by default it’s a1000
.
Now, because of this possible clash between user ID in therocker
containers and the one in the host, what happens is that the path of theHOME
directory in the RStudio server will have a different value depending on the host user ID. It will behome/rstudio
if the host ID is1000
, and will instead coincide with the host$HOME
otherwise (for instance, Zeus users at Pawsey fall in this latter case).So, in the case of this workshop setup, you will need the container home to be
/home/rstudio
.
There’s a directory ready to be used as fake home for RStudio in the current directory,home_rstudio
. So you can go with-B home_rstudio:/home/rstudio
.Now go ahead with this bind mounting!
Solution
$ export SINGULARITYENV_PASSWORD="<Pick-Your-Password>" $ export SINGULARITYENV_USER="$USER" $ singularity exec \ -B home_rstudio:/home/rstudio \ ggtree_2.0.4.sif \ rserver --www-port 8787 --www-address 0.0.0.0 --auth-none=0 --auth-pam-helper-path=pam-helper
You don’t need to redefine the shell variables, they’re just here for the sake of good documentation.
And now for another go with the web server on the browser.
Comments
This time it seems to have worked.
You are getting the RStudio interface… you have just spawned a working RStudio server!
As a final note, if you don’t want to bother about the two possibilities we discussed above for the container
HOME
path, you can actually go for a double mount of the host directory. Singularity will let you do it and work correctly:-B home_rstudio:/home/rstudio -B home_rstudio:$HOME
.
In the bottom right box, you can see there’s the analysis file script.r
.
You can execute it by typing the following in the console on the left:
> source('script.r')
…and after a few seconds, the phylogenetic tree is visualised on the right!
When you’re done, click on the Power Off button at the top right of the RStudio interface. You can either save the workspace data or not. They will be saved in the fake home in the host, and remain available for future sessions if you need them.
Then go back to the terminal and hit Ctrl-C
.
You might remember that in the webinar covering this topic we also discussed container isolation, in order to better clean up the RStudio session upon exit. In the interest of time, you’re not going through this aspect in this example. You’ll get a taste in the next example, though.
ROOM - Bonus - Launch a Jupyter Notebook session
Go through this only if time allows.
Another quite powerful and general-purpose graphical platform is Jupyter. It provides a friendly interface to execute workflows in several languages, the most common being Python and R. Notably, these days a lot of Python visualisation libraries are being developed specifically for Jupyter, so that the only way to use them is on this platform.
In this example you need to visualise a network of gene correlations by using the package ipycytoscape
. There’s a notebook ready for use at home_jupyter/TipGeneExample.ipynb
. This example notebook comes from the Github page of the package.
Note we’ve already made the image available on Docker Hub: marcodelapierre/ipycytoscape:0.2.2
. It is based on the image jupyter/datascience-notebook:latest
by Jupyter Stacks.
Similar to the RStudio example, you’ll go through steps to work out the execution syntax for a containerised Jupyter Notebook.
First, download the image:
$ singularity pull docker://marcodelapierre/ipycytoscape:0.2.2
First: plain run
The standard command to start a Jupyter Notebook is:
jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME
Here,
--no-browser
disables the creation of a window in the machine where the server is launched (which would be useless, as you’ll be connecting remotely using the web browser).
The flag--notebook-dir
is handy to set the working directory.Now, try and execute that command from the container.
Solution
$ singularity exec \ ipycytoscape_0.2.2.sif \ jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME
What happens?
Comments
You’re getting lots of output and an error (well, at least Jupyter is verbose and tells you what’s going on):
[C 08:39:47.917 NotebookApp] Bad config encountered during initialization: [C 08:39:47.917 NotebookApp] No such notebook dir: ''/home/ubuntu''
This seems quite clear: you need to mount a writable home directory!
Second: home directory
Note that in the Jupyter Stacks container images, you’re known internally as the user
jovyan
. However, theHOME
directory with Singularity has the same path than the host.Use the
-B
flag to mounthome_jupyter
as$HOME
.Solution
$ singularity exec \ -B home_jupyter:$HOME \ ipycytoscape_0.2.2.sif \ jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME
What have you got?
Comments
Well, yet another error message:
OSError: [Errno 30] Read-only file system: '/run/user'
The
/run
directory is a system directory used by some processes to store runtime files.
Third and last: container isolation
Files written in the
/run
directory are volatile, you don’t really need them.
A simple workaround is to use the container isolation variable-C
, or--containall
. This option isolates the container from the host, including the use of volatile/run
and/tmp
directories instead of the host ones, to better clean up the session upon exit. In this way no entry will be written in the host/run
.Try and add the
-C
flag to the runtime:Solution
$ singularity exec \ -C \ -B home_jupyter:$HOME \ ipycytoscape_0.2.2.sif \ jupyter notebook --no-browser --port=8888 --ip 0.0.0.0 --notebook-dir=$HOME
Comments
It has worked!
The terminal is waiting after having produced some relevant output:
[..] [I 08:47:16.450 NotebookApp] Use Control-C to stop this server and shut down all kernels (twice to skip confirmation). [C 08:47:16.454 NotebookApp] To access the notebook, open this file in a browser: file:///home/ubuntu/.local/share/jupyter/runtime/nbserver-16-open.html Or copy and paste one of these URLs: http://nimbus1:8888/?token=bde09af92ccaa4af48de43693ac3371b3fe9637f3a32c0b4 or http://127.0.0.1:8888/?token=bde09af92ccaa4af48de43693ac3371b3fe9637f3a32c0b4
Notice the long string after the keyword token=
in the last line. Select it and copy it in your clipboard.
Now, open your browser, and go to http://<YOUR NIMBUS VM'S IP>:8888
.
You’re asked for a token. Paste it from your clipboard.
Click on the notebook TipGeneExample.ipynb
.
Then Click on the Cell menu, and then on the item Run all.
Scroll up a bit… here is your graph for the gene network!
When you’re done, click on Logout on the top right of the page.
Then go back to the terminal, hit Ctrl-C
, enter y
, and hit Enter
to shutdown the server.
ROOM - Questions for Reflection
These are a set of questions designed to kick off the discussion within your breakout room:
- Look back at what you went through in the previous exercises. What was the most interesting learning outcome? Which was the hardest bit to digest?
- Do you use RStudio, Jupyter or other graphical web platforms?
- Do you see advantages in terms of reproducibility, portability and collaboration, that a containerised approach could bring in using this type of platforms?
- If you use RStudio or Jupyter, think about how many different packages and package dependencies you normally need. Do you think that a containerised approach might help in better managing large Python/R software stacks?
- In your analysis workflows, do you use any other application with a desktop window interface? (this is for the upcoming exercise…)
GUIDED - Launch a containerised X11 application
Now we’re going to see something new compared to the webinars.
A good number of Linux scientific packages provide a graphical interface by means of desktop windows. The X Window System, or X11, is a quite common framework to achieve this. It is cross-platform (Linux, macOS and Windows) and can handle windows both locally and from a remote server.
In this exercise we’re going to launch the genomics viewer IGV, by Broad Institute. We’ll use it to visualise a mutation in a human genome.
X11 requirements
-
Our local machine needs an X11 system. Linux boxes should all be good to go. On macOS we’ll need XQuartz; on Windows we’ll need MobaXterm or Cygwin/X.
-
If we’re running our application by remotely connecting to a Linux machine, that machine will need an X11 setup, too.
-
We’ll also need to enable X11 forwarding when establishing the SSH connection to that machine. This is why in the setup for this workshop we used
ssh
with the-X
flag (-Y
from macOS). -
There is one small requirement for the image that packages the X11 enabled application. It’s a utility called
xauth
, used to authenticate the X client in the container with the X server of the local machine. To this end, the Dockerfile (or def file) for the image needs to contain anapt
command (Ubuntu/Debian) to installxauth
. E.g. in a Dockerfile:[..] RUN apt-get install -y xauth [..]
To be precise, this package is required when running containers with Docker, not Singularity. It’s a good practice to embed it in order to enforce compatibility across different container engines.
-
One last thing to know before we run is that we need to bind mount an Xauthority file in the container. This file contains a secret string, or magic cookie, that is used by the X client to authenticate with the X server.
Usually this file is located in the user’s home directory, so the mounting will look like:
-B ~/.Xauthority
Less often, a dedicated location is used for this file which is contained in the
$XAUTHORITY
variable. In this case, this will do the job:-B $XAUTHORITY
.
Running the X11 container
We’ve built a container image for IGV ourselves, to ensure xauth
is included (for the curious, here is the Dockerfile). The image is already publicly available for download, so let’s pull it:
$ singularity pull docker://marcodelapierre/igv:2.8.3
Similar to the RStudio and Jupyter examples above, IGV will try to write config files in the home. Let’s bind mount an host directory to this end, in addition to ~/.Xauthority
.
Note how in this case the order of bind mounts matters: you need to mount the fake home, and then the ~/.Xauthority
, which lies in your host home. If you go the other way round, the fake home will hide that single file, and you’ll get an error like:
Can't connect to X11 window server using 'nimbus1:10.0' as the value of the DISPLAY variable.
This is the same error we would get if we forgot to mount the authority file, or if the local/remote X11 setup were incomplete.
And now let’s go!
$ singularity exec -B home_igv:$HOME -B ~/.Xauthority igv_2.8.3.sif igv
We will get a bunch of output lines in the terminal, and then… a window should open on your desktop!
We’ve prepared a batch script to open the required files and locate the mutation along the genome.
Click on the Tools menu, then Run Batch Script, select batch.igv
and then click Open.
That’s it!
When you’re done, click on File, then Exit.
Key Points