Streamline the user experience: bash wrappers, modules and SHPC

Overview

Teaching: 15 min
Exercises: 15 min
Questions
Objectives
  • Simplify containers usage by means of bash wrappers

  • Discuss how to deploy containers and their wrappers using modules

  • Discuss how to deploy container modules using SHPC

Can we standardise the use of containers, to simplify the required syntax?

Running containers using the host MPI requires setting a number of environment variables along with the additional standard syntax to run the container itself. There are several possible ways of simplifying commands: bash wrappers; modules and SHPC, which provides a framework combining the former two methods.

Now, let’s think about the typical usage of a containerised application.
Once the container image is available in the local disk, in the vast majority of cases you’ll use it to execute some command in this way. As a practical example, let’s grab the lolcow container used in earlier episodes.

singularity exec ./lolcow.sif <CMD> <ARGS>

As a plain, useful example, let’s suppose we want to get the help output from the cowsay command:

$ singularity exec ./lolcow.sif cowsay -h

We can break this into logical parts; let’s write a script called cowsay.1 for convenience:

#!/bin/bash

# point to the image directory
# and the name of the container respectively
image_dir="."
image_name="lolcow.sif"

# define the command
cmd="cowsay"

# and grab all commands passed to the script via the command line
args="$@"

# call singularity
singularity exec $image_dir/$image_name $cmd $args

Shell variables express tool- and command- specific information, such as the image location image_dir and name image_name. The command name, cmd, might change from command to command. How about the value we assigned to the command arguments variable, args? Well, that’s bash syntax. If you execute this script, bash will assign to $@ the full list of arguments that you append to the script in the command line.

To see this in practice,make the cowsay.1 script executable (using chmod) and run it with the -h argument:

$ chmod +x cowsay.1
$ ./cowsay.1 -h
cow{say,think} version 3.03, (c) 1999 Tony Monroe
Usage: cowsay [-bdgpstwy] [-h] [-e eyes] [-f cowfile]
          [-l] [-n] [-T tongue] [-W wrapcolumn] [message]

The generality of the wrapper script means that to write a script for the command lolcat, we need only change that line:

#!/bin/bash

image_dir="."
image_name="lolcow.sif"

cmd="lolcat"

args="$@"

singularity exec $image_dir/$image_name $cmd $args

From the output, you can see that the cowsay command actually got the -h flag right, and this was thanks to the usage of $@ in the script.

So to summarise this section, we’ve written a simple bash script that wraps around the Singularity exec approach, so that to run cowsay from a container you simply type:

$ ./cowsay.1 <ARGS>

Why the .1 extension? Well, this is just because the story is not over…

A (quite) general bash wrapper for containerised applications

In the first iteration of a bash wrapper for containerised commands, we need to provide 3 pieces of information in the script: image location, image name and command name. Can we further simplify and generalise this?

Yes. With a couple of extra bash commands and assumptions, we can make it so that the only required information will be the container image name.

First, let’s get rid of the command name. Let’s assume that we’re calling the wrapper with the same name of the command we want it to execute. Then, we’re going to use the bash variable $0; used inside a script, it contains the full path of the script itself; we’re also using the bash command basename, that extract a file or directory name out of its full path. The cmd variable becomes:

cmd="$(basename $0)"

Now let’s generalise the image location. Let’s assume that we’re storing the wrappers in the same directory where the image is located. Then, we can use the bash command dirname to extract the location of a file or directory out of its full path. The image_dir variable becomes:

image_dir="$(dirname $0)"

So we can now have a general bash wrapper for the commands from the container image lolcow.sif:

#!/bin/bash

image_dir="$(dirname $0)"
image_name="lolcow.sif"

cmd="$(basename $0)"

args="$@"

singularity exec $image_dir/$image_name $cmd $args

To create a wrapper for cowsay, all we have to do is to create a script named cowsay with that content. Then, we can do the same for any other commands such as fortune, lolcat, and so on. In fact, we need not even create files for each command. Instead we create a single script, e.g. named .lolcow_commands.sh, and then create appropriately named symbolic links for the commands, for instance:

$ ln -s .lolcow_commands.sh cowsay
$ ln -s .lolcow_commands.sh lolcat

How general is this approach?

Well, quite general probably. It can be used every time you would use containers with this Singularity syntax:

singularity exec <IMAGE> <CMD> <ARGS>

This will also work with MPI containers and Slurm, as the corresponding syntax does not impact such form:

mpirun -n <NTASKS> singularity exec <IMAGE> <CMD> <ARGS>
srun -n <NTASKS> singularity exec <IMAGE> <CMD> <ARGS>

So that now things become :

mpirun -n <NTASKS> <CMD> <ARGS>
srun -n <NTASKS> <CMD> <ARGS>

Of course there are some corner cases:

How to address dynamic bind mount set

Specifying the paths to be bind mounted as additional flags in the wrappers is not really general nor portable. What you want to do here is to use $SINGULARITY_BINDPATH to define the maximal required paths prior to execution of the application.
If you have a standard setup on your system, where all the data go under the same parent directory (e.g. /data), you might even want to define the variable in the startup scripts (~/.sourcerc,…). This can be quite a good practice in simplifying your production environment, and making it more robust.

The singularity module provided on Pawsey HPC systems adds /group and /scratch to the the bind path, so you don’t have to worry about bind mounting data directories at all.

Using modules to handle bash wrappers

So far in this episode, we’ve devised a scenario to deploy a containerised application in a streamlined way:

  1. define the container image you need;
  2. pull it in a directory;
  3. in that same directory, create bash wrappers for the commands you need to execute from that container.

If you’re in a system with lots of other applications, you might want to tidy up the environment by using modules. Here, we’re using the Environment Modules implementation; an alternative one is the Lmod module system. This tutorial provides Linux template installation scripts for both: see Environment Modules script and Lmod script.
Note that discussing modules in details is out-of-scope here, we’re just using them to show how to organise containerised applications.

All relevant bash wrapper scripts for our containerised application, e.g. are in a single location. To run this example, there’s already a directory made ready in your current work directory, $TUTO/demos/wrap_container, namely apps/lolcow/1.0.0/bin. It contains four bash wrappers:

$ ls apps/lolcow/1.0/bin
cowsay     fortune     lolcat     lolcow

To get ready for this example, let us also pull the BLAST image there:

$ singularity pull --dir apps/lolcow/1.0.0/bin

Now, we can think of a minimal modulefile to setup lolcow in our environment:

#%Module1.0######################################################################
##
## blast modulefile
##
proc ModulesHelp { } {
    puts stderr "\tModule for lolcow version 1.0.0\n"
    puts stderr "\tThis module uses the container image ???"
}

module-whatis   "edits the PATH to use the lolcow commands, version 1.0.0"

prepend-path     PATH            $env(TUTO)/demos/wrap_lolcow/apps/blast/1.0.0/bin

In general, the string associated to PATH will need to be customised case-by-case, same as the help and whatis strings.

A copy of this modulefile is under modulefiles/ in the current path.

Let’s try it! First we need to tell modules to look for modules in this directory:

$ module use $(pwd)/modulefiles
$ module avail
------------------------------ /somewhere/demos/wrap_lolcow/modulefiles ------------------------------------
lolcow/1.0.0  

------------------------------ /usr/share/modules/modulefiles ---------------------------------------------
dot  module-git  module-info  modules  null  use.own  

It’s there! Let’s load it:

$ module load lolcow/1.0.0

Can we now see the wrappers in there?

$ which cowsay
/somewhere/demos/wrap_lolcow/apps/blast/1.0.0/bin/cowsay

Sure! Let’s test it with the usual -h flag:

$ cowsay -h
cow{say,think} version 3.03, (c) 1999 Tony Monroe
Usage: cowsay [-bdgpstwy] [-h] [-e eyes] [-f cowfile]
          [-l] [-n] [-T tongue] [-W wrapcolumn] [message]

Containerised application with wrappers and modules: the experience looks like a traditional installation!

Latest: SHPC, a tool to the rescue for container modules

Singularity Registry HPC, or SHPC for short, is an extremely interesting project by some of the original creators of Singularity.
This utility enables the automatic deployment of so called Container Modules, using either Lmod or Environment Modules to provide access to bash wrappers. we have just presented in this episode.

This ever-growing repository of containerised applications already provides a number of Bioinformatics packages, which are typically run within containers. As an example, let’s see how we can install BLAST using SHPC. First, let’s look for available BLAST versions with shpc show:

$ shpc show --versions -f blast
quay.io/biocontainers/blast:2.10.1--pl526he19e7b1_3
quay.io/biocontainers/blast:2.11.0--pl5262h3289130_1
quay.io/biocontainers/blast:2.12.0--pl5262h3289130_0
ncbi/blast:2.11.0
ncbi/blast:2.12.0
ncbi/blast:latest

And now let’s install the latest BLAST biocontainer (copy-pasting the image and tag from the output above) with shpc install:

$ shpc install quay.io/biocontainers/blast:2.12.0--pl5262h3289130_0
singularity pull --name /home/ubuntu/singularity-hpc/containers/quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/quay.io-biocontainers-blast-2.12.0--pl5262h3289130_0-sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18.sif docker://quay.io/biocontainers/blast@sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18
INFO:    Converting OCI blobs to SIF format
INFO:    Starting build...
[..]
INFO:    Creating SIF file...
/home/ubuntu/singularity-hpc/containers/quay.io/biocontainers/blast/2.12.0--pl5262h3289130_0/quay.io-biocontainers-blast-2.12.0--pl5262h3289130_0-sha256:a7eb056f5ca6a32551bf9f87b6b15acc45598cfef39bffdd672f59da3847cd18.sif
Module quay.io/biocontainers/blast:2.12.0--pl5262h3289130_0 was created.

That’s it! We now have a BLAST module that provides all the BLAST applications.

Final thoughts on using wrappers

So, we’ve shown you how to effectively hide containers under the hood to provide a simplified user experience, while gaining in reproducibility, portability, productivity and more.

Why bothering with learning the longer story of the Singularity syntax then? Well, containers are a powerful technology, but also a complex one.
Even if you’re going to use them through a friendlier interface, it’s still crucial to know how thing work underneath, to be aware of the corresponding limitations, and possibly also to be able to fix the setup when things go wrong.

Key Points

  • It is possible to devise a fairly general wrapper template for containerised applications

  • The key information to setup the wrappers is the container image, and the commands one needs to run from that image

  • It is possible to write a minimal modulefile, that allows to setup the shell environment to use containerised applications through wrappers

  • SHPC uses bash functions and automates the process of creating container modules