A short introduction to Python containers
Overview
Teaching: 5 min
Exercises: 10 minQuestions
Objectives
Get ready for the session
First thing, we need to download the workshop materials from Github. cd
to a directory of your choice, and then:
$ git clone https://github.com/PawseySC/containers-astronomy-workshop.git
$ cd containers-astronomy-workshop/exercises
$ export EXERCISES=$(pwd)
Here we’re defining the variable EXERCISES
, pointing to the subdirectory of the repository that contains inputs and scripts for the various examples.
Trick: get rid of sudo docker
Docker requires administrative rights to be used, so in principle every command requires sudo, as in
sudo docker
.To save typing, you may want to add your user to the
docker
user group:$ sudo usermod -aG docker $USER
then exit the terminal session and open a fresh new one.
From now on, you can run
docker
without sudo.
Note that under the hooddocker
commands will still require admin rights.
A principle for containerised applications
When containerising sets of applications, two approaches are possible:
- one container per application, or
- one container for the whole software stack of a workflow
Often a workflow relies on a set of standalone packages, for instance a bunch of C applications in bioinformatics. In this case, we typically advise for the first approach, that is simpler to maintain and more modular, in that changes in one application or workflow tasks do not impact the others.
However, the situation is different when a workflow is built upon a set of packages that are written in Python (or R): in this case, very often these rely on a large set of dependencies, creating a complex dependency tree. Here, building distinct containers for each package would imply building multiple dependency trees, with potential issues in maintenance and even inconsistencies at runtime, that might manifest when incompatible versions of the same dependency are used for distinct packages.
In this context, then, we recommend to create one single container for the full set of Python (or R) packages required for a given workflow.
Using Singularity and Docker for containers in HPC
In HPC clusters Docker is not usable for running containers, mostly due to the security issues (it requires admin rights to run!). For this reason, Singularity is used instead at runtime.
However, for building containers the Docker image format and the Dockerfile specification are more popular, and cross-compatible, than the Singularity counterparts. Therefore, we suggest to use Docker for building container images; you can do that in your laptop, on a workstation, or a cloud instance, provided you have admin rights on that machine.
Some caveats in running Python containers
-
Clean shell environment
As Python is often used by system services and utilities, it is common in HPC clusters to have Python related variables defined in the shell environment. These may include
PYTHONPATH
,PYTHONUSERBASE
,PYTHONSTARTUP
and others.
Now, Singularity by default favours integration over isolation, and thus passes the host shell environment onto the container.As a result of these two factors, a Python container may show unintended and uncontrolled behaviours.
As an example, try and open a Python console using the
docker://python:3.8-slim
container image:$ singularity exec docker://python:3.8-slim python3
[..] Python 3.8.5 (default, Aug 4 2020, 16:24:08) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
This run is fine; close the shell typing
exit
, or pressingCtrl-D
.But now suppose the host variable
PYTHONSTARTUP
is defined:$ export PYTHONSTARTUP="/etc/pythonstart" $ singularity exec docker://python:3.8-slim python3
Python 3.8.5 (default, Aug 4 2020, 16:24:08) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. Could not open PYTHONSTARTUP FileNotFoundError: [Errno 2] No such file or directory: '/etc/pythonstart' >>>
Now you get a warning as that path is non existent in the container.
This is innocuous, but the interference of host Python variables is often a source of errors.To be safe, always run Python containers with
singularity exec -e
, or--cleanenv
, to isolate the container shell environment:$ export PYTHONSTARTUP="/etc/pythonstart" $ singularity exec -e docker://python:3.8-slim python3
Python 3.8.5 (default, Aug 4 2020, 16:24:08) [GCC 8.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
WARNING for MPI applications: it turns out that resource managers, such as Slurm, make heavy use of environment variables to properly spawn MPI jobs. In this case the
-e
option will break the MPI run. The workaround here is to not use-e
, but insteadunset
any Python-related variables in the session for good. Have a look at this convenient one-liner:$ unset $( env | grep ^PYTHON | cut -d = -f 1 | xargs ) $ srun singularity exec docker://python:3.8-slim python3 <PYTHON SCRIPT FILE>
-
Singularity syntax to pass shell variables
As the tip no. 1 above suggests to isolate the container shell environment from the host one, how can you then set environment variables in the container when you need to?
Well, you can use the dedicated Singularity syntax. This involves setting the required variable in the host, prepended with the prexif
SINGULARITYENV_
, as in:$ export SINGULARITYENV_VARIABLE="value"
Then, you get the variable in the container:
$ singularity exec -e docker://python:3.8-slim bash -c 'echo $VARIABLE'
value
-
Mounting a fake home for writing files
It may happen that some packages need to write service or configuration files in the user’s
HOME
directory. This can be an issue on HPC clusters, where user’s homes might not be mounted by default:$ singularity exec -e docker://python:3.8-slim ls $HOME
ls: cannot access '/home/ubuntu': No such file or directory
You might be tempted to bind mount it at runtime, but this is not your best option: we recommend AVOIDING mounting home for improved security.
Instead, you can create a service directory in the host filesystem, to be used as a fake home in your container:
mkdir fake_home $ singularity exec -e -B fake_home:$HOME docker://python:3.8-slim touch $HOME/testfile
Then, files written in the fake home will be available from the host; you can keep them for future sessions if you want/need to, or clean them up after runtime:
$ ls fake_home
testfile
Key Points