Breakout room 3: design container images

Overview

Teaching: 0 min
Exercises: 20 min
Questions
Objectives
  • Write Dockerfiles for real-world applications

Goal

In this session, you’re going to use Docker to build two container images, in particular one RStudio image and one Conda image. For each of them, you’ll pick an appropriate base image, write the Dockerfile and action the build.

The first step in designing a Dockerfile is to choose a base image, that is the starting point for our container.

The best place to find useful base image is the Docker Hub online registry. Here is a non-comprehensive list of potentially useful base images (the version tags are not necessarily the most recent ones, but they’re relevant to this exercise):

IMPORTANT: which base images to use for the next steps?

As we don’t continuously update the content of this tutorial, in the following exercises please use only image tags from the list above, and only for the suggested image names.

We’ve pre-cached the images that are relevant to this tutorial image in the virtual machine, so that the following pull processes should only take a few seconds.

Exercise 1: Write an RStudio Dockerfile

The first exercise in this session is to write a little Dockerfile for the R package ggtree. This package provides functionalities to represent phylogenetic trees using R, by building on top of the Tidyverse collection of data science packages, and in particular the plotting library ggplot2.

First, cd into the appropriate directory:

cd /data/bio-intro-containers/exercises/build_examples/r-ggtree

Use a text editor to create a blank Dockerfile; both nano and vi are available, pick your favourite.

Choose the base image

Considering the characteristic that we have just stated for ggtree, an R package based on Tidyverse, which is the closest base image you would choose from the list of useful images above?

Solution

We need R and Tidyverse, so let’s go with rocker/tidyverse:3.6.1.

Now we need to declare this image in the Dockerfile, using an appropriate Docker instruction.

Solution

FROM rocker/tidyverse:3.6.1

Command to install an R package

The package ggtree is part of the BioConductor project. So, from inside an R console we could install it by using the command BiocManager::install("ggtree"). However, here we need Docker to execute this command from a bash shell.

If you are an R user, do you know how you can execute the R command above from the shell? No worries if you don’t, just have a look at the solution.

Solution

R -e 'BiocManager::install("ggtree")'

We’re almost there with our first Dockerfile… now we just need to embed the shell command above in the Dockerfile, by using the appropriate Docker instruction.

Solution

FROM rocker/tidyverse:3.6.1

RUN R -e 'BiocManager::install("ggtree")'

Very often (albeit not always) preparing Dockerfiles for R images looks as simple as this. Other times, you will also need to install other packages such as pre-requisites. And of course things can even get more complicated than this.

Document your Dockerfile with labels

It’s a good practice to add information in your Dockerfile in the form of “labels”, for instance an email contact for the person who developed it.
In this case, just add your name, using the appropriate Docker instruction.

Solution

FROM rocker/tidyverse:3.6.1

LABEL maintainer="Myself" 

RUN R -e 'BiocManager::install("ggtree")'

Build the image

Now it’s finally the time for building!
Let’ use the appropriate docker syntax in the shell to build an image called ggtree:2.0.4; remember we’re running from the directory where the Dockerfile is (.).

Solution

sudo docker build -t ggtree:2.0.4 .

It will only take a couple of minutes to build, as most required R packages are already provided by the base Tidyverse image.

Note how here we provided the information about the package version; normally we would be able to find it out ourselves after the first build, by inspecting the R installation in the container.

Bonus: test that the image works

If you have time, run the following command from the Docker image you just built, by using sudo docker run <IMAGE> <COMMAND>, to query the package version:

R -e 'packageVersion("ggtree")'

Solution

sudo docker run ggtree:2.0.4 R -e 'packageVersion("ggtree")'
R version 3.6.1 (2019-07-05) -- "Action of the Toes"
Copyright (C) 2019 The R Foundation for Statistical Computing
Platform: x86_64-pc-linux-gnu (64-bit)

[..]

> packageVersion("ggtree")
[1] ‘2.0.4’
> 
> 

The container seems to be working, and ggtree is at version 2.0.4.

Converting the Docker image into Singularity format

We’re not doing it now to save time, but remember that the last step required to use the built image with Singularity is to turn it into a SIF file. You’ll need a Singularity installation in conjunction with Docker:

singularity pull docker-daemon:ggtree:2.0.4

Exercise 2: Write a Conda Dockerfile

In this second exercise you’re going to build an image for the popular bioinformatics tool samtools; you’ll use the Conda package manager to install it.

First, cd into the appropriate directory, and then create a blank Dockerfile with a text editor:

cd /data/bio-intro-containers/exercises/build_examples/conda-samtools

First, pick the base image

Have a look back at the list of suggested base images above; which one would you pick for a Conda installation?

Solution

The best match is continuumio/miniconda3:4.8.2.

Now embed the base image declaration in the Dockerfile.

Solution

FROM continuumio/miniconda3:4.8.2

Second, write the command to install a Conda package

Samtools version 1.9 can be installed with Conda through the channel bioconda, so the shell syntax would be (-y is to confirm prompts):

conda install -y -c bioconda samtools=1.9

With this information, complete the Dockerfile to install samtools using Conda.

Solution

FROM continuumio/miniconda3:4.8.2

RUN conda install -y -c bioconda samtools=1.9

Similar to the case of R, Dockerfiles often turn out to be quite compact when using a conda base image. Things are not always this easy, as for instance package version conflicts are common with Conda; additional command lines might be required to work around them, or you might even need to install packages entirely manually.

In the interest of time, you’re not going to build this Conda-based image.

Bonus: example Dockerfiles

You may have a look at these, to get a taste of what more articulated Dockerfiles look like.

A large R image

FROM rocker/tidyverse:latest

RUN apt-get update -qq && apt-get -y --no-install-recommends install \
      autoconf \
      automake \
      g++ \
      gcc \
      gfortran \
      make \
      && apt-get clean all \
      && rm -rf /var/lib/apt/lists/*

RUN mkdir -p $HOME/.R
COPY Makevars /root/.R/Makevars

RUN Rscript -e "library('devtools')" \
      -e "install_github('Rdatatable/data.table', build_vignettes=FALSE)" \
      -e "install.packages('reshape2')" \
      -e "install.packages('fields')" \
      -e "install.packages('ggbeeswarm')" \
      -e "install.packages('gridExtra')" \
      -e "install.packages('dynamicTreeCut')" \
      -e "install.packages('DEoptimR')" \
      -e "install.packages('http://cran.r-project.org/src/contrib/Archive/robustbase/robustbase_0.90-2.tar.gz', repos=NULL, type='source')" \
      -e "install.packages('dendextend')" \
      -e "install.packages('RColorBrewer')" \
      -e "install.packages('locfit')" \
      -e "install.packages('KernSmooth')" \
      -e "install.packages('BiocManager')" \
      -e "source('http://bioconductor.org/biocLite.R')" \
      -e "biocLite('Biobase')" \
      -e "biocLite('BioGenerics')" \
      -e "biocLite('BiocParallel')" \
      -e "biocLite('SingleCellExperiment')" \
      -e "biocLite('GenomeInfoDb')" \
      -e "biocLite('GenomeInfgoDbData')" \
      -e "biocLite('DESeq')" \
      -e "biocLite('DESeq2')" \
      -e "BiocManager::install(c('scater', 'scran'))" \
      -e "library('devtools')" \
      -e "install_github('IMB-Computational-Genomics-Lab/ascend', ref = 'devel')" \
      && rm -rf /tmp/downloaded_packages

Samtools compiled in the image

FROM ubuntu:18.04

# Image metadata
LABEL maintainer="john.doe@nowhere.com"

# Define version as build variable
ARG SAM_VER="1.9"

# Good practice variables
ENV DEBIAN_FRONTEND="noninteractive"
ENV LANG="C.UTF-8" LC_ALL="C.UTF-8"

# Install apt dependencies
RUN apt-get update && \
    apt-get -y install \
      gcc \
      libbz2-dev \
      libcurl4-openssl-dev \
      liblzma-dev \
      libncurses5-dev \
      libncursesw5-dev \
      make \
      perl \
      tar \
      vim \
      wget \
      zlib1g-dev \
    && apt-get clean all && \
    apt-get purge && \
    rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*

# Build samtools
RUN mkdir /build && \
    cd /build && \
    wget https://github.com/samtools/samtools/releases/download/${SAM_VER}/samtools-${SAM_VER}.tar.bz2 && \
    tar -vxjf samtools-${SAM_VER}.tar.bz2 && \
    cd samtools-${SAM_VER} && \
    ./configure --prefix=/apps && \
    make && \
    make install && \
    cd htslib-${SAM_VER} && \
    make && \
    make install && \
    cd / && \
    rm -rf /build

# Define PATH variable
ENV PATH=/apps/bin:$PATH

# Default command to be bash
CMD ["/bin/bash"]

Key Points

  • Picking the appropriate base image is key and can save you lots of work

  • The most important instructions in a Dockerfile are ‘FROM’ to select the base image and ‘RUN’ to execute commands

  • Build a container image with Docker using docker build

  • Convert a Docker image into Singularity format by using singularity pull docker-daemon: