Container workflows at Pawsey: Glossary

Key Points

Introduction to Docker
  • You’ve learned some basic Docker commands for running, downloading, and searching for docker images

  • docker run for running images (add -it for interactive mode)

  • docker pull for downloading images

  • docker search for searching images

  • Other useful commands to list current images and containers are docker images and docker ps -a

Cleaning up Docker
  • Cleaning up containers and images is a two-step process

  • Remove stopped containers with docker rm

  • Delete unnecessary images with docker rmi

  • docker run --rm allows to automatically remove containers at completion

Sharing files with the host with Docker
  • Map host directories in the containers with the flag -v <host dir>:<container dir>

Long running services with Docker
  • You-ve learned how to run long-running services (like a web server) through containers

  • Use the flag -d to run the containers in background

  • Use the flag -p <host port:<container port> to map communication ports

  • Additional options to manage and query containers include --name and docker logs

Build your own container image with Docker
  • A Dockerfile is a recipe that uses specific instructions to direct the image building process

  • docker build is used to build images

  • docker push is used to push images to a web registry

Docker advanced topics
  • Change user that runs the container with the flag -u <user>:<group>

  • Make the container accept input from STDIN with the flag -i

  • You can use a Docker Compose YAML file to orchestrate the setup of multiple containers at once

Run containers on HPC with Shifter (and Singularity)
  • Shifter has a quite simple syntax that allows to pull, manage and run containers on HPC systems

  • shifter pull and shifter run are the key commands

A bioinformatics example: BLAST
  • There are a lot of applications (not just bioinformatics) already wrapped up in container images

  • Here’s a small list of some of the registries we use at Pawsey:

  • Docker Hub

  • Biocontainers

  • Quay^

  • Nvidia GPU Cloud (NGC)^

  • ^The last two require you to create an account and login to access containers

RStudio deployment for fun and profit
  • Containers are great way to manage R workflows. You likely still want to have a local installation of R/Rstudio for some testing, but if you have set workflows, you can use containers to manage them. You can also provide Rstudio servers for collaborators

Bioinformatics meets RStudio in containers
  • Containers are great way to manage R workflows. You likely still want to have a local installation of R/Rstudio for some testing, but if you have set workflows, you can use containers to manage them. You can also provide Rstudio servers for collaborators

  • Also, docker-compose is a great way to manage complex Docker commands, as well as coordinating multiple containers

Making Python not awful with containers
  • Containers are great way to manage Python workflows

Containers for machine learning
  • Ship machine learning frameworks through containers to simplify deployment and increase portability

Molecular dynamics with GPU containers
  • You can use containers to ship GPU applications

Glossary

FIXME