Docker has gained significant traction and stability since its beginnings in 2013. My previous experience was that creating complex application containers with multiple services was complicated and full of workarounds. Since then, Docker has taken over the linking tool Fig into Docker Compose, and has gained an interesting set of abstractions towards container orchestration. Still, there are many pitfalls regarding Docker use in both development and production. This post aims at giving an introduction to the possibilities of Docker and its limitations for isolated development environments.

Docker

Should you have never heard of Docker before, it is an open-source project to enapsulate applications into so-called containers by exploiting virtualization at OS level. (If you're already familiar with Docker and Compose, you can skip the introduction and move forward to the Pitfalls).

These containers are executed in the userspace, running isolated comparably to a virtual machine, however without the overhead of running the application in a fully virtualized guest system.

The Docker logo.

The Docker™ logo

Docker's website describes itself as:

Docker is a platform for developers and sysadmins to develop, ship, and run applications. Docker lets you quickly assemble applications from components and eliminates the friction that can come when shipping code. Docker lets you get your code tested and deployed into production as fast as possible.

Docker allows you to perform the following tasks:

1. Leveraging existing containers

You can use containers built by other users which are available in the Docker Hub to run services with a single command.

The following command fetches the latest container from the postgres hub entry and starts it. The local (host) port is mapped to the container's exposed port (5432).

1
docker run --name psql-service -p 5432:5432 -d postgres

You can then access the postgres server locally:

1
2
3
4
5
$ PGPASSWORD=postgres psql -h localhost:5432
psql (9.4.0)
Type "help" for help.

postgres=#

2. Dockerizing

You can run commands and processes inside existing containers. For example, to spawn a shell on an ubuntu 15.04 image simply call:

1
2
$ sudo docker run -t -i ubuntu:15.04 /bin/bash
root@e0e45abf1fa7:/#

Without any connection to your host, this functionality doesn't seem particularly useful at the moment. It will however show its capabilities later on.

3. Building container images

To construct containers like those we have run above (postgres, ubuntu), Docker provides its own build syntax. The syntax defines step-by-step instructions to install dependencies and add data within a so-called Dockerfile.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
# FROM defines a base image (pulled from the Docker Hub).
FROM ubuntu:15.04

# RUN executes initialization commands within the container
# e.g., installing packages.
RUN apt-get update && apt-get install -y perl

# ADD copies files from the directory of the Dockerfile
# into the container at build time.
ADD ./my-app.pl /root/my-app.pl

# CMD defines the command to execute when starting the container.
# May be overriden from command-line.
CMD ["perl /root/my-app.pl"]

4. Building and linking ensembles of containers

With the Docker CLI, you can give names to containers and use --link to define complex applications consisting of multiple services.

I won't go into any detail, as the orchestration tool fig (announced to be included as docker-compose into the Docker stack in late 2014) hides the complexity of defining ensembles of containers with a simpler interface. For the remainder of this post, we thus concentrate on the orchestration part.

This is only a glimpse of Docker. If you want to delve deeper into Docker itself, see their user guide. If you want to delve into Dockerfiles, here is a complete reference of available syntax commands.

Installation

On Linux, Docker runs natively and installing both is thus rather straightforward, so I refer to their documentation.

To install the latest version of docker-compose, execute

Mac OS X

Docker relies on certain features (especially cgroups and namespaces for process isolation) of the Linux kernel that are not available in OS X.

An alternative is to run the docker daemon in a Linux-based virtual machine. Probably the most common way to use docker on OS X is boot2docker. Its a wrapper script to building a VirtualBox machine with Tiny Core Linux and configuring the local shell to forward docker commands to the VM. The main downside of this approach is that it introduces another layer of abstraction.

If you use homebrew, you can install boot2docker.

1
2
3
brew update
brew install docker
brew install boot2docker

Note that you still need to manually install VirtualBox, either from their site or by using brew cask:

1
2
brew install caskroom/cask/brew-cask
brew cask install virtualbox

You can then use the boot2docker script to create and manage the docker VirtualBox VM.

1
2
3
4
5
6
7
8
9
# Create the VM. Needed only once
boot2docker init

# Start the VM.
boot2docker start

# Tell the Docker client where to connect to
# Needs to be executed on every session (or add to ~/.bashrc)
eval "$(boot2docker shellinit)"

Docker Compose (previously Fig)

Docker Compose integrates seamlessly into to Docker ecosystem in that it wraps the process of manually linking containers into a simpler YAML-based structure. With it, the complexity of bringing a set of two or more containers (such as a Rails application container with a separate database container) can be described in a single, separate file docker-compose.yml.

Note that Docker Compose does not save you the effort of learning the Dockerfile syntax (unless you only concatenate existing container images). But, you will be able to avoid manually calling the docker binary for the majority of usages.

To install the latest version (1.2.0 as of 2015-04-16):

1
2
curl -L https://github.com/docker/compose/releases/download/1.2.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose

Configuration file

The docker-compose.yml (formerly fig.yml) allows you to define a set of services based on an image or Dockerfile and determine the connections to:

  • the outer world (i.e., ports forwarded to the docker daemon host)
  • the other containers

Each section in the YAML defines a separate container service. Compose provides a syntax that closely resembles the docker command-line arguments:

  • image Specifies a Docker Hub image to build the container from
  • build Relative or absolute path to the Dockerfile to build that service. The directory containing the Dockerfile will be available in the Dockerfile (i.e., the docker context).
  • volumes Define persistent volumes mapped into the container <path> or synced from the docker host <host path>:<guest path>
  • volumes_from Import all volumes from another services defined in this file.
  • links Links this service to another container, adding that name to /etc/hosts and injecting a set of environment variables.
  • ports A set of ports forwarded between the host and the container (<host port>:<container port>). When omitting the host part, one is selected randomly.
  • expose Exposes a list of ports from the service for other containers only.

A complete reference with further information on the available commands can be found in the compose user guide.

Example

The following example defines three services:

  1. A PostgreSQL 9.4 server (based on the official postgres image)
  2. The web container, built from the Dockerfile in the same directory.
  3. A Data container to persist PostgreSQL data

docker-compose.yml

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
postgres:
  image: "postgres:9.4"
  volumes_from:
    - data
  expose:
    - 5432
web:
  build: .
  ports:
    - "3000:3000"
  links:
    - postgres
  volumes:
    - ./:/app
data:
  image: cogniteev/echo
  command: echo 'Data Container for PostgreSQL'
  volumes:
    - /var/lib/postgresql/data

Pitfalls

Defining the stack of services was straightforward. However, a number of issues remain concerning the provisioning of services (e.g., creating the database or compiling assets) and the performance of host-mounted volumes.

Mounting host volumes

The volume directive ./:/app mounts the current directory into the container. On OS X, this is routed through shared folders in VirtualBox. This is painfully slow, especially when sharing a lot of files. This especially is a problem for development, where a synchronized code base with a running container is crucial.

A number of solutions exist to work around this problem, [which I detail in this separate blog post](/2015/05/docker-host-volume-synchronization/. There are two main strategies: Exposing and synchronizing host directories (e.g., through NFS mounts or rsync), or adding a synchronization volume container (e.g., docker-unison).

Using the Dockerfile for provisioning

A common set of workarounds are using the Dockerfile for development provisioning (such as database setup or common preparation tasks). If you use volumes to mount your code into a dockerized service, this volume will not be available in the container build phase, that is, during the execution of the Dockerfile. However, most often you have to refer to files from your application to determine dependencies to install, or scripts to create and migrate a database.

Using a separate (data) container

A data container is borrowing a persistent volume from another container. This is useful for data that should be available for multiple builds of a container, such as installed packages (e.g., rubygems, perl packages).

In the compose configuration, borrowing a volume with a data container works by creating a service with a container-side volume (without a host mount part) and importing it with the volumes_from directive.

The following example creates such a persistent volume on the data container for a postgres database. If the postgres service is destroyed or rebuilt, its data remains available on that volume. However, there is no direct access from the host to that volume.

docker-compose.yml

1
2
3
4
5
6
7
8
9
10
11
postgres:
  image: "postgres:9.4"
  volumes_from:
    - data
  expose:
    - 5432
data:
  image: cogniteev/echo
  command: echo 'Data Container for PostgreSQL'
  volumes:
    - /var/lib/postgresql/data

When using package tools, you have to tell it to use the persistent volume to install packages to.

For bundler, you can define that path inside the Dockerfile to use a custom path at, e.g., /bundler:

1
RUN bundle config path /bundler

Accessing host data for provisioning

Even if you create persistent containers, you still cannot directly use host application data on the mounted volume you later use in the container. While there are commands to add host files to the container during the build phase (e.g., ADD (and the deprecated COPY)), this isn't really useful to automatically provision a development environment in the Dockerfile.

A common workaround is to ADD a careful selection of files into the container to pull dependencies. For example, a Gemfile can be copied in the Dockerfile and then installed to a data container during build.

I think that this is one of the major missing features of Docker for quickly rolling out a development environment on a new machine.

Accessing linked services during build phase

This is very similar to the above issue. Linked services are not available in the build phase. If you use rails and want to create and migrate the database upon the container construction, this is something you have to run manually with docker-compose run after building the image.

Imagine we have executed a Dockerfile for a rails environment with docker-compose build web. Then, you can use the linked services from docker-compose with:

1
docker-compose run web rake db:migrate

Unavailability of development helpers

If you live in the world of Rails development, there are a ton of incredible helpers that speed up development. Gems like letter_opener or capybara-screenshot display mails and test information in the local browser. In a dockerized application, these helpers are either non-functional or will fail with errors regarding X session allocation. The same applies for testing frontend matters with, e.g., Selenium, as it opens a live browser instance (unless used solely with PhantomJS). While there is a way to share a local X session with the container, this really raises more questions than it answers (regarding portability) and requires even more manual interaction with the container.

Conclusion

If you want to get a new team member up to speed quickly with a complex application stack, Docker has become one of the best approaches towards that goal. With its recent inclusion of Compose into the Docker workflow, defining complex applications with a whole number of linked services was made trivial. However, preparing and maintaining the services of a complex application (e.g., migrating a database service) requires additional manual work and a number of issues remain unsolved.

For my applications, a major missing component is some kind of provisioning functionality in the Dockerfile, which would allow me to use linked services and host volumes for bootstrapping databases and preparing the application code for development. Along with the remaining issues, I have somewhat mixed feelings towards Docker as a development environment. I think that it is a great way to get a local development environment started in a few, simple steps. But, if you rely on advanced tools in your development process, Docker may be of limited use when compared to a carefully crafted VM.

You can find an application of the ideas in this blog in my Docker setup for OpenProject. I also plan another blog post with the concrete dockerization process of an application stack with database, LDAP, and single-sign on services.

Resources

A number of interesting posts and resources on Docker for both development and production / deployment.