Docker has gained significant traction and stability since its beginnings in 2013. My previous experience was that creating complex application containers with multiple services was complicated and full of workarounds. Since then, Docker has taken over the linking tool Fig into Docker Compose, and has gained an interesting set of abstractions towards container orchestration. Still, there are many pitfalls regarding Docker use in both development and production. This post aims at giving an introduction to the possibilities of Docker and its limitations for isolated development environments.
Docker
Should you have never heard of Docker before, it is an open-source project to enapsulate applications into so-called containers by exploiting virtualization at OS level. (If you're already familiar with Docker and Compose, you can skip the introduction and move forward to the Pitfalls).
These containers are executed in the userspace, running isolated comparably to a virtual machine, however without the overhead of running the application in a fully virtualized guest system.
Docker's website describes itself as:
Docker is a platform for developers and sysadmins to develop, ship, and run applications. Docker lets you quickly assemble applications from components and eliminates the friction that can come when shipping code. Docker lets you get your code tested and deployed into production as fast as possible.
Docker allows you to perform the following tasks:
1. Leveraging existing containers
You can use containers built by other users which are available in the Docker Hub to run services with a single command.
The following command fetches the latest container from the postgres hub entry and starts it. The local (host) port is mapped to the container's exposed port (5432).
1
docker run --name psql-service -p 5432:5432 -d postgres
You can then access the postgres server locally:
1
2
3
4
5
$ PGPASSWORD=postgres psql -h localhost:5432
psql (9.4.0)
Type "help" for help.
postgres=#
2. Dockerizing
You can run commands and processes inside existing containers. For example, to spawn a shell on an ubuntu 15.04 image simply call:
1
2
$ sudo docker run -t -i ubuntu:15.04 /bin/bash
root@e0e45abf1fa7:/#
Without any connection to your host, this functionality doesn't seem particularly useful at the moment. It will however show its capabilities later on.
3. Building container images
To construct containers like those we have run above (postgres, ubuntu), Docker provides its own build syntax. The syntax defines step-by-step instructions to install dependencies and add data within a so-called Dockerfile
.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
# FROM defines a base image (pulled from the Docker Hub).
FROM ubuntu:15.04
# RUN executes initialization commands within the container
# e.g., installing packages.
RUN apt-get update && apt-get install -y perl
# ADD copies files from the directory of the Dockerfile
# into the container at build time.
ADD ./my-app.pl /root/my-app.pl
# CMD defines the command to execute when starting the container.
# May be overriden from command-line.
CMD ["perl /root/my-app.pl"]
4. Building and linking ensembles of containers
With the Docker CLI, you can give names to containers and use --link
to define complex applications consisting of multiple services.
I won't go into any detail, as the orchestration tool fig (announced to be included as docker-compose
into the Docker stack in late 2014) hides the complexity of defining ensembles of containers with a simpler interface.
For the remainder of this post, we thus concentrate on the orchestration part.
This is only a glimpse of Docker. If you want to delve deeper into Docker itself, see their user guide. If you want to delve into Dockerfiles, here is a complete reference of available syntax commands.
Installation
On Linux, Docker runs natively and installing both is thus rather straightforward, so I refer to their documentation.
To install the latest version of docker-compose, execute
Mac OS X
Docker relies on certain features (especially cgroups and namespaces for process isolation) of the Linux kernel that are not available in OS X.
An alternative is to run the docker daemon in a Linux-based virtual machine. Probably the most common way to use docker on OS X is boot2docker. Its a wrapper script to building a VirtualBox machine with Tiny Core Linux and configuring the local shell to forward docker commands to the VM. The main downside of this approach is that it introduces another layer of abstraction.
If you use homebrew, you can install boot2docker.
1
2
3
brew update
brew install docker
brew install boot2docker
Note that you still need to manually install VirtualBox, either from their site or by using brew cask:
1
2
brew install caskroom/cask/brew-cask
brew cask install virtualbox
You can then use the boot2docker
script to create and manage the docker VirtualBox VM.
1
2
3
4
5
6
7
8
9
# Create the VM. Needed only once
boot2docker init
# Start the VM.
boot2docker start
# Tell the Docker client where to connect to
# Needs to be executed on every session (or add to ~/.bashrc)
eval "$(boot2docker shellinit)"
Docker Compose (previously Fig)
Docker Compose integrates seamlessly into to Docker ecosystem in that it wraps the process of manually linking containers into a simpler YAML-based structure.
With it, the complexity of bringing a set of two or more containers (such as a Rails application container with a separate database container) can be described in a single, separate file docker-compose.yml
.
Note that Docker Compose does not save you the effort of learning the Dockerfile syntax (unless you only concatenate existing container images).
But, you will be able to avoid manually calling the docker
binary for the majority of usages.
To install the latest version (1.2.0 as of 2015-04-16):
1
2
curl -L https://github.com/docker/compose/releases/download/1.2.0/docker-compose-`uname -s`-`uname -m` > /usr/local/bin/docker-compose
chmod +x /usr/local/bin/docker-compose
Configuration file
The docker-compose.yml
(formerly fig.yml) allows you to define a set of services based on an image or Dockerfile and determine the connections to:
- the outer world (i.e., ports forwarded to the docker daemon host)
- the other containers
Each section in the YAML defines a separate container service. Compose provides a syntax that closely resembles the docker command-line arguments:
- image Specifies a Docker Hub image to build the container from
- build Relative or absolute path to the Dockerfile to build that service. The directory containing the Dockerfile will be available in the Dockerfile (i.e., the docker context).
- volumes Define persistent volumes mapped into the container
<path>
or synced from the docker host<host path>:<guest path>
- volumes_from Import all volumes from another services defined in this file.
- links Links this service to another container, adding that name to
/etc/hosts
and injecting a set of environment variables. - ports A set of ports forwarded between the host and the container (
<host port>:<container port>
). When omitting the host part, one is selected randomly. - expose Exposes a list of ports from the service for other containers only.
A complete reference with further information on the available commands can be found in the compose user guide.
Example
The following example defines three services:
- A PostgreSQL 9.4 server (based on the official postgres image)
- The web container, built from the
Dockerfile
in the same directory. - A Data container to persist PostgreSQL data
docker-compose.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
postgres:
image: "postgres:9.4"
volumes_from:
- data
expose:
- 5432
web:
build: .
ports:
- "3000:3000"
links:
- postgres
volumes:
- ./:/app
data:
image: cogniteev/echo
command: echo 'Data Container for PostgreSQL'
volumes:
- /var/lib/postgresql/data
Pitfalls
Defining the stack of services was straightforward. However, a number of issues remain concerning the provisioning of services (e.g., creating the database or compiling assets) and the performance of host-mounted volumes.
Mounting host volumes
The volume directive ./:/app
mounts the current directory into the container. On OS X, this is routed through shared folders in VirtualBox. This is painfully slow, especially when sharing a lot of files.
This especially is a problem for development, where a synchronized code base with a running container is crucial.
A number of solutions exist to work around this problem, [which I detail in this separate blog post](/2015/05/docker-host-volume-synchronization/. There are two main strategies: Exposing and synchronizing host directories (e.g., through NFS mounts or rsync), or adding a synchronization volume container (e.g., docker-unison).
Using the Dockerfile for provisioning
A common set of workarounds are using the Dockerfile for development provisioning (such as database setup or common preparation tasks). If you use volumes to mount your code into a dockerized service, this volume will not be available in the container build phase, that is, during the execution of the Dockerfile. However, most often you have to refer to files from your application to determine dependencies to install, or scripts to create and migrate a database.
Using a separate (data) container
A data container is borrowing a persistent volume from another container. This is useful for data that should be available for multiple builds of a container, such as installed packages (e.g., rubygems, perl packages).
In the compose configuration, borrowing a volume with a data container works by creating a service with a container-side volume (without a host mount part) and importing it with the volumes_from
directive.
The following example creates such a persistent volume on the data
container for a postgres database.
If the postgres service is destroyed or rebuilt, its data remains available on that volume.
However, there is no direct access from the host to that volume.
docker-compose.yml
1
2
3
4
5
6
7
8
9
10
11
postgres:
image: "postgres:9.4"
volumes_from:
- data
expose:
- 5432
data:
image: cogniteev/echo
command: echo 'Data Container for PostgreSQL'
volumes:
- /var/lib/postgresql/data
When using package tools, you have to tell it to use the persistent volume to install packages to.
For bundler, you can define that path inside the Dockerfile to use a custom path at, e.g., /bundler
:
1
RUN bundle config path /bundler
Accessing host data for provisioning
Even if you create persistent containers, you still cannot directly use host application data on the mounted volume you later use in the container.
While there are commands to add host files to the container during the build phase (e.g., ADD
(and the deprecated COPY
)),
this isn't really useful to automatically provision a development environment in the Dockerfile.
A common workaround is to ADD
a careful selection of files into the container to pull dependencies.
For example, a Gemfile can be copied in the Dockerfile and then installed to a data container during build.
I think that this is one of the major missing features of Docker for quickly rolling out a development environment on a new machine.
Accessing linked services during build phase
This is very similar to the above issue.
Linked services are not available in the build phase.
If you use rails and want to create and migrate the database upon the container construction, this is something you have to run manually with docker-compose run
after building the image.
Imagine we have executed a Dockerfile for a rails environment with docker-compose build web
.
Then, you can use the linked services from docker-compose with:
1
docker-compose run web rake db:migrate
Unavailability of development helpers
If you live in the world of Rails development, there are a ton of incredible helpers that speed up development. Gems like letter_opener or capybara-screenshot display mails and test information in the local browser. In a dockerized application, these helpers are either non-functional or will fail with errors regarding X session allocation. The same applies for testing frontend matters with, e.g., Selenium, as it opens a live browser instance (unless used solely with PhantomJS). While there is a way to share a local X session with the container, this really raises more questions than it answers (regarding portability) and requires even more manual interaction with the container.
Conclusion
If you want to get a new team member up to speed quickly with a complex application stack, Docker has become one of the best approaches towards that goal. With its recent inclusion of Compose into the Docker workflow, defining complex applications with a whole number of linked services was made trivial. However, preparing and maintaining the services of a complex application (e.g., migrating a database service) requires additional manual work and a number of issues remain unsolved.
For my applications, a major missing component is some kind of provisioning functionality in the Dockerfile, which would allow me to use linked services and host volumes for bootstrapping databases and preparing the application code for development. Along with the remaining issues, I have somewhat mixed feelings towards Docker as a development environment. I think that it is a great way to get a local development environment started in a few, simple steps. But, if you rely on advanced tools in your development process, Docker may be of limited use when compared to a carefully crafted VM.
You can find an application of the ideas in this blog in my Docker setup for OpenProject. I also plan another blog post with the concrete dockerization process of an application stack with database, LDAP, and single-sign on services.
Resources
A number of interesting posts and resources on Docker for both development and production / deployment.
- Test, Develop, Build, Stage with Docker: A different perspective on Staging and Testing with Docker.
- Using Docker with Vagrant: An introduction to using Vagrant with NFS sync for Docker.
- Docker Misconceptions: A thorough reasoning on when (not) to use Docker.
- Why I don't use Docker much anymore: A critical view on the issues of Docker and its applications for development and production.