Why we use Docker
Sometimes an innovation has such an impact that, when talking to developers, it seems everybody has decided to learn the same thing at about the same time. Maybe that’s Rust, or Haskell, or Ruby, or Python, or Perl, or C++, or C, or Turbo Pascal or BASIC. Maybe it’s the novelty of being able to throw up a few virtual machines and a firewall and load balancer on AWS, or compile your own Linux kernel. Docker is now so ubiquitous it’s easy to forget that it was only fairly recently that everyone I talked to had a personal side project to evaluate Docker. At long last we could finally stop our software installations from interfering with each other without incurring the costs of a rack of servers or even a multiple copies of the kernel. Someone had finally found got some momentum around a tool set to make use of the namespace isolation features of our operating systems. No longer did we watch in frustration as a few smart people at unicorn tech companies put tantalising container isolation patches into the Linux kernel which weren’t usable without a user level tool stack with a community behind it. Even better, Docker provided a standard way to distribute programs with their support files and dependencies. No more bleary eyed nights getting a Debian package ready only for someone to ask for Red Hat Enterprise, or (imagine the indignity) a Windows executable.
The complexity and limitations of docker build
Using Docker requires you to put the software you work with into containers. That’s fiddly without tools to help, so Docker provides docker build which executes a new Dockerfile format and manipulates the Docker daemon to produce filesystem layers. There’s quite an extensive reference in the Docker documentation of Dockerfiles, which is a dense 74 minute read. When you’re done with that you can spend another 30 minutes on to the best practices for writing Dockerfiles.
If you’ve been working with it docker build for a while it’s easy to forget:
- Dockerfiles only build Docker containers. There’s no way to target a direct installation to a physical or virtual machine. Or a Windows executable. Or a Debian package. Or your own laptop OS, for instance so that your IDE can host code directly in a debugger. Even if you feel that Docker is a perfect choice for distributing and running your software, you or your predecessors may already have made a substantial investment in configuring a server, perhaps by developing a script or using other tools to install your software.
- The basic model is that every command you run in the Dockerfile produces a new layer. Nice idea, but layers aren’t free; early on, people noticed that a 20 line Go program produced a 500MB docker image. That can add up if you are running 20 microservices each as its own Docker layer and you’ve got a big team producing changes quickly and each changes build a whole stripe of your services at once. You need to take care that you don’t for instance use service tokens or ssh private keys in docker builds and delete them further down the Dockerfile, since they remain in the docker container layers.
- There are significant limitations and gaps around constructs such as if statements, loops, blocks, template expansion and error handling.
What I’ve seen across multiple projects is that one developer becomes the Dockerfile expert and keeps the Dockerfiles reasonably fast and efficient. Alternatively, no one gives the Dockerfiles much attention and docker builds can introduce significant overheads. Tomas Tomecek describes the same problem with docker-compose.
Using Ansible to build docker images
Ansible can be used to build container images without Dockerfiles and docker build. The original proposition of Ansible is to:
Run code on specific ssh target machines, handle errors and output appropriately, and provide common programming constructs layered on top of that ability.
Over the years, Ansible has been generalised with other connection plugin alongside ssh. Ansible has gained the ability to target docker containers as well contexts such as Windows Powershell remoting. We should therefore regard Ansible now, in its more general form, as:
Run code in specific execution contexts, handle errors and output appropriately, and provide common programming constructs layered on top of that ability.
Of course, you can just write code to manipulate ssh servers yourself using ssh libraries or wrapping the ssh command line tool, and I spent a lot of time on working on that paradigm on previous projects. Colleagues seem to find my Ansible playbooks much easier to work with and extend than my traditional programs that do the same thing, especially if I set my programs up to process lots of targets in parallel, which ansible does as standard. Furthermore, Ansible has a wonderful library of thousands of modules, and there’s plenty that are useful for constructing docker containers.
Here’s an Ansible input file (which Ansible calls a playbook) for building a container image:
On line 6 we check if a container already exists. The first time you run this, the container will be missing on your machine. So, the Ansible docker_container module will be used (line 12-16) to create a container. When we run it again, we’ll reuse the same container, unless you decided to delete it in the mean time. Lines 19-25 tell Ansible about the new container, and then on lines 30-46 we set up the container how we want. I’ve used a few Ansible modules such as the “lineinfile” module (lines 36-39) which concisely ensures that a config file has been set up.
All these steps can be safely repeated (they are what I like to think of as “idempotent“) so while we are developing our Ansible code here we don’t have to keep waiting for the same work to happen. The ansible modules will quickly notice that the packages we asked for are already installed, or we’ve already updated the config file, etc. That’s one of the key benefits of Ansible; since thousands of Ansible modules are idempotent you can safely reuse them and it saves you from resetting everything and having to wait for things to happen again when you are developing your system.
In this playbook, lines 30-46 target a docker container. The exact same code can run against a Linux machine with no containers involved anywhere. So, if you find yourself converting an existing Linux machine to run as a container, you can develop those lines targeting an existing Linux machine, using –check mode to confirm that your Ansible configuration actually detects that nothing needs to be done. Then, you switch the same Ansible code over to build a container. Or, if you want to support both running as a container and as a Linux machine then you can just move your Ansible setup code into a role and call it from one playbook to set up a machine or another playbook to make a container image.
It’s likely you’d then push the container image to a container repository, and then deploy it using either Ansible to pull the container to a server and start it, or you’d use something like Kubernetes to create containers. Since container images are defined by a technology-neutral spec for container images, it shoudln’t (and, in my experience, doesn’t) matter if you use Dockerfiles or Ansible to create your container images.
This approach is suitable for running in a continuous integration system, and in that case you’d probably tag the image you create with the commit ID of you used to build your containers.
For a simple example like this, the arrangements we make to handle containers may seem a little verbose compared to a Dockerfile. However, in larger projects that make extensive use of Ansible for other operational needs this is all blends in fairly well, and you’d probably move some of the task list into role files. With Dockerfiles, it is very common to call a collection of scripts to handle the details of setting up an image, since then you get a wider range of programming features than Dockerfiles provide. I’m also aware of a tool called ansible-bender which sits on top of ansible and hides some of this detail, but I haven’t tried it yet since I find the approach enough is simple enough that a more complex tool isn’t required.
As often happens, early choices become standard practice when something takes off, and at that point it is hard to change the original good-enough choices. Retrofitting the kind of power available in Ansible to docker build would be very hard to do without breaking compatibility with old Dockerfiles, and so I’m glad to see the standardisation of docker images that lets us choose the best tool for the job, rather than necessarily being stuck with the limitations of the bundled docker tools.