Dragons in the Algorithm
Adventures in Programming
by Michael Chermside

Before Dockercon


I am attending Dockercon and I thought I would try a little experiment. I will try writing up my understanding od Docker before the conference and after, and see how much my view has changed. So, without further ado, here is my "before" review of Docker:

What is Docker?

I like to think of Docker as "virtual machines done right". Virtual machines are an incredibly powerful concept. A virtual machine is just a computer that you run as a program on another computer. Once (years ago), the overhead of simulating a different computer was a huge burden, giving slowdowns of 5x or more running software on the virtual machine instead of directly on the host system. Now, direct support for virtualization commands on modern CPUs plus substantial progress in writing virtualization software has given us slowdowns of 1.05x or so -- hardly enough to worry about.

So now that virtual machines are a viable way to run software, we can do all kinds of things that make use of them. For one thing, the entire cloud computing movement is built on virtual machines. When you click the button to spin up a new server on Amazon Web Services (AWS), Amazon doesn't reformat the harddrive of a server and then start installing an operating system; instead they simply start up a new virtual machine on the giant farm of servers-for-running-virtual-machines that they operate.

Another thing that virtual machines can give us is reliable, repeatable configuration. Before discussing how that works, let me spend a few minutes talking about why it is so wonderful.

Getting environments to work right is my nemesis as a programmer. I set up my new laptop and install a bunch of libraries, tools, and OS features, then continue along doing my development for months. Suddenly one day my system stops working (or we get a new hire who can't get their system to work). What went wrong? Was it that some library got upgraded and we needed the older version? Was it an OS patch that affected the networking stack? Some property file that I adjusted by hand months ago but have now forgotten about? If my code suddenly stopped working I would be able to use the version control system to go back and see what changed. I could even branch it and keep working on the new feature in one branch while in another branch I roll it back to work on something different. But I can't track and roll back environment changes I made to my machine. I can't click a button and make a duplicate of my machine to give to a colleague or to run an experiment.

But if I'm running inside of a virtual machine then I CAN do these things. I can keep my execution environment under version control, with rollbacks and branching. This is enormously powerful, and it can completely eliminate the situation where things are working (or not) one place but we don't know why and can't recreate it.

Now, there are two ways we can actually do this versioning. We can take snapshots of our virtual machine on a regular basis (or every time we change something) and save these and stash them in a version control system. Or we could keep a virtual machine image that has little more than the bare operating system, and after starting it run some scripts that install everything about our environment, keeping the scripts in a version control system. Keeping VM snapshots has the advantage that the servers start up faster (no need to wait for the script to kick things off), but keeping install scripts makes it possible to get meaningful diffs from changes (because the files are readable and VM images aren't), to merge branches (you can't merge VM images), and to make changes at different points in the process (for instance, applying OS patches on a VM image that has a webserver already launched is impossible without at least stopping the webserver). Therefore, we typically use scripts for deploying to virtual machines.

So, you say, that's been an interesting discussion about virtual machines, but what does this have to do with Docker? And what IS Docker anyway? Well, I would claim that Docker is a system for launching virtual machines.

Docker is based on a version of the Linux operating system with a couple of relatively minor changes. The first is that it can safely run several different process spaces within a single OS, keeping them isolated so that the execution of one cannot affect the others. This is pretty much the same as running separate virtual machines on a host OS, except that with normal virtual machines the hardware level and basic IO like networking lives outside the virtual container, and the OS and your applications live inside the virtual system. With Docker, the hardware, basic IO, AND the operating system live outside the virtual container and your applications live inside. This is why I think of docker as a virtual machine system, only where the OS exists outside of the virtualization instead of inside.

The other change Docker makes is that it provides a way to access system resources in a shared manner. Network connections in the virtual machine, for instance, are mapped to network connections in the host machine (this is something that all virtual machines need to do somehow). And for the particular resource of file storage, Docker uses a clever trick that lets multiple virtual docker images share the same exact disk space for most files that they share in common -- so you could imagine running multiple copies of the same image and there would only be one copy of the entire filesystem except for logfiles and any configuration files that were set differently across the images.

Now, remember how I earlier explained that for virtual machines we have (mostly) chosen to version-control the startup scripts instead of actual virtual machine images? Well, in the Docker world people have chosen to do this the other way around: they mostly store images with only tiny amounts of scripts run at launch. This is mostly a cultural difference: there are not fundamental issues that would prevent us from doing the same with other virtual machines, but nevertheless it makes a difference. Docker images are quick to start up, and as we move things from Dev, to QA, to production we tend to pass around a single docker image which helps keep things consistent across the environments.

Posted Mon 20 June 2016 by mcherm in Programming