Containers are everywhere: it’s hard to go a day without hearing about the latest Docker feature, the newest post about how Kubernetes has revolutionized devops, or wondering how you’ll ever get employed if you don’t live and breath and blog daily about containers.
But why now? Why, after decades of creating and scaling web applications, has containerization become the latest must-have for startups and established web companies?
My completely unscientific, anecdotal guess: the (relatively) rapid rise of interpreted languages used for web development, and the package managers that accompany them.
I Blame Heroku
This is all Heroku’s fault, and I mean that in the nicest possible way. The engineers at Heroku do wonderful things to produce a solid, predictable PaaS, and let people concentrate on the business logic of their application instead of the operational logic of deployment and availability.
One of the many nuggets of wisdom to come out of Heroku is the 12 Factor App Methodology. It describes, in roughly the correct order of importance, twelve essential components for any software as a service application to be deployable to cloud-based providers. While the original motivation for the 12 Factor Methodology was to educate developers on writing their applications so that Heroku could run them more effectively, it has the very agreeable side-effect of popularizing a checklist for those that wish to deploy their application with ease of horizontal scalability in mind.
While many of the them are seen as common sense or best practices, the second of the twelve factors, “Explicitly declare and isolate dependencies”, is the most insidiously difficult of all of them.
Dependency management issues popped into existed about 15 minutes after the first lines of software were ever written. We’ve been battling them ever since.
apt-get install sanity
Anyone who has been in this game long enough can list off a dozens times where a production system was brought to its knees because of an unintended library version mismatch, or due to some fundamental difference in how the operating system implements certain primitives.
Which brings me to my main point: difficulties in package management for interpreted languages is one of (if not the) primary reason why Docker and other containerization solutions have exploded in recently popularity.
That’s a fairly loaded statement. Let me explain.
First, a Caveat
I like the idea of containers. I think the people that work on containerization technology are incredibly talented, and have skills dealing with low-level systems programming that I can only hope to attain myself, one day in the distant future. They are an incredibly versatile tool that can be used to solve a variety of problems, and have already allowed the creation of entire software industries (e.g. AWS Lambda et. al.) that previously did not exist.
The PATH Manipulation Crutch
Nearly every interpreted language that I can think of1 uses a package manager to fetch whatever dependencies are required to run an application, and store them in some special filesystem location. That location is then added to whatever lookup path the language uses to find dynamic dependencies, so that at runtime (compared to at compile-time, which is the major differentiating factor between static and dynamic linking), the correct packages can be loaded2. For some, this lookup path is isolated by default (e.g. Node/npm), and for others, it is shared system-wide (e.g. Python, Ruby).
In the early days, that special filesystem location would be the same for every application: all your dependencies for all your projects would end up in the same place.
Then the inevitable would occur, and you’d have two projects that required two
incompatible versions of the same 3rd party library. The solution to this was,
often enough, to change the search path (e.g.
$PYTHONPATH in Python or
$GEMPATH for Ruby) so that each project would use separate filesystem
locations for storing and loading dependencies. This practice was common enough
that it became a de facto standard, and various tools for creating and
manipulating so called virtual environments were born3.
Now, this tends to work about 90% of the time. The other 10% fails miserably because your dependencies cross a boundary whereby your application requires some interpreted library to run, but that library itself depends on a binary system dependency.
A typical example: you need to parse an XML document in your application. No
problem, there’s most certainly some 3rd party library that can help you with
that. You install whatever
xmlparser library you need via your interpreted
language package manager, but it fails with a
fatal error: libxml/parser.h:
no such file or directory error. “That’s an easy enough fix,” you say, and
apt-get install libxml2-dev (or whatever), so that your language
package manager can compile the necessary C-based extension to get it to work.
Except that now, you’ve shifted the conflicting dependency problem down the chain to your operating system package manager4. And, if you’ve ever tried something like this, you know that vendoring those kinds of libraries is a whole other world of pain involving manual editing of Makefiles at best, and diving into Autotools at worst. That’s something that I wouldn’t wish upon my greatest enemy. Well, maybe the Makefile hackery. But definitely not the mucking about with Autotools. That’s just evil.
There’s a reason Debian and other Linux OS maintainers spend a huge amount of time and effort ensuring that packaged versions of software do not conflict with other libraries in the officially supported tree, and it’s not because they have a lot of free time on their hands.
Now, imagine you are a large-ish technology company, and you have dozens of projects in almost as many languages, each with their own set of possibly conflicting dependencies. While this might not matter much in production because you can architect services to exist on separate platforms (or find a creative way to ensure that they coexist on the same platform), it most definitely causes a problem for developers that need to work on several of these at a time.
Now, I don’t have any concrete proof of this, but my hunch is that the rapidly evolving nature of web application development means that we tend to run into dependency conflicts more often than most. New frameworks and plugins and technologies excite and attract us, like moths to a candle, and development of popular 3rd party libraries means that they will often get multiple new versions released in a month, not to mention over the course of a long-term project.
This means that, as a whole, web developers are more susceptible to unintended (and sometimes unavoidable) version conflicts in the applications they build. Couple that with the fact that web development was often done on a host machine (Windows, OS X) that was very different from the deployment target (Linux, BSD), and we have a perfect storm for frustration.
And what do we end up doing, as developers, when faced with a systemic frustration in our work? We figure out a way to automate it or abstract it out of our workflow.
For a while, we handled this with virtual machines: Virtual Box was a boon to the industry, since we had a zero-cost solution that worked across platforms, and with some work you could get your virtual machine to be as close to your production environment as you possibly could. People who wrote applications in interpreted languages (myself included) jumped on the Virtual Box train with enthusiasm.
Vagrant took this one step further, and gave us an easy-to-use, scriptable API for interacting with VMs, eliminating the need for interacting with a GUI to create or manage an entire fleet of isolated environments.
Docker, born out of the now defunct dotCloud PaaS as an internal project, is another manifestation of our desire to automate and abstract away frustrations. Sure, it was originally developed to solve a very specific set of problems that PaaSes face, but it spread like wildfire once it was opened to the public5. One of the reasons why is that it took away some of the pain points of virtual machines: they were slow to start (due to a hypervisor emulating an entire OS kernel for each VM instance), and creating reproducible, distributable images took quite a bit of effort the moment you wanted more than a simple snapshot.
Docker and other containerization platforms have quite the spectrum of pros and cons6, but a huge reason why it has been making such a splash is that it has helped developers manage application dependencies in a repeatable, predictable manner. So much so that we’ve accepted it as one of the de facto technologies for modern application development.
Are containers a web development renaissance in the making? I’m not sure. Are they solving more problems than they create? Most likely.
While I may have my reservations about this, there’s no denying that containers and the containerization trend aren’t going to let up any time soon.
- I'm sure there are some exceptions to this that I simply haven't thought about. Feel free to let me know; I'm quite curious about interesting details such as these. ⏎
- Perhaps surprising to some, but you can typically recompile an interpreter to bake in whatever external libraries & packages you desire, thus blurring the line between static and dynamic linking. You would simply package up a new version of the interpreter for each application that you intended to run instead of using the same interpreter (but with different dependency load paths) for every application. ⏎
- Popular examples: virtualenv/venv for Python, bundler for Ruby, composer for PHP, npm for Node, local::lib for Perl. ⏎
- There's nothing stopping you from downloading the source and compiling it yourself, of course. At times that may be the best (or the only) option, but it comes with its own set of problems. ⏎
- Docker was not the first attempt at so-called containerization, but it would be hard to argue that it is not the most well known. ⏎
Some recent articles discussing this (and for related orchestration software):
- Docker usage rises, but high portability pointless for most
- The Kubernetes Effect
- What are the pros and cons for using Docker on a production server for containerizing services?