To containerize or not to containerize, that is the question, or Containers vs VMs: the eternal debate
Here at Mirantis we do a lot of thinking about how to move traditional monolithic workloads to the cloud, and the first thing that we have to determine is not so much how to move a workload, but whether a workload should be moved at all. In this article, we'll discuss some of the issues that you need to consider when making this decision about your own particular situation.
Although there are exceptions, moving your application to a cloud-based environment typically presents you with two basic tools: virtual machines, or containers. Although in many cases the simplest solution seems to be to "lift and shift" the application into a VM and call it a day, that's often not the best solution.
Let's look at the different factors that can tell you when to containerize applications (and when not to use containers at all).
Containerized vs. Non-Containerized Solutions: The differences between containers and VMs
Before we talk about when to use (or when not to use) containers and VMs, it's important to understand the differences between the two architectures.
A VM, or Virtual Machine, is exactly that: it's an abstraction of an entire computer, from the operating system all the way down to memory and storage. The image from which a VM is built can represent just the operating system, on which applications can then be installed, or it can include all of the applications you need, such as a web server and database, and even your application itself. Each VM is completely isolated from the host on which it runs, as well as any other VMs on that host.
Containers, on the other hand, are designed to occupy part of an existing machine, sharing the kernel of their host with any other containers running on the system and including just enough of the operating system and any supporting libraries to run the required code. They're built from images that include everything they need -- and ideally, nothing else.
Because of these differing structures, requirements for running VMs and containers can vary significantly. Because a VM is essentially an entire computer, it will naturally require more resources than a container, which involves just a minimal portion of the operating system. Therefore, in general, it's less resource-intensive to scale containers, and you can "fit" more of them on a single server than VMs.
It's important to note, however, that because multiple services can "share" the resources of a single VM, there may be edge cases where scaling the multiple containers necessary to replace a single VM could overshadow any resources savings. For example, if you were to decompose the functions of a single VM into, say, 50 different services, that's 50 partial copies of the operating system versus one full copy. So be sure to understand exactly what you're getting into.
The question of whether VMs or containers are more secure is a contentious one, and a complete discussion is well beyond the scope of this article, but let's touch on some of the major themes. (Interested in a whole blog post on the topic? Let us know in the comments!)
While VMs are fairly strictly isolated from each other, containers share a kernel, so if one is compromised, others on that host may be in danger. What's more, libcontainers, which is used by Docker to interact with Linux, touches five separate namespaces: Process, Network, Mount, Hostname, and Shared Memory. Each provides an opportunity for security issues.
In addition, former PTL of the OpenStack Magnum containers project Adrian Otto notes that "VMs have small attack surfaces, while in the Linux 3.19 kernel, there are no fewer than 397 system calls for containers."
In other words, containers are much more open to attack, at least in theory. However, while VMs simply have a smaller attack surface than containers, you do need to consider the entire virtualization platform. It's impossible to "break out" of a VM. Mirantis security expert Adam Heczko notes that "[The popular hypervisor] Qemu is affected by roughly 217 vulnerabilities so far, and there have been 3 VM escape attacks in the wild. I'm not sure that VMs are more secure than containers, the threat model is just radically different as the architecture is."
Another aspect of security to consider is that while users typically create their own VM images that run the software they need, containers -- specifically, Docker containers -- are designed to build upon each other.
For example, let's say you were creating a web-based search application. You might create a container image as follows:
Start with a minimal operating system, such as Alpine
Deploy a web server such as Nginx
Deploy a search application that runs on Nginx
The issue here is that while you can be fairly confident in the first two layers of this image -- as long as you use the "official" references, that is -- that last application is a mystery, unless you take the time to dig into it and find out what's really there. Anybody can add an image to a repo and call it anything they want. So if your developers decide to grab an Nginx image with the search application they want already installed, unless they do some due diligence to make sure they're getting what they think they're getting, you could be inviting real problems into your datacenter.
Now let's look at the pros and cons of each architecture.
Pros and cons of VMs
While it's fashionable to proclaim the death of VMs in favor of containers, the reality is that like most things in life, they have benefits and drawbacks. Here a few that can give you some idea of when not to containerize, and use a VM instead.
The positive features of VMs include:
Complete abstraction of the system: Because all of the pieces of your application are running on the same "server" or servers, communication between them is straightforward, no need for additional complicated networking. This can make production and development much simpler on a VM.
No need to decompose the application: Because you're running in an environment similar to a bare metal machine, there's no need to alter the architecture of the application itself.
Run multiple applications at the same time: It's common to run multiple applications on a single VM, simplifying management of the overall infrastructure.
Secure: Virtual machines have a long track record of use, and are considered to be fairly secure, providing isolation with a fairly small attack surface, though you should keep in mind the caveats in the "Security" section above (and below).
Diverse operating systems available: Within a hypervisor, you can use virtually any operating system, so you can run multiple operating systems on a single physical server.
They can be big: Because they include so much, VMs can be large, both in terms of images required to define them, and in terms of resources needed to run them.
They can be slow to start: Starting a VM is the same as starting a computer; it can take some time. If you're just starting it once and letting it run for a few weeks or months or years, this may not be an issue. But if you're dealing with a process that must be constantly spun up, this latency can definitely be a problem.
They can be slow to run: due to the fact that it's essentially emulating a computer within a computer, application products running on VMs are often not as performant as those running on bare metal.
They can't be nested (easily): While it is possible to run a VM within another VM under some circumstances, it's not always an option. What's more, when it is an option, the performance penalty can be substantial.
They need careful security configuration: The platform that hosts your VMs needs to be carefully analyzed and configured to prevent potential security problems due to security domains bridging, or components that span multiple security domains, such as public and management, or management and data.
Pros and cons of containers
Just as the death of containers has been overstated, containers aren't universally great. Let's look at the pros and cons here, as well so you know when not to use containers
Relatively small size: Containers share the host's kernel, they only include the absolutely necessary operating system and library components, and they (generally should) limit themselves to a single function, so they tend to be very small.
Fast: Because they are small, they can start in a matter of seconds, or even less, making them useful for applications that need to be repeatedly spun up and down, such as so-called "serverless" applications.
CI/CD: Containers are made to start and restart frequently, so it's easy to pick up changes.
Portable: Because they're self-contained, containers can be moved between machines with relative ease, as long as the correct kernel is in place
Lifecycle and delivery model: The structure of the containerized lifecycle makes it easier to incorporate advanced features such as vulnerability assessments and image registry signing.
They can require complicated networking: Because functions are (ideally) broken out into multiple containers, these containers need to communicate with each other to get anything done. But because containers are not a single unit, they have to communicate with each other. Some orchestration systems such as Kubernetes have higher level units such as multi-container pods that make this a little easier, but it's still more complex than using VMs. That said, Adam Heczko adds, "Actually, the L3 networking model in Kubernetes is much simpler than the L2 model in OpenStack." So the amount of work you're going to need to do on networking depends on whether you're looking at communicating between functions or between VMs.
They can be less secure: As I mentioned above, containers are still a relatively young technology and they're still not considered to be quite as secure as VMs, for a number of different reasons, but your mileage may vary here. So, one example of when not to use containers is if a high level of security is critical.
They can require more work upfront: If you're using containers right, you will have decomposed your application into its various constituent services, which, while beneficial, isn't necessary if you are using VMs.
They can be unreliable: While this sounds negative, containers are generally designed for cloud native computing, which assumes that any component can die at any time; you will need to ensure that your application is properly architected for this eventuality.
Making your decision
In the end, the decision about whether to use containers or VMs is the same as most other IT decisions: "it depends".
If you're basically doing a "lift and shift" of your application, you may be better off with a simple VM deployment, where it will experience the least disruption. If you're creating a new application from scratch, you're probably better off starting with containers.
Fortunately, you don't have to make a hard-and-fast decision; even if you're starting with a 30 year old monolith, you can always move it to a VM to start with, and then gradually decompose and containerize its various components.
More on that next time.