Cloud Prizefight: OpenStack vs. VMware
February 7, 2013
There have been many discussions in the cloud landscape comparing VMware and OpenStack. In fact, it’s one of the most popular topics among those thinking about using OpenStack. I’ve given a couple of presentations to the SF Bay OpenStack Meetup on this topic and many peers have asked me to write about it. To make it interesting, I’ve decided to structure this as a head-to-head bout between these two cloud software contenders competing for usage in your data center. Some aspects I will consider are open vs. closed systems, Enterprise legacy application vs. cloud-aware application, free vs. licensed, and well-tested features vs. controlling your own roadmap.
The contenders will be judged in the following categories: design, features, use cases, and value. The categories will be scored on a 10-point scale and then tallied to determine the winner.
Round 1: Design
VMware’s suite of applications was built from ground up, starting with the hypervisor. The ESX(i) hypervisor is free and provides an excellent support structure for VMware orchestration products such as vSphere and vCloud director. The software is thoroughly tested and has a monolithic architecture. Overall, the product is documented and has a proven track history—used by high-profile customers on a multi-data-center scale. That said, the system is closed and the roadmap is completely dependent on VMware’s own objectives, with no control in the hands of consumers.
OpenStack is open source and no single company controls its destiny. The project is relatively new—2 years young—but has huge market momentum and the backing of many large companies (see: companies supporting OpenStack). With so many companies devoting resources to OpenStack it has no dependencies to a single company. However, the deployment and architecture have a steeper learning curve than VMware and the documentation is not always current.
VMware takes a small lead in design with excellent documentation and an easy-to-use interface for deployment and management. OpenStack is no slouch here, though, since it was designed from the ground up for flexibility and it’s vendor agnostic in terms of hardware and hypervisors.
Round 2: Features
vMotion is the building block for three vSphere Features: DRS, DPM, and host maintenance. Live Migration allows for the movement of a VM from one host to another with zero downtime and it’s supported via shared storage. When a VM is moved from one host to the other, the RAM state and data should be migrated to the new host. Since the storage is shared, the data does not need to move at all—rather the link to the data changes from one host to another. This makes for a fast transition time, since the data does not need to be copied/moved via a network.
*As of vSphere 5.1, VMware supports live migration without shared storage.
OpenStack Live Migration
KVM live migration allows a guest operating system to move to another hypervisor . You can migrate a guest back and forth between an AMD host and an Intel host. A 64-bit guest can only be migrated to a 64-bit host, but a 32-bit guest can be migrated to either. During the live migration process the guest should not be affected by the operation, and the client can continue to perform operations while the migration is running. The main dependency here is shared storage, which can be expensive.
Live migration requirements:
OpenStack Block Migration
In OpenStack, shared storage is not required for VM migration since there is support for KVM block migration. In this scenario, both the RAM state and data are moved from one host to another. The downside is that it takes longer and requires CPU resources on both host and target to make the move. There are use cases when block migration is a better option than the classic live migration because the ability to use only the network to migrate your VMs can be priceless. This is especially true if the main purpose of moving VMs is host maintenance. Some deployments do not have shared storage but still need to perform maintenance on compute nodes like kernel or security upgrades and VM downtime is not acceptable. In this case, a block migrate is the ideal solution.
A user doesn’t have a distributed filesystem, and doesn’t want one for understandable reasons—perhaps the costs of enterprise storage and network latency—but wants to be able to perform maintenance operations on hypervisors without interrupting VMs.
VMware DRS and DPM
DRS leverages vMotion by dynamically monitoring the resource usage of VMs and hosts during runtime and moving the VMs to efficiently load balance across hosts.
DPM leverages vMotion by moving VMs off hosts and shutting them down during periods of lower load to reduce power consumption. When the load grows, DPM turn hosts on again and spawns VMs on them.
OpenStack includes schedulers for compute and volumes. OpenStack uses a scheduler to select an appropriate host for your VM based on a list of attributes and filters set by the cloud admin. The scheduler is quite flexible and can support a wealth of filters, but consumers can also write a custom filter using JSON. While the scheduler is flexible, and highly customizable, it’s not quite a replacement for DRS for the following reasons:
VM-level high availability (HA) in vSphere allows for the spawning of the same VM to a different host when the VM or ESX(i) host fails. It should not be confused with fault tolerance (FT), as HA is not fault tolerant. HA simply means that when something fails, it can be restored in reasonable amount of time via self healing. HA is protection for virtual machines from hardware failure: If a failure does occur, HA reboots or powers on the VM on a different ESX(i) host, so it is basically a cold power on from a crash. In HA, the services are susceptible to downtime.
Currently, there is no official support for VM-level HA in OpenStack—it was initially planned for the Folsom release but was later dropped/postponed. There is currently an incubation project called Evacuate that is adding support for VM-level HA to OpenStack.
VMware Fault Tolerance
VMware FT livestreams the state of a virtual machine and all changes to a secondary ESX(i) server for protected VMs. Fault tolerance means that when either the primary or secondary VM’s host dies, as long as the other stays up, the VM keeps running. Contrary to marketing myths, this still doesn’t help you if an application crashes or during patching. Once it crashes, it crashes on both sides, and if you stop a service to patch, it will also stop on both VMs. It protects you against a single host failure with no interruption to the protected VM. True application-level clustering like MSCS or WCS are required to protect against application-level failures. Considering other FT limitations, like high resource usage, double, ram, disk, and cpu, and bandwidth to livestream the state this is one of the less used VMware features. It requires twice the memory as memory cannot dedupe via TPS across hosts. It also uses CPU lockstepping to sync every CPU instruction between the VMs. This results in the limitation that only single vCPU VMs can be protected with FT.
In OpenStack, there is no feature comparable to FT and there are no plans to introduce this feature. Furthermore, instructions mirroring is not supported by KVM (the most common hypervisor for OpenStack)
As you can see, there are some gaps between VMware and OpenStack, and there are also gaps within those features. OpenStack and VMware are in a battle, with both companies matching each other’s features. This is good for OpenStack as VMware is extremely expensive and OpenStack is free. VMware has spent lots of money developing these features, which need to be passed on the the consumer, whereas OpenStack features are developed by the community and can be consumed freely.
As VMware increases their lead in the features category, they have invested a great deal in features like vMotion, HA, FT, and other ways to protect the VMs. OpenStack has been catching up in features that they deem useful for cloud-aware tenants but have also dropped features deemed lower priority in order to focus on supporting more hardware solutions.
Round 3: Use Cases
Before we can assign value to the features above we need to think about use cases. In the cloud ecosystem there are two types of tenants that consume infrastructure as a service: cloud-aware and legacy. Cloud-aware applications will handle HA and DR policies on their own, while legacy application will rely on the infrastructure to provide HA and DR. See diagram below from an VMware cloud architect’s article.
Common characteristics of cloud-aware applications:
Common characteristics of legacy applications:
Legacy applications will tend to need features such as FT, VM-level HA, and auto virus scanning, whereas cloudaware applications do not; when one VM fails, just bring up additional VMs to replace them.
Pet vs. Cattle
Another way to think about this is by using the famous Pets vs. Cattle analogy from William Baker at Microsoft.
The analogy goes as follows: In the legacy service model, where you think of your machines as pets and give them names like dusty.cern.ch, they are raised and cared for. When they get ill, you nurse them back to health. In the cloud-aware tenant service model, VMs are treated like cattle, given number names like vm1002.cern.ch, they are all identical, and when they get ill, you shoot them and get another cow.
Future application architectures should use cattle. VMware features that nurture/protect the VM are less important in the cattle service model.
OpenStack catches up in this category, since many of the features VMware had (and OpenStack didn’t) are not necessarily useful for cloud-aware applications. Furthermore, you will pay license fees for features you may not need and have no control over VMware adding the features you do need.
Round 4: Value
Here comes the final round that decides it all. However, the answer to which provides the best value isn’t as clear, since it depends on scale. While OpenStack is free to use, it does require a lot of engineering resources and expertise. It will also require more effort to architect and stand up, since it supports so many deployment scenarios and the installations patterns are never the same. VMware has associated costs for licensing but should be easier to install and get running. Also, it’s easier to get resources trained with using point and click interfaces vs. a command line.
In short, OpenStack has a higher initial cost, but as projects scale, you will get more value, due to the lack of licensing fees. VMware will be cheaper for smaller installations, but the value will diminish as you increase scale. That being said, cloud use cases are trending toward large scale and as people get more experience with OpenStack, the initial costs will be lower.
And the winner is…
In a title bout between two of the biggest players in the cloud landscape, VMware took a big lead early on in features and design, but OpenStack came through as the underdog and won the competition by dealing a knockout blow in value.
Coincidentally, at the time of this writing, VMware stock plunged 22 percent in a single day on January 29, with market analysts citing the lack of a clear and well-defined cloud strategy and weak outlook…
I understand that some of you may disagree with my scoring and the fact that I assigned the same weight to each category. Truth be told, the scoring is not perfect and completely subjective, since the reason for its existence was to make the material a little more interesting. That said, please feel free to give your opinions in the comments!38 comments
Continuing the Discussion