One cloud to rule them all -- or is it?
It’s the typical Catch-22 situation when trying to do something on the scale of private cloud: You can’t afford to build it without paying customers, but you can’t get paying customers without a functional offering.
In the rush to break the cycle, you onboard more and more customers. You want to reach critical mass and become the de-facto choice within your organization. Maybe you even have some competition within your organization you have to edge out. Before long you end up taking anyone with money.
And who has money? In the enterprise, more often than not it's the bread and butter of the organization: the legacy workloads.
Promises are made. Assurances are given. Anything to onboard the customer. “Sure, come as you are, you won’t have to rewrite your application; there will be no/minimal impact to your legacy workloads!”
But there's a problem here. Legacy workloads -- that is, those large, vertically scaled behemoths that don't lend themselves to "cloud native" principles -- present both a risk and an opportunity when growing your private cloud, depending how they are handled.
(Note: Just because a workload has been virtualized does not make it "cloud-native". In fact, many virtualized workloads, even those implemented using SOA, service-oriented architecture, will not be cloud native. We'll talk more about classifying, categorizing and onboarding different workloads in a future article.)
"Legacy" cloud vs "Agile" cloud
The term "legacy cloud" may seem like a bit of an oxymoron, but hear me out. For years, surveys that ask people about their cloud use have had to include responses from people who considered vSphere cloud because the line between cloud and virtualization is largely irrelevant to most people.Or at least it was, when there wasn't anything else.
But now there's a clear difference. Legacy cloud is geared towards these legacy workloads, while agile cloud is geared toward more "cloud native" workloads.
Let’s consider some example distinctions between a “Legacy Cloud” and an “Agile Cloud”. This table shows some of the design trade-offs between environments built to support legacy workloads versus those built without those restrictions:
Legacy Cloud | Agile Cloud |
No new features/updates (platform stability emphasis), or very infrequently, limited & controlled | Regular/continuous deployment of latest and greatest features (platform agility emphasis) |
Live Migration Support (redundancy in the platform instead of in the app), DRS (in case of ESXi hypervisors managed by VMWare) | Highly scalable and performant local storage, ability to support other performance enhancing features like huge pages. No live migration security and operational burdens. |
VRRP for Neutron L3 router redundancy | DVR for network performance & scalability; apps built to handle failure of individual nodes |
LACP bonding for compute node network redundancy | SR-IOV for network performance; apps built to handle failure of individual nodes |
Bring your own (specific) hardware | Shared, standard hardware defrayed with tenant chargeback policies (white boxes) |
ESXi hypervisor or bare metal as a service (Ironic) to insulate data plane, and/or separate controllers to insulate control plane | OpenStack reference KVM deployment |
It’s one or the other, so introducing legacy workloads into your existing cloud can conflict with other objectives, such as increasing development velocity.
So what do you do about it?
If you find yourself in this situation, you basically have three choices:- Onboard tenants with legacy workloads and force them to potentially rewrite their entire application stack for cloud
- Onboard tenants with legacy workloads into the cloud and hope everything works
- Decline to onboard tenants/applications that are not cloud-ready
Fortunately, there's one more option: split your cloud infrastructure according to the types of workloads, and engineer a platform offering for each. Now, that doesn't necessarily mean a separate cloud.
The main idea is to architect your cloud so that you can provide a legacy-type environment for legacy workloads without compromising your vision for cloud-aware applications. There are two ways to do that:
- Set up a separate cloud with an entirely new control plane for associated compute capacity. This option offers a complete decoupling between workloads, and allows for changes/updates/upgrades to be isolated to other environments without exposing legacy workloads to this risk.
- Use compute nodes such as ESXi hypervisor or bare metal (e.g., Ironic) for legacy workloads. This option maintains a single OpenStack control plane while still helping isolate workloads from OpenStack upgrades, disruptions, and maintenance activities in your cloud. For example, ESXi networking is separate from Neutron, and bare metal is your ticket out of being the bad guy for rebooting hypervisors to apply kernel security updates.
Of course each option come with their own downsides as well; an additional control plane involves additional overhead (to build and operate), and running a mixed hypervisor environment has its own set of engineering challenges, complications, and limitations. Both options also add overhead when it comes to repurposing hardware.
There's no instant transition
Many organizations get caught up in the “One Cloud To Rule Them All” mentality, trying to make everything the same and work with a single architecture to achieve the needed economies of scale, but ultimately the final decision should be made according to your situation.It's important to remember that no matter what you do, you will have to deal with a transition period, which means you need to provide a viable path for your legacy tenants/apps to gradually make the switch. But first, asses your situation:
- If your workloads are all of the same type, then there’s not a strong case to offer separate platforms out of the gate. Or, if you’re just getting started with cloud in your organization, it may be premature to do so; you may not yet have the required scale, or you may be happy with onboarding only those applications which are cloud ready.
- When you have different types of workloads, with different needs -- for example, Telco/NFV vs Enteprise/IT vs BigData/IoT workloads -- you may want to think about different availability zones inside the same cloud, so specific nuances for each type can be addressed inside it’s own zone while maintaining one cloud configuration, life cycle management and service assurance perspective, including having similar hardware. (Having similar hardware makes it easier to keep spares on hand.)
- If you find yourself in a situation where you want to innovate with your cloud platform, but you still need to deal with legacy workloads with conflicting requirements, then workload segmentation is highly advisable. In this case, you'll probably want to break from the “One Cloud” mentality in favor of the flexibility of multiple clouds If you try to satisfy both your "innovation" mindset and your legacy workload holders on one cloud, you'll likely disappoint both.
Moving forward
Even if you do create a separate legacy cloud, you probably don't want to maintain it in perpetuity. Think about your transition strategy; a basic and effective carrot and stick approach is to limit new features and cloud-native functionality to your agile cloud, and to bill/chargeback at higher rates in your legacy cloud (which are, at any rate, justified by the costs incurred to provide and support this option).Whatever you ultimately decide, the most important thing to do is make sure you've planned it out appropriately, rather than just going with the flow, so to speak. If you need to, contact a vendor such as Mirantis; they can help you do your planning and get to production as quickly as possible.