How obsolete is my cloud?
Is your cloud obsolete?
Over the last few months I have advised several customers on upgrading their cloud to the latest version of the Mirantis Cloud Platform suite, and there are a few questions they frequently ask; can the hardware from the current platform be reused, should it be shored up, or should we buy new hardware?
There is no simple answer.
I look at the question from the perspective of the customer, so this piece is written from the viewpoint of a fictional cloud owner. It illustrates the advances made in hardware over time and the implications for server designs. However, it should not be taken as a ready-made cloud design. Each cloud design recommendation will differ greatly based on the individual use case, and your cloud architect should provide individual optimization.
I am using benchmark values to assist comparison, and while benchmarks are not the be-all-end-all, they give a reasonable estimate of the platform capability.
Overall, however, it boils down to this:
Technology has advanced greatly in critical areas, such as CPU and storage, over the last few years
Upgrading the hardware of a previous generation cloud brings major advances in performance and reliability
In many cases Total Cost of Ownership (TCO) of new hardware is lower than TCO of retaining legacy hardware
Let's look at the details through the eyes of a customer.
It is that time of the year again: time to evaluate whether to upgrade our cloud platform. The existing platform is 5 years old and has been running fairly reliably lately, so should we purchase new hardware or continue with what we have?
The truth is that five years is a long time in terms of server hardware. While the expense of buying new servers may seem unnecessary, especially since the old platform still works, we need to evaluate and compare the existing platform with the latest available hardware.
Evaluation of the current platform
Five years ago we bought all new hardware for our new cloud environment. These were compute nodes with dual Xeon E5-2650 v4s, which were the optimum in price-performance at the time, storage nodes with lots of 2TB SAS HDDs, and 10GbE networking throughout.
The platform supports 3000 standard workloads with an average of 4 vCPUs and 4GB RAM per VM, so we have a total of 12000 vCPUs and 12TB memory. Our CPU oversubscription is set to 5:1, so the number of physical CPU threads is 2400. With dual E5s per node we needed 50 servers. (Well, 51, actually, because we wanted three availability zones.)
We specified an average of 100GB per workload, as we had a mix of Windows and Linux workloads. With the necessary operational reserve, we required approximately 360TB capacity. The Ceph cluster needed 540 2TB drives to provide sufficient capacity at 3x replication, and with 20 drives and four journal SSDs per node, we needed 27 nodes. Back then it worked well, but workloads have become a lot more data hungry, so we are feeling the pinch in capacity and even more so in performance.
Now our architect is faced with two questions: “Can we upgrade?” and “Should we buy?”
Over the last five years the server landscape has shifted dramatically.
This is most evident in the storage realm. Hard disks have been replaced by SAS/SATA SSDs, which in turn are currently being phased out in favor of NVMe. NVMe is also a type of SSD, but technologically quite different in that the device is directly connected to the PCI Express bus while SAS/SATA SSDs are designed to be connected to classic hard disk controllers, which increase latency and limit IOPS.
For comparison, a performance HDD under optimum conditions can provide about 150 small block IO operations per second (IOPS). An enterprise class SATA SSD tops out at about 80k read IOPS, limited by the performance of the SATA bus. NVMe devices, not hampered by a hard disk controller, can provide more than a million raw IOPS.
Of course performance of SSD and especially NVMe is also limited by the software defined storage (SDS) code itself, as well as network latency and throughput, so in real life these IO numbers are not achievable, but the real world difference between seeking a narrow track on a chunk of metal spinning at high rpm with a comparatively heavy read/write head dozens of times a second is neither conductive to performance for many competing workloads, nor good for reliability.
With SSD and NVMe, other parameters come into play. Traditional servers were dual CPU due to the fact that high-core CPUs were very costly. The CPUs are interconnected, but hardware access is split between the two CPUs. Thus hardware connected to one CPU being addressed by the other causes additional lag, also known as NUMA lag. This lag has the highest impact if the network interface and the storage device are both controlled by one CPU, but the threads of the SDS system run on the other, so data is bounced between the server segments multiple times.
Current CPUs have a much higher core count, so it is feasible to build servers with a single CPU, memory and a single PCIe bus, thus eliminating NUMA and NUMA lag entirely.
CPUs have not quite seen a 1000x increase in performance like storage devices, but they are still dramatically faster than they used to be. For comparison, five years ago, a top of the line $10,000/CPU Xeon E5 2699v4 benchmarked at about 24600 units with 44 threads, while today's $900 Intel Xeon Silver 4316 -- the second least expensive Xeon Silver model Intel offers in 2022 -- benchmarks at 37200 units with 40 threads, or roughly 1.5 times as fast for less than 0.1 times the cost.
Some of this performance improvement comes from the fact that E5 CPUs had four memory channels while Xeon Scalable Gen 3 CPUs such as the 4316 have 8. Other improvements are based on newer manufacturing processes and the fact that density has dramatically increased: 5 years ago, 44 cores / 88 threads was the best one could buy for a high end dual CPU server. Today the maximum for a dual CPU server is 128 cores / 256 threads per node. As even these high end CPUs are much cheaper than the aforementioned E5 2699 v4 was in its day, obviously the lower end CPUs had to also be offered at a much lower price point.
To buy or not to buy?
What does this mean for us as a customer?
Replacing a 1080TB (raw) HDD-based Ceph cluster with an NVMe based setup would reduce the Ceph cluster footprint from 27 2U servers (54 rack units or realistically about 2 full racks with power and AC constraint ) to 18x 1U servers with 8x 7.68TB NVMe devices each. Depending on the use case, different configurations are feasible--for example, 27 servers with 10x 3.84TB NVMe, or the other extreme, 9 servers with 8x 15.36TB.
What is the one thing all these configurations have in common? They are dramatically faster than any hard disk-based configuration. Our original configuration could, under optimum conditions, provide perhaps 60k IOPS total, while an estimate for the 18 node NVMe configuration would be more than 20 times that.
One thing to remember is that this performance can not be accessed by a single workload. A benchmark to test the true performance of any SDS cluster must be highly distributed and must put the cluster under enough pressure to drive all storage devices evenly to near their limit.
On the CPU side, replacing our dual E5 2650 v4 CPUs with newer models could be as mundane as using Xeon Silver 4316 CPUs, which offers about 3x the performance per CPU, reducing the footprint from 51 nodes to 18 nodes (providing for three availability zones), or it could be as radical as using 9 nodes with single CPU AMD Epyc 7713 64 core/128 thread chassis, which offers about the same performance as the 51 legacy nodes.
In reality, extremes are not ideal--in this case because running too many workloads on a single node increases the impact of hardware failure, and because other bottlenecks, such as network IO, will become evident. The fact remains that we can compress our four rack 54U storage subsystem and 51U compute environment into as little as 18U or less than a full rack if we really wanted to, and in the process get better performance, especially on the storage side, and faster repair upon storage node failure due to the drastically reduced time required to rebuild the contents of a failed server on other NVMe nodes.
So the question remains: Should we buy?
With both the increasing number of workloads and performance requirements, especially in storage, the answer is yes in most cases.. While it may be possible to reuse some of the legacy nodes in functions that do not require as much performance, in most cases, a clean cut is the best choice.
This clean cut also has another advantage for us. A greenfield deployment of new hardware enables migration at a steady, measured pace. In contrast, an upgrade is conducted under and requires evacuation of one node after another to redeploy the hardware.
A greenfield deployment also provides us with the opportunity to do some spring cleaning:
Discard all applications and data that are not in use
Only migrate images after review
Verify that copy-on-write is used to its full potential
Furthermore, we will have to make some changes, even if the old nodes are to be reused. Boot devices must be SSD or NVMe for reasonable performance. Storage nodes need more memory. The control plane, in most cases, needs more nodes.
On the network side we can standardize on 25GbE with a new platform, which improves performance significantly.
Finally, we are aware that our old platform may be arriving at the tail end of the bathtub curve and more device failures, especially involving heavily stressed HDDs, are on the horizon.
From an environmental standpoint, the more modern CPUs use more power than their older counterparts overall, but considerably less in relation to their performance. As an example, take a Xeon E5-2620 at 105W, which benchmarks at about 13500 CPU mark or about 130 CPU mark per watt, and for comparison a 150W Xeon Silver 4316 with a CPU mark value of 37500 or 250 CPUmark per watt. For your workload this means that for the same computational load you would have to expend roughly twice the power as you would have to with a current gen model.
On the hard disk side things look even worse. We not only have to contend with capacity, but also with performance. Clusters used to be overbuilt in capacity (more drives) to provide the required IO performance. New clusters can be comparatively smaller, because in almost all cases the performance of NVMe storage will exceed the performance that can be realistically consumed. Furthermore, hard disk based storage clusters require a long time to repair themselves, so disk- or worse, server-failure significantly impacts performance for days or even weeks, whereas in a flash based cluster, a storage device replacement timespan can be measured in hours or even less.
What about repurposing?
What can we do with the legacy nodes? Some of them would still make good control plane nodes; Mirantis Container Cloud management and managed nodes have low hardware requirements for these nodes. They can also be used to make up a staging environment, if one doesn’t already exist, or a test/dev environment, where performance is not as important. Ceph nodes can be reused as archival storage, or upgraded with flash devices replacing HDDs.
Calculations and the decision
Our data center is situated in a convenient location, but pricing is still reasonable. We are currently charged $160 per rack unit per month, which is somewhere in the middle of the range.
Our current environment, including 9 control plane nodes costs 114 RU * $160/RU/month * 12 months, so $206,000/year.
If we replace the hardware and end up with 18 compute and 18 1U storage nodes, this cost is reduced to 45 RU * $160/RU/month * 12 month = $81,000/year.
Calculated over the next five years, will we break even?
The difference between operating costs of the new and the old environment is $132,000/year or $662,000 for five years.
The new hardware will cost us about $650,000 (assuming Supermicro with NVMe storage).
So over five years, we're still $12,000 ahead--and that doesn't include the repairs we won't have to make to aging equipment.
What's more, even if we didn't break even, the massive gain in performance and the reliability of a platform several generations newer than what we have would be worth it, but we will actually hit the break even point.
The decision: Buy.
The advantages and disadvantages you need to analyze
A greenfield deployment has a number of advantages and a few disadvantages.
Having the new and the legacy environment avoids the risk an upgrade always brings with it. The old cloud continues to operate for a time and application owners can migrate workloads that can't be migrated by attrition.
One of the greatest advantages of a greenfield deployment, however, is the opportunity to revisit the architecture. As familiarity with the cloud grows, flaws in the original architecture become apparent. A new start enables you to iron out these wrinkles. This is especially true when features of the cloud platform software change and an upgrade is not possible with the current architecture.
In some cases it is possible to rebuild the old cloud internally so an upgrade becomes possible, but this procedure typically comes with a considerable amount of effort and risk.
And the disadvantages? Migrating software and data takes time and effort. Engineers and customers may prefer working around the flaws of the status quo to moving to a new platform. After all, a lot of stuff has accumulated over the years. Sorting it, packing it up and moving to the new cloud can be uncomfortable. In addition to that, data center space and power for two clouds must be available for the transition period.
Afterthoughts - The Cloud-Saving Lessons
While we embark on our migration journey, what lessons can we learn from all this?
While calculating the cost difference for each use case, spending money on a new platform does not necessarily mean higher overall cost. Building a new platform also comes with a number of intrinsic benefits, like the opportunity to shed some dead weight in the form of unused data and applications, which in the normal course of business do not get cleaned up. Scaling out the new platform with its smaller footprint is also considerably easier.
And finally, a new cloud provides the opportunity to apply the lessons learned from the existing one, but not carry over the bugs as is a risk in an upgrade. This means better reliability, less operational workload, and the opportunity to address additional requirements, as well as performance and environmental benefits.
I am not recommending out with the old, in with the new, but I do recommend you do the math, because the result may surprise you.