Two Ways to Fail with Private Cloud
Getting to cost-efficient private cloud is hugely challenging, and complexity is the main cause.
A very deep stack of interoperating components is required to provide “public cloud equivalent” functionality in a physical datacenter. Enterprises looking to build private clouds thus often feel as though they have only two choices:
Acquiring a proprietary private cloud stack from a single vendor: But going this way leads to extremely high total cost of ownership, locking customers into substantial fees for software licenses and support, encouraging staff-building for provider-specific skills (which is expensive and limiting), and imprisoning enterprises in a go-forward technology roadmap and timetable that may slow innovation – for example, by:
Precluding adoption of newer, more cloud native technologies, such as Kubernetes, before their vendor commercializes them.
Locking themselves into their vendor’s proprietary model for how these technologies should be used – for example, by providing Kubernetes, but in a platform-as-a-service (PaaS) framework that proves limiting over time.
Or worst, by locking themselves into their vendor’s proprietary model for how private cloud extends to hybrid and multi-cloud, effectively limiting themselves to public cloud frameworks with which their vendor has a joint offering.
The ‘open source do-it-yourself’ approach: OpenStack won the mid-10s war of competing infrastructure-as-a-service (IaaS) cloud-framework standards, and is now a mature, full-featured, and comprehensive system–mapping to the most-used core features and quality-of-life supports (such as nice webUIs) offered by public cloud providers.
Open source solves some lock-in problems (development is ongoing, version releases are frequent, new features are regularly introduced), and the long-term roadmap is friendly to continued adoption of new (largely open-source) technologies. But it leaves other problems in place. For example, some open source private cloud frameworks are distributed by vendors in forms that are tightly coupled to a particular Linux operating system, limiting choice and potentially raising costs.
Do-it-yourself cloud failure
The biggest obstacle to DIY success, though, is that engineering a production-grade open source cloud requires enormous expertise. (Too much choice is sometimes scarier than too little.) This enormous landscape means potentially needing to acquire or develop skills around every open source component making up your private cloud solution stack, plus skills in deep topics like (physical and virtual) storage and networking, plus skills in open source cloud platform engineering, site reliability engineering (SRE) and infrastructure management to make your open source private cloud secure, monitorable and observable, auditable, resilient, scalable, and updatable – all tasks that also typically compel creation of additional tooling to operationalize them reliably at scale.
More friction on the road to cloud success
Meanwhile, beyond the primary “proprietary vs. DIY/open source” choice, techno-cultural issues also loom. Building a private or hybrid cloud is one thing – building it as part of a consistent, enterprise-wide cloud strategy is another. And most organizations of a size and diversity to need one cloud will likely need several, over time, perhaps associated with different locations, lines of business, development teams, or products.
Without mechanisms for centralizing control, optimizing operations, instituting enterprise-wide policies, and building consistent playbooks for security and data protection, regulatory compliance/audit/governance, cloud and application operations, disaster recovery, software development workflow automation, and the like, the end result can easily become a welter of small cloud projects, each a burgeoning cost center, profiting little from economies of scale, hard to automate around, and presenting a large and difficult-to-police attack surface to hackers.
Cloud failures happen - and apps pay the price
Unsurprising, then, that the reality of some (most?) enterprise private cloud efforts is pretty grim: lots of money spent for lackluster results, or outright failure – bad for organizations, and for the career prospects of those leading private cloud efforts. Lots of little cloud projects, each a silo. Unknown risks of data loss and application unavailability from disasters and security breaches.
Worse: not all costs are measurable in dollars spent and/or wasted. The opportunity cost of private cloud failure is high, including stalled enterprise-wide digital transformation efforts, hindering an organization’s ability to compete, and blocked progress on applications and innovation critical to business success.
Part of the problem is that – particularly where DIY clouds are concerned – cloud complexity becomes an organizational distraction. A wide array of skills are needed to get a modern cloud on its feet. Still more skills are required if the cloud is multi-layer, perhaps with Kubernetes at the bottom (that is, on bare metal and host operating systems), a containerized IaaS backplane on top of that, and then perhaps multiple Kubernetes clusters deployed on virtual infrastructure. Nor is the base platform the only concern. On top of raw platforms (IaaS or Kubernetes), a great deal of additional enrichment (security controls, service mesh, backup and other functionality) is required to create manageable production platforms, automate their care, scaling, and updating, and make them ready to host applications. Still more work is required to automate delivery and application-layer operations on the platform.
All this work is critical. But none of it moves the needle very much for the business at large.
Cloud ROI only happens two ways – at the very end of what turns out to be a long series of interdependent DIY projects (meaning significant time is lost bringing new application functionality to customers and markets); or in parallel with a long, drawn-out, experimental cloud build and enhancement process (meaning that generations of applications are created, inefficiently, in stages, while the organization is learning how to manage and evolve its cloud).
Flip the cloud stack
The trick to ending this distraction is a twofold mind trick. The core problem with DIY and its complexity is that it elevates the platform and turns it into the main concern. An alternative is to view the platform as a complement and commoditize (or at least, somewhat externalize) it – refocusing attention on applications and on developer experience and productivity.
This “commoditize your complements” imperative was identified and validated in a visionary blog by founder/coder Joel Spolsky back in 2002, and, at this point, is widely considered tech industry gospel. What’s important for your business is delivering quality applications with speed, ease, and efficiency. Platform complexity puts brakes on this, and adds costs. So your business goal is to make that complexity go away: drive it as far from the center of your concerns as possible, and start driving its cost down towards some predictable minimum.
Instead of “platform-first” cloud thinking, in other words, you need to do “developer-first” cloud thinking. You need to reimagine how you want software to work for your business:
Instead of hiring developers for platform and/or DevOps skills, you want to hire for talent and experience in creating software for your primary markets.
You want developers using their skills mostly to create value for your business: delivering high-quality products with good market fit, and evolving them rapidly to delight customers and increase share.
You want applications to mostly run themselves, leveraging standardized services to streamline updating, scaling, and police other aspects of efficiency.
You want DevOps, SREs, cloud operators, and other ‘ops’ specialists focused on standardizing and improving services, workflows, and other automation that improves developer experience and productivity, and helps guarantee application performance, resilience, and security.
Basically everything else, you want to make someone else’s problem – doing so in ways that:
Provide all the human responsiveness and automated oversight required to keep apps healthy and developers productive and happy.
Provide the services and frameworks and workflows you need to let developers “just push their code.”
Ensure security, data privacy and sovereignty, and enable regulatory compliance.
Avoid any and all forms of lock-in wherever practical – meaning heavy or exclusive use of open source, strict abstraction of application and platform domains from one another, heavy use of standardized APIs, and other best practices.
Let you run as cost-efficiently and cost-predictably as possible.
Less ops, more apps
Talking about “turning the problem upside down” and creating a developer-first infrastructure around Kubernetes sounds great. Is it possible in the real world?
Absolutely. But it takes a special kind of partner.
One with demonstrated expertise, willingness, training capability, and a complete technical playbook for taking on every aspect of responsibility for building and managing production Kubernetes and integrated development workflow solutions.
One with a complete, integrated, and opinionated solution stack that provides a technology base for what they’re really selling you, which is “a place to push your code where apps can run safely and reliably.” That’s way more than just a Kubernetes – in fact, such vendors should potentially be able to run against any standard Kubernetes, and build that out swiftly to include high-quality enterprise production services like service mesh, backup, and CDN. Or they should be providing Kubernetes hosting capacity tricked out in a fully resilient, production-ready way.
Guaranteed rapid implementation. Leaders in this category should be talking about having solutions up and running in hours or days, and having everyone on your teams trained and fully onboarded within a month.
An exceptionally-responsive and agile support culture that makes you feel safe and in charge.
SLAs, SLOs, and other commitments expressed clearly and enforceably.
Other quantitative benefits in black and white, depending on your architecture and requirements. Stuff like observability/monitoring and notifications, continuous security reports, and cost analytics should be table stakes.
These partners now exist. And it makes sense to find and engage with them. The sooner you do, the sooner you can focus resources on building great applications, and helping your business win.