It sat there in his inbox, staring at him.
Carl Delacour looked at the email from BigCo's public cloud provider, Ganges Web Services. He knew he'd have to open it sooner or later.
It wasn't as if there would be any surprises in it -- or at least, he hoped not. For the last several months he'd been watching BigCo's monthly cloud bills rising, seemingly with no end in sight. He'd only gotten through 2017 by re-adjusting budget priorities, and he knew he couldn't spend another year like this.
He opened Slack and pinged Adam Pantera. "Got a sec?"
A moment later a notification popped up on his screen. "For you, boss? Always." "What's it going to take," Carl typed, "for us to bring our cloud workloads back on premise?"
There was a pause.
A long pause.
Such a long pause, in fact, that Carl wondered if Adam had wandered away from the keyboard. "YT?" "Yeah, I'm here," he saw. "I'm just ... I don't think we can do that the way everything is structured. We built all of our automation on the provider API. It'd take months, at best, maybe a year."
Carl felt a cold lump in the center of his chest as the reality of the situation sank in. It wasn't just the GWS bill that was adding up in his head; the new year would bring new regulatory constraints as well. It was his job to deal with this sort of thing, and he didn't seem to have any options. These workloads were critical to BigCo's daily business. He couldn't just turn them off, but he couldn't let things go on as they were, either, without serious consequences. "Isn't this stuff supposed to be cloud native?" he asked.
"It IS cloud native," Adam replied. "But it's all built for our current cloud provider. If you want us to be able to move between clouds, we'll have to restructure for a multi-cloud environment."
Carl's mouse hovered over the monthly cloud bill, his finger suddenly stabbing the button and opening the document.
"DO IT," he told Adam.
Carl wasn't being unreasonable. He should be able to move workloads between clouds. He should also be able to make changes to the overall infrastructure. And he should be able to do it all without causing a blip in the reliability of the system. Fortunately, it can be done. We're calling it Intelligent Delivery, and it's time to talk about what that's going to take. Intelligent Delivery is a way to combine technologies that already exist into an architecture that gives you the freedom to move workloads around without fear of lock-in, the confidence that stability of your applications and infrastructure isn't in doubt, and ultimate control over all of your resources and cost structures. It's the next step beyond Continuous Delivery, but applied to both applications and the infrastructure they run on.
How do we get to Intelligent Delivery? Providing someone like Carl with the flexibility he needs involves two steps: 1) making software deployment smarter, using those smarts to help the actual infrastructure, and 2) building in monitoring that ensures nothing relevant escapes your notice.
Making software deployment as intelligent as possible It's true that software deployment is much more efficient than it used to be, from CI/CD environments to container orchestration platforms such as Kubernetes. But we still have a long way to go to make it as efficient as it could be. We are just beginning to move into the multi-cloud age; we need to get to the point where the actual cloud on which the software is deployed is irrelevant not only to us, but also to the application. The deployment process should be able to choose the best of all possible environments based on performance, location, cost, or other factors. And who chooses those factors? Sometimes it will be the developer, sometimes the user. Intelligent Delivery needs to be flexible enough to make either option possible. For now, applications can run on public or private clouds. In the future, these choices may include spare capacity literally anywhere, from servers or virtual machines in your datacenter to wearable devices with spare capacity halfway around the world -- you should be able to decide how to implement this scalability. We already have built-in schedulers that make rudimentary choices in orchestrators such as Kubernetes, but there's nothing stopping us from building applications and clouds that use complex artificial intelligence or machine-learning routines to take advantage of patterns we can't see.
Taking Infrastructure as Code to its logical conclusion
Carl got up and headed to the break room for some chocolate, pinching his eyes together. Truth be told, Carl's command wasn't a surprise. He'd been worried that this day would come since they'd begun building their products on the public cloud. But they had complex orchestration requirements, and it had been only natural for them to play to the strengths of the GWS API.
Now Adam had to find a way to try and shunt some of those workloads back to their on-premises systems. But could those systems handle it? Only one way to find out.
He took a deep breath and headed for Bernice Gordon's desk, rounding the corner into her domain. Bernie sat, as she usually did, in a rolling chair, dancing between monitors as she checked logs and tweaked systems, catching tickets as they came in.
"What?" she said, as he broached her space.
"And hello to you, too," Adam said, smiling.
Bernie didn't look up. "Cory is out sick and Dan is on paternity leave, so I'm a little busy. What do you need, and why haven't you filed a ticket?" "I have a question. Carl wants to repatriate some of our workloads from the cloud."
Bernie stopped cold and spun around to face him. He could have sworn her glare burned right through his forehead. "And how are we supposed to do that with our current load?" "That's why I'm here," he said. "Can we do it?"
She was quiet for a moment. "You know what?" She turned back to her screens, clicking furiously at a network schema until a red box filled half the screen. "You want to add additional workloads, you've got to fix this VNF I've been nagging you about to get rid of that memory leak."
He grimaced. The fact was that he'd fixed it weeks ago. "I did, I just haven't been able to get it certified. Ticket IT-48829, requesting a staging environment."
Her fingers flew over the keyboard for a moment. "And it's in progress. But there are three certifications ahead of you." She checked another screen. "I'm going to bump you up the list. We can get you in a week from tomorrow." So far we've been talking about orchestrating workloads, but there's one piece of the puzzle that has, until now, been missing: with Infrastructure as Code, the infrastructure IS a workload; all of the intelligence we apply to deploying applications applies to the infrastructure itself. We have long-since passed the point where one person like Bernie, or even a team of operators could manually deploy servers and keep track of what's going on within an enterprise infrastructure environment. That's why we have Infrastructure as Code, where traditional hardware configurations such as servers and networking are handled not by a person entering command line commands, but by configuration management scripting such as Puppet, Chef, and Salt. That means that when someone like Bernie is tasked with certifying a new piece of software, instead of scrambling, she can create a test environment that's not just similar to the production environment, it's absolutely identical, so she knows that once the software is promoted to production, it'll behave as it did in the testing phase. Unfortunately, while organizations use these capabilities in the ways you'd expect, enabling version control and even creating devops environments where developers can take some of the load off operators, for the most part these are fairly static deployments On the other hand, by treating them more like actual software and adding more intelligence, we can get a much more intelligent infrastructure environment, from predicting bad deployments to getting better efficiency to enabling self-healing.
Coherent and comprehensive monitoring
Bernie Gordon quietly closed her bedroom door; regression and performance testing on the new version of Andy's VNF had gone well, but had taken much longer than expected. Now it was after midnight as she got ready for bed, and there was something that was still bothering her about the cutover to production. Nothing she could put her finger on, but she was worried.
Her husband snored quietly and she gave him a gentle kiss before turning out the light.
Then the text came in. She grabbed her phone and pushed the first button her fingers found to cut off the sound so it wouldn't wake Frank, but she already knew what the text would tell her.
The production system was failing.
Before she could even get her laptop out of her bag to check on it, her phone rang. Carl's avatar stared up at her from the screen.
Frank shot upright. "Who died?" he asked, heart racing and eyes wide.
"Nobody," she said. "Yet. Go back to sleep." She answered the call. "I got the text and I'm on my way back in," she said without waiting.
With Intelligent Delivery, nobody should be getting woken up in the middle of the night, because with sufficient monitoring and analysis of that monitoring, the system should be able to predict most issues before they turn into problems. Knowing how fast a disk is filling up is easy. Knowing whether a particular traffic pattern shows a cyberattack is more complicated. In both cases, though, an Intelligent Delivery system should be able to either recommend actions to prevent problems, or even take action autonomously. What's more, monitoring is about more than just preventing problems; it can provide the intelligence you need to optimize workload placement, and can even feed back into your business to provide you with insights you didn't know you were missing. Intelligent Delivery requires comprehensive, coherent monitoring in order to provide a complete picture. Of course, Intelligent Delivery isn't something we can do overnight. The benefits are substantial, but so are the requirements.
What does Intelligent Delivery involve? Intelligent Delivery, when done right, has the following advantages and requirements:
- Defined architecture: You must always be able to analyze and duplicate your infrastructure at a moment's notice. You can accomplish this using declarative infrastructure and Infrastructure as Code.
- Flexible but controllable infrastructure: By defining your infrastructure, you get the ability to define how and where your workloads run. This makes it possible for you to opportunistically consume resources, moving your workloads to the most appropriate hardware -- or the most cost-effective -- at a moment's notice.
- Intelligent oversight: It's impossible to keep up with everything that affects an infrastructure, from operational issues to changing costs to cyberattacks. Your infrastructure must be intelligent enough to adapt to changing conditions while still providing visibility and control.
- Secure footing: Finally, Intelligent Delivery means that infrastructure stays secure using a combination of these capabilities:
- Defined architecture enables you to constantly consume the most up-to-date operating system and application images without losing control of updates.
- Flexible but controllable infrastructure enables you to immediately move workloads out of problem areas.
- Intelligent oversight enables you to detect threats before they become problems.
What technologies do we need for Intelligent Delivery? All of the technologies we need for Intelligent Delivery already exist; we just need to start putting them together in such a way that they do what we need. Let's take a good hard look at the technologies involved:
Of course the first step in cloud is some sort of virtualization, whether that consists of virtual machines provided by OpenStack or VMware, or containers and orchestration provided by Docker and/or Kubernetes. Intelligent Delivery requires the ability to move workloads between clouds, not just preventing vendor lock-in but also increasing robustness. These clouds will typically consist of either OpenStack or Kubernetes nodes, usually with federation, which enables multiple clusters to appear as one to an application. In order for Intelligent Delivery to be feasible, you must deploy servers, networks, and other infrastructure using a repeatable process. Infrastructure as Code makes it possible to not only audit the system but also to reliably, repeatedly perform the necessary deployment actions so you can duplicate your environment when necessary.
- Virtualization and containers:
CI/CD is not a new concept; Jenkins pipelines are well understood, and now software such as the Spinnaker project is making it more accessible, as well as more powerful. In order for a system to be intelligent, it needs to know what's going on in the environment, and the only way for that to happen is to have extensive monitoring systems such as Grafana, which can feed data into the algorithms used to determine scheduling and predict issues. To truly take advantage of a cloud-native environment, applications should use a microservices architecture, which decomposes functions into individual units you can deploy in different locations and call over the network. A number of technologies are emerging to handle the orchestration of services and service requests. These include service mesh capabilities from projects such as Istio, to the Open Service Broker project to broker requests, to the Open Policy Agent project to help determine where a request should, or even can, go. Some projects, such as Grafeas, are trying to standardize this process. Even as containers seemingly trounce virtual machines (though that's up for debate), so-called serverless technology makes it possible to make a request without knowing or caring where the service actually resides. As infrastructure becomes increasingly "provider agnostic" this will become a more important technology.
- Continuous Delivery tools:
Where today NFV is confined mostly to telcos and other Communication Service Providers, NFV can provide the kind of control and flexibility required for the Intelligent Delivery environment. As software gets broken down into smaller and smaller pieces, physical components can take on a larger role; for example, rather than having a sensor take readings and send them to a server that then feeds them to a service, the device can become an integral part of the application infrastructure, communicating directly where possible. Eventually we will build the infrastructure to the point where we've made it as efficient as we can, and we can start to add additional intelligence by applying machine learning. For example, machine learning and other AI techniques can predict hardware or software failures based on event streams so they can be prevented, or they can choose optimal workload placement based on traffic patterns.
- Network Functions Virtualization:
Carl glanced at the collection of public cloud bills in his inbox. All together, he knew, they were a fraction of what BigCo had been paying when they'd been locked into GWS. More than that, though, he knew he had options, and he liked that.
He looked through the glass wall of his office. Off in the corner he could see Bernie. She was still a bundle of activity -- you couldn't slow her down -- but she seemed more relaxed these days, and happier as she worked on new plans for what their infrastructure could do going forward instead of just keeping on top of tickets all day.
On the other side of the floor, Andy and his team stared intently at a single monitor. They held that pose for a moment, then cheered.
A Slack notification popped up on his monitor. "The new service is certified, live, and ready for customers," Andy told him, "and one day before GoldCo even announces theirs."
Carl smiled. "Good job," he replied, and started on plans for next quarter.