I love a great user experience. And so, I picked iPhone over Android; I picked black-and-white Kindle over iPad for reading; I picked Netflix for couch-potato-ing over AppleTV. But when Amazon announced that their Instant Videos would be available on the iPad, I had to stop and think. And not just about my techno-toy inventory.
Netflix’s famous innovation — risk-taking (Qwikster was not just about schadenfreude) and its AWS EC2 foundations are a two-edged sword. No shortage of thrills, either in the technology or the big-hairy-ass-risk-taking. Netflix is a relentless learner. I can’t get enough of the NetFlix TechBlog, no self-respecting cloud practitioner should miss it.
Only now, the thrill is a little more chilling. As Amazon streams movies straight onto the iPad, Netflix has one trick left — relying on great deals with the same forward thinking plain dealers in the entertainment industry who gave us SOPA. Netflix could find itself getting minimized out of a short line between two points.
Now, if there’s one thing that makes turns the cloud from vapor into a fertile resource, it’s standardization. In fact, Adrian Cockcroft tweeted about this earlier today:
“Learned Kanban in 6Sigma training. ITIL and 6Sigma standardize terminology/tools”
Now, I’m not the first to point this Apple/Amazon irony out to Netflix, and suggest that perhaps OpenStack should be on the Netflix roadmap; Lew Moorman of Rackspace did it in a great talk at Structure earlier this year. He also observed why it’s easier said than done:
“You date your hardware provider, but you marry your cloud”.
I hope there’ll soon be a day when big OpenStack hiccups get as much attention as recent AWS outages. But the built-for-failure architecture of cloud has been battle-tested at EC2, both by acts of nature and by the Simian Army. Now it’s time to do the same for OpenStack.
We need to get to a point where we can readily compare a workload running OpenStack vs. one running on some other cloud. What might such a comparison tell us? The monkeys are a good start:
- Chaos Monkey, a tool that randomly disables production instances to make a workload can survive this common type of failure without any customer impact
- Latency Monkey induces artificial delays in RESTful client-server communication layer to simulate service degradation and measures if upstream services respond appropriately
- Conformity Monkey finds instances that don’t adhere to best-practices and shuts them down.
- Doctor Monkey taps into health checks that run on each instance as well as monitors other external signs of health (e.g. CPU load) to detect unhealthy instances.
- Janitor Monkey ensures that the cloud environment is running free of clutter and waste. It searches for unused resources and disposes of them.
- Security Monkey is an extension of Conformity Monkey. It finds security violations or vulnerabilities, such as improperly configured security groups, and terminates the offending instances.
- Chaos Gorilla is similar to Chaos Monkey, but simulates an outage of an entire ‘availability zone’.
Open sourcing Chaos Monkey was a great first step. Now, maybe Netflix doesn’t want anyone to know that they have put Moorman’s offer into play. OpenStack can gain from being at the receiving end of this kind of stuff, and I bet Netflix can too.