Big Data Analytics
and Elastic Data Processing
Big Data Multiplies IT Burden
Big data analytics is pervasive. 92% business and IT executives see big data as being important to their business (source: Wikibon, 2014). Use-cases range across ad targeting, eCommerce recommendation engines, point-of-sale transaction analysis, Internet-of-things data analytics, IT systems monitoring, managing customer churn, trade surveillance, risk modeling, threat analysis and search analytics. When you multiply the number of use-cases with sources of data (web, wireless etc.), departments in an enterprise, and types of clusters for each use-case (QA, dev, production etc.), the permutations are mind-numbing.
To cater to these needs, an IT/ OPS team might create a large number of small clusters; but you will get bogged down with:
- Manual effort to deploy, manage, monitor, upgrade, update and backup these big data clusters
- Complaints from data scientists who have to wait for initial deployments and change management due to these manual efforts
- Difficulty in managing performance as applications go into production and need to dynamically scale
- Low utilization, with expensive hardware and software sitting around
- Creation of data silos with no ability to share data, resulting in additional manual work to transfer data
The IT/ OPS team might instead create a few large clusters, but you simply trade one set of problems with another. Large clusters present problems such as difficulty in performance scaling, multi-tenancy, data ingest, data integration with other big data tools, company wide standardization of software and policies, backup and disaster recovery (backup windows can be as large as 12-24 hours for large clusters), poor utilization of resources (as low as 30%) and life-cycle management of these large clusters (e.g. updates, upgrades, monitoring, management etc.).
Using Mirantis Cloud Platform (MCP) to offer self-service big data clusters is an elegant way to solve these problems.
# of Clusters = # of Use Cases x Sources of Data x Multiple Departments x Multiple Clusters Per Department
One-Click Cluster Deployment
When your business wants answers, you the Data Scientist are raring to go! Your Hadoop or Spark cluster needs to be deployed instantly. You don’t want to submit tickets and wait weeks or months to get your cluster. You also want the right software distribution (e.g. Cloudera Hadoop Distribution, Hortonworks Data Platform or Apache Spark) installed and configured to your specifications. Mirantis Cloud Platform enables you to provision a Hadoop or Spark cluster with a single click. Size the cluster to match your workload in the dev phase; deploy different size clusters with a single click. And when your solution goes into production, you can easily scale the cluster to increase performance. Finally, instead of being stuck with the cluster for months or years, you want to tear it down with a click of a button. Get all of this through Mirantis Cloud Platform.
Big Data Cluster Lifecycle
While 50% of IT/ OPS executives believe that they are providing value for big data projects only 18% of their users agree with that statement (Source: Wikibon, 2014). A major reason for this discrepancy is manual deployments and change management, which is an equal source of aggravation for both users and IT/ OPS. Using the capabilities of Mirantis Cloud Platform, you can streamline the deployment and scaling of self-service Hadoop clusters. MCP also solves multi-tenancy headaches – each cluster can be on a separate project taking advantage of OpenStack multi-tenancy. Moreover, you can utilize HDFS to simplify data sharing and reduce the need to move, transform or copy data.
Ultimately, Mirantis OpenStack helps you get out of manually creating Hadoop and Spark clusters.
As a business leader, you want to use big data as a competitive advantage. The last thing you want is for your infrastructure to slow you down.
Mirantis Cloud Platform Gives Your Data Scientists and Developers:
- Instant access to Hadoop or Spark clusters – completely configured
- Performance scaling as needed
- Ability to experiment by building and tearing down clusters on-demand
- Ability to share data by using HDFS as the back-end
- Significant reduction in synchronization code between applications and data silos
Your IT/OPS Staff, at the Same Time, Will:
- Eliminate manual effort thus slashing operational expenses (OPEX) by allowing self-service provisioning of Hadoop or Spark clusters
- Improve resource utilization of equipment and cutting capital expenditures (CAPEX)
- Migrate AWS Elastic Map Reduce (EMR) workloads to OpenStack, eliminate shadow IT, improve compliance and reduce costs