Big Data Multiplies IT Burden

Big data analytics is pervasive. 92% business and IT executives see big data as being important to their business (source: Wikibon, 2014). Use-cases range across ad targeting, eCommerce recommendation engines, point-of-sale transaction analysis, Internet-of-things data analytics, IT systems monitoring, managing customer churn, trade surveillance, risk modeling, threat analysis and search analytics. When you multiply the number of use-cases with sources of data (web, wireless etc.), departments in an enterprise, and types of clusters for each use-case (QA, dev, production etc.), the permutations are mind-numbing:

Number of Clusters = The number of use-cases × sources of data × multiple departments × multiple clusters/ department

To cater to these needs, an IT/ OPS team might create a large number of small clusters; but you will get bogged down with:

  • Manual effort to deploy, manage, monitor, upgrade, update and backup these big data clusters
  • Complaints from data scientists who have to wait for initial deployments and change management due to these manual efforts
  • Difficulty in managing performance as applications go into production and need to dynamically scale
  • Low utilization, with expensive hardware and software sitting around
  • Creation of data silos with no ability to share data, resulting in additional manual work to transfer data

The IT/ OPS team might instead create a few large clusters, but you simply trade one set of problems with another. Large clusters present problems such as difficulty in performance scaling, multi-tenancy, data ingest, data integration with other big data tools, company wide standardization of software and policies, backup and disaster recovery (backup windows can be as large as 12-24 hours for large clusters), poor utilization of resources (as low as 30%) and life-cycle management of these large clusters (e.g. updates, upgrades, monitoring, management etc.).

Using Mirantis OpenStack to offer self-service big data clusters is an elegant way to solve these problems. Project Sahara, a part of Mirantis OpenStack, enables users to provision a Hadoop or Spark cluster with a click of a button on an OpenStack cloud. When combined with OpenStack Swift compliant storage, this solution also solves data sharing, backup, disaster recovery and minimize code on ETL (extract, transform, load) solutions.

Learn More about Sahara

Data Scientist: One-Click Cluster Deployment

When your business wants answers, you the Data Scientist are raring to go! Your Hadoop or Spark cluster needs to be deployed instantly. You don’t want to submit tickets and wait weeks or months to get your cluster. You also want the right software distribution (e.g. Cloudera Hadoop Distribution, Hortonworks Data Platform or Apache Spark) installed and configured to your specifications. Mirantis OpenStack, with Sahara, enables you to provision a Hadoop or Spark cluster with a single click via a self-service interface through the Horizon dashboard or APIs. Size the cluster to match your workload in the dev phase; deploy different size clusters with a single click. And when your solution goes into production, you can easily scale the cluster to increase performance. Finally, instead of being stuck with the cluster for months or years, you want to tear it down with a click of a button. Get all of this through Mirantis OpenStack.

IT/OPS: Big Data Cluster Lifecycle

While 50% of IT/ OPS executives believe that they are providing value for big data projects only 18% of their users agree (Source: Wikibon, 2014). A major reason for this discrepancy is manual deployments and change management, which is an equal source of aggravation for both users and IT/ OPS. Using the capabilities of Mirantis OpenStack and Sahara, you can streamline the deployment and scaling of self-service Hadoop clusters. Using node templates, you control the policies governing the cluster by defining the size and role of each VM running as a node within the cluster. You can also group nodes based on role, and scale the number of nodes within a group, either via UI or API – giving you and your users full control of your cluster. Sahara manages the entire lifecycle of your big data cluster ranging from deployment to scaling to tear-down, optimizing the utilization of your resources. OpenStack also solves multi-tenancy headaches – each cluster can be on a separate project taking advantage of OpenStack multi-tenancy. Moreover, you can utilize HDFS or any Swift-compliant storage. Swift simplifies data sharing and reduces the need to move, transform or copy data. Using a resilient object storage across geographies as the back-end also eliminates the need for backups.

Ultimately, Mirantis OpenStack helps you get out of manually creating Hadoop and Spark clusters.

Business Leader: Improved Agility

As a business leader, you want to use big data as a competitive advantage. The last thing you want is for your infrastructure to slow you down.

Mirantis OpenStack, along with Sahara, gives your data scientists and developers:

Instant access to Hadoop or Spark clusters – completely configured

Performance scaling as needed

Ability to experiment by building and tearing down clusters on-demand

Ability to share data by using a Swift compliant storage as the back-end

Significant reduction in synchronization code between applications and data silos

Your IT/OPS staff, at the same time, will:

Eliminate manual effort thus slashing operational expenses (OPEX) by allowing self-service provisioning of Hadoop or Spark clusters

Improve resource utilization of equipment and cutting capital expenditures (CAPEX)

Migrate AWS Elastic Map Reduce (EMR) workloads to OpenStack, eliminate shadow IT, improve compliance and reduce costs

See It In Action: Hadoop on OpenStack Using Sahara

As a business leader, you want to use big data as a competitive advantage. The last thing you want is for your infrastructure to slow you down.

Sahara is the OpenStack integrated project for EDP. In this 8-minute video, you can learn how Hadoop clusters based on your preferred choice of technology can be easily deployed and managed within the familiar constructs of OpenStack.

See Also: Murano Application Catalog

The Murano application catalog along with packages such as Kubernetes and NoSQL-as-a-Service complement big data capabilities provided by Hadoop and Spark.

Learn More about Murano
On-Demand Webinar
Introducing Mirantis Cloud Platform