Mirantis OpenStack

  • Download

    Mirantis OpenStack is the zero lock-in distro that makes deploying your cloud easier, and more flexible, and more reliable.

  • On-Demand

    Mirantis OpenStack Express is on demand Private-Cloud-as-a-Service. Fire up your own cloud and deploy your workloads immediately.

Solutions Engineering

Services offerings for all phases of the OpenStack lifecycle, from green-field to migration to scale-out optimization, including Migration, Self-service IT as a Service (ITaaS), CI/CD. Learn More

Deployment and Operations

The deep bench of OpenStack infrrastructure experts has the proven experience across scores of deployments and uses cases, to ensure you get OpenStack running fast and delivering continuous ROI.

Driver Testing and Certification

Mirantis provides coding, testing and maintenance for OpenStack drivers to help infrastructure companies integrate with OpenStack and deliver innovation to cloud customers and operators. Learn More

Certification Exam

Know OpenStack? Prove it. An IT professional who has earned the Mirantis® Certificate of Expertise in OpenStack has demonstrated the skills, knowledge, and abilities needed to create, configure, and manage OpenStack environments.

OpenStack Bootcamp

New to OpenStack and need the skills to run an OpenStack cluster yourself? Our bestselling 3 day course gives you the hands-on knowledge you need.

OpenStack: Now

Your one stop for the latest news and technical updates from across OpenStack ecosystem and marketplace, for all the information you need stay on top of rapid the pace innovation.

Read the Latest

The #1 Pure Play OpenStack Company

Some vendors choose to “improve” OpenStack by salting it with their own exclusive technology. At Mirantis, we’re totally committed to keeping production open source clouds free of proprietary hooks or opaque packaging. When you choose to work with us, you stay in full control of your infrastructure roadmap.

Learn about Our Philosophy

Introducing Savanna, Hadoop as a Service for Private Cloud with OpenStack

on April 1, 2013

Hadoop was originally a widely adopted implementation of the MapReduce paradigm. It has since become a platform for distributed computation, with a number of projects built on top of it. In this post, I will discuss use cases for moving two core components—the MapReduce framework and HDFS (distributed and reliable file system)—to OpenStack. This will be a first step toward bringing the whole Hadoop infrastructure into the private cloud. This project, which we’ve called Savanna, was just recently introduced by Mirantis, and is detailed by my colleague Dmitry Mescheryakov in the Youtube video below. I’ve also discussed the use cases we believe are germane to the problem.

Understanding the Hadoop Cloud Use Cases

The Hadoop ecosystem has different groups of users, each with unique views on the system and specific needs. We can divide these users into three major groups: system administrators, developers and QA, and analytics and data scientists. Let’s consider how we can help each group of users and how their problems can be solved using the OpenStack cloud.

Hadoop cloud use case: Systems Administrators

System administrators solve problems of installation, management, and monitoring of clusters. They need tools that are integrated into a central point of administration to monitor the entire IT infrastructure of the company.

The other issue that arises within a big organization is how to support different versions and distributions of Hadoop. One team may prefer the Cloudera distribution, while another may want to play with a new feature that is currently available only in the latest version directly out of Apache. Also, for many, it is important to have out-of-the-box integration with enterprise-level tools such as the Cloudera Management Console and Ambari for Apache/HortonWorks.

Such an approach works well with the current tendency of moving applications to the cloud to provide superior resource utilization. The OpenStack cloud thus becomes an ideal tool for solving use-case-related issues encountered by the system administrators group by providing an easy way to set up multiple Hadoop environments.

Hadoop cloud use case: Developers and QA

Developer and QA teams require fast creation of different types of environments, such as development, QA, and pre-production. Often a unique environment is created to test a specific issue or conduct an experiment, and does not have a long-lived presence. The problem is especially critical in areas used by Hadoop developers, because even their simple testing environments require several machines. Due to its cloud nature, OpenStack provides an ideal solution for the fast creation of new temporary environments.

Ideally, an OpenStack cloud should provide a REST API for Hadoop cluster setup and management. If we have a continuous delivery process and want to run a test in a distributed environment, the continuous integration server should be able to create a test cluster via API and run a test on it. After the job is complete, the server shuts down this cluster.

Another problem that development and QA teams often run into is providing access to data for another team that is using a different Hadoop cluster. If we create a new cluster, we should be able to populate it with data. An effective solution to avoid transferring data to the new cluster is to use Swift for the permanent storage of data. This way, any new cluster created can be configured to receive its data directly from the main Swift cluster. The Mirantis team has already produced major contributions in Swift that expose data locality information (see Change I6b1ba25b), and the Hadoop part (HADOOP-8545) is nearly finished.

In the next blog post we will discuss this Hadoop and Swift integration in more detail, and look into use cases for working with Swift as a permanent data source for Hadoop.

Hadoop cloud use case: Data Scientists

Data scientists prefer to work with high-level instruments such as Pig or Hive. These tools provide a simpler interface for data processing than the traditional way of writing a MapReduce job. Analytics are not required to understand implementation details and have little knowledge of what happens under the hood. Due to deficiencies in scientific understanding of structural details, the analytics work can have unpredictable outcomes and damage an existing production cluster. The load from an analytics job mixed with an existing production load could consequently cause a problem with cluster stability. Using OpenStack allows for the creation of an isolated environment for analytics needs separate from those of production.

Data scientists usually deal with ad-hoc queries, meaning that large loads of data are transferred/loaded into the Hadoop cluster. This creates a situation in which production jobs can run short of resources. Usually analytics would have to wait for a free window in which to run their queries, but OpenStack clouds can utilize the unused power of data center resources to meet these immediate loads. This lets the data scientists run their ad hoc queries and avoid waiting for a time slot on a production cluster.

Permanent production load represents a use case for which we don’t recommend using a Hadoop installation on top of OpenStack, even with all of these advantages. In this case, the benefits of manageability are not significant and do not justify the expense of virtualization overhead—at least for now. In the future, when OpenStack launches Grizzly with bare metal provisioning, this issue will be fixed and this will be a merging point for different types of Hadoop installations.

Savanna: Hadoop as a Service with OpenStack

To provide a solution for these use cases above, Mirantis has initiated the Savanna project, Hadoop as Service for OpenStack. For more detailed information, please visit the Savanna wiki page https://wiki.openstack.org/wiki/Savanna and project website http://software.mirantis.com/key-related-openstack-projects/savanna-hadoop/.

12 comments

12 Responses

Continuing the Discussion

  1. Dell Open Source Ecosystem Digest #14. Issue Highlight: "OpenStack Grizzly Released" (englischsprachig) - TechCenter - Blog - TechCenter – Dell Community

    [...] “Introducing Savanna, Hadoop as a Service for Private Cloud with OpenStack” by Alexander [...]

    April 5, 201304:18
  2. Dell Open Source Ecosystem Digest #14. Issue Highlight: "OpenStack Grizzly Released" - Dell TechCenter - TechCenter - Dell Community

    [...] “Introducing Savanna, Hadoop as a Service for Private Cloud with OpenStack” by Alexander [...]

    April 5, 201304:19
  3. Dell Open Source Ecosystem Digest #14. Issue Highlight: "OpenStack Grizzly Released" | ServerGround.net

    [...] “Introducing Savanna, Hadoop as a Service for Private Cloud with OpenStack” by Alexander [...]

    April 5, 201304:21
  4. Server King » Dell’s Digest for April 5, 2013

    [...] “Introducing Savanna, Hadoop as a Service for Private Cloud with OpenStack” by Alexander [...]

    April 5, 201315:33
  5. Red Hat, Hortonworks prep OpenStack for Hadoop | NEWS ONLINE

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201308:54
  6. Red Hat, Hortonworks prep OpenStack for Hadoop | TabletPCTrend.com

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201308:57
  7. Red Hat, Hortonworks prep OpenStack for Hadoop

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201309:07
  8. Red Hat, Hortonworks prep OpenStack for Hadoop | how to get the most from your gaming console

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201309:10
  9. Red Hat, Hortonworks prep OpenStack for Hadoop » Nottingham PC Repair

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201309:14
  10. Red Hat, Hortonworks prep OpenStack for Hadoop | Amazon Shoe Outlet-Wow Factor

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201310:00
  11. Red Hat, Hortonworks prep OpenStack for Hadoop « system-ON-key

    [...] launched the project earlier this month, donating the code to the OpenStack Foundation. OpenStack is a [...]

    April 16, 201313:41
  12. OpenStack Savanna – Fast Hadoop Cluster Provisioning on OpenStack | BigHadoop

    [...] is a Hadoop as a Service for OpenStack introduced by Mirantis. It is still in an early phase (v.02 has been released in Summer, 2013) and according to its [...]

    August 18, 201302:49

Some HTML is OK


or, reply to this post via trackback.