The Road to Hong Kong—OpenStack Summit Speakers #3: Savanna, Elastic Hadoop on OpenStack
This is the third in a series of posts by speakers at the Hong Kong OpenStack Summit. Today we feature the agenda for Ilya Elterman, Sergey Lukjanov, and Matthew Farrellee’s talk about provisioning and managing Hadoop clusters on OpenStack using Savanna, scheduled for November 7, from 2:40 pm to 3:20 pm.
Savanna supports two key use cases: on-demand cluster provisioning and on-demand Hadoop jobs execution Elastic Data Processing (EDP). This presentation will focus on the general project vision and EDP functionality. It will provide an introduction to the Savanna project, review the features implemented in the 0.3 version, and talk about the further plans. It will also cover the key architectural aspects and include a live demo. The demo will show how users can execute Hadoop jobs in a single click using a pre-configured Hadoop cluster template on the data stored in Swift.
The latest release of Savanna, an open-source project recently accepted into OpenStack Incubation, now provides Elastic Data Processing (EDP), a feature that allows single-click MapReduce job creation and launch. We’ll be talking about the full scope and features in version 0.3 next Thursday in Hong Kong, and we hope you will come to check it out. Here we share a few points that we will discuss there and give you an overview of a live EDP demo we will include in our talk. We can’t wait to share it.
1. Introduction to Savanna: What is it?
Savanna contains two management APIs--for Hadoop clusters and jobs management. In this part of our presentation, we will include use cases, a general project overview, and the direction in which the project is going.
We’ll talk about the following use cases:
Fast cluster provisioning for development and QA
Dedicated clusters for each tenant, which resolves security and isolation issues for Hadoop multi-tenancy
EDP--Utilization of the unused compute capacity for bursty workloads
EDP--Running Hadoop workloads in a few clicks without expertise in Hadoop ops
Centralized cluster management and monitoring for administrators
2. The current state of EDP--Provisioning
When we discuss the current state of EDP, we will cover:
The REST API for executing MapReduce jobs without exposing the details of the underlying infrastructure (similarly to AWS Elastic MapReduce [EMR]), including the integration of:
Pluggable data sources: Swift
Pig and Hive job types
Oozie for workflow management
A user-friendly UI for ad-hoc analytics queries based on Pig or Hive
3. Live demo--Provisioning, EDP, and transient clusters
We’re especially excited to share a live demo that will feature a one-click MapReduce job execution flow on data located in Swift.
4. Roadmap for the Icehouse release cycle
In this part, we'll talk about the future plans and our short-term and long-term roadmaps. The scope for the Icehouse, the future of Savanna architecture, and integration with other OpenStack projects, will be defined in separate Design Summit sessions, held on Friday, November 8, from 1:30 pm to 4:50 pm. Please check out our Design Summit schedule and join us.
In essence, the project team is working toward the integration with the OpenStack ecosystem, particularly Heat, Ceilometer, Tempest, and DevStack. Other plans for the future include code hardening, EDP enhancement that incorporate external Hadoop Distributed File System (HDFS) and relational database management system (RDBMS) data sources, and performance testing.
5. Interesting stats
We will also share some interesting facts about our contributors, code, reviews, and community at large. Savanna’s reviewer team has grown over the last three months from 14 to 24 active members from five companies–Mirantis, Red Hat, Hortonworks, Rackspace, and IBM.
Ilya Elterman runs the Cloud Platform Engineering organization at Mirantis, responsible for creating elastic platform services on top of OpenStack IaaS. Sergey Lukjanov is the Project Technical Leader of Savanna project, and his main responsibilities are architecture design and community-related work in Savanna. Matthew Farrellee is a Principal Software Engineer and Engineering Manager at Red Hat, with over a decade of experience in distributed and computational system development and management.