Earlier this week, project Savanna, which provides elastic Hadoop provisioning and elastic data processing on OpenStack, was accepted into OpenStack incubation as the Data Processing service. On behalf of the entire Savanna team, we would like to thank the OpenStack Community and Technical Committee for their continued support in helping us reach this milestone.
Why is Savanna incubation a big deal?
Aside from providing useful functionality, which we feel will boost enterprise adoption of OpenStack, we see Savanna as an opportunity for the two largest open source ecosystems–the Apache Foundation (which houses Hadoop) and the OpenStack Foundation–to work closer together and learn from each other. To that effect, as Savanna embarks on its incubation journey, tighter integration between Savanna and rest of the OpenStack ecosystem will be the Savanna team’s #1 priority.
We are committed to deeper integration
Savanna is already tightly integrated with core OpenStack components, including Nova, Keystone, Glance, Cinder, and Horizon, as seen in the figure below. Savanna also enables Hadoop to use Swift as storage for MapReduce jobs. In addition to that, Savanna uses diskimage-builder to build images with installed Hadoop.
Savanna’s plan is to eventually use Ceilometer for metrics, Heat for orchestration, and potentially Ironic for provisioning bare metal or hybrid Hadoop clusters. Further, we intend to support databases provisioned by Trove as data sources for our data processing. Given the degree of complexity involved with Hadoop cluster provisioning, we feel that Savanna incubation will benefit the many surrounding projects that touch upon provisioning and orchestration use cases in OpenStack in general. To that effect, Mirantis has dedicated resources to Trove and Heat.
Savanna is committed to operating and integrating with the OpenStack ecosystem.
Meet the team
I’m Sergey Lukjanov (SergeyLukjanov on irc), the Savanna Project Technical Leader (PTL) and Senior Software Engineer at Mirantis. My main responsibilities are architecture design and community-related work in Savanna. My activities also include reviewing and contributing to Savanna and overseeing Launchpad and Gerrit activity. Some of the Big Data projects and technologies I’ve worked on are Hadoop, HDFS, HBase, Cassandra, and Twitter Storm, as well as enterprise-grade solutions.
The other Savanna core team members are:
- Alexander Ignatov (aignatov on irc)–A Senior Software Engineer at Mirantis, he has expertise in networks, Java and distributed systems such as Hadoop and HBase. Alexander has been involved in the project since the beginning. He is the main author of the Vanilla Hadoop plugin.
- Matthew Farrellee (mattf on irc)–A Principal Software Engineer and Engineering Manager at Red Hat with over a decade of experience in distributed and computational system development and management, Matt has been involved with Savanna since it was renamed from EHO. He is a major contributor to the diskimage-builder elements for Savanna and an active participant in architecture design discussions. He is integrating Savanna within the Fedora Big Data SIG.
- John Speidel (jspeidel on irc)–A Senior Member of Technical Staff at Hortonworks, John has 15 years of experience developing commercial middleware systems with a focus on distributed transaction processing. John is a co-author of the Hortonworks Data Platform plugin for Savanna.
Sharing in this accomplishment are all of the developers who have made significant contributions to Savanna. For a full list of contributors and other information about the project, please visit the Savanna Incubation wiki.
Incubation: what does that mean, anyway?
A project has to undergo a period of incubation before it can be made part of the OpenStack integrated release. During this process, the incubated project goes through at least one release cycle that also includes aligning with OpenStack’s common practices and exploring integration opportunities with the other projects.
Along the way, projects migrate to the OpenStack namespace and the main OpenStack infrastructure.
The Technical Committee conducts a graduation review at the end of the development cycle prior to the election of the next cycle’s PTLs. As a result of the review, projects that have fulfilled all of the incubation milestones and substantially integrated with the other OpenStack services are promoted to Integrated status for the next release cycle.
About Savanna’s latest release
Savanna 0.2.2, released today, provides two important updates:
- The new Hortonworks Data Platform (HDP) Plugin, which enables you to provision HDP clusters on OpenStack using templates with a single click and in an easily repeatable fashion. Core to the HDP Plugin is Apache Ambari, which is used as the orchestrator for deploying the HDP stack. This plugin supports converting Apache Ambari Blueprints to native Savanna templates that can be used to provision Hadoop clusters. The main benefit of this approach is that users can export blueprints from the Hadoop clusters deployed and configured by Apache Ambari outside of OpenStack and then deploy the same clusters using Savanna.
- Improved cluster scaling support, bugs fixes, and added documentation, especially regarding plugins.