OpenStack Project Technical Lead Interview Series #10: Sergey Lukjanov, OpenStack Savanna Project
This post is the 10th of a continuing series of interviews with OpenStack Project Technical Leads on our Mirantis blog. Our goal is to educate the broader tech community and help people understand how they can contribute to and benefit from OpenStack. Naturally, these are the opinions of the interviewee, not of Mirantis.
Here the interview is with Sergey Lukjanov, OpenStack Savanna Project Technical Lead.
Mirantis: Can you please introduce yourself?
Sergey Lukjanov: I'm a Senior Software Engineer and Technical Leader at Mirantis Inc. where I have been working for more than 3 years. My main responsibilities are architecture design and community-related work. I’m experienced in big data projects and technologies (Hadoop, HDFS, Cassandra, Twitter Storm, etc.) and enterprise-grade solutions. I’m currently contributing to different open source projects including Twitter Storm and OpenStack.
Q: What is your history with OpenStack? Why do you engage?
A: I have been actively involved with OpenStack for about a year and before that I was in an observer mode since the Diablo timeframe. I started actively contributing with the emergence of Swift middleware for data locality support. I then worked on the Savanna project from day one and I contributed to different OpenStack projects including Oslo, Swift, Nova client, Hacking, Pbr, Jeepyb, and so on. My main focus in OpenStack is to enrich the portfolio of services it provides with platform-level functionality, making it easier to use for application developers and encouraging faster and larger adoption of the OpenStack platform.
Q: What are your responsibilities on the Savanna project as the PTL?
A: The main responsibility is to oversee the project. It includes bugs and blueprints management at Launchpad, code review coordination in Gerrit, IRC meetings, and Design Summit chairing. I think, a PTL is a person who should coordinate all subteams to avoid overlaps and conflicts between them and ensure that the overall project direction is in line with scope and goals. In addition to that I’m a top contributor and reviewer of Savanna.
Q: Can you explain Savanna’s role within OpenStack? Why does Savanna matter?
A: I see OpenStack not just an IaaS, but also as a vast community, a huge ecosystem of extremely fast growing projects integrated with each other to provide one consistent cloud platform. And here I see an opportunity to further enhance the ecosystem by creating an integration framework with other open source initiatives and their communities. The integration with the Apache Hadoop community through Savanna project is a great example. From the users’ point of view - big data processing could be eventually useful for most of the OpenStack projects.
Q: What is genuinely unique and disruptive about Savanna?
A: Savanna was applied for incubation in the end of the Havana cycle within the Data Processing program. Currently Savanna provides major infrastructure-level operations in two areas:
Hadoop ecosystem cluster provisioning and management using Hadoop vendors tooling like Apache Ambari to provision the Hortonworks Data Platform;
Hadoop jobs scheduling and operating including their creation, execution etc.
In addition I would like to clarify that Savanna doesn’t provide any Data APIs due to the extremely wide broad list of potential problems in the big data area. In the future we’ll be moving from Hadoop to NNN data processing tools.
Q: Tell us about the Savanna community--who is contributing?
A: We started with a small team at Mirantis and grew to about 30 contributors in the Havana release cycle with a core team from Mirantis, Red Hat, and Hortonworks and contributors from HP, IBM, UnitedStack, and Rackspace.
Q: What has the Savanna community accomplished so far?
A: Currently we have a service that can provision and manage cluster with support for scaling (both increasing and decreasing cluster size, including addition of new node types), anti-affinity (for example, to guarantee data nodes reliability), and data-locality (to improve Hadoop jobs execution performance). We’re using node group and cluster templates to store cluster configurations. As for the Elastic Data Processing, our second and main feature, Savanna supports plain jar, Pig and Hive jobs execution using Oozie, including the ability to read/write data from/to Swift. In terms of pluggability we have two plugins for Hadoop cluster provisioning - Vanilla plugin, that simply installs all needed services, and Hortonworks Data Platform plugin, that installs Apache Ambari to start up and configure a Hadoop cluster. And, of course, we have an OpenStack Dashboard plugin that exposes all our functionality.
Q: Which capabilities will Savanna provide in the OpenStack Icehouse release?
A: The main goal is to integrate better with other OpenStack projects and infrastructure. The major change that is planned for Icehouse is to support Heat for resources orchestration instead of direct provisioning. Also, we’re working on integrating Savanna to the DevStack gate--Savanna is already supported by DevStack and we’re moving our API and integration tests to tempest now. One more thing that I hope to see in Savanna Icehouse are guest agents that will solve controller-instance accessibility issues by using pluggable transports; no more direct ssh/API calls will be needed. In the EDP we’d like to improve overall workflows and add support of new features, job types, data sources, and so on. In addition to that I expect to see at least one more vendor-backed plugin like IDH (Intel Distribution for Apache Hadoop) plugin that’s already on review.
Q: What do you wish people knew about this project?
A: Savanna mission is to provide the OpenStack community with data processing tools and currently we’re focused on the Hadoop ecosystem but we already have some discussions and blueprints to add other tools, for example Apache Spark and Twitter Storm. So, our ongoing work is to collect feedback on EDP and add new features and data processing tools to it.
Q: Are there any common misconceptions about Savanna?
A: The story about Data API in Savanna. We have no Data API, but two levels of the control plane API, one for clusters provisioning/operations and management and one for jobs execution and workflows management. And again, as for our mission--we’d like to provide complex tools in the data processing area instead of being a single use or single framework solution. Our domain area is data processing.
Q: Which use cases are you targeting?
A: There are several use cases that we’re keeping in mind writing Savanna. First of all, self-service provisioning of data processing clusters (for now, Hadoop clusters). Utilization of unused compute capaсity for bursty workloads is an important use case for cloud platforms, too. Running data processing workloads (different Hadoop jobs at the moment) in a few clicks without expertise in specific data processing tools.
Q: What is your vision for Savanna?
A: From my point of view Savanna should be a data processing tools provisioning/clustering service with the most major feature of providing operations in area of Elastic Data Processing, such as jobs execution and etc.
Q: Whom would you like to see contributing to Savanna?
A: I’d love to see two types of people to contribute. We need people who are interested in different distributions of Hadoop and especially in other data processing frameworks. The other group we really need are operators, people who will adopt and use Savanna for their data processing workloads and help us by providing feedback and comments.
Q: Which functionalities need to be enhanced? and tested?
A: The future Heat integration needs testing because it will replace a huge part of the orchestration code. We’re working on moving our integration tests to Tempest, so, that’s an area where we need help with--both moving existing tests and adding new ones. We need more testing of Savanna on different operating systems and with different guest operating systems.
Q: How specifically can people get started?
A: I hope that it’s now not very difficult to start working on Savanna--it can be installed by DevStack and only the diskimage-builder based image that we have on CDN should be uploaded to Glance. We have good guides for developers, admins and users on http://docs.openstack.org/developer/savanna. And, of course, we’re working on making this process simpler especially due to the expectation of new plugin writers and contributors. For any questions you can find us in #savanna channel at freenode and in the email@example.com mailing list (please, use the [savanna] subject prefix).
Q: Thank you for this interview, Sergey.
A: You’re welcome.