Sahara updates in the OpenStack Juno release
The OpenStack Juno release includes several new features for the Sahara project, including the following:
Two new provisioning plugins and the Cloudera Hadoop Distribution plugin are included.
Provisioning and EDP engines have been enhanced to support more cluster setups and job types.
Swift authorization offers better security.
The Sahara UI has been incorporated into the OpenStack Dashboard project.
Support for running in distributed mode is in alpha state.
Other updates are also available.
The Apache Spark plugin enables Spark jobs to execute on top of Hadoop clusters. The plugin uses the same Hadoop distribution as the Cloudera plugin but with a different EDP engine. It supports Spark 0.9.1 and 1.0.0 installations.
The Vanilla plugin can now install Hadoop 2.4.1. Hadoop 2.3.0 is now deprecated; however, all existing clusters continue to operate normally.
The Cloudera Hadoop Distribution plugin allows Hadoop clusters setup using Cloudera Manager. Sahara installs Cloudera Manager 5, which then provisions Hadoop 2.3.0 to VMs.
The plugin can perform all kind of operations, including cluster scaling and running EDP jobs.
New EDP engines in Sahara allow it to execute new EDP job types, such as:
Oozie workflow engine - responsible for MapReduce, Pig, Hive, and Streaming jobs
Spark engine - executes Spark jobs on the clusters provisioned with a new Apache Spark plugin
Swift authorization improvement
The Swift authorization mechanism no longer requires storing credentials internally, which was a noted security concern. Keystone trusts are now used for Swift authorization.
The Sahara UI has been incorporated in the OpenStack Dashboard project. All panels are now available under the Project tab in the Data Processing section. This migration simplifies the Sahara deployment process, eliminating the need to customize the OpenStack Dashboard with Sahara’s panels.
Horizon has adopted Bootstrap 3 to improve the appearance of all the Sahara UI tables.Workflows and detail views have been updated.
Support for running in distributed mode
Sahara now has alpha support for running in distributed mode. Two process types can run on separate OpenStack nodes. The API process serves HTTP requests and builds the tasks for the engine. The engine process actually executes the tasks and reports back to the API. Distributed mode allows as many APIs and engines as needed per OpenStack cloud.
The all-in-one entry point has been renamed to a more descriptive sahara-all, which is still recommended.
Sahara now supports more flexible configurations depending on the cloud installation, such as:
Regions support - If OpenStack is running in a multi-region mode, you can configure Sahara to operate inside a specific region.
Ceilometer integration - Sahara sends notifications each time the Cluster changes state.
Security Groups support - Sahara can now set up a security group for instances in a node group. You can use an existing security group or create a new one.
Anti-affinity - The internal implementation now uses the Server Groups API. The change should not be visible from the user’s perspective as the mechanism is used for new clusters. The clusters created with Icehouse Sahara will continue to operate normally.