Towards a highly available (HA) open cloud: an introduction to production OpenStack
When OpenStack needs to be deployed for production purposes, be it a small cluster for development environments in a start-up or a large-scale public cloud provider installation, there are several key demands that deployment must meet. The most frequent and thus most important requirements are:
- High Availability (HA) and redundancy of service
- Scalability of the cluster
- Automation of operations
At Mirantis we have developed an approach that allows you to satisfy all three of these demands. This article introduces a series of posts describing our approach and gives a bird's-eye view of methods and tools used.
High availability and redundancy
OpenStack services can generally be divided into several groups, based here on the HA approach for a given service.
The first group includes API servers, namely:
As HTTP/REST services, they can be relatively simply made redundant with a load balancer added to the cluster. If the load balancer supports health checking, it suffices to provide basic high availability of API services. Note that in the 2012.1 (Essex) release of OpenStack platform, only the Swift API supports a specific "healthcheck" call. Other services require extensions to APIs to support such a call and make an actual check of service health.
The second group includes services that actually manage virtual servers and provide resources for them:
These services do not require specific redundancy in the production environment. The approach for this group of services is based on a fundamental paradigm for cloud computing, where we have many interchangeable workers, and loss of a single worker causes only temporary local disruption in manageability, not in service provided by the cluster. Thus, it is enough to monitor these services by using an external monitoring system and have basic recovery scenarios implemented as event handlers. A simple scenario is to send a notification to the administrator and attempt to restart the failed service.
High availability of the networking service as provided by the multihost feature of nova-network is covered in the official OpenStack documentation. In actual production environments, however, a frequent change to this scheme is offloading the routing of project networks to the external hardware router. This leaves only DHCP functions to
nova-network, and multihost ensures that the DHCP server is not a single point of failure.
Redundancy is built into the
nova-scheduler service. When a first instance of nova-scheduler is started, it starts consuming messages from the
scheduler queue in the RabbitMQ server. An additional queue scheduler_fanout_<ID> is created that is used by nova-compute services to update status. The parameter <ID> in the queue topic is replaced with the identifier of a new scheduler instance. All subsequently started nova-schedulers act similarly, which allows all of them to function in parallel without additional effort.
The RabbitMQ queue server is the main communication bus for all nova services, and it must be reliable in any production environment. Clustering and queue mirroring are supported natively by RabbitMQ, and a load balancer can be used to distribute connections between RabbitMQ servers running in clustered mode. Mirantis has also developed a patch for Nova RPC library implementation that allows it to fail over to the backup RabbitMQ server if the primary goes down and is unable to accept connections.
The most widely used DB for OpenStack deployments is MySQL, and it is most frequently used in deployments by Mirantis. Currently, there are a number of solutions that provide both high availability and scalability for MySQL. Among these solutions, the most common is MySQL-MMM (multi-master replication manager). This solution is used in more than one Mirantis deployment and works well enough, despite numerous known limitations.
Even though we had no serious issues with MMM, we are looking at more state-of-the-art open source solutions for database HA, particularly the WSREP-based Galera clustering engine for MySQL. The Galera cluster features simple and transparent scalability mechanisms and supports high availability through synchronous multi-master replication provided by WSREP layer.
The next post in this blog will cover solutions used by Mirantis to implement high availability of RabbitMQ and MySQL DB.
Once we know how to balance the load or parallelize the workload, we need a mechanism that allows us to add workers to the cluster and expand it to handle a bigger workload, also known as "horizontal" scaling. For most OpenStack platform components, it is simple to add an instance of the server, include it in the load balancer configuration, and have the cluster scaled out. However, this poses two specific problems in real-world production deployments:
- Most clusters are scaled by nodes, not service instances. This makes it necessary to define the roles of nodes that allows them to scale cluster smart. The role essentially translates into a set of services running on the node and is scaled out by adding a node to the cluster.
- Scaling out the cluster by adding a controller node requires configuration changes in multiple places in a particular order, i.e., the node must be deployed, services started, and only then is the load balancer configuration updated to include a new node. For compute nodes, the process is simpler, but still requires a high level of automation, from bare metal to services configuration.
Nodes and roles
While OpenStack services can be distributed among servers with high flexibility, the most common way to deploy the OpenStack platform is to have two types of nodes: a controller node and compute nodes. A typical development OpenStack installation includes a single controller node that runs all services except the compute group, and multiple compute nodes that run compute services and host virtual servers.
Obviously, this architecture does not work for production purposes. For small production clusters, we tend to recommend that you make cluster nodes as self-sufficient as possible by installing API servers on compute nodes, and leaving only the database, queue server, and dashboard on the controller node. Controllers should run in a redundant configuration. The following node roles are defined in this architecture:
- Endpoint node: This node runs load balancing and high availability services that may include load balancing software and clustering applications. A dedicated load balancing network appliance can serve as an endpoint node. A cluster should have at least two endpoint nodes configured for redundancy.
- Controller node: This node hosts communication services that support operation of the whole cloud, including the queue server, state database, Horizon dashboard, and possibly a monitoring system. This node can optionally host the nova-scheduler service and API servers load balanced by the endpoint node. At least two controller nodes must exist in a cluster to provide redundancy. The controller node and endpoint node can be combined in a single physical server, but it will require changes in configuration of nova services to move them from ports utilized by the load balancer.
- Compute node: This node hosts a hypervisor and virtual instances, and provides compute resources to them. The compute node can also serve as network controller for instances it hosts, if a multihost network scheme is in use.
The architecture proposed above requires a sequence of steps performed on every physical server in the cluster. Some of the steps are quite complex, and some involve more then one node; for example, load balancer configuration or multi-master replication setup. The complexity of the current OpenStack deployment process makes scripting of tasks and steps essential for success, which has given birth to more than one project already, including the well-known Devstack and Crowbar.
Simple scripting of the deployment process is not enough to successfully install OpenStack in production environments, nor to ensure scalability of cluster. You would also need to develop new scripts if you wanted to change something in your architecture or upgrade versions of components. However, there are tools designed for these tasks: configuration managers. Most well-known among them are Puppet and Chef, and there are products based on them (e.g., the aforementioned Crowbar, which has Chef under the hood).
We have used both Puppet and Chef to deploy OpenStack in a variety of projects. Naturally, each has its own limitations. From our own experience we know that the best results can be achieved when the configuration manager is supported by a centralized orchestration engine for seamless deployment. By combining it with a bare-metal provisioning application that configures the physical servers on a hardware level, and a test suite responsible for validation of deployment, we have an end-to-end approach that can quickly install the OpenStack platform in a wide range of hardware configurations and logical architectures.
Automation of operations
Using an orchestration engine with a configuration management system that recognizes node roles allows us to automate the deployment process to a very high degree. We can automate the scaling process as well. All of this reduces the costs of OpenStack operation and support. Most modern orchestrators have APIs, which allow you to create CLI or web-based user interfaces that will allow operators to perform administrative tasks across the whole cluster or its specific parts.
We'll talk more about this in blogs posts ahead.