Understanding your options: Deployment topologies for High Availability (HA) with OpenStack
September 10, 2012
When I was designing my first OpenStack infrastructure I struggled to find information on how to distribute its numerous parts across the hardware. I studied a number of documents, including Rackspace’s reference architecture (which was once at http://referencearchitecture.org but now seems to be mothballed). I also went through the design diagrams presented in the OpenStack documentation. I must admit that back in those days I had only basic knowledge of how these components interact, so I ended up with a pretty simple setup: one “controller node” that included everything, including API services,
When I joined Mirantis, it strongly affected my approach, as I realized that all my ideas involving a farm of dedicated compute nodes plus one or two controller nodes were wrong. While it can be a good approach from the standpoint of keeping everything tidily separated, in practice we can easily mix and match workhorse components without overloading OpenStack (e.g.
In general, one can assume that each OpenStack deployment needs to contain at least three types of nodes (with a possible fourth), which my colleague Oleg Gelbukh has outlined:
While the endpoint node’s role is obvious—it typically hosts the load-balancing software or appliance providing even traffic distribution to OpenStack components and high availability—the controller and compute nodes can be set up in many different ways, ranging from “fat” controller nodes which host all the OpenStack internal daemons (scheduler, API services, Glance, Keystone, RabbitMQ, MySQL) to “thin,” which host only those services responsible for maintaining OpenStack’s state (RabbitMQ and MySQL). Then compute nodes can take some of OpenStack’s internal processing, by hosting API services and scheduler instances.
At Mirantis we have deployed service topologies for a wide range of clients. Here I’ll provide a walk-through of them along with diagrams, and take a look at the different ways which OpenStack can be deployed. (Of course, service decoupling can go even further.)
Topology with a hardware load balancer
In this deployment variation, the hardware load balancer appliance is used to provide a connection endpoint to OpenStack services. API servers, schedulers, and instances of
All the native Nova components are stateless web services; this allows you to scale them by adding more instances to the pool (see the Mirantis blog post on scaling API services for details). That’s why we can safely distribute them across a farm of compute nodes. The database and message queue server can be deployed on both controller nodes in a clustered fashion (my earlier post shows ways how to do it). Even better: The controller node now hosts only platform components that are not OpenStack internal services (MySQL and RabbitMQ are standard Linux daemons). So the cloud administrator can afford to pass the administration of them to an external entity, Database Team, a dedicated RabbitMQ cluster. This way, the central controller node disappears and we end up with a bunch of compute/API nodes, which we can scale almost linearly.
Topology with a dedicated endpoint node
In this deployment configuration, we replace a hardware load balancer with an endpoint host that provides traffic distribution to a farm of services. Another major difference compared to the previous architecture is the placement of API services on controller nodes instead of compute nodes. Essentially, controller nodes have become “fatter” while compute nodes are “thinner.” Also, both controllers operate in active/standby fashion. Controller node failure conditions can be identified with tools such Pacemaker and Corosync/Heartbeat.
Topology with simple controller redundancy
In this deployment, endpoint nodes are combined with controller nodes. API services and
Many ways to distribute services
I’ve shown you service distribution across physical nodes, which Mirantis has done for various clients. However, sysadmins can mix and match them in a completely different way to suit their needs. The diagram below shows—based on our experience at Mirantis—how OpenStack services can be distributed across different node types.
Hardware considerations for different node types
The main load on the endpoint node is generated by a network subsystem. This type of node requires much of the CPU performance and network throughput. It is also useful to bind network interfaces for redundancy and increased bandwidth, if possible.
The cloud controller can be fat or thin. The minimum configuration it can hosts includes those pieces of OpenStack that maintain the system state: database and AMQP server. Redundant configuration of the cloud controller requires at least two hosts and we recommend use of network interface binding for network redundancy and RAID1 or RAID10 for storage redundancy. The following configuration can be considered the minimum for a controller node:
Compute nodes require as much memory and CPU power as possible. Requirements for the disk system are not very restrictive, though the use of SSDs can increase performance dramatically (since instance filesystems typically reside on the local disk). It is possible to use a single disk in a non-redundant configuration and in case of failure replace the disk and return the server back to the cluster as a new compute node.
In fact, the requirements for compute node hardware depend on customer evaluation of average virtual instance parameters and desired density of instances per physical host.
Volume controller nodes provide a persistent block storage feature to instances. Since block storage typically contains vital data, it is very important to ensure its availability and consistence. The volume node must contain at least six disks. We recommend installing the operating system on a redundant disk array (RAID1). The remaining four disks are assembled in a RAID5 array or RAID10 array, depending on the configuration of the RAID controller.
Block volumes are shared via iSCSI protocol, which means high loads on the network subsystem. We recommend at least two bonded interfaces for iSCSI data exchange, possibly tuned for this type of traffic (jumbo frames, etc.).
OpenStack network topology resembles a traditional data center. (Other Mirantis posts provide a deeper insight into OpenStack networking: FlatDHCPManager, VlanManager.) Instances communicate internally over fixed IPs (a data center private network). This network is hidden from the world by NAT and a firewall provided by the
The public network has two purposes:
The public network is usually isolated from private networks and management network. A public/corporate network is a single class C network from the cloud owner’s public network range (for public clouds it is globally routed).
The private network is a network segment connected to all the compute nodes; all the bridges on the compute nodes are connected to this network. This is where instances exchange their fixed IP traffic. If VlanManager is in use, this network is further segmented into isolated VLANs, one per project existing in the cloud. Each VLAN contains an IP network dedicated to this project and connects virtual instances that belong to this project. If a FlatDHCP scheme is used, instances from different projects all share the same VLAN and IP space.
The management network connects all the cluster nodes and is used to exchange internal data between components of the OpenStack cluster. This network must be isolated from private and public networks for security reasons. The management network can also be used to serve the iSCSI protocol exchange between the compute and volume nodes if the traffic is not intensive. This network is a single class C network from a private IP address range (not globally routed).
The iSCSI network is not required unless your workload involves heavy processing on persistent block storage. In this case, we recommend iSCSI on dedicated wiring to keep it from interfering with management traffic and to potentially introduce some iSCSI optimizations like jumbo frames, queue lengths on interfaces, etc.
Openstack daemons vs. networks
In high availability mode, all the OpenStack central components need to be put behind a load balancer. For this purpose, you can use dedicated hardware or an endpoint node. An endpoint node runs high-availability/load-balancing software and hides a farm of OpenStack daemons behind a single IP. The following table shows the placement of services on different networks under the load balancer:
OpenStack deployments can be organized in many ways, thus ensuring scalability and availability. This fact is not obvious from the online documentation (at least it wasn’t for me) and in fact I’ve seen a couple of deployments where sysadmins were sure they needed a central controller node. This isn’t true, and in fact it is possible to have an installation where there is no controller node at all and the database and messaging server are hosted by an external entity.
When architectures are distributed, you need to properly spread traffic across many instances of a service and also provide replication of stateful resources (like MySQL and RabbitMQ). The OpenStack folks haven’t provided any documentation on this so far, so Mirantis has been trying to fill this gap by producing a series of posts on scaling platform and API services.
Continuing the Discussion