Segregation in Grizzly: Availability Zones vs. Host Aggregates
May 16, 2013
Segregation of OpenStack nodes in Grizzly
As your cluster begins to grow in size, you will very likely begin to think about ways in which to segregate different areas of the cluster for various reasons. Perhaps you want you make sure you have multiple copies of your data in multiple regions, or you want to ensure continuity of service if a particular data center’s power goes down. On the other hand, you might want to separate different parts of your cluster based on capabilities; you may want your customers to be able to request the fastest disks, or require that a particular flavor of instance is only booted on a particular type of hardware.
OpenStack’s Grizzly release brings a number of new features related to partitioning your cluster. Some of these, such as cells, are related to scaling the OpenStack compute service. Others, such as regions, involve geographical placement of data or compute resources.
Grizzly also introduces the concept of host aggregates, which control which machines host particular VM’s. If you think that this sounds like Folsom’s availability zones, you’re not alone; the two are very commonly confused. But where availability zones are a customer-facing capability, host aggregates are meant to be used by administrators to separate hardware by particular properties, and are not seen by customers.
Let’s take a look at how these two capabilities differ, and when each is most appropriate.
OpenStack and Availability zones
Let’s start with the more familiar idea of availability zones. An availability zone is a way in which the user can specify a particular “location” in which a host should boot. The most common usage for availability zones is to group together servers in terms of, well, availability. For larger clusters, this might be defined in terms of geography, as in “as long as the Chicago data center has Internet connectivity, these servers will be available.” For smaller clusters, it might be defined in terms of more tangible measures, such as “as long as this particular power supply is up, these servers will be available.”
To specify an availability zone in which your host will boot, you simply need to specify it using the availability-zone flag, as in:
In this case, we’re specifying that the large instance
(The availability zone for a server itself is set in the nova.conf file using node_availability_zone.)
So for the user, availability zones are fairly straightforward; pick a zone, start a VM. Host aggregates, on the other hand, are the administrator’s domain.
Where availability zones are designed for users to be able to choose where to host their virtual machines, host aggregates are intended as a way to group servers that have a particular quality to them. While that might indeed be geographical, as in “all hosts in the Chicago data center”, or availability based, as in “all hosts in this rack”, host aggregates are instead designed for grouping servers by capability, such as “all hosts using solid-state drives,” or “all hosts with more than 32GB of memory”.
In fact, there aren’t any particular requirements for using host aggregates in OpenStack; they’re completely arbitrary, and subject to the whim of an administrator. For example, while it’s common to create aggregates such as “all hosts with SSDs” (so users know they’re getting the fastest disk access) or “all hosts with trusted hardware” (for secure applications), we can go ahead and create an aggregate such as “all hosts administered by Joe” (assuming that Joe is our ace admin, and his servers never, ever go down unexpectedly).
To create that zone, we use the aggregate-create command:
Notice that we’ve created this aggregate within the chicago availability zone, rather than the default nova zone; aggregates can be used to further subdivide availability zones, but this parameter is optional.
So the big question is, if availability zones and host aggregates both segregate a cluster, why do we need both of them?
What a host aggregate brings to the party that availability zones don’t
Availability zones are handy for allowing users to specify a particular group of servers on which they want their host to run, but beyond that they don’t do much more than serve as a bucket. In this example, using an availability zone, our users can specify that a VM should be started up in the Chicago data center.
Host aggregates, on the other hand, serve as an intelligent way for schedulers to know where to place VM’s based on some sort of characteristic. In this example, we might want to enable users to easily boot their most mission-critical VMs on servers that are administered by Joe, rather than leaving them to fate.
In general, the workflow for using host aggregates looks like this:
Let’s look at how this would work in our case.
1) First, we need to make sure that the scheduler supports host aggregates. To do that, check the
/etc/nova/nova.conf file on the nova-scheduler server to make sure that
2) Next, create the aggregate itself:
This command creates a new aggregate, joesdomain , in the chicago availability zone, and enables us to see the id , as in:
3) Now specify the joeistheboss property using that id :
4) Of course we need to add some hosts do that aggregate so we have somewhere to boot the VM’s:
5) To tie all of this together, create a flavor that requires the joeistheboss property:
This creates the new flavor and specifies the extra_specs property, as you can see with the flavor-show command:
6) Finally, users who are particularly concerned about their VMs can use this flavor to make sure that their VMs are hosted in joesdomain :
The scheduler in this case knows the following:
Accordingly, the scheduler knows to start this new VM on one of those three hosts.
What do I use when?
So now that we know the difference between availability zones and host aggregates, which do we use, and when?
As a user, there’s really no decision; only admins can create host aggregates, so availability zones are your only choice, unless they’ve already been set up.
As an admin planning for your customers, however, you do have a decision to make. In general, you’ll need to consider the following:
While it’s easy to mix up availability zones and host aggregates, each is used for a different purpose, and each gives you control over the segregation of your cluster in a different way. Availability zones enable users to specifically choose from a group of hosts. Aggregates enable an administrator to more easily specify the way in which hardware is utilized. In this example we’ve chosen the somewhat lighthearted example of requiring a particular system administrator, but the point is that you have the ability to control host aggregates on completely arbitrary grounds.18 comments
Continuing the Discussion