The first and final words on OpenStack availability zones
Often, there isn’t even agreement over the basic meaning or purpose of an OpenStack availability zone.
On the one hand, you have the literal interpretation of the words “availability zone”, which would lead us to think of some logical subdivision of resources into failure domains, allowing cloud applications to intelligently deploy in ways to maximize their computing availability. (We’ll be running with this definition for the purposes of this article.)
On the other hand, the different ways that projects implement availability zones lend themselves to certain ways of using the feature as a result. In other words, because this feature has been implemented in a flexible manner that does not tie us down to one specific concept of an availability zone, there's a lot of confusion over how to use them.
So, what is an availability zone, really?
In this article, we'll learn about the traditional overview of availability zones, insights into and best practices and guidelines for planning and using them, and even a little bit about non-traditional uses. Finally, we hope to address the question: Is an OpenStack availability zone right for you?
OpenStack availability zone Implementations
One of the things that complicates the use of an OpenStack availability zone is that each-OpenStack project implements them in their own way (if at all). If you do plan to use availability zones, you should evaluate which OpenStack projects you're going to use to support them, and how that affects your design and deployment of those services.For the purposes of this article, we will look at three core software services with respect to availability zones: Nova, Cinder, and Neutron. We won't go into the steps to set up availability zones, but instead, we'll focus on a few of the key decision points, limitations, and trouble areas with them.
Nova availability zones
Since host aggregates were first introduced in OpenStack Grizzly, I have seen a lot of confusion about availability zones in Nova. Nova tied their availability zone implementation to host aggregates, and because the latter is a feature unique to the Nova project, its implementation of availability zones is also unique.I have had many people tell me they use availability zones all the time in Nova, convinced they are not using host aggregates. Well, I have news for these people -- all* availability zones in Nova are host aggregates (though not all host aggregates are availability zones):
* Exceptions being the default_availability_zone that compute nodes are placed into when not in another user-defined availability zone, and the internal_service_availability_zone where other nova services live
Some of this confusion may come from the nova CLI. People may access some documentation online, see they can create an availability zone with one command, and may not realize that they’re actually creating a host aggregate. Ex:
$ #nova aggregate-create <aggregate name> <AZ name> $ nova aggregate-create HA1 AZ1 +----+---------+-------------------+-------+------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+------------------------+ | 4 | HA1 | AZ1 | | 'availability_zone=AZ1'| +----+---------+-------------------+-------+------------------------+I have seen people get confused with the second argument (the AZ name). This is just a shortcut for setting the availability_zone metadata for a new host aggregate you want to create.
This command is equivalent to creating a host aggregate, and then setting the metadata:
$ nova aggregate-create HA1 +----+---------+-------------------+-------+----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+----------+ | 7 | HA1 | - | | | +----+---------+-------------------+-------+----------+
$ nova aggregate-set-metadata HA1 availability_zone=AZ1Doing it this way, it’s more apparent that the workflow is the same as any other host aggregate, the only difference is the “magic” metadata key availability_zone which we set to AZ1 (notice we also see AZ1 show up under the Availability Zone column). And now when we add compute nodes to this aggregate, they will be automatically transferred out of the default_availability_zone and into the one we have defined. For example:
Metadata has been successfully updated for aggregate 7. +----+---------+-------------------+-------+------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+------------------------+ | 7 | HA1 | AZ1 | | 'availability_zone=AZ1'| +----+---------+-------------------+-------+------------------------+
Before:
$ nova availability-zone-list | nova | available | | |- node-27 | | | | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 | +-------------------+----------------------------------------+After:
$ nova availability-zone-list | AZ1 | available | | |- node-27 | | | | |- nova-compute | enabled :-) 2016-11-06T05:13:48.000000 | +-------------------+----------------------------------------+In instances like these, note that there is one behavior that sets apart the availability zone host aggregates apart from others. Namely, nova does not allow you to assign the same compute host to multiple aggregates with conflicting availability zone assignments. For example, we can first add compute a node to the previously created host aggregate with availability zone AZ1:
$ nova aggregate-add-host HA1 node-27 Host node-27 has been successfully added for aggregate 7 +----+------+-------------------+----------+------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+------+-------------------+----------+------------------------+ | 7 | HA1 | AZ1 | 'node-27'| 'availability_zone=AZ1'| +----+------+-------------------+----------+------------------------+Next, we create a new host aggregate for availability zone AZ2:
$ nova aggregate-create HA2Now if we try to add the original compute node to this aggregate, we get an error because this aggregate has a conflicting availability zone:
+----+---------+-------------------+-------+----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+----------+ | 13 | HA2 | - | | | +----+---------+-------------------+-------+----------+
$ nova aggregate-set-metadata HA2 availability_zone=AZ2 Metadata has been successfully updated for aggregate 13. +----+---------+-------------------+-------+------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+------------------------+ | 13 | HA2 | AZ2 | | 'availability_zone=AZ2'| +----+---------+-------------------+-------+------------------------+
$ nova aggregate-add-host HA2 node-27(Incidentally, it is possible to have multiple host aggregates with the same availability_zone metadata, and add the same compute host to both. However, there are few, if any, good reasons for doing this.)
ERROR (Conflict): Cannot add host node-27 in aggregate 13: host exists (HTTP 409) +----+------+-------------------+----------+------------------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+------+-------------------+----------+------------------------+ | 13 | HA2 | AZ2 | | 'availability_zone=AZ2'| +----+------+-------------------+----------+------------------------+
In contrast, Nova allows you to assign this compute node to another host aggregate with other metadata fields, as long as the availability_zone doesn't conflict:
You can see this if you first create a third host aggregate:
$ nova aggregate-create HA3 +----+---------+-------------------+-------+----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+----------+ | 16 | HA3 | - | | | +----+---------+-------------------+-------+----------+Next, tag the host aggregate for some purpose not related to availability zones (for example, an aggregate to track compute nodes with SSDs):
$ nova aggregate-set-metadata HA3 ssd=True Metadata has been successfully updated for aggregate 16. +----+---------+-------------------+-------+-----------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+---------+-------------------+-------+-----------+ | 16 | HA3 | - | | 'ssd=True'| +----+---------+-------------------+-------+-----------+Adding original node to another aggregate without conflicting availability zone metadata works:
$ nova aggregate-add-host HA3 node-27 Host node-27 has been successfully added for aggregate 16 +----+-------+-------------------+-----------+------------+ | Id | Name | Availability Zone | Hosts | Metadata | +----+-------+-------------------+-----------+------------+ | 16 | HA3 | - | 'node-27' | 'ssd=True' | +----+-------+-------------------+-----------+------------+(Incidentally, Nova will also happily let you assign the same compute node to another aggregate with ssd=False for metadata, even though that clearly doesn't make sense. Conflicts are only checked/enforced in the case of the availability_zone metadata.)
Nova configuration also holds parameters relevant to availability zone behavior. In the nova.conf read by your nova-api service, you can set a default availability zone for scheduling, which is used if users do not specify an availability zone in the API call:
[DEFAULT] default_schedule_zone=AZ1However, most operators leave this at its default setting (None), because it allows users who don’t care about availability zones to omit it from their API call, and the workload will be scheduled to any availability zone where there is available capacity.
If a user requests an invalid or undefined availability zone, the Nova API will report back with an HTTP 400 error. There is no availability zone fallback option.
Cinder
Creating availability zones in Cinder is accomplished by setting the following configuration parameter in cinder.conf, on the nodes where your cinder-volume service runs:[DEFAULT] storage_availability_zone=AZ1Note that you can only set the availability zone to one value. This is consistent with availability zones in other OpenStack projects that do not allow for the notion of overlapping failure domains or multiple failure domain levels or tiers.
The change takes effect when you restart your cinder-volume services. You can confirm your availability zone assignments as follows:
cinder service-list +---------------+-------------------+------+---------+-------+ | Binary | Host | Zone | Status | State | +---------------+-------------------+------+---------+-------+ | cinder-volume | hostname1@LVM | AZ1 | enabled | up | | cinder-volume | hostname2@LVM | AZ2 | enabled | up |If you would like to establish a default availability zone, you can set the this parameter in cinder.conf on the cinder-api nodes:
[DEFAULT] default_availability_zone=AZ1This instructs Cinder which availability zone to use if the API call did not specify one. If you don’t, it will use a hardcoded default, nova. In the case of our example, where we've set the default availability zone in Nova to AZ1, this would result in a successful provision.
But had we omitted the storage_availability_zone, the call would have failed (Cinder falls back to the hard coded default “nova” AZ in this case, which does not exist in our example).
This also means that unlike Nova, users do not have the flexibility of omitting availability zone information and expecting that Cinder will select any available backend with spare capacity in any availability zone.
Therefore, you have a choice with this parameter. You can set it to one of your availability zones so API calls without availability zone information don’t fail, but causing a potential situation of uneven storage allocation across your availability zones. Or, you can not set this parameter, and accept that user API calls that forget or omit availability zone info will fail.
Another option is to set the default to a non-existent availability zone you-must-specify-an-AZ or something similar, so when the call fails due to the non-existent availability zone, this information will be included in the error message sent back to the client.
Your storage backends, storage drivers, and storage architecture may also affect how you set up your availability zones. If we are using the reference Cinder LVM ISCSI Driver deployed on commodity hardware, and that hardware fits the same availability zone criteria of our computes, then we could setup availability zones to match what we have defined in Nova. We could also do the same if we had a third party storage appliance in each availability zone, e.g.:
| Binary | Host | Zone | Status | State | | cinder-volume | hostname1@StorageArray1 | AZ1 | enabled | up | | cinder-volume | hostname2@StorageArray2 | AZ2 | enabled | up |(Note: Notice that the hostnames (hostname1 and hostname2) are still different in this example. The cinder multi-backend feature allows us to configure multiple storage backends in the same cinder.conf (for the same cinder-volume service), but Cinder availability zones can only be defined per cinder-volume service, and not per-backend per-cinder-volume service. In other words, if you define multiple backends in one cinder.conf, they will all inherit the same availability zone.)
However, in many cases, if you’re using a third party storage appliance, then these systems usually have their own built-in redundancy that exist outside of OpenStack notions of availability zones. Similarly if you use a distributed storage solution like Ceph, then availability zones have little or no meaning in this context. In this case, you can forgo Cinder availability zones.
The one issue in doing this, however, is that any availability zones you defined for Nova won’t match. This can cause problems when Nova makes API calls to Cinder - for example, when performing a Boot from Volume API call through Nova. If Nova decided to provision your VM in AZ1, it will tell Cinder to provision a boot volume in AZ1, but Cinder doesn’t know anything about AZ1, so this API call will fail. To prevent this from happening, we need to set the following parameter in cinder.conf on your nodes running cinder-api:
[DEFAULT] allow_availability_zone_fallback=TrueThis parameter prevents the API call from failing, because if the requested availability zone does not exist, Cinder will fallback to another availability zone (whichever you defined in default_availability_zone parameter, or in storage_availability_zone if the default is not set). The hardcoded default storage_availability_zone is nova, so the fallback availability zone should match the default availability zone for your cinder-volume services, and everything should work.
The easiest way to solve the problem, however, is to remove the AvailabilityZoneFilter from your filter list in cinder.conf on nodes running cinder-scheduler. This makes the scheduler ignore any availability zone information passed to it altogether, which may also be helpful in case of any availability zone configuration mismatch.
Neutron
Availability zone support was added to Neutron in the Mitaka release. Availability zones can be set for DHCP and L3 agents in their respective configuration files:[AGENT] Availability_zone = AZ1Restart the agents, and confirm availability zone settings as follows:
neutron agent-show <agent-id>
+---------------------+------------+ | Field | Value | +---------------------+------------+ | availability_zone | AZ1 | ...If you would like to establish a default availability zone, you can set this parameter in neutron.conf on neutron-server nodes:
[DEFAULT] default_availability_zones=AZ1,AZ2This parameter tells Neutron which availability zones to use if the API call did not specify any. Unlike Cinder, you can specify multiple availability zones, and leaving it undefined places no constraints in scheduling, as there are no hardcoded defaults. If you have users making API calls that do not care about the availability zone, then you can enumerate all your availability zones for this parameter, or simply leave it undefined - both would yield the same result.
Additionally, when users do specify an availability zone, such requests are fulfilled as a “best effort” in Neutron. In other words, there is no need for an availability zone fallback parameter, because your API call still executes even if your availability zone hint can’t be satisfied.
Another important distinction that sets Neutron aside from Nova and Cinder is that it implements availability zones as scheduler hints, meaning that on the client side you can repeat this option to chain together multiple availability zone specifications in the event that more than one availability zone would satisfy your availability criteria. For example:
$ neutron net-create --availability-zone-hint AZ1 \ --availability-zone-hint AZ2 new_network
As with Cinder, the Neutron plugins and backends you’re using deserve attention, as the support or need for availability zones may be different depending on their implementation. For example, if you’re using a reference Neutron deployment with the ML2 plugin and with DHCP and L3 agents deployed to commodity hardware, you can likely place these agents consistently according to the same availability zone criteria used for your computes.Whereas in contrast, other alternatives such as the Contrail plugin for Neutron do not support availability zones. Or if you are using Neutron DVR for example, then availability zones have limited significance for Layer 3 Neutron.
OpenStack Project availability zone Comparison Summary
Before we move on, it's helpful to review how each project handles availability zones.Nova | Cinder | Neutron | |
Default availability zone scheduling | Can set to one availability zone or None | Can set one availability zone; cannot set None | Can set to any list of availability zones or none |
Availability zone fallback | None supported | Supported through configuration | N/A; scheduling to availability zones done on a best effort basis |
Availability zone definition restrictions | No more than 1 availability zone per nova-compute | No more than 1 availability zone per cinder-volume | No more than 1 availability zone per neutron agent |
Availability zone client restrictions | Can specify one availability zone or none | Can specify one availability zone or none | Can specify an arbitrary number of availability zones |
Availability zones typically used when you have ... | Commodity HW for computes, libvirt driver | Commodity HW for storage, LVM iSCSI driver | Commodity HW for neutron agents, ML2 plugin |
Availability zones not typically used when you have... | Third party hypervisor drivers that manage their own HA for VMs (DRS for VCenter) | Third party drivers, backends, etc. that manage their own HA | Third party plugins, backends, etc. that manage their own HA |
Best Practices for availability zones
Now let's talk about how to best make use of availability zones.What should my availability zones represent?
The first thing you should do as a cloud operator is to nail down your own accepted definition of an availability zone and how you will use them, and remain consistent. You don’t want to end up in a situation where availability zones are taking on more than one meaning in the same cloud. For example:Fred’s AZ | Example of AZ used to perform tenant workload isolation VMWare cluster #1 AZ | Example of AZ used to select a specific hypervisor type Power source #1 AZ | Example of AZ used to select a specific failure domain Rack #1 AZ | Example of AZ used to select a specific failure domainSuch a set of definitions would be a source of inconsistency and confusion in your cloud. It’s usually better to keep things simple with one availability zone definition, and use OpenStack features such as Nova Flavors or Nova/Cinder boot hints to achieve other requirements for multi-tenancy isolation, ability to select between different hypervisor options and other features, and so on.
Note that OpenStack currently does not support the concept of multiple failure domain levels/tiers. Even though we may have multiple ways to define failure domains (e.g., by power circuit, rack, room, etc), we must pick a single convention.
For the purposes of this article, we will discuss availability zones in the context of failure domains. However, we will cover one other use for availability zones in the third section.
How many availability zones do I need?
One question that customers frequently get hung up on is how many availability zones they should create. This can be tricky because the setup and management of availability zones involves stakeholders at every layer of the solution stack, from tenant applications to cloud operators, down to data center design and planning.A good place to start is your cloud application requirements: How many failure domains are they designed to work with (i.e. redundancy factor)? The likely answer is two (primary + backup), three (for example, for a database or other quorum-based system), or one (for example, a legacy app with single points of failure). Therefore, the vast majority of clouds will have either 2, 3, or 1 availability zone.
Also keep in mind that as a general design principle, you want to minimize the number of availability zones in your environment, because the side effect of availability zone proliferation is that you are dividing your capacity into more resource islands. The resource utilization in each island may not be equal, and now you have an operational burden to track and maintain capacity in each island/availability zone. Also, if you have a lot of availability zones (more than the redundancy factor of tenant applications), tenants are left to guess which availability zones to use and which have available capacity.
How do I organize my availability zones?
The value proposition of availability zones is that tenants are able to achieve a higher level of availability in their applications. In order to make good on that proposition, we need to design our availability zones in ways that mitigate single points of failure.For example, if our resources are split between two power sources in the data center, then we may decide to define two resource pools (availability zones) according to their connected power source:
Or, if we only have one TOR switch in our racks, then we may decide to define availability zones by rack. However, here we can run into problems if we make each rack its own availability zone, as this will not scale from a capacity management perspective for more than 2-3 racks/availability zones (because of the "resource island" problem). In this case, you might consider dividing/arranging your total rack count into into 2 or 3 logical groupings that correlate to your defined availability zones:
We may also find situations where we have redundant TOR switch pairs in our racks, power source diversity to each rack, and lack a single point of failure. You could still place racks into availability zones as in the previous example, but the value of availability zones is marginalized, since you need to have a double failure (e.g., both TORs in the rack failing) to take down a rack.
Ultimately with any of the above scenarios, the value added by your availability zones will depend on the likelihood and expression of failure modes - both the ones you have designed for, and the ones you have not. But getting accurate and complete information on such failure modes may not be easy, so the value of availability zones from this kind of unplanned failure can be difficult to pin down.
There is however another area where availability zones have the potential to provide value -- planned maintenance. Have you ever needed to move, recable, upgrade, or rebuild a rack? Ever needed to shut off power to part of the data center to do electrical work? Ever needed to apply disruptive updates to your hypervisors, like kernel or QEMU security updates? How about upgrades of OpenStack or the hypervisor operating system?
Chances are good that these kinds of planned maintenance are a higher source of downtime than unplanned hardware failures that happen out of the blue. Therefore, this type of planned maintenance can also impact how you define your availability zones. In general the grouping of racks into availability zones (described previously) still works well, and is the most common availability zone paradigm we see for our customers.
However, it could affect the way in which you group your racks into availability zones. For example, when creating OpenStack availability zones, you may choose physical parameters like which floor, room or building the equipment is located in, or other practical considerations that would help in the event you need to vacate or rebuild certain space in your DC(s). Ex:
One of the limitations of the OpenStack implementation of availability zones made apparent in this example is that you are forced to choose one definition. Applications can request a specific availability zone, but are not offered the flexibility of requesting building level isolation, vs floor, room, or rack level isolation. This will be a fixed, inherited property of the availability zones you create. If you need more, then you start to enter the realm of other OpenStack abstractions like Regions and Cells.
Other uses for availability zones?
Another way in which people have found to use availability zones is in multi-hypervisor environments. In the ideal world of the “implementation-agnostic” cloud, we would abstract such underlying details from users of our platform. However there are some key differences between hypervisors that make this aspiration difficult. Take the example of KVM & VMWare.When an iSCSI target is provisioned through with the LVM iSCSI Cinder driver, it cannot be attached to or consumed by ESXi nodes. The provision request must go through VMWare’s VMDK Cinder driver, which proxies the creation and attachment requests to vCenter. This incompatibility can cause errors and thus a negative user experience if the user tries to attach a volume to their hypervisor provisioned from the wrong backend.
To solve this problem, some operators use availability zones as a way for users to select hypervisor types (for example, AZ_KVM1, AZ_VMWARE1), and set the following configuration in their nova.conf:
[cinder] cross_az_attach = FalseThis presents users with an error if they attempt to attach a volume from one availability zone (e.g., AZ_VMWARE1) to another availability zone (e.g., AZ_KVM1). The call would have certainly failed regardless, but with a different error from farther downstream from one of the nova-compute agents, instead of from nova-api. This way, it's easier for the user to see where they went wrong and correct the problem.
In this case, the availability zone also acts as a tag to remind users which hypervisor their VM resides on.
In my opinion, this is stretching the definition of availability zones as failure domains. VMWare may be considered its own failure domain separate from KVM, but that’s not the primary purpose of creating availability zones this way. The primary purpose is to differentiate hypervisor types. Different hypervisor types have a different set of features and capabilities. If we think about the problem in these terms, there are a number of other solutions that come to mind:
- Nova Flavors: Define a “VMWare” set of flavors that map to your VCenter cluster(s). If your tenants that use VMWare are different than your tenants who use KVM, you can make them private flavors, ensuring that tenants only ever see or interact with the hypervisor type they expect.
- Cinder: Similarly for Cinder, you can make the VMWare backend private to specific tenants, and/or set quotas differently for that backend, to ensure that tenants will only allocate from the correct storage pools for their hypervisor type.
- Image metadata: You can tag your images according to the hypervisor they run on. Set image property hypervisor_type to vmware for VMDK images, and to qemu for other images. The ComputeCapabilitiesFilter in Nova will honor the hypervisor placement request.
Soo… Are availability zones right for me?
So wrapping up, think of availability zones in terms of:- Unplanned failures: If you have a good history of failure data, well understood failure modes, or some known single point of failure that availability zones can help mitigate, then availability zones may be a good fit for your environment.
- Planned maintenance: If you have well understood maintenance processes that have awareness of your availability zone definitions and can take advantage of them, then availability zones may be a good fit for your environment. This is where availability zones can provide some of the creates value, but is also the most difficult to achieve, as it requires intelligent availability zone-aware rolling updates and upgrades, and affects how data center personnel perform maintenance activities.
- Tenant application design/support: If your tenants are running legacy apps, apps with single points of failure, or do not support use of availability zones in their provisioning process, then availability zones will be of no use for these workloads.
- Other alternatives for achieving app availability: Workloads built for geo-redundancy can achieve the same level of HA (or better) in a multi-region cloud. If this were the case for your cloud and your cloud workloads, availability zones would be unnecessary.
- OpenStack projects: You should factor into your decision the limitations and differences in availability zone implementations between different OpenStack projects, and perform this analysis for all OpenStack projects in your scope of deployment.