Using Software Load Balancing in High Availability (HA) for OpenStack Cloud API Services
August 31, 2012
In two previous posts, my colleagues Oleg Gelbukh and Piotr Siwczak covered most of the ground on making OpenStack highly available (HA). If you haven’t read those posts yet, it’s a good time to do so:
In this post, I’ll offer some direct practical advice on completing the puzzle: enabling HA load balancing for OpenStack API services.
All of these services are stateless, so putting a highly available load balancer on top of a few instances of the services is enough for most purposes. Here I’ll consider one option: HAProxy + Keepalived; however, other options are possible (e.g., HAProxy + Pacemaker + Corosync, or a hardware load balancer).
As discussed in previous posts, we’re talking about OpenStack REST API services: nova-api, keystone-api, and glance-api.
Types of failures
Several kinds of outages can happen in a distributed system like OpenStack. Here are the major ones:
There are also more complex kinds of failures, such as when a service hangs or starts giving the wrong answers due to hardware failure or other problems.
Some of these failures can be mitigated by monitoring at the application level; for example, by checking that the service gives an expected answer to a sample request, and restarting the service otherwise. Some failures, alas, cannot.
NOTE: In this post, I’m only covering service/machine crashes.
Surviving service failures
Suppose we have an external load balancer that we assume to be always available. Then we spawn several instances of the necessary services and balance them.
Whenever a request arrives, the balancer will attempt to proxy the request for us and connect to one of the backend servers. If a connection cannot be established, the load balancer will transparently try another instance.
Note that this provides no protection against failure of a service in the middle of executing a request. More on this later.
Surviving load balancer failures
So what if the load balancer itself fails? As a load balancer is almost stateless (except for stickiness, which we can ignore in OpenStack), we just need to put a virtual IP address on top of a bunch of load balancers (two is often enough). This can be done using Keepalived or other similar software.
This construction makes the virtual IP address refer to “whichever balancer is available.” The details, in the case of Keepalived, are implemented using the VRRP protocol. When a node owning the virtual IP dies, the other Keepalived will notice and assign the IP address to itself.
What if the loadbalancer software crashes but the node survives (very unlikely but possible)? For that, Keepalived has “check scripts”—just configure Keepalived to use a script that checks if the load balancer is running, and whenever it’s not, Keepalived considers the node unusable and moves the virtual IP to a usable node.
What if Keepalived crashes? The other Keepalived will think the whole node died and claim the virtual IP.
NOTE: VRRP, and thus Keepalived, leaves a short window of unavailability during failover.
Transitional failure effects
There are at least three levels of fault tolerance with very different guarantees and different implementation complexity:
At level 1, there may be a window of unavailability (the shorter, the better), e.g., until we detect that a particular server became unusable and tell the client to use another one instead.
At level 2, no new requests are denied, though currently executing requests may fail. This is harder: We must be able to direct any request to a currently available instance, which requires the infrastructure to proxy, and not just redirect the connections. Here we assume that if a connection can be established, the server will not die while serving the request—this is equivalent to assuming that requests take zero time, otherwise it’s equivalent to level 3.
At level 3, the failure recovery happens transparently even to someone who’s executing a long request with a server that now failed.
Level 3 is usually impossible to implement fully at an infrastructure level because it requires 1) buffering requests and responses and 2) understanding how to safely retry each type of request.
For example: What if a large file upload or download fails? Should the infrastructure buffer the whole file and re-upload/re-serve it? What if a call with side effects failed, having perhaps performed half of them—is it safe to retry it?
Implementing this level of fault tolerance requires a layer of application-specific retry logic on the client and special support for avoiding duplicate side effects on the server.
The setup I’m discussing in this post gives level 2 protection against service failures and level 1 for balancer failures.
NOTE: Previous posts on MySQL and RabbitMQ HA introduced level 3 tolerance to their failures as there is retry logic in place, at least if you use the proper patches mentioned in the posts.
As mentioned, we’ll use the following set of software:
We’ll have two types of nodes: a service node and an endpoint node. A service node hosts services, while an endpoint node hosts HAProxy and Keepalived. A node can also play both roles at the same time.
Wiring of services to each other
Services must address each other by the virtual IP in order to take advantage of each other’s high availability.
Also, if higher than level 1 or 2 fault tolerance is needed, application-specific retry logic should be introduced. As far as I know, nobody currently retries internal calls in OpenStack (just calls to Keystone), and it seems that in most cases it’s enough to retry external ones.
Enough theory, let’s build the thing.
Suppose we have two machines, from which we want to make an HA OpenStack controller pair, installing both the API services and Keepalived + HAProxy.
Suppose machine 1 has address 192.168.56.200, machine 2 has address 192.168.56.201, and we want the services to be accessible through virtual IP 192.168.56.210. And suppose all these IPs are on eth1.
This is how everything is wired (it’s similar for other services; HAProxy and Keepalived are shared,of course).
Installing necessary packages
I’m assuming that you’re on Ubuntu, in which case you should type:
Configuration of haproxy
This configuration is identical on both nodes and resides in /etc/haproxy/haproxy.cfg.
This configuration encompasses the four nova-api services (EC2, volume, compute, metadata), glance-api, and the two keystone-api services (regular and admin API). If you have something else running (e.g., swift proxy), you know what to do.
For more information, you can read a HAProxy manual.
Now restart HAProxy on both nodes:
Configuration of Keepalived
This configuration is almost, but not quite identical on both nodes as well, and resides in /etc/keepalived/keepalived.conf.
The difference is that one node has its priority defined as 101, and the other as 100. Whichever of the available nodes has highest priority at any given moment, wins (that is, claims the virtual IP).
For more information, you can read a Keepalived manual.
And now let us check that Keepalived + HAProxy work by poking glance:
Also, we can see that just one of the controllers—the one with higher “priority”—claimed the virtual IP:
Configuration of OpenStack services
Now for the service wiring. We need two things:
1) to listen on the proper local IP address and
Now, assuming you have OpenStack running, you can try doing something with your HA setup.
That was easy! You can thank the modular design of OpenStack, but perhaps the most credit should be given to the fact that components are wired via asynchronous messaging (RabbitMQ), whose main purpose is helping to build fault-tolerant systems.
1 Why almost? Because we have a short unavailability window during failover of a load balancer, and because currently executing requests will break during failover. This can be mitigated by retrying requests (client-side and by developing a patch for Nova to retry internal Keystone requests), but it gets a lot more difficult if the requests can have side effects.5 comments
Continuing the Discussion