Using Software Load Balancing in High Availability (HA) for OpenStack Cloud API Services

Eugene Kirpichev - August 31, 2012

In two previous posts, my colleagues Oleg Gelbukh and Piotr Siwczak covered most of the ground on making OpenStack highly available (HA). If you haven't read those posts yet, it's a good time to do so:

In this post, I'll offer some direct practical advice on completing the puzzle: enabling HA load balancing for OpenStack API services.

All of these services are stateless, so putting a highly available load balancer on top of a few instances of the services is enough for most purposes. Here I'll consider one option: HAProxy + Keepalived; however, other options are possible (e.g., HAProxy + Pacemaker + Corosync, or a hardware load balancer).

Which services?

As discussed in previous posts, we're talking about OpenStack REST API services: nova-api, keystone-api, and glance-api.

Types of failures

Several kinds of outages can happen in a distributed system like OpenStack. Here are the major ones:

Service instance failure: A particular instance of a service crashes, but the other processes on the same machine function normally.
Machine failure: A whole machine becomes unusable, perhaps due to a power outage or network failure.
Network partition: Several network segments become unable to talk to each other (e.g., a higher-level switch malfunctions) but can talk to everybody within their segment. This poses a huge problem for any stateful services and is the root cause of the sheer complexity of different consistency models for distributed data stores (how to synchronize the changes made on different partitions when connectivity is restored). However, for stateless services, it's simply equivalent to machines on different ends of the partition becoming unavailable to each other.

There are also more complex kinds of failures, such as when a service hangs or starts giving the wrong answers due to hardware failure or other problems.

Some of these failures can be mitigated by monitoring at the application level; for example, by checking that the service gives an expected answer to a sample request, and restarting the service otherwise. Some failures, alas, cannot.

NOTE: In this post, I'm only covering service/machine crashes.

Surviving service failures

Suppose we have an external load balancer that we assume to be always available. Then we spawn several instances of the necessary services and balance them.

Whenever a request arrives, the balancer will attempt to proxy the request for us and connect to one of the backend servers. If a connection cannot be established, the load balancer will transparently try another instance.

Note that this provides no protection against failure of a service in the middle of executing a request. More on this later.

Surviving load balancer failures

So what if the load balancer itself fails? As a load balancer is almost stateless (except for stickiness, which we can ignore in OpenStack), we just need to put a virtual IP address on top of a bunch of load balancers (two is often enough). This can be done using Keepalived or other similar software.

This construction makes the virtual IP address refer to "whichever balancer is available." The details, in the case of Keepalived, are implemented using the VRRP protocol. When a node owning the virtual IP dies, the other Keepalived will notice and assign the IP address to itself.

What if the loadbalancer software crashes but the node survives (very unlikely but possible)? For that, Keepalived has "check scripts"—just configure Keepalived to use a script that checks if the load balancer is running, and whenever it's not, Keepalived considers the node unusable and moves the virtual IP to a usable node.

What if Keepalived crashes? The other Keepalived will think the whole node died and claim the virtual IP.

NOTE: VRRP, and thus Keepalived, leaves a short window of unavailability during failover.

Transitional failure effects

There are at least three levels of fault tolerance with very different guarantees and different implementation complexity:

Level 1: Failure of a component does not lead to permanent disruption of service.
Level 2: Failure of a component does not lead to failure of new requests.
Level 3: Failure of a component does not lead to failure of any requests (new orcurrently executing).

At level 1, there may be a window of unavailability (the shorter, the better), e.g., until we detect that a particular server became unusable and tell the client to use another one instead.

At level 2, no new requests are denied, though currently executing requests may fail. This is harder: We must be able to direct any request to a currently available instance, which requires the infrastructure to proxy, and not just redirect the connections. Here we assume that if a connection can be established, the server will not die while serving the request—this is equivalent to assuming that requests take zero time, otherwise it's equivalent to level 3.

At level 3, the failure recovery happens transparently even to someone who's executing a long request with a server that now failed.

Level 3 is usually impossible to implement fully at an infrastructure level because it requires 1) buffering requests and responses and 2) understanding how to safely retry each type of request.

For example: What if a large file upload or download fails? Should the infrastructure buffer the whole file and re-upload/re-serve it? What if a call with side effects failed, having perhaps performed half of them—is it safe to retry it?

Implementing this level of fault tolerance requires a layer of application-specific retry logic on the client and special support for avoiding duplicate side effects on the server.

The setup I'm discussing in this post gives level 2 protection against service failures and level 1 for balancer failures.

NOTE: Previous posts on MySQL and RabbitMQ HA introduced level 3 tolerance to their failures as there is retry logic in place, at least if you use the proper patches mentioned in the posts.

Software topology

As mentioned, we'll use the following set of software:

the services themselves;
HAProxy for making the services HA; and
Keepalived for making HAProxy HA.

We'll have two types of nodes: a service node and an endpoint node. A service node hosts services, while an endpoint node hosts HAProxy and Keepalived. A node can also play both roles at the same time.

Wiring of services to each other

Services must address each other by the virtual IP in order to take advantage of each other's high availability.

Also, if higher than level 1 or 2 fault tolerance is needed, application-specific retry logic should be introduced. As far as I know, nobody currently retries internal calls in OpenStack (just calls to Keystone), and it seems that in most cases it's enough to retry external ones.

Getting hands-on

Enough theory, let's build the thing.

Suppose we have two machines, from which we want to make an HA OpenStack controller pair, installing both the API services and Keepalived + HAProxy.

Suppose machine 1 has address 192.168.56.200, machine 2 has address 192.168.56.201, and we want the services to be accessible through virtual IP 192.168.56.210. And suppose all these IPs are on eth1.

This is how everything is wired (it's similar for other services; HAProxy and Keepalived are shared,of course).

Installing necessary packages

I'm assuming that you're on Ubuntu, in which case you should type:

$ sudo apt-get install haproxy keepalived

Configuration of haproxy

This configuration is identical on both nodes and resides in /etc/haproxy/haproxy.cfg.

$ cat /etc/haproxy/haproxy.cfg
global
  chroot  /var/lib/haproxy
  daemon
  group  haproxy
  log  192.168.56.200 local0
  maxconn  4000
  pidfile  /var/run/haproxy.pid
  stats  socket /var/lib/haproxy/stats
  user  haproxy

defaults
  log  global
  maxconn  8000
  mode  http
  option  redispatch
  retries  3
  stats  enable
  timeout  http-request 10s
  timeout  queue 1m
  timeout  connect 10s
  timeout  client 1m
  timeout  server 1m
  timeout  check 10s

listen keystone-1 192.168.56.210:5000
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:5000  check
  server  controller-2 192.168.56.201:5000  check

listen keystone-2 192.168.56.210:35357
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:35357  check
  server  controller-2 192.168.56.201:35357  check

listen nova-api-1 192.168.56.210:8773
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:8773  check
  server  controller-2 192.168.56.201:8773  check

listen nova-api-2 192.168.56.210:8774
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:8774  check
  server  controller-2 192.168.56.201:8774  check

listen nova-api-3 192.168.56.210:8775
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:8775  check
  server  controller-2 192.168.56.201:8775  check

listen nova-api-4 192.168.56.210:8776
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:8776  check
  server  controller-2 192.168.56.201:8776  check

listen glance-api 192.168.56.210:9292
  balance  roundrobin
  option  tcplog
  server  controller-1 192.168.56.200:9292  check
  server  controller-2 192.168.56.201:9292  check

This configuration encompasses the four nova-api services (EC2, volume, compute, metadata), glance-api, and the two keystone-api services (regular and admin API). If you have something else running (e.g., swift proxy), you know what to do.

For more information, you can read a HAProxy manual.

Now restart HAProxy on both nodes:

$ sudo service haproxy restart

Configuration of Keepalived

This configuration is almost, but not quite identical on both nodes as well, and resides in /etc/keepalived/keepalived.conf.

The difference is that one node has its priority defined as 101, and the other as 100. Whichever of the available nodes has highest priority at any given moment, wins (that is, claims the virtual IP).

$ cat /etc/keepalived/keepalived.conf
global_defs {
  router_id controller-1
}
vrrp_instance 42 {
  virtual_router_id 42

  # for electing MASTER, highest priority wins.
  priority  101
  state     MASTER

  interface eth1

  virtual_ipaddress {
        192.168.56.210
  }
}

For more information, you can read a Keepalived manual.

And now let us check that Keepalived + HAProxy work by poking glance:

$ telnet 192.168.56.210 9292
Trying 192.168.56.210...
Connected to 192.168.56.210.
Escape character is '^]'.
^]

telnet> quit
Connection closed.

Also, we can see that just one of the controllers—the one with higher "priority"—claimed the virtual IP:

openstack@controller-1:~$ ip addr show dev eth1
2: eth1:  mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 08:00:27:9d:c4:b0 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.200/24 brd 192.168.56.255 scope global eth1
    inet 192.168.56.210/32 scope global eth1
    inet6 fe80::a00:27ff:fe9d:c4b0/64 scope link 
       valid_lft forever preferred_lft forever

openstack@controller-2:~$ ip addr show dev eth1
2: eth1:  mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
    link/ether 08:00:27:bd:7f:14 brd ff:ff:ff:ff:ff:ff
    inet 192.168.56.201/24 brd 192.168.56.255 scope global eth1
    inet6 fe80::a00:27ff:febd:7f14/64 scope link 
       valid_lft forever preferred_lft forever

Configuration of OpenStack services

Now for the service wiring. We need two things:

1) to listen on the proper local IP address and
2) address others by the virtual IP address.

Nova

openstack@controller-1:~$ cat /etc/nova/nova.conf
--metadata_listen=192.168.56.200
--glance_api_servers=192.168.56.210:9292
--osapi_volume_listen=192.168.56.200
--ec2_listen=192.168.56.200
--sql_connection=mysql://nova:nova@192.168.56.210/nova
--osapi_compute_listen=192.168.56.200
--novncproxy_host=192.168.56.210
...

openstack@controller-1~$ cat /etc/nova/api-paste.ini
...
[filter:authtoken]
auth_host = 192.168.56.210
auth_uri = http://192.168.56.210:35357/v2.0
...

Keystone

openstack@controller-1~$ cat /etc/keystone/keystone.conf
[DEFAULT]
bind_host     = 192.168.56.200
...
[sql]
connection    = mysql://keystone_admin:nova@192.168.56.210/keystone
...

Glance

openstack@controller-1:~$ cat /etc/glance/glance-scrubber.conf
[DEFAULT]
registry_host = 192.168.56.210
...

openstack@controller-1:~$ cat /etc/glance/glance-api-paste.ini
...
[filter:authtoken]
auth_host = 192.168.56.210
auth_uri = http://192.168.56.210:5000/
...

openstack@controller-1:~$ cat /etc/glance/glance-api.conf
[DEFAULT]
registry_host = 192.168.56.210
bind_host = 192.168.56.200
...

openstack@controller-1:~$ cat /etc/glance/glance-api-paste.ini
...
[filter:authtoken]
auth_host = 192.168.56.200
auth_uri = http://192.168.56.200:5000/
...

openstack@controller-1:~$ cat /etc/glance/glance-cache.conf
[DEFAULT]
registry_host = 192.168.56.210
auth_url = http://192.168.56.210:5000/
...

openstack@controller-1:~$ cat /etc/glance/glance-registry.conf
[DEFAULT]
bind_host = 192.168.56.200
sql_connection = mysql://glance:nova@192.168.56.210/glance
...

openstack@controller-1:~$ cat /etc/glance/glance-registry-paste.ini
[filter:authtoken]
auth_host = 192.168.56.210
auth_uri = http://192.168.56.210:5000/
...

Openrc file

openstack@controller-1:~$ cat /root/openrc
export OS_AUTH_URL="http://192.168.56.210:5000/v2.0/"
export SERVICE_ENDPOINT=http://192.168.56.210:35357/v2.0/
...

Now, assuming you have OpenStack running, you can try doing something with your HA setup.

openstack@controller-1:~$ nova image-list
+--------------------------------------+--------+--------+--------+
|                  ID                  |  Name  | Status | Server |
+--------------------------------------+--------+--------+--------+
| ef0dfac5-5977-405c-8e62-595af17aa01d | cirros | ACTIVE |        |
+--------------------------------------+--------+--------+--------+

Conclusion

So, we can deploy an almost¹ fully HA OpenStack by combining the contents of this post and the two previous posts (OpenStack HA in general and MySQL and RabbitMQ HA).

That was easy! You can thank the modular design of OpenStack, but perhaps the most credit should be given to the fact that components are wired via asynchronous messaging (RabbitMQ), whose main purpose is helping to build fault-tolerant systems.

¹ Why almost? Because we have a short unavailability window during failover of a load balancer, and because currently executing requests will break during failover. This can be mitigated by retrying requests (client-side and by developing a patch for Nova to retry internal Keystone requests), but it gets a lot more difficult if the requests can have side effects.

Using Software Load Balancing in High Availability (HA) for OpenStack Cloud API Services

Which services?

Types of failures

Surviving service failures

Surviving load balancer failures

Transitional failure effects

Software topology

Wiring of services to each other

Getting hands-on

Configuration of haproxy

Configuration of Keepalived

Configuration of OpenStack services

Conclusion

Choose your cloud native journey.

NEWSLETTER

Digital Self-Determination

Services

Platform

Company