NEW! Mirantis Academy -   Learn confidently with expert guidance and On-demand content.   Learn More

< BLOG HOME

Using Software Load Balancing in High Availability (HA) for OpenStack Cloud API Services

Eugene Kirpichev - August 31, 2012

In two previous posts, my colleagues Oleg Gelbukh and Piotr Siwczak covered most of the ground on making OpenStack highly available (HA). If you haven't read those posts yet, it's a good time to do so:

In this post, I'll offer some direct practical advice on completing the puzzle: enabling HA load balancing for OpenStack API services.

All of these services are stateless, so putting a highly available load balancer on top of a few instances of the services is enough for most purposes. Here I'll consider one option: HAProxy + Keepalived; however, other options are possible (e.g., HAProxy + Pacemaker + Corosync, or a hardware load balancer).

Which services?

As discussed in previous posts, we're talking about OpenStack REST API services: nova-api, keystone-api, and glance-api.

Types of failures

Several kinds of outages can happen in a distributed system like OpenStack. Here are the major ones:

  • Service instance failure: A particular instance of a service crashes, but the other processes on the same machine function normally.
  • Machine failure: A whole machine becomes unusable, perhaps due to a power outage or network failure.
  • Network partition: Several network segments become unable to talk to each other (e.g., a higher-level switch malfunctions) but can talk to everybody within their segment. This poses a huge problem for any stateful services and is the root cause of the sheer complexity of different consistency models for distributed data stores (how to synchronize the changes made on different partitions when connectivity is restored). However, for stateless services, it's simply equivalent to machines on different ends of the partition becoming unavailable to each other.

There are also more complex kinds of failures, such as when a service hangs or starts giving the wrong answers due to hardware failure or other problems.

Some of these failures can be mitigated by monitoring at the application level; for example, by checking that the service gives an expected answer to a sample request, and restarting the service otherwise. Some failures, alas, cannot.

NOTE: In this post, I'm only covering service/machine crashes.

Surviving service failures

Suppose we have an external load balancer that we assume to be always available. Then we spawn several instances of the necessary services and balance them.

 

 

 

 

Whenever a request arrives, the balancer will attempt to proxy the request for us and connect to one of the backend servers. If a connection cannot be established, the load balancer will transparently try another instance.

Note that this provides no protection against failure of a service in the middle of executing a request. More on this later.

Surviving load balancer failures

So what if the load balancer itself fails? As a load balancer is almost stateless (except for stickiness, which we can ignore in OpenStack), we just need to put a virtual IP address on top of a bunch of load balancers (two is often enough). This can be done using Keepalived or other similar software.

This construction makes the virtual IP address refer to "whichever balancer is available." The details, in the case of Keepalived, are implemented using the VRRP protocol. When a node owning the virtual IP dies, the other Keepalived will notice and assign the IP address to itself.

What if the loadbalancer software crashes but the node survives (very unlikely but possible)? For that, Keepalived has "check scripts"—just configure Keepalived to use a script that checks if the load balancer is running, and whenever it's not, Keepalived considers the node unusable and moves the virtual IP to a usable node.

What if Keepalived crashes? The other Keepalived will think the whole node died and claim the virtual IP.

NOTE: VRRP, and thus Keepalived, leaves a short window of unavailability during failover.

Transitional failure effects

There are at least three levels of fault tolerance with very different guarantees and different implementation complexity:

  • Level 1: Failure of a component does not lead to permanent disruption of service.
  • Level 2: Failure of a component does not lead to failure of new requests.
  • Level 3: Failure of a component does not lead to failure of any requests (new orcurrently executing).

At level 1, there may be a window of unavailability (the shorter, the better), e.g., until we detect that a particular server became unusable and tell the client to use another one instead.

At level 2, no new requests are denied, though currently executing requests may fail. This is harder: We must be able to direct any request to a currently available instance, which requires the infrastructure to proxy, and not just redirect the connections. Here we assume that if a connection can be established, the server will not die while serving the request—this is equivalent to assuming that requests take zero time, otherwise it's equivalent to level 3.

At level 3, the failure recovery happens transparently even to someone who's executing a long request with a server that now failed.

Level 3 is usually impossible to implement fully at an infrastructure level because it requires 1) buffering requests and responses and 2) understanding how to safely retry each type of request.

For example: What if a large file upload or download fails? Should the infrastructure buffer the whole file and re-upload/re-serve it? What if a call with side effects failed, having perhaps performed half of them—is it safe to retry it?

Implementing this level of fault tolerance requires a layer of application-specific retry logic on the client and special support for avoiding duplicate side effects on the server.

The setup I'm discussing in this post gives level 2 protection against service failures and level 1 for balancer failures.

NOTE: Previous posts on MySQL and RabbitMQ HA introduced level 3 tolerance to their failures as there is retry logic in place, at least if you use the proper patches mentioned in the posts.

Software topology

As mentioned, we'll use the following set of software:

  • the services themselves;
  • HAProxy for making the services HA; and
  • Keepalived for making HAProxy HA.

We'll have two types of nodes: a service node and an endpoint node. A service node hosts services, while an endpoint node hosts HAProxy and Keepalived. A node can also play both roles at the same time.

Wiring of services to each other

Services must address each other by the virtual IP in order to take advantage of each other's high availability.

Also, if higher than level 1 or 2 fault tolerance is needed, application-specific retry logic should be introduced. As far as I know, nobody currently retries internal calls in OpenStack (just calls to Keystone), and it seems that in most cases it's enough to retry external ones.

Getting hands-on

Enough theory, let's build the thing.

Suppose we have two machines, from which we want to make an HA OpenStack controller pair, installing both the API services and Keepalived + HAProxy.

Suppose machine 1 has address 192.168.56.200, machine 2 has address 192.168.56.201, and we want the services to be accessible through virtual IP 192.168.56.210. And suppose all these IPs are on eth1.

This is how everything is wired (it's similar for other services; HAProxy and Keepalived are shared,of course).

 

 

 

 

Installing necessary packages

I'm assuming that you're on Ubuntu, in which case you should type:

$ sudo apt-get install haproxy keepalived

Configuration of haproxy

This configuration is identical on both nodes and resides in /etc/haproxy/haproxy.cfg.

$ cat /etc/haproxy/haproxy.cfg
global
chroot /var/lib/haproxy
daemon
group haproxy
log 192.168.56.200 local0
maxconn 4000
pidfile /var/run/haproxy.pid
stats socket /var/lib/haproxy/stats
user haproxy

defaults
log global
maxconn 8000
mode http
option redispatch
retries 3
stats enable
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout check 10s

listen keystone-1 192.168.56.210:5000
balance roundrobin
option tcplog
server controller-1 192.168.56.200:5000 check
server controller-2 192.168.56.201:5000 check

listen keystone-2 192.168.56.210:35357
balance roundrobin
option tcplog
server controller-1 192.168.56.200:35357 check
server controller-2 192.168.56.201:35357 check

listen nova-api-1 192.168.56.210:8773
balance roundrobin
option tcplog
server controller-1 192.168.56.200:8773 check
server controller-2 192.168.56.201:8773 check

listen nova-api-2 192.168.56.210:8774
balance roundrobin
option tcplog
server controller-1 192.168.56.200:8774 check
server controller-2 192.168.56.201:8774 check

listen nova-api-3 192.168.56.210:8775
balance roundrobin
option tcplog
server controller-1 192.168.56.200:8775 check
server controller-2 192.168.56.201:8775 check

listen nova-api-4 192.168.56.210:8776
balance roundrobin
option tcplog
server controller-1 192.168.56.200:8776 check
server controller-2 192.168.56.201:8776 check

listen glance-api 192.168.56.210:9292
balance roundrobin
option tcplog
server controller-1 192.168.56.200:9292 check
server controller-2 192.168.56.201:9292 check

This configuration encompasses the four nova-api services (EC2, volume, compute, metadata), glance-api, and the two keystone-api services (regular and admin API). If you have something else running (e.g., swift proxy), you know what to do.

For more information, you can read a HAProxy manual.

Now restart HAProxy on both nodes:

$ sudo service haproxy restart

Configuration of Keepalived

This configuration is almost, but not quite identical on both nodes as well, and resides in /etc/keepalived/keepalived.conf.

The difference is that one node has its priority defined as 101, and the other as 100. Whichever of the available nodes has highest priority at any given moment, wins (that is, claims the virtual IP).

$ cat /etc/keepalived/keepalived.conf
global_defs {
router_id controller-1
}
vrrp_instance 42 {
virtual_router_id 42

# for electing MASTER, highest priority wins.
priority 101
state MASTER

interface eth1

virtual_ipaddress {
192.168.56.210
}
}

For more information, you can read a Keepalived manual.

And now let us check that Keepalived + HAProxy work by poking glance:

$ telnet 192.168.56.210 9292
Trying 192.168.56.210...
Connected to 192.168.56.210.
Escape character is '^]'.
^]

telnet> quit
Connection closed.

Also, we can see that just one of the controllers—the one with higher "priority"—claimed the virtual IP:

openstack@controller-1:~$ ip addr show dev eth1
2: eth1: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 08:00:27:9d:c4:b0 brd ff:ff:ff:ff:ff:ff
inet 192.168.56.200/24 brd 192.168.56.255 scope global eth1
inet 192.168.56.210/32 scope global eth1
inet6 fe80::a00:27ff:fe9d:c4b0/64 scope link
valid_lft forever preferred_lft forever

openstack@controller-2:~$ ip addr show dev eth1
2: eth1: mtu 1500 qdisc pfifo_fast state UNKNOWN qlen 1000
link/ether 08:00:27:bd:7f:14 brd ff:ff:ff:ff:ff:ff
inet 192.168.56.201/24 brd 192.168.56.255 scope global eth1
inet6 fe80::a00:27ff:febd:7f14/64 scope link
valid_lft forever preferred_lft forever

Configuration of OpenStack services

Now for the service wiring. We need two things:

1) to listen on the proper local IP address and
2) address others by the virtual IP address.

Nova

openstack@controller-1:~$ cat /etc/nova/nova.conf
--metadata_listen=192.168.56.200
--glance_api_servers=192.168.56.210:9292
--osapi_volume_listen=192.168.56.200
--ec2_listen=192.168.56.200
--sql_connection=mysql://nova:nova@192.168.56.210/nova
--osapi_compute_listen=192.168.56.200
--novncproxy_host=192.168.56.210
...

openstack@controller-1~$ cat /etc/nova/api-paste.ini
...
[filter:authtoken]
auth_host = 192.168.56.210
auth_uri = http://192.168.56.210:35357/v2.0
...

Keystone

openstack@controller-1~$ cat /etc/keystone/keystone.conf
[DEFAULT]
bind_host = 192.168.56.200
...
[sql]
connection = mysql://keystone_admin:nova@192.168.56.210/keystone
...

Glance

openstack@controller-1:~$ cat /etc/glance/glance-scrubber.conf
[DEFAULT]
registry_host = 192.168.56.210
...

openstack@controller-1:~$ cat /etc/glance/glance-api-paste.ini
...
[filter:authtoken]
auth_host = 192.168.56.210
auth_uri = http://192.168.56.210:5000/
...

openstack@controller-1:~$ cat /etc/glance/glance-api.conf
[DEFAULT]
registry_host = 192.168.56.210
bind_host = 192.168.56.200
...

openstack@controller-1:~$ cat /etc/glance/glance-api-paste.ini
...
[filter:authtoken]
auth_host = 192.168.56.200
auth_uri = http://192.168.56.200:5000/
...

openstack@controller-1:~$ cat /etc/glance/glance-cache.conf
[DEFAULT]
registry_host = 192.168.56.210
auth_url = http://192.168.56.210:5000/
...

openstack@controller-1:~$ cat /etc/glance/glance-registry.conf
[DEFAULT]
bind_host = 192.168.56.200
sql_connection = mysql://glance:nova@192.168.56.210/glance
...

openstack@controller-1:~$ cat /etc/glance/glance-registry-paste.ini
[filter:authtoken]
auth_host = 192.168.56.210
auth_uri = http://192.168.56.210:5000/
...

Openrc file

openstack@controller-1:~$ cat /root/openrc
export OS_AUTH_URL="http://192.168.56.210:5000/v2.0/"
export SERVICE_ENDPOINT=http://192.168.56.210:35357/v2.0/
...

Now, assuming you have OpenStack running, you can try doing something with your HA setup.

openstack@controller-1:~$ nova image-list
+--------------------------------------+--------+--------+--------+
| ID | Name | Status | Server |
+--------------------------------------+--------+--------+--------+
| ef0dfac5-5977-405c-8e62-595af17aa01d | cirros | ACTIVE | |
+--------------------------------------+--------+--------+--------+

Conclusion

So, we can deploy an almost1 fully HA OpenStack by combining the contents of this post and the two previous posts (OpenStack HA in general and MySQL and RabbitMQ HA).

That was easy! You can thank the modular design of OpenStack, but perhaps the most credit should be given to the fact that components are wired via asynchronous messaging (RabbitMQ), whose main purpose is helping to build fault-tolerant systems.

1 Why almost? Because we have a short unavailability window during failover of a load balancer, and because currently executing requests will break during failover. This can be mitigated by retrying requests (client-side and by developing a patch for Nova to retry internal Keystone requests), but it gets a lot more difficult if the requests can have side effects.

Choose your cloud native journey.

Whatever your role, we’re here to help with open source tools and world-class support.

GET STARTED
NEWSLETTER

Subscribe to our bi-weekly newsletter for exclusive interviews, expert commentary, and thought leadership on topics shaping the cloud native world.

JOIN NOW