Building OpenStack Database (Trove): Native Replication, Part 2

In the first part of this series discussing native replication in OpenStack Database (Trove), we talked about the different types of replication and how they work. To review, they are:

  • Single-Master Replication (SMR)
  • Multi-Master Replication (MMR)
  • Multi-Master per Slave Replication (MMSR)

Now we’re going to look at the community use-cases that replication solves, and at the ways in which we can implement these replication methods. (In a future series we’ll talk about actually using replication in an OpenStack environment.)

Community use cases

The only reason to even think about replication is because there’s a need for it in the community.  This need generally falls under one of several use cases:

Read Replicas

The uses for Read Replicas include providing live backups and providing scale-out capabilities.  The Read Replicas use case has the following properties:
  1. The master can exist before the slave if the master already contains data.
  2. A system can have N slaves for one master.
  3. By default, slaves are marked as read-only.
  4. A slave can be detached from “replication set” to act as an independent site.
  5. A pre-existing non-replication site can become the master of a new replication set.

MultiZone Disaster Recovery

MultiZone Disaster Recovery makes it possible to spread data out over multiple zones to ensure data availability in the face of technical issues. It’s marked by these properties:
  1. A master in one zone is mirrored by a slave in a different zone.
  2. Some mechanism should exist where a cloud administrator can set up “zone configuration” so that the user can simply select “MultiZone DR” and Trove will know where to put both the master and the slave.
  3. The administrator should be able to restore the master from a slave, either directly or by making a backup stored in Swift.
  4. The administrator should be able to “flip the switch” on a running MySql instance to change which server is getting updates.

Single Zone Failover

Single Zone Failover is more of an “instant” type of recovery, more about keeping the system running than providing backup. It has these properties:

  1. Single Zone Failover implements master-master replication between 2 instances in the same zone
  2. It can be set up on a pre-existing instance.
  3. The administrator should be able to switch the “active master”, i.e., the site to which data is being written, in which case the other site would be marked read-only.
So that’s the “why”. Now let’s talk about the “how”.

Trove Replication Under the Hood

Before doing a deep dive into community use cases let’s ask ourselves a couple of questions:
  1. At the Trove level, how do instances know who they are?
  2. From the Trove user perspective, how do we know which roles are assigned to instances?

The answers to both of these questions lie in the metadata.

Metadata and the Replication Contract

From the Trove perspective, metadata is nothing more than an additional key-value model that’s being associated with the base instance model. The goal was to propose a way to store critical information for instances, and have that information associated with the instance so that it is displayed whenever that instance is listed via the Trove API.

In implementing metadata, the Trove team had several goals:   
  1. Give Trove users the ability to store key/value pairs that have significance to them but are not essential to trove’s functionality.
  2. Create a user-accessible REST API to manage instance metadata.
  3. Mimic the design/interface used by Nova for metadata.
  4. Implement the model so that it acts like a dictionary so that hooking metadata into the instance would be easy.

The purpose of this metadata is to enable Trove to describe the replication contract. For example, let’s take a look at the metadata for a master-slave replication scenario.

In this example, the replication contract would contain the following fields:

replication_contract:

  1. replicates_to (master role definition).
  2. replicates_from (slave role definition).
  3. writable (additional attribute that unambiguously defines the master/slave node).
In terms of the master-slave replication, the slave node metadata replicates_to attribute is always empty as it’s not replicating to anything, replicates_from can’t be empty (the master must exist), and by default, the writable attribute is set to False.

For the master node, the metadata replicates_to attribute can be empty; in this case, the master node is working as a standalone server. Otherwise, replicates_to refers to the slave node. The replicates_from field would be empty, and by default the writable attribute is set to True.

Answering those Replication Questions

Now we can get back to our original questions. First, how does an instance know who it is, and who it works with?

Let’s take a look at a small example.  Consider this Single-Master-Replication (SMR) scenario:


In this case, the metadata looks something like this:

As you can see, Master replicates_to each slave, and each slave replicates_from Master. And, of course, Master is writable (in other words, it accepts read/write operations), and each slave is in a non-writable state (read-only by the default).

Now let’s take a look at multi-master replication. In this case, there are three masters, all with similar metadata.  In this example, the metadata for Master 3 is:

metadata: {
“replication_contract”:{
        “replicates_from”:[
            “Master1”,
            “Master2”,
],
“replicates_to”:[
            “Master1”,
            “Master2”,
     ],
writable: False
}
}

Now let’s look at the second question: how can the user find out which server is replicated to or from where, and which node is accessible? The answer is that the Trove instance metadata service gives the user the ability to query instance metadata, so all of that information is visible.

Now we’re ready to proceed to the community use cases. While doing this deep dive, keep Trove’s instance metadata capabilities in mind.

Read Replicas Deep Dive

Now let’s look at how Read Replicas can actually be implemented.

Attach procedure

As all of us on the Trove project knew that the load capacity for a cluster could be increased in two ways: hardware scaling and spread scaling (adding another server instance and spreading the load between them). From the single instance perspective, hardware scaling looks like more than enough, but from the high availability and fault tolerance perspective, only spread scaling is acceptable.

The first use case to implement is the “attach operation”. Suppose we have standalone database server that contains N TB of data and we want to scale up the possible load (for example, to create more than 1000 accounts inside our application). Again, we have two options: hardware and spread scaling. From the user perspective, I want to have the ability to spin-up a fresh new instance and join it to the already running instance.

Trove enables a user to perform attach operations over instances that the user owns. The best way to perform a join is to apply an up-to-date backup to the fresh new instance, and then attach it. (The actual means for attaching the instance is very specific for each datastore type.)


Detach procedure

As you know Trove is able to “join” already running nodes into the full-functional replication set. From this perspective slave node receives the data from the master node.

Trove also needs to be able to perform the opposite operation, and “detach” a server from a cluster using the “detach operation”. The newly detached node will have had its data replicated from master node, so it should be up-to-date when detached.

Provisioning of a Replication Set with One Master

The most common use case for all DBaaS services is provisioning a replication set with N slaves and one master node. Master/slave is a model of communication where one device or process has unidirectional control over one or more other devices. In some systems a master is elected from a group of eligible devices, with the other devices acting in the role of slaves. Many developers use master-slave replication to solve a number of different problems, including problems with performance, supporting the backup of different databases, and as a part of a larger solution to alleviate system failures. As you know, master-slave replication enables data from one database server (the master) to be replicated to one or more database servers (the slaves). The master logs the updates, which then ripple through to the slaves.

The slave outputs a message stating that it has received the update successfully, thus allowing the sending of subsequent updates. Master-slave replication can be either synchronous or asynchronous. The difference is simply the timing of propagation of changes. If the changes are made to the master and slave at the same time, it is synchronous. If changes are queued up and written later, it is asynchronous.

Potential Uses for Read Replicas Replication

Let’s examine a few examples of how you can take advantage of Read Replicas:
  • Scale-out solutions: Spreading the load among multiple slaves to improve performance. In this environment, all writes and updates must take place on the master server. Reads, however, may take place on one or more slaves. This model can improve the performance of writes (since the master is dedicated to updates), while dramatically increasing read speed across an increasing number of slaves.
  • Data security: As data is replicated to the slave, and the slave can pause the replication process, it is possible to run backup services on the slave without corrupting the corresponding master data.
  • Analytics: Live data can be created on the master, while the analysis of the information can take place on the slave without affecting the performance of the master.
  • Long-distance data distribution: if a branch office would like to work with a copy of your main data, you can use replication to create a local copy of the data for their use without requiring permanent access to the master.
  • Backups: To use replication as a backup solution, replicate data from the master to a slave, and then back up the data slave. The slave can be paused and shut down without affecting the running operation of the master, so you can produce an effective snapshot of “live” data that would otherwise require the master to be shut down.
  • Scale-out: You can use replication as a scale-out solution; that is, where you want to split up the load of database queries across multiple database servers, within some reasonable limitations. Because replication works from the distribution of one master to one or more slaves, using replication for scale-out works best in an environment where you have a high number of reads and low number of writes/updates. Most Web sites fit into this category, where users are browsing the Web site, reading articles, posts, or viewing products. Updates only occur during session management, or when making a purchase or adding a comment/message to a forum. Replication in this situation enables you to distribute the reads over the replication slaves, while still enabling your web servers to communicate with the replication master when a write is required.
  • Spreading the load: There may be situations when you have a single master and want to replicate different databases to different slaves. For example, you may want to distribute different sales data to different departments to help spread the load during data analysis.
  • Increasing the performance: As the number of slaves connecting to a master increases, the load, although minimal, also increases, as each slave uses a client connection to the master. Also, as each slave must receive a full copy of the master binary log, the network load on the master may also increase and create a bottleneck. If you are using a large number of slaves connected to one master, and that master is also busy processing requests (for example, as a part of a scale-out solution), then you may want to improve the performance of the replication process. One way to improve the performance of the replication process is to create a deeper replication structure that enables the master to replicate to only one slave, and for the remaining slaves to connect to this primary slave for their individual replication requirements.
  • Failover alleviating: You can set up a master and a slave (or several slaves), and write a script that monitors the master to check whether it is up. Then instruct your applications and the slaves to change master in case of failure.
  • Security: You can use SSL for encrypting the transfer of the binary log required during replication, but both the master and the slave must support SSL network connections. If either host does not support SSL connections, replication through an SSL connection is not possible. Setting up replication using an SSL connection is similar to setting up a server and client using SSL. You must obtain (or create) a suitable security certificate that you can use on the master, and a similar certificate (from the same certificate authority) on each slave.

Promotion/Demotion

If the master for a replication set goes away due to server failure or some other issue, it can be removed, and one of the slaves (with consistent data) can be promoted to be the master.

Multi-zone management

As you might imagine, multi-zone management is a bit more complicated.

Cross-datacenter deployment strategies

An affinity rule is a setting that establishes a relationship between two or more virtual machines and hosts.

Affinity rules and anti-affinity rules tell the hypervisor platform to keep virtual entities together or separated. The rules, which can be applied as either required or preferred, help reduce traffic across networks and keep the virtual workload balanced on available hosts. If two virtual machines communicate frequently and should share a host, the admin can create a VM-VM affinity rule to keep them together. Conversely, if two resource-hungry VMs would tax a host, an anti-affinity rule will keep those VMs from sharing a host.

Affinity rules and anti-affinity rules can be applied between VMs and hosts as well, and a VM can be subject to VM-VM affinity rules and VM-Host affinity rules at the same time. Affinity and anti-affinity rules in a virtual environment can conflict with one another. For example, two VMs with an anti-affinity relationship may both be linked to a third VM via an affinity rule, but they cannot share a host. Optional affinity rule violation alarms can alert administrators to these events.

Basically, what that means is that, affinity/anti-affinity (A/AA) rules are nothing more than a way to describe VM placement strategy.

How does this will work for databases replication? The answer is simple. To provide high availability to nodes, we need to spread them between multiple hosts, or in terms of databases, between different data centers.

This is where A/AA comes in. Basically it’s the same use case as provisioning for the replication set at one host, but now the new requirement to the provisioning engine is the ability to mention the hypervisor/scheduler hints.

This use case comes from the need for simplifying the provisioning mechanism for the Trove user. Suppose we need to create the master-slave replication set. The user, through the API, can request:

  1. … to place the master node VM on host A, and all slaves on host B.
  2. … to place all nodes on host A.
  3. … to place each node on its own host.

Each configuration is stored inside the backend, and at the provisioning stage, the user can specify the ID (UUID) of the A/AA rule that should be applied. If no rules are mentioned, all VMs, by default, will be placed on the same host, because the provisioning engine (nova) hasn’t receive any of the hypervisor hints.

Ability to recover the master from a slave

One common use case involves the corruption of the master, or the loss of the VM on which it’s hosted. Out of the box, Trove provides the ability to provision an instance with data that comes from the backup information specified by the restore reference. So the user should be able to “clone the slave instance” to a fresh new instance, and then promote it to be the master node at the new host.

Ability to restore master to a slave

Trove allows the user to restore instances from scratch using a given backup. This community use case goes even deeper. In this case, Trove should be able to organize the inter-guest communication. This use case requires an up-to-date backup, so it means that the master should prepare its own backup, push it to Swift, then wait until the job is completed. It can then send the data to the slave node guestagent.

This process is called cloning, because the slave node is a 100% reflection of the master node. 

Multi-master(Master-master) replication use case

Multi-master replication (MMR) is a method of database replication that allows data to be stored by a group of compute instances and updated by any member of the group. All members are responsive for client data queries. The multi-master replication system is responsible for propagating the data modifications made by each member to the rest of the group, and resolving any conflicts that might arise between concurrent changes made by different members.

Multi-master replication can be contrasted with master-slave replication, in which a single member of the group is designated as the “master” for a given piece of data, and is the only node allowed to modify that data item. Other members wishing to modify the data item must first contact the master node. Allowing only a single master makes it easier to achieve consistency among the members of the group, but is less flexible than multi-master replication.

Multi-master replication can also be contrasted with failover clustering, in which passive slave servers are replicating the master data in order to prepare for takeover in the event that the master stops functioning. The master is the only server active for client interaction. The primary purposes of multi-master replication are increased availability and faster server response time.

From the Trove perspective, MMR is the most interesting use case, because the way MMR is organized is very close to cluster organization/deployment because all roles are the same (all nodes are masters).Consider a cassandra cluster with no roles at all, where each node can be picked for R/W operations. Does that sound familiar? But we still talk about the replication, because cluster configuration doesn’t require the mentioning of what part of the data needs to be distributed among all nodes of cluster.

Failover is the one of the possible use cases for master-master replication. Also, the use case becomes more meaningful when we start to talk about replication/clustering multi-datacenter deployment. The justification is easy enough: what if a compute host went down, and the whole cluster was working on it? All data would be completely lost.

Another use case is if you have multi-host deployment ( N nodes at Host A, M nodes at Host B), the data loss will be partial, which is also not the best option, but it will save part of the data stored among replication/cluster nodes.

Failover within one zone/host

Multimaster replication can be used to protect the availability of a mission critical database. For example, a multimaster replication environment can replicate data in your database to establish a failover site should the primary site become unavailable due to system or network outages. Such a failover site can also serve as a fully functional database to support application access when the primary site is concurrently operational.

OpenStack Juno release plans

For the Juno release, the Trove community has decided to implement the read replica use cases:
  • attach procedure;
  • detach procedure;
  • fresh new replica provisioning;
  • role promotion/demotion procedures.

Outlook for Additional Trove Features

The Trove team will also be working towards these features, with an eye towards additional replication use cases:
  • MSR, MMR, MMSR;
  • Cross-datacenters deployment.
  • Fault-tolerance/Failover;
  • Role promotion/demotion;

Summary

While the deep down details aren’t often required for simply using OpenStack, it’s always good to understand what needs to go on behind the scenes — particularly if you are considering contributing to the project. The OpenStack Database (Trove) community is working on solving the replication use-cases that are important to the community, such as Read Replicas, Single Zone Failover, and MultiZone Disaster Recovery.

Watch for our next series, which explains how to use these features from a user standpoint.

Latest Tweets

WEBINAR
Mirantis and Ericsson on Edge Computing