Let’s talk about replication a bit. How much do you know about it? Why is it so important? As for me, I can think of plenty of questions about replication, and they generally fall into these three categories:
What is it?
Why do we need it? (Justification and benefits)
How it can be applied?
In this two-part article series, we’re going to look at those questions in the context of a proposal to add replication capabilities to Trove, the OpenStack Database project.
Replication: Definition and types
So before we start, we should define just what it is that we’re talking about.
Replication is the process of sharing information between databases (or any other type of server) to ensure that the content is consistent between systems. Replication is normally used to increase the number of database servers available to clients, thereby reducing the load on each.
In general, systems will implement one of these three replication types:
Single-Master Replication (SMR)
Multi-Master Replication (MMR)
Multi-Master per Slave Replication (MMSR)
Let’s take a deeper look at each of them to build a view about how data is actually being replicated among nodes within these common types of replication.
Single-Master Replication (SMR)
In Single-Master Replication, changes, such as inserts, updates, and deletions to table rows are allowed to occur only in a designated master database. These changes are then replicated to tables in one or more slave databases. The replicated tables in the slave databases are not permitted to accept any changes, except from the designated master database. (This is also known as master-to-slave, or master/slave replication.)
Multi-Master Replication designates two or more databases in which tables with the same table definitions and initial row sets are created. Changes, such as inserts, updates, and deletions to table rows are allowed to occur in any of these databases. Changes to table rows in any given database are then replicated to their counterpart tables in every other database. In this scenario, each master is both readable and writeable. (This schema is close to the “clustering” form of data sharing because each node accepts read/write operations.)
Multi-Master-Single-Slave Replication (MMSSR)
Multi-Master-Single-Slave Replication (MMSSR) is basically the opposite of Single Master Replication (SMR). In this case, three or more master databases are designated in which tables with the same table definitions and initial row sets are created. Changes, such as inserts, updates, and deletions to table rows are allowed to occur in any of these master databases. Changes to table rows in any given master database are then replicated to their counterpart tables in a common slave database. This single slave is a backup node shared among all master the nodes, so it contains all of the data.
This method does have at least one limitation: each master must replicate only unique structures to the slave. For example, it’d be inappropriate to have the same table names on multiple master nodes.
Why does Trove need replication?
Most of the datastores currently supported by Trove have replication capabilities to fulfill various use cases such as scale out via read replicas, operational recovery (failover, fault tolerance), and offline backup. In order to be truly production ready, Trove needs to support easy configuration and management of these use cases.
Over time, all of these requirements should be evaluated; the goal of this proposal (and the blueprint behind it) is to focus on read replicas for scale out, and target the MySQL datastore. It is expected that implementation within this scope will occur for other datastores, and then the scope can be expanded to meet the remaining requirements.
A set of use cases can be found in the community roadmap, but some common use cases include:
Read/write scaling (SMR).
High availability (single zone, multiple datacenter).
Replication as the backup process (MMR).
Replication as a failover mechanism (MMSSR).