How to Use StatefulSets and Create a Scalable MySQL Server on Kubernetes

Eric Gregory - September 16, 2022
image

One of the biggest challenges for implementing cloud native technologies is learning the fundamentals—especially when you need to fit your learning into a busy schedule. 

In this series, we’ll break down core cloud native concepts, challenges, and best practices into short, manageable exercises and explainers, so you can learn five minutes at a time. These lessons assume a basic familiarity with the Linux command line and a Unix-like operating system—beyond that, you don’t need any special preparation to get started.

In the last lesson, we learned how to use Secrets in Kubernetes, and we deployed a stateful application as a StatefulSet. Now it’s time to dig into what a StatefulSet is, how it works, and why it’s such an important resource for running stateful components like databases on Kubernetes. This will give us some of the final pieces necessary to complete the decomposition of our monolithic To Do app. 

Table of Contents

  1. What is Kubernetes?

  2. Setting Up a Kubernetes Learning Environment

  3. The Anatomy of a Kubernetes Cluster

  4. Introducing Kubernetes Pods 

  5. What are Kubernetes Deployments?

  6. Using Kubernetes Services, Part 1

  7. Using Kubernetes Services, Part 2

  8. Persistent Data and Storage with Kubernetes

  9. How to Use Kubernetes Secrets with Environment Variables and Volume Mounts

  10. How to Use StatefulSets and Create a Scalable MySQL Server on Kubernetes ← You are here

What is a StatefulSet?

The StatefulSet API resource is an abstraction for managing stateful applications on Kubernetes. It is roughly analogous to a Deployment, but tailored to stateful rather than stateless processes. 

In the last lesson, we skipped over the whys and wherefores of StatefulSets to focus on implementing Secrets. Today, we’ll take a look at some of the same YAML markup, but zero in on StatefulSets instead.

In a new project directory for this lesson (you can call the directory /statefulsets/manifests), create a YAML file called todo-mysql.yml and copy the manifests below. You can also find a copy in the GitHub repository for this lesson.

apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
 name: sc-local
provisioner: k8s.io/minikube-hostpath
parameters:
 {}
reclaimPolicy: Delete
volumeBindingMode: Immediate
allowVolumeExpansion: false
 
---
 
apiVersion: v1
kind: Secret
metadata:
 name: mysqlpwd
data:
 password: b2N0b2JlcmZlc3Q=
 
---
 
apiVersion: v1
kind: Service
metadata:
 name: todo-mysql
 labels:
   app: todo-mysql
spec:
 type: ClusterIP
 selector:
   app: todo-mysql
 ports:
   - port: 3306
     protocol: TCP
 clusterIP: "None"
 
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
 name: todo-mysql
spec:
 selector:
   matchLabels:
     app: todo-mysql
 serviceName: todo-mysql
 replicas: 1
 template:
   metadata:
     labels:
       app: todo-mysql
   spec:
     terminationGracePeriodSeconds: 10
     containers:
     - name: todo-mysql
       image: mysql:8
       ports:
       - containerPort: 3306
         name: todo-mysql
       env:
       - name: MYSQL_ROOT_PASSWORD
         valueFrom:
           secretKeyRef:
             name: mysqlpwd
             key: password
       - name: MYSQL_DATABASE
         value: "todo_db"
       volumeMounts:
       - mountPath: /var/lib/mysql
         name: todo-volume
 volumeClaimTemplates:
 - metadata:
     name: todo-volume
   spec:
     storageClassName: sc-local
     accessModes: [ "ReadWriteOnce" ]
     resources:
       requests:
         storage: 1Gi

This mega-manifest includes all of the Kubernetes resources for our To Do app’s MySQL server. It differs from the similar YAML file we used previously only in that it includes a definition for the mysqlpwd Secret. The last section of the manifest defines our StatefulSet

The StatefulSet manifest should feel familiar—it looks a lot like a Deployment manifest! Instead of the volume field under a Deployment’s template spec, we define a VolumeClaimTemplate under the overall StatefulSet spec to describe how the workload will consume storage. In this case, that means connecting to our sc-local Storage Class, specifying a ReadWriteOnce access mode, and defining a one gibibyte storage request.

Take note of the fact that we’re defining a template for PersistentVolumeClaims here, which Kubernetes will use to create a unique PVC for each Pod managed by the StatefulSet. That will be important later.

Getting a StatefulSet up and running

Make sure Minikube is running, then from the directory where you’ve saved todo-mysql.yml, run:

% kubectl apply -f todo-mysql.yml
storageclass.storage.k8s.io/sc-local created
secret/mysqlpwd created
service/todo-mysql configured
statefulset.apps/todo-mysql created

As soon as the resources are created, you should be able to see them in Lens. You can check Pods under the Workloads menu to see if your automatically generated todo-mysql-0 Pod is running yet, or check with:

% kubectl get pods

Note the name of the Pod: todo-mysql-0. This is a key difference between a StatefulSet and a Deployment: 

  • The StatefulSet maintains a primary pod at ordinal number 0, using a predictable name that applications can connect to reliably if needed. Replicas are duplicated from this primary Pod to facilitate an orderly flow of persistent data. 

  • Deployments, by contrast, use randomized addenda to their names for individual instances, and are not designed with persistent data in mind.

Now we’ll use kubectl exec to ensure that the StatefulSet is working correctly. Since our pod is named using an ordinal, we can jump right into it without searching for its name:

% kubectl exec --stdin --tty todo-mysql-0 -- /bin/bash

Now that we’re inside the container, start MySQL:

bash4.4# mysql -u root -p

Enter our root password: octoberfest

From here, we can access the database we created:

mysql> USE todo_db;

Now let’s create the database table that our To Do app will use:

mysql> CREATE TABLE IF NOT EXISTS Todo (task_id int NOT NULL AUTO_INCREMENT, task VARCHAR(255) NOT NULL, status VARCHAR(255), PRIMARY KEY (task_id));

Press return.

Query OK, 0 rows affected (0.04 sec)

We can check to confirm that our table has been created:

mysql> SHOW TABLES;
+-------------------+
| Tables_in_todo_db |
+-------------------+
| Todo              |
+-------------------+
1 row in set (0.00 sec)

Now let’s manually add an item to the table—in this case, a simple ‘hello’ message:

mysql> INSERT INTO Todo (task, status) VALUES ('Hello','ongoing');

Check the table for the new item:

mysql> SELECT * FROM Todo;
+---------+-------+---------+
| task_id | task  | status  |
+---------+-------+---------+
|       1 | Hello | ongoing |
+---------+-------+---------+
1 row in set (0.00 sec)

Exit MySQL and the container by typing exit twice.

Question time: if we create a replica, do you think our data will persist across replicas of the pod? Let’s see! We can add replicas imperatively using kubectl:

% kubectl scale statefulsets todo-mysql --replicas=2
statefulset.apps/todo-mysql scaled
% kubectl get pods
NAME           READY   STATUS    RESTARTS   AGE
todo-mysql-0   1/1     Running   0          11m
todo-mysql-1   1/1     Running   0          6s

Now let's hop into the new replica.

% kubectl exec --stdin --tty todo-mysql-1 -- /bin/bash
bash-4.4# mysql -u root -p
Enter password: octoberfest
mysql> USE todo_db;
Database changed
mysql> SHOW TABLES;
Empty set (0.00 sec)

If you guessed that the data wouldn’t persist across pods, well done. Go ahead and exit, then delete the running StatefulSet and Service.

mysql> exit
bash-4.4# exit
% kubectl delete statefulset todo-mysql
% kubectl delete service todo-mysql

Let’s talk about why the data isn’t persisting. One simple reason is the design of the StatefulSet. 

When we created a new replica, Kubernetes used the volumeClaimTemplate in the StatefulSet spec to create a new PersistentVolumeClaim and bind a newly apportioned chunk of storage. In practical terms, this is a new database instance with the same credentials and pre-set todo_db database as the first instance. 

“Replica” is a bit of a misnomer for this new pod. It’s a unique instance, with unique storage and a unique hostname. Moreover, the volume claim is renewable, and the hostname is predictable on account of the appended ordinal number. If this Pod should crash, a new one will be generated with the same ordinal number, and that new instance will be able to take up the existing PVC, so that the pod has access to the same data. 

In short, the pod achieves stable identity in terms of both storage and networking.

This is all by design. The StatefulSet is meant to serve as a building block for statefulness on Kubernetes. By enabling us to manage predictably unique workloads, the StatefulSet gives us a key ingredient for scaling stateful applications, including databases. We shouldn’t expect to run a production-grade database with a StatefulSet alone, but it provides a building block and common abstraction for stateful applications.

Horizontally scalable databases

In the context of databases, “horizontal scaling” means expanding database capacity across multiple nodes or instances rather than “vertically” raising the ceiling on one big node. 

Distributed databases tend to use a couple of patterns—often in conjunction—to to scale data stores horizontally: 

  • Replication: in which the database copies an entire data-set and coordinates updates among the replicas. The primary goals here are to reduce latency, if a given database instance is subject to many simultaneous requests, and to foster resilience. This might be used for smaller volumes of data or to facilitate back-ups.

  • Sharding: in which the database divides the data-set between a number of database nodes—or “shards”—and coordinates queries between them. Sharding is a different approach to increasing speed and a way to more efficiently store large data-sets that may be costly or difficult to store on one node, let alone multiple replicas. This is the typical approach for large distributed data-sets.

Many databases, including MySQL, support manual sharding—handcrafting the joins between different sections of the data-set—but that’s exactly as excruciating as it sounds. No matter how our cloud native database handles distributed data, we definitely want it to operate in an automated way.

Here are a few popular open source databases used widely with Kubernetes:

  • CockroachDB: A distributed SQL database designed specifically for Kubernetes, focusing on resilience, scalability, and ease of use within Kubernetes. Open source and available self-hosted or managed through creator Cockroach Labs, this is considered a “NewSQL” database in that it combines an old-school relational and SQL-compatible approach with new-school distributed and cloud native architecture.

  • Cassandra: While it precedes Kubernetes (and therefore isn’t tailored to it specifically), Apache’s Cassandra is a distributed “NoSQL” (non-relational, or “not only SQL”) database system with a mature set of drivers, designed to pass data quickly between its own database nodes. Cassandra may be self-hosted or used as a service managed by a variety of vendors including the major public cloud providers.

  • MongoDB: MongoDB is a NoSQL database that precedes Kubernetes and is designed to use JSON-like data objects. MongoDB can be self-hosted or managed by MongoDB using the MongoDB Atlas service, or as a managed offering from other vendors.

In their Kubernetes implementations, all of these databases are built on StatefulSets, and all of them require some additional logic to work as intended. But as I said before, a StatefulSet isn’t sufficient to run a production-grade database—especially not one that needs to juggle concurrent requests across instances and ensure data fidelity. For those kinds of complex tasks, Kubernetes relies on two important avenues of extensibility: custom resources and operators.

Introducing custom resources and operators

The Kubernetes creed—declarative portability for All the Things!—dictates that we’d really like our extensions to work as standard, portable Kubernetes resources. You can define custom resources with Custom Resource Definitions (CRDs), which are simply YAML manifests for creating new Kubernetes resources. 

Suppose we want to replicate instances of the MySQL server to reduce latency. This is only possible for reads—if we try to do the same thing for writes, we introduce problems like write skew, wherein two transactions simultaneously attempt to change a value. But we can accomplish our goal with the help of a CRD and an operator. 

Today, we’re going to use a CRD developed by Oracle, owners and sponsors of MySQL. Ultimately, this will work together with an operator and use the StatefulSet to manage Pods with a primary-secondary model—enabling us to scale horizontally.. You can take a look at the entire CRD manifest here if you wish, but we’ll just peek at a representative snippet:

apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
 name: innodbclusters.mysql.oracle.com
spec:
 group: mysql.oracle.com
 versions:
   - name: v2
     served: true
     storage: true
     schema:
       openAPIV3Schema:
         type: object
         required: ["spec"]
         properties:
           metadata:
             type: object
             properties:
               name:
                 type: string
                 maxLength: 40
           spec:
             type: object
             required: ["secretName"]
             properties:
               secretName:
                 type: string
                 description: "Name of a generic type Secret containing root/default account password"
		...

After defining the metadata for this new resource called innodbclusters.mysql.oracle.com, the spec goes on to start defining the properties that future manifest-writers will use to define their own instances of this resource. I’ve only excerpted one property out of the many included in the full file: secretName, which accepts a string as its value. The definition here also includes a description that can be accessed via kubectl describe, Lens, or other Kubernetes API clients.

Let’s apply our custom resource—not from our local machine, this time, but from the web:

% kubectl apply -f https://raw.githubusercontent.com/mysql/mysql-operator/trunk/deploy/deploy-crds.yaml

Next, we’re going to deploy the MySQL Operator for Kubernetes, also developed by Oracle. An operator is an application that runs on the cluster and manages a custom resource. This is ultimately just an application pattern intended to automate some of the tasks of a human operator. This operator is built out of a collection of Kubernetes resources including a Deployment running the mysql-operator container image. You can find operators for other database systems at Operator Hub, which serves as a central location for finding Operators of all sorts, including but not limited to databases.

Apply the operator manifest as well, and then check that it’s running:

% kubectl apply -f https://raw.githubusercontent.com/mysql/mysql-operator/trunk/deploy/deploy-operator.yaml
% kubectl get deployment mysql-operator --namespace mysql-operator
NAME             READY   UP-TO-DATE   AVAILABLE   AGE
mysql-operator   1/1     1            1           77s

With our CRD and operator in place, it’s time for another manifest to deploy our set of MySQL servers. Let’s see how much of a mega-manifest we’re working with this time:

apiVersion: v1
kind: Secret
metadata:
 name: mysqlpwd
data:
 rootHost: JQ==
 rootPassword: b2N0b2JlcmZlc3Q=
 rootUser: cm9vdA==
 
---
 
apiVersion: mysql.oracle.com/v2
kind: InnoDBCluster
metadata:
 name: todo-mysql
spec:
 secretName: mysqlpwd
 instances: 3
 router:
   instances: 1
 tlsUseSelfSigned: true

Hey, that’s downright manageable! Since our StorageClass is already in place, we’re defining two resources: 

  • Our Secret, which we’re revising slightly to the format expected by the custom resource. I have the credential fields base64-encoded here.

  • An instance of the custom resource InnoDBCluster, which abstracts away all of its constituent components. (We’ll take a peek at some of those in a moment.) 

Meanwhile, we only need to specify values for a handful of InnoDBCluster’s fields: the Secret name, how many database server instances we want, and how many routers we want. The router will help coordinate traffic between the three replicas: to ordinal zero (todo-mysql-0) when a write comes over the transom, and to any ordinal for a read. This is the primary-secondary model we were talking about earlier, and the StatefulSet’s unique, predictable hostnames make it possible.

Copy the manifest above into a file called inno.yml in your /stateful/ project directory. (Alternatively, it’s available on GitHub.) Then apply the manifest:

% kubectl apply -f inno.yml

If you wish, you can watch the database server instances spin up from the command line:

% kubectl get innodbcluster --watch

It’ll take a moment, but all instances should reach online status before long:

NAME         STATUS   ONLINE   INSTANCES   ROUTERS   AGE
 todo-mysql   ONLINE   2        3           1         48s
 todo-mysql   ONLINE   2        3           1         53s
 todo-mysql   ONLINE   3        3           1         58s

Once the instances are up, open Lens and take a look at your pods. You’ll see we have three database server replicas (managed by a StatefulSet) and one stateless database router.

Let’s have a look at a couple more components to get a sense for how this contraption is working. Under Configuration in the Lens menu, select ConfigMaps and select the entry called todo-mysql-initconf.

The ConfigMap API resource is an abstraction that gives us a way to separate configuration details from their subject container images. A Pod or Deployment or StatefulSet can then access the details from the ConfigMap through an environment variable, volume-mounted config file, or command line argument.

This is useful especially when configuration details might be re-used—or when they might differ according to the circumstances. When Pods in the same StatefulSet (or Deployment) initialize, they can use these configuration files conditionally, and this is another place where a StatefulSet’s fixed ordinals—which initialize one-by-one, in sequence—can come in really handy. As a todo-mysql Pod comes online, it can check to see where it stands in the ordinal line-up; if it’s first, it can use the configuration files associated with a writable primary instance. If it’s later in the sequence, it can configure itself as a read-only replica. 

But does our horizontally-scaling set of database servers actually work as intended? Let’s see! Once again, we’re going to: 

  • Hop into the ordinal-zero pod

  • Log in to MySQL

  • Create a Todo table in the todo_db database

  • Add an item in the table.

 

Working with multi-container pods

Note that the opening kubectl exec command looks a little different than before. That’s because we’re jumping into a multi-container pod, so we need to specify which container we want to access with the -c argument. In Lesson 2, I mentioned that some applications break the one-container-per-pod ideal and use “sidecar” containers to manage some extra bit of functionality. Here’s one such pod in the wild.

 
% kubectl exec --stdin --tty -c mysql todo-mysql-0 -- /bin/bash
bash4.4# mysql -u root -p
octoberfest
 
mysql> CREATE DATABASE IF NOT EXISTS todo_db;
Query OK, 1 row affected (0.02 sec)
 
mysql> USE todo_db;
Database changed
 
mysql> CREATE TABLE IF NOT EXISTS Todo (task_id int NOT NULL AUTO_INCREMENT, task VARCHAR(255) NOT NULL, status VARCHAR(255), PRIMARY KEY (task_id));
Query OK, 0 rows affected (0.04 sec)
 
mysql> INSERT INTO Todo (task, status) VALUES ('Hello','ongoing');
Query OK, 1 row affected (0.01 sec)
 
mysql> SELECT * FROM Todo;
+---------+-------+---------+
| task_id | task  | status  |
+---------+-------+---------+
|       1 | Hello | ongoing |
+---------+-------+---------+
1 row in set (0.00 sec)

Exit MySQL and the container by typing exit twice. Moment of truth—let’s see if our write has carried over to a replica server.

% kubectl exec --stdin --tty -c mysql todo-mysql-0 -- /bin/bash
bash4.4# mysql -u root -p
octoberfest
 
mysql> USE todo_db;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A
 
Database changed
mysql> SELECT * FROM Todo;
+---------+-------+---------+
| task_id | task  | status  |
+---------+-------+---------+
|       1 | Hello | ongoing |
+---------+-------+---------+
1 row in set (0.00 sec)

Excellent! 

This kind of simple, wholesale replication is by no means a standard approach to scaling databases—but it’s one use-case for a StatefulSet, and it gives us some key insight into the resource’s uses and workings. The StatefulSet is a deceptively powerful resource with enormous flexibility: 

  • It can drive full data-store replication via sequential Pod initialization, as with the InnoDB CRD and MySQL Operator. 

  • It can be used to facilitate sophisticated architectures in which each Pod serves as a unique node in a database cluster, as in the case of CockroachDB.

Medium and large organizations—the ones for which Kubernetes is best-suited—will be using many different database systems for many different ends. The StatefulSet is a key API resource that makes each of those different approaches possible. 

That brings us to a close for today. Run exit twice and shut down your cluster with minikube stop. (We’ll use the running resources next time, and they should launch again when you restart Minikube.) Next time, we’ll use what we’ve learned here to finalize the decomposition of our monolithic To Do app into a scalable, cloud native architecture.