SwarmKit CSI: What, why, and how
The 23.0 release of Moby and Mirantis Container Runtime brings changes and enhancements to all areas of the product, but one of the biggest is the addition of Swarm Cluster Volumes. Swarm Cluster Volumes leverage the existing ecosystem of Container Storage Interface (CSI) plugins to provide cluster-aware persistent storage for your Swarm cluster.
The Missing Piece: Storage
It’s not a secret to say that one of Swarm’s biggest holes is its very primitive support for persistent storage. Persistent storage is just that – data that is written to and persists on-disk. For the large part, Swarm relies on simply using the existing container volumes functionality. Services are created referencing a path or named Volume on the host. If that volume does not exist, it is created when the Task is started on the node. Some Volume Plugins allow for clever functionality like moving Volumes from node to node as needed, but ultimately, they’re limited by the very simplistic scope of the Volume Plugin API.
From the beginning, Swarm was designed to be as simple and painless to use as possible. Features like overlay networking, ingress routing, and by-default mutual TLS authentication are powerful but simple. Swarm is often compared to Kubernetes, but it’s clear that matching the breadth and flexibility of Kubernetes' feature set is not a worthwhile direction for Swarm.
Instead, Swarm focuses on providing simple but powerful features that fulfill the needs of the majority of users. For example, Swarm’s relatively simplistic scheduling modes, Replicated and Global, may not cover the infinite use cases that one can achieve with Kubernetes, but they provide a much more straightforward model that most users can pick up on quickly.
The challenge is to provide this same simplicity with persistent storage—one of the most challenging areas of cluster orchestration. The goal of cluster orchestration is to have numerous interchangeable commodity nodes running numerous interchangeable commodity workloads. This is at odds with persistent storage, which is by its nature not interchangeable or commodity. A database wouldn’t be very useful if every time a workload was rescheduled, it started with empty tables!
Enter the Container Storage Interface
The Container Storage Interface is one of the many attempts at standardization in the container orchestration space. It seeks to provide one common API that enables any storage plugin to work with any container orchestrator. Support for Container Storage Interface plugins came first and most fully to Kubernetes, and then to Apache Mesos, proving that the concept works for both sides of the plugin.
The API exposed by CSI provides a very compelling feature set. It is designed for a cluster environment, and enables the communication of details such as which nodes can access a storage volume, how workloads can concurrently access a storage volume, and more. Some storage plugins might work by providing network-attached storage devices. Others may involve copying the data onto the target nodes. In theory, a storage plugin could even work by physically changing tapes in a tape library. Supporting this breadth of possible use cases, with their strengths and restrictions, is not an easy task.
Fortunately, CSI’s draw for Swarm is that the complexity of handling these storage problems is cleanly pushed onto the plugin author. A plugin simply needs to report how and where a storage volume can be used, and Swarm can schedule its workloads accordingly. The strength of the Container Storage Interface, and its adoption by vendors, provides the key component needed to bring cluster-aware storage to Swarm while following the Swarm philosophy of simplicity.
Swarm Cluster Volumes
The Container Storage Interface plugins are ultimately an implementation detail. They are a means to an end, the engine that drives persistent storage in Swarm, but not the car itself. Instead, Swarm presents the end-user with a purpose-built interface that exposes the necessary details of a storage volume, while hiding the gritty mechanics.
Volumes are a new primitive Swarm type, like Secrets and Networks, that are used to manage cluster volumes. Rather than creating storage volumes on the node on demand, and more like Secrets or Networks, the Volume is created by the user ahead of time. This is accomplished, similar to Networks, in a way that transparently uses the existing Moby Volumes API, only with a new set of flags and options for the knobs and switches users may need in a cluster. These options can specify which nodes they want the volume accessible on, what kind of access mode it needs, and other more granular options.
Similarly, using cluster volumes with a service builds on existing APIs. Rather than specifying, say, bind or volume for a storage mount, the user can specify cluster as a mount type to mount a cluster volume. Swarm is aware of the volume’s details, and treats the volume’s use restrictions the same as it would a user-defined placement constraint, scheduling workloads only on those nodes that support their desired mounts.
For example, some volumes may be usable on one of any number nodes, and by any number of workloads at the same time, but only on a single node at once. Swarm schedules the first workload that uses such a volume to any of its acceptable nodes, and then knows to schedule subsequent workloads to that same node. This kind of intelligence is what makes cluster volumes “cluster aware” in a way that older Volume support in Swarm could not be.
The State of Cluster Volumes
Cluster volume support is available in Swarm in the Moby and Mirantis Container Runtime 23.0 release, but it should be stressed that this support is currently experimental. Early adopters are strongly encouraged to try it out and see how it works with their workloads, but ultimately, this initial release almost certainly has bugs that make it unsuitable for a production deployment. Additionally, CSI plugins are often deployed through tools such as Kubernetes Helm charts. While any CSI plugin’s executable should be compatible with Swarm, their packaging and deployment is usually tooled for Kubernetes. To use an automotive analogy, the engine fits, but it needs a transmission adapter kit.
Further, the initial release of cluster volumes contains only a minimum set of the features possible with CSI plugins. The core functionality of volume lifecycle and usage is implemented, but features like volume resizing or snapshots are not currently supported. The scope of implementing even these core features is huge, but nothing built so far precludes the later addition of these more advanced features.
This release is the culmination of loads of work, and one of the biggest new features in Swarm since its initial release. We hope you’ll try cluster volumes in a testing environment and find that the same simplicity you’ve come to love about Swarm now has a whole new dimension.