How to Use Cluster API to Programmatically Configure and Deploy Kubernetes Clusters

Eric Gregory - November 10, 2022

Kubernetes provides a portable, resilient, and resource-aware substrate for code that lives on the cloud. As Kubernetes has grown more popular, cloud infrastructure patterns have grown more complex: organizations may run hundreds or thousands of Kubernetes clusters at edge sites, and workloads may be orchestrated across multiple cloud providers or on-prem datacenters. 

That complexity calls for a unified interface through which clusters may be provisioned and managed, programmatically and at scale. The Kubernetes community created just such a tool in Cluster API.  

In this tutorial, we will…

  • Explain the fundamentals of Cluster API—what it is, how it works, and the core concepts that govern its design

  • Show you how to set up a local management cluster that will enable you to deploy new clusters to a cloud provider such as AWS

  • Walk you through the process of provisioning a workload cluster on AWS, and explain how to interact with that cluster via Lens

This primer will serve as a foundation for further tutorials on Cluster API, in which we will show you how to control clusters across multiple environments and create a management interface that responds programmatically to your needs.

Ready? Let’s get started!

What is the Cluster API?

Cluster API is a tool for programmatically configuring and deploying Kubernetes clusters on a variety of different infrastructures. It is open source and maintained as a sub-project of Kubernetes; as of this writing it is in version 1.2.4 and is regarded as production-ready. 

For example, Cluster API is an upstream component of Mirantis Container Cloud, which uses the API to deliver a powerful infrastructure control plane managed through a single-pane GUI, while integrating management for components like Ceph storage.

What is the purpose of Cluster API?

Cluster API begins from a simple premise: 

What if we could apply the “Kubernetes way of doing things”—that is, declarative configuration and deployment via API—to the provisioning and management of clusters themselves. What if spinning up a cluster to your specifications was functionally no different than deploying a Kubernetes resource like a Pod?

The answer to that “What if?” is that clusters would be much easier to configure and deploy programmatically and at scale—the same as any other Kubernetes resource. And that’s very useful for operators who want to manage lifecycles for hundreds or even thousands of clusters: whether those are edge clusters at retail locations, individual developer clusters, or clusters dedicated to sensitive, isolated workloads.

How does Cluster API work?

Cluster API not only gives us a way to manage Kubernetes clusters; it also runs on Kubernetes. Ultimately, Cluster API is simply a set of components that we install on a cluster and interact with in Kube-like style, through Kube-like tools. 

This raises a sort of chicken-and-egg problem: as operators, we will need to provide a first cluster from which all of our other clusters will be initiated. This initial cluster may be temporary—a bootstrap cluster that will be discarded later—or it may serve an important and ongoing role as a management cluster

In the conventions of Cluster API, the management cluster is one of two essential cluster types:

  • Management clusters: These clusters are responsible for the creation and oversight of other clusters through Cluster API. They are your agents for infrastructure management, and in that respect are a bit of an oddity—they’re probably not running application workloads like a normal Kubernetes cluster, but instead focusing entirely on provisioning, monitoring, and managing other clusters. 

  • Workload clusters: These are the clusters that will actually handle application workloads for your users—the business-as-usual clusters that do exactly what you would expect a Kubernetes cluster to do, running microservices and handling requests. 

This framework should sound familiar—it is, of course, reminiscent of the manager and worker nodes within a given Kubernetes cluster.

In any case, to get started with Cluster API, we need only furnish a few prerequisites:

  • An initial cluster: This is the starting cluster we will need to create the rest, and it can live in a lot of different places—on any number of clouds or on our local machine.

  • kubectl:  You’ll need the kubectl CLI installed on your workstation and all set to control your initial cluster.

  • A provider: “Provider” is Cluster API’s abstraction for the infrastructure on which your newly created clusters will run. That might mean the big public cloud providers such as AWS, Azure, and Google Cloud, but it could also refer to more specialized providers such as Equinix Metal or DigitalOcean, or a host infrastructure such as OpenStack. 

With these pieces in place, we will be able to configure and create clusters at scale entirely programmatically.  

Indeed, if you were so inclined, you could use the clusters you provision with Cluster API to run Cluster API components and provision yet another cluster. You can see, then, why the Cluster API logo is three turtles with Kubernetes logos on their shells: it’s Kubernetes turtles all the way down.

Deploying a cluster with Cluster API

In this walkthrough, we will use a local development cluster as our management cluster, and we will deploy a workload cluster to AWS

You can launch a local development cluster with Lens Desktop Kubernetes (a feature of Lens Pro), the Lens for Docker Desktop extension, Minikube, or another dev cluster implementation of your choice. Go ahead and start your local cluster now.

Next we’ll install the clusterctl command line tool. Download the binary from GitHub (making sure to grab the correct binary for your system):

% curl -L -o clusterctl

From the same directory, use chmod to modify permissions so the binary is executable.

% chmod +x ./clusterctl

Put the clusterctl binary in your PATH.

% sudo mv ./clusterctl /usr/local/bin/clusterctl

The clusterctl CLI tool should be installed now. In a moment, we’re going to use it to initialize our local Kubernetes cluster as a management cluster. But before we do, we need to do some configuration for the provider to which we intend to deploy workload clusters—in this case, AWS.

Initializing the management cluster for AWS

We could configure for multiple providers, at this point, and clusterctl would enable us to manage all of them—enabling us to create and manage hybrid/multi-cloud architecture from a single interface. But for this primer, we’ll keep things simple and stick to a single provider.

Now, in order to do our AWS configuration, we’re going to download another CLI tool called clusterawsadm that will help us generate a CloudFormation stack with appropriate IAM resources. (CloudFormation is an AWS tool for infrastructure-as-code automation.) We’ll install this tool exactly the same way we did with clusterctl—again, verifying that you have the right binary for your system:

% curl -L -o clusterawsadm
% chmod +x clusterawsadm
% sudo mv clusterawsadm /usr/local/bin 

Note: The rest of this walkthrough uses AWS credentials and, once a workload cluster is deployed, will incur some costs if you follow along—proceed advisedly! Make sure to use the credentials for an IAM user with policies that make sense for you.

The clusterawsadm tool draws on a set of environment variables in order to run its configuration. Now we’ll use the export command to define those environment variables:

% export AWS_REGION=us-east-1
% export AWS_ACCESS_KEY_ID=<Your access key>
% export AWS_SECRET_ACCESS_KEY=<Your secret access key>
% export AWS_SESSION_TOKEN=<Session token>

Note: the session token is only necessary if you’re using multi-factor authentication. 

With those environment variables defined, we can use clusterawsadm to generate our CloudFormation stack:

% clusterawsadm bootstrap iam create-cloudformation-stack

The system will return…

Attempting to create AWS CloudFormation stack cluster-api-provider-aws-sigs-k8s-io

…and it might take a moment. When it’s done, you’ll get a completion notification for various resources that looks like this:

AWS::IAM::Role            |                                            |CREATE_COMPLETE

Now we’ll create another environment variable—this one with our newly-created credentials, base64-encoded.

% export AWS_B64ENCODED_CREDENTIALS=$(clusterawsadm bootstrap credentials encode-as-profile)

And finally, three last environment variables which will be required to specify details about our new workload clusters. 
For the SSH key, you’ll need to actually have a key pair named default—alternatively, change the value here to an existing key pair. If you need to create an SSH key pair, you can do that on the Key pairs page found under Network & Security in the EC2 menu.

​​% export AWS_SSH_KEY_NAME=default
% export AWS_NODE_MACHINE_TYPE=t3.large

That’s it for our AWS-specific configuration. Now we’re ready to initialize our local cluster as a management cluster—with resources for deploying clusters to the provider AWS.

% clusterctl init --infrastructure aws

Your output should look something like this:

Fetching providers
Installing cert-manager Version="v1.9.1"
Waiting for cert-manager to be available...
Installing Provider="cluster-api" Version="v1.2.4" TargetNamespace="capi-system"
Installing Provider="bootstrap-kubeadm" Version="v1.2.4" TargetNamespace="capi-kubeadm-bootstrap-system"
Installing Provider="control-plane-kubeadm" Version="v1.2.4" TargetNamespace="capi-kubeadm-control-plane-system"
Installing Provider="infrastructure-aws" Version="v1.5.0" TargetNamespace="capa-system"
Your management cluster has been initialized successfully!

Before we move on, let’s take a moment to look at what we’ve done. If we take a look at our namespaces, we’ll see that we have several new namespaces in our management cluster dedicated to Cluster API tooling:

% kubectl get ns
NAME                                STATUS   AGE
capd-system                         Active   43h
capi-kubeadm-bootstrap-system       Active   43h
capi-kubeadm-control-plane-system   Active   43h
capi-system                         Active   43h
cert-manager                        Active   43h

We’ve also got quite a few new Custom Resource Definitions (CRDs). Here are a select handful:

% kubectl get crds
NAME                                                             CREATED AT                                  2022-10-17T19:19:36Z                                   2022-10-17T19:18:31Z               2022-10-17T19:19:36Z                      2022-10-17T19:19:36Z                                        2022-10-17T19:19:36Z                              2022-10-17T19:19:37Z                             2022-10-17T19:19:37Z                                    2022-10-17T19:19:37Z                                        2022-10-17T19:19:38Z                                     2022-10-17T19:19:38Z                            2022-10-17T19:18:17Z

These CRDs are really the core mechanism of Cluster API, enabling the management cluster to handle, say, clusters and machines as Kubernetes resources.

Creating and managing a workload cluster

Now we’re ready to generate a workload cluster! We’ll use clusterctl’s generate cluster command to create a YAML manifest in our working directory—which we can use, in turn, to deploy our cluster.

% clusterctl generate cluster test-cluster --infrastructure aws --kubernetes-version v1.25.0 --control-plane-machine-count=3 --worker-machine-count=3 > test-cluster.yaml

Note that we’re provisioning three worker machines for a reason: this is the requirement for minimum availability. 

Let’s take a look at the YAML manifest we generated. Open the file with the code or text editor of your choice. You’ll see multiple manifests in this one file, defining several different kinds of resources:

  • Cluster

  • AWSCluster

  • KubeadmControlPlane

  • AWSMachineTemplate

  • MachineDeployment

  • KubeadmConfigTemplate

Here are several of those new Kubernetes resource types that we added through CRDs. Look through the manifests and observe how many of the details we’ve specified so far are rendered in the YAML. I’ll zoom in on just one of those manifests here:

kind: AWSCluster
 name: test-cluster
 namespace: default
 region: us-east-1
 sshKeyName: default

Here we have an abstraction for an AWS-hosted cluster. It includes details like region and sshKeyName that we specified earlier through environment variables. And it’s linked to the resource instance of our more general Cluster object by name: test-cluster.

Let’s provision our workload cluster. From the directory where test-cluster.yaml is stored:

% kubectl apply -f test-cluster.yaml

Your output should look something like this: created created created created created created created

Now we can use kubectl to manage those custom resource types. We can run…

% kubectl get clusters
NAME           PHASE         AGE  
test-cluster   Provisioned   30s

…and this gives us a view of all our clusters managed by Cluster API. We can get more granular and view all of our machines:

% kubectl get machines
NAME                               CLUSTER          
test-cluster-control-plane-9cz2s   test-cluster  

Or perhaps we only wish to see our workload clusters on AWS:

% kubectl get awsclusters
NAME           CLUSTER        READY
test-cluster   test-cluster   true

The clusterctl CLI tool can help us gain even more insight. Run:

% clusterctl describe cluster test-cluster

The output should look like this:

NAME                                                 READY  SEVERITY  REASON  SINCE                SINCE  MESSAGE                                                                               
Cluster/test-cluster                                 True                     28s                                                                                                               
├─ClusterInfrastructure                              True                     5m                                                                                                                
├─ControlPlane                                       True                     28s                                                                                           
│ └─3 Machines...                                    True                     2m                             
  └─MachineDeployment/test-cluster-md-0              False  Warning           10m                             
    └─3 Machines...                                  True                     2m11s 

(I’ve omitted some detail for legibility here—the real output will give you names for all your nodes and more informative warnings for unready machines.)

Your results may not show the control plane as ready, but give it a couple of minutes and it should get there. The worker MachineDeployment will not become ready, however, because our new cluster is missing a final ingredient: a Container Network Interface (CNI) plugin to handle cluster networking. 

Once a control plane node is ready, we can communicate with it—and our first order of business is to grab the kubeconfigfor the worker cluster. The clusterctl CLI makes this easy:

% clusterctl get kubeconfig test-cluster > test-cluster.kubeconfig

The command above will download a file called test-cluster.kubeconfig to our current working directory. Now we can use that kubeconfig to manage our new worker cluster, and the first thing we will do is add the open source Calico CNI plugin, which provides network connectivity between workloads:

% kubectl --kubeconfig=./test-cluster.kubeconfig \
  apply -f

Here, we’re telling kubectl to use the kubeconfig in our working directory and to apply to that cluster—our new workload cluster—the manifest for Calico, which we’re grabbing from GitHub

If you check out the Calico YAML, or simply skim the console output, you’ll see that we’re adding a pile of Custom Resource Definitions, a Controller, and several other resources:

poddisruptionbudget.policy/calico-kube-controllers created
serviceaccount/calico-kube-controllers created
serviceaccount/calico-node created
configmap/calico-config created created created created created created created created created created created created created created created created created created created created created created
daemonset.apps/calico-node created
deployment.apps/calico-kube-controllers created

Now that we’ve added Calico, the worker nodes in our workload cluster should quickly become ready:

% clusterctl describe cluster test-cluster

The output for a fully ready cluster:

NAME                                                   READY  SEVERITY  REASON  SINCE                                                                                
Cluster/test-cluster                                   True                     4m48s                                                                                         
├─ClusterInfrastructure                                True                     9m20s                                                                                         
├─ControlPlane                                         True                     4m48s                                                                                         
│ └─3 Machines...                                      True                     6m20s       
  └─MachineDeployment/test-cluster-md-0                True                     78s                                                                                           
    └─3 Machines...                                    True                     6m31s  

At this point, we’re fully operational, and we can use the workload cluster’s kubeconfig to do whatever we need to do on the cluster. 

For ongoing interaction with the cluster, we could continue to specify the kubeconfig with kubectl, or merge it with our local kubeconfig and switch contexts, but one of the easier ways to hop between cluster contexts is using Lens

After downloading and starting Lens, simply click on the plus sign in the lower-right corner and select Sync kubeconfig.

Now you can easily switch between clusters and manage your workload clusters, install charts with a few clicks via Lens’ Helm interface, or just as easily set up a Prometheus monitoring stack

Here we can see all of our nodes organized for easy monitoring and management:

When we’re done with the workload cluster, the clean-up is mercifully easy:

% kubectl delete cluster test-cluster "test-cluster" deleted

That brings us to the end of this introductory primer, but it’s by no means the end of what you can do with Cluster API. In future walkthroughs, we’ll show you how to programmatically control clusters in other environments, and how to create a hybrid environment that is programmatically responsive to your needs. 

  "$experimentIndex": 0,
  "$variantIndexes": [
  "$activeVariants": [
  "$classes": [
  "name": "alternate-ad-placement",
  "experimentID": "ca62VGC4QDaNqECV8gH-kg",
  "variants": [