How to reduce your risk and liability with Kubernetes

Nick Chase - January 31, 2023

5 risks and liabilities that come with using Kubernetes

As cloud native computing becomes an integral part of most organizations, dependence on Kubernetes is growing at an increasing rate. In most cases, that’s a Good Thing (TM). There are, however, some risks you should keep in mind as you build out your Kubernetes architecture.

A complete discussion would leave us at novel-length, but here are a few ideas to get you started.

Access control

Perhaps the most obvious risk when it comes to using Kubernetes -- or any software delivery system, for that matter -- is that of ensuring that access is properly configured. For Kubernetes, this means a combination of cluster-specific and non-cluster-specific issues. For example, your cluster can be configured perfectly, but if the host operating system is riddled with vulnerabilities, it will be simple for an attacker to gain access to the cluster through the back door, so to speak. On the other hand, even if every host is completely hardened and up-to-date a misconfiguration could enable an attacker to access a vulnerable pod, and since Kubernetes has a flat networking system, every pod has networking access to every other pod.

But perhaps the most obvious issue when it comes to access control is that of users, who obviously need some access to the cluster in order to do their jobs and deploy or manage applications. General access to a cluster given through a KUBECONFIG file can be an issue because these files can be shared, and once created, their access can’t be revoked.

Another alternative is to use Role Based Access Control (RBAC), in which you create roles and bind users to them; in this way, you can change a user’s permissions by changing the permissions of the role, or remove them entirely by deleting the role binding associating the user with the role.

A simpler way is to use a tool that can understand and abstract RBAC permissions. For example, the Lens Kubernetes IDE enables you to assign permissions to a user without them ever having access to a KUBECONFIG file.

Other Security issues

Part of the reason it’s so important to control who can access the cluster is, of course, the danger that malicious code can be deployed. But just keeping malicious users out of the cluster is only part of the story; there is always a significant chance that a user might, through no fault of their own, deploy a container that contains security issues such as unprotected Secrets, and of course vulnerabilities in underlying packages, such as Log4Shell.

One way to prevent this is to provide users with approved base images, and to train them to always use them. It’s also a good idea to set up a CI/CD process that prevents users from deploying to the cluster unless their applications have been scanned for vulnerabilities, both at the container registry level and continuously. The Lens Kubernetes IDE’s Pro-tier security scanning can be automated via operator to handle this and integrated with your CI/CD processes.

Backup and recovery

Conventional wisdom says that Kubernetes-based applications should be stateless. But Kubernetes itself, and most applications, store important data in potentially-vulnerable forms. This data must be protected against loss, corruption, and non-availability in case of incidents and disasters. Disasters can range from total loss of a datacenter to flood or fire, to lockdowns caused by ransomware – all can be hugely costly, taking critical applications offline for extended periods. So you need to think about backup and recovery at several levels in order to minimize Mean Time to Recovery (MTTR).

First off, there’s the issue of ensuring that you can get back to the state that the cluster itself was in at the time at which you had an issue -- in other words, the pods that were instantiated, services running, and so on. One way you can protect that information is to regularly back up the etcd database that stores the state of your Kubernetes cluster.

Then there is the fact that beyond very simple applications, it’s unlikely that your overall architecture is built on entirely stateless applications; you likely store user information and other data pertaining to what you’re trying to accomplish. Typically you can accomplish this kind of statefulness through the use of PersistentVolumes -- mounted sections of the physical file system -- and StatefulSets, which enable Kubernetes to keep track of data related to particular interactions. To protect yourself, you’ll want to ensure that you have appropriate backup and recovery plans for the hosts on which this data will live.

You can also use the open source Velero project, which is specifically designed to backup and restore clusters and the objects associated with them.

Compliance

You may be subject to data sovereignty rules like GDPR that require certain data to remain within certain geographical boundaries, or outside of multi-tenant environments. But you may also have clusters that span multiple regions or underlying infrastructures.

Kubernetes gives you the means to automatically schedule sensitive workloads according to predefined parameters via mechanisms like node affinity and policy resources. But if these are configured erroneously, you could end up with a cluster orchestrating compliance violations en masse.

By the same token, many organizations are extrinsically motivated to have ironclad backup plans in place, by means of rules like the aforementioned GDPR, HIPAA, PCI, or other industry-specific regulations. In all of these cases, the stakes of compliance are high, and the margin for error in configuration can be thin.

The CNCF Graduated project Open Policy Agent provides an open source component for standardizing policy resources across a cluster—declaratively, according to the Kubernetes Way. More importantly, defining policies in this way abstracts them from the code itself, so it can be more easily released and reviewed.

Resource management

Cloud costs tend to spiral up, over time. And for shared, abstracted systems like Kubernetes, these can be very hard to understand, visualize, and control. Even when running Kubernetes on private clouds, resource utilization – critical for analytics and capacity planning – deserves serious attention.

Part of solving the Kubernetes resource management puzzle means being able to see and attribute resource utilization (to projects, teams, departments, etc.) This is the province of tools like OpenCost, and its expanded sibling, Kubecost, which provide open source basics for resource utilization tracking and cost conversion/attribution. Kubecost is available in self-managed and as-a-service forms, too. Users of the Lens UI for Kubernetes can install Kubecost on a cluster quickly and easily via Helm chart, and can integrate cost data with the Lens dashboard through Kubecost’s Lens extension.

The other part of managing resource utilization involves codifying and automating enforcement of policies preventing resource consumption beyond prescribed limits. This kind of automation can be done with tools like Open Policy Agent, described above. Setting workable resource limits and then optimizing them, however, is a dynamic process. That’s best approached with tools like Goldilocks, a project that helps you by making, and then dynamically “right-sizing” resource limit recommendations. Combining both can protect against both fine-grained excesses and certain kinds of “sorcerer's apprentice” hassles that cause big, surprise bills – like a developer issuing erroneous commands that end up overscaling a cluster.

Conclusion

Like I said up front, a full discussion would take a book. If you’d like to dive deeper into the risks that come with Kubernetes, and the best ways to mitigate those risks, Mirantis is holding a live conversation on how to reduce risk and liability with Kubernetes on February 2nd, 2023.

De-risking Kubernetes doesn’t have to be difficult. With the right cloud native expertise and the right tooling, you can have infrastructure that lets you breathe easy and focus on code. And missing out on that kind of accelerative focus—well, that’s a risk you don’t want to take.