Developer Self-Service in Kubernetes Environments: A Step-by-Step Guide
Centralizing management of Kubernetes applications on a single platform reduces DevOps toil, increases dev productivity, and speeds troubleshooting and support
Assuming or hoping every developer and application support team in your organization knows Kubernetes well is not a strategy.
When teams can’t support applications because they are not comfortable with the underlying infrastructure or because of scale, the pressure and toil on the DevOps team will keep increasing.
Giving teams access to your clusters to troubleshoot applications may not be secure or easy to manage.
Happily, you can implement a scalable application support model in just a few minutes and without changing any of your pipelines and infrastructure. Centralizing application management from multiple clusters, pipelines, and teams in a unified application platform will reduce DevOps toil, increase dev productivity, and speed troubleshooting and support.
I’m sure you have seen many guides with cheat sheets for getting up to speed on Kubernetes and kubectl basics. I have found fantastic resources for team building, delivering, and supporting Kubernetes clusters.
In contrast, I have seen limited resources for teams supporting applications. I’m sure most of you have come across the picture below:
As engineers, we love a nicely done diagram and get the feeling it gives us superpowers when we memorize all those steps and individual commands. But this workflow presents a few challenges:
It assumes developers and anyone else supporting applications will have access to your clusters. We can all agree that this is far from ideal, and handling RBAC in Kubernetes at scale is another challenge.
As you scale the number of applications deployed in your cluster and the number of developers managing those applications, it’s wrong to assume everyone will know Kubernetes infrastructure details, ingress, services, etc. This will cause application support times to increase. Load, toil, and stress on the DevOps team will also increase, and your goals of a developer self-service platform will be impossible to reach.
While Git may be your source of truth, finding and accessing an application’s historical information will be a nightmare. As you scale pipelines, teams with different SLAs and security requirements are onboarded, and more complexity and scale kick in.
If one service goes down, it can very well impact others. Understanding service-to-service relationships at scale using the workflow above is challenging: it’s difficult to visualize the blast radius of a service outage and gauge its impact on other services.
Onboarding developers, DevOps, and SREs will be more challenging as you scale. Greater complexity makes it harder for new team members to quickly understand all applications, owners, service relationships, etc.
Bottom line, our goal is to implement a self-service application support model that exposes all required information so teams can support their applications quickly while reducing the toil on the DevOps side so that you can focus on scaling.
The section below will guide you through implementing that model without making a single change in your clusters, pipelines, automation, etc. You will give developers and SRE teams a multi-tenant application platform they need to support their applications effectively.
You will look good, eh?
We will assume the sample architecture below:
Centralize application information from multiple clusters into a single application platform
Remove the need to give developers access to your clusters
Enable teams to get instant details about service ownership, health, dependencies, logs, etc.
Record application history, so any changes such as new deployments, errors, restarts, and more will be recorded as the application’s lifecycle so teams can understand what happened to the application over time. This is great for application support and regulated industries that need access to every application change and log.
Regardless of the pipeline you use – Jenkins, ArgoCD, FluxCD, or others –Lens AppIQ will standardize the application management layer and help you keep evolving your pipelines and infrastructure without impacting how teams manage and support their applications.
Lens AppIQ will install its agent on your cluster and start the scanning process. The process should take a couple of minutes, and you can follow its progress through the Events page of your Lens AppIQ dashboard.
As soon as Lens AppIQ starts discovering the applications, you will see them available on the Applications page. Lens AppIQ will continue to scan and discover applications while the cluster remains connected.
By clicking on the application name, you will have access to the application metadata, such as owner, container images used by this application, environment variables, endpoints where the application is exposed, etc.
In addition, you have access to the application monitoring information, lifecycle (deployments, restarts, errors,…), pods, and logs, and you can integrate this application into your incident management systems so that teams can automate support and tracking of activity.
Your users also have visibility into each applications’ dependencies and how it communicates with other services. Should problems arise, they can quickly spot the dependency causing the problem or determine why their services can’t communicate with other applications.
In just a few minutes, and without making any changes to your infrastructure, you can enable your teams to support and manage their applications without dealing with the underlying infrastructure complexity and logging into your clusters.
This way, you can build a scalable application management model, make it easy for users to understand and support their applications, reduce time required to resolve issues, and more.