Monitoring Kubernetes costs using Kubecost and Mirantis Kubernetes Engine [Transcript]

Otto Nagengast - May 11, 2022

- Mirantis Kubernetes Engine, Lens, kubernetes, Kubecost, chargeback

Cloud environments & Kubernetes are becoming more and more expensive to operate and manage. In this demo-rich workshop, Mirantis and Kubecost demonstrate how to deploy Kubecost as a Helm chart on top of Mirantis Kubernetes Engine. Lens users will be able to visualize their Kubernetes spend directly in the Lens desktop application, allowing users to view spend and costs efficiently by namespace, pod, deployment, and more. You can access the full webinar recording here and follow along with the transcript below.

Here is an excerpt of the demo portion of the webinar.

Featured presenters: Alex Thilen, Head of Business Development at Kubecost, Rob Faraj, Founding Team at Kubecost, and Edward Ionel, Head of Developer Relations at Mirantis

Introduction

Edward: Today, we’re going to be taking a look at how to monitor your Kubernetes costs with Kubecost on Mirantis Kubernetes Engine, which is a resilient and robust certified Kubernetes distro.

And the reason why this is so important is this: As organizations switch into building more applications on the cloud and creating cloud native applications, it’s imperative for teams and organizations to be able to accurately understand their cost allocation and their spend with Kubernetes and containers.

And this is why we’re here today to bring to you our awesome demo where we’ll showcase how you can accurately track all of these different metrics. And the Kubecost team will absolutely jump into that.

To start, we will just do a quick high-level overview of the technologies we’re taking a look at, the organizations that are here with us today, and then we will jump into our hands-on demo.

We have two separate demos. The first one being how to deploy Kubecost on top of Mirantis Kubernetes Engine, and then port forwarding into that application, being able to understand the cost allocation there. And then I’m going to hand it over to Rob from Kubecost to give us a deep dive into everything you can see with Kubecost.

Without further ado, let’s jump into it.

About Mirantis

So, who is Mirantis? Mirantis is a cloud native organization focused on open source technologies that’s been around since roughly 2010.

We started with OpenStack and had transitioned into Kubernetes, and we really are focused on helping organizations ship code faster. We serve over 800 customers from all across the globe, and we’re really also focused on reducing cloud and Kubernetes complexity and ensuring that developers and operators can work simultaneously and have the correct synergies.

Here, this is a high-level overview of our product suite at Mirantis. We have on the lefthand side, our virtualization product, which is Mirantis OpenStack for Kubernetes. In the middle, we have Mirantis Container Cloud that gives you a single pane of glass to view to all of your infrastructure, whether it’s running on OpenStack on-prem, or if it’s running in the cloud, you can actually be able to see all of your infrastructure, all of your workloads in one single pane of glass, which is extremely new. Whether again, it’s running in the public cloud, on-prem, or bare metal.

On the righthand side, we could take a look at our containerized technologies, Mirantis Kubernetes Engine. That is the Kubernetes distro we are taking a look at today. It’s resilient and robust, and it is a certified Kubernetes distro.

Then we have K0s on the right hand side. That is an open source certified Kubernetes that we’ve also built.

On the bottom right hand side, we have Mirantis Secure Registry, which is a container images registry that allows you to scan for vulnerabilities, store your images, manage teams, and so forth. And then Mirantis container run time as well.

On the left hand side, we can see Lens. This is another product we will be taking a look at today. Lens is a 100 percent open source technology as well, built by team Mirantis, that allows you to achieve full situational awareness of everything happening within your Kubernetes cluster regardless of the distro.

So if you have a EKS cluster, or Mirantis Kubernetes Engine cluster, a K0s cluster, you can add any of these clusters to Lens, achieve full situational awareness, and also manage multiple Kubernetes clusters as well. We’ll take a look at that.

StackLight is a feature on top of all of our products, except the open source ones, that allow you to achieve full lifecycle management of everything that’s happening on your stack or your Kubernetes service from real-time alerts, upgrades, and automation.

What is Mirantis focused on? I mentioned this earlier, and we won’t take too much of a deep dive here, but we’re very much focus on driving business outcomes and results specifically around shipping code faster.

We believe that organizations that are able to create a cloud native approach and ship their applications very quickly are generally going to be able to have faster deployment times, faster time to recovery, and are much more or less likely to upload vulnerable code to the production environments.

How Mirantis delivers cloud native technologies at enterprise scale

So how do we deliver this at enterprise scale? There’s two streams. For developers, we’re absolutely increasing productivity, creating an as-a-service experience for our developers, where it’s a click of a button for them to actually push their code, their applications, on top of the Kubernetes service.

They are never directly interacting with the platform. We help them secure the software supply chain as well to Mirantis Secure Registry. Being able to understand every single line of code through the entire process, whether you are writing the code, pushing it through a CI/CD pipeline, and then pushing it on top of your Kubernetes service.

We’re ensuring that that code is compliant and meets regulatory concerns that you may have. We’re also very much focused on open source. That’s item three here. We have open standards. We’re very much focused on open source. And a lot of our products are built on open source technologies like Kubernetes and OpenStack.

For our operators, it’s very important that they have automated updates and upgrades. This is that lifecycle management that I mentioned, being able to get real time alerts via Slack, understanding their cluster health, and also being able to have upgrades and so forth that are happening on the backend to ensure everything’s working properly.

We also have managed services that allows you to have a full white glove service there that allows us to actually operate the entire stack infrastructure and Kubernetes service, and Mirantis Kubernetes Engine and all of our products can be deployed on any operating system and infrastructure.

About Mirantis Kubernetes Engine

So real quick, we will talk about MKE. This is going to be a high level overview, and then we’ll actually jump into more of what Kubecost is doing.

As mentioned, Mirantis Kubernetes Engine is a resilient and robust, certified Kubernetes distro that actually has all of the batteries included, but they’re swappable. If you have your own networking technologies, your own ingress that you want to use, you absolutely can swap out the NGINX or the Istio for your choice.

On the left hand side, we have lifecycle management. I think I touched on that enough, the alerts, the upgrades, and so forth. On the righthand side, it’s extremely important to our customers that we remain compliant and have the correct governance associated as well.

So role-based access control, identity management, and we are FIPS-compliant as well. What does Mirantis bring to Kubernetes? I think we touched on all of these things already, but it’s the automated deployment and continuous delivery through lifecycle management.

We have extensive security enhancements as well on top of every single one of our products. And we can support any infrastructure, whether it’s bare metal, on-prem or running in the public cloud as well.

From here, I’m super excited to turn it over to Alex from Kubecost, where he’s going to talk to us about a couple slides. Alex, take the floor.

About Kubecost

Alex: Thank you, Edward. I just wanted to give overview of the company of KubeCost, just so you know who we are as well as the problem space and how we address that.

We are founded by two former Google employees that worked on the Kubernetes project back when it was still called Borg.

They’ve been in this problem space for about a decade and Kubernetes is super, super deep in our company’s DNA. We are absolutely, hyper-focused on helping teams understand and optimize their Kubernetes spend. That is our primary and singular focus.

To do this, our founders launched an open source project in April of 2019. And since then, more than 2,000 teams are actively using our product across all major cloud providers.

They use our product on-premise. They use it in air-gapped environments, and they use it on the three major cloud providers, as well as some of the longer tail players. This makes up 2,000-plus user deployments on our open source or free tiers.

And we’re managing about $2 billion in spend under management. Within those customers, we’ve helped them save a significant portion on their Kubernetes spend. Traditionally, we see anywhere between 30 to 50 percent on our open source or free tier.

Once we have deeper commercial engagements with teams, we get up to about 80 percent, which is super exciting. Moving forward here, I’m going to talk about some survey results from the CNCF.

What I found really interesting about Edward’s polling question is that those results were so similar to what the CNCF found as well in the spin op survey.

If we look on the left side, we see that the survey found that the vast majority of teams which are represented by these red bars or about 75 percent are seeing rising Kubernetes costs. Quite honestly, I don’t think anybody here is going find that super surprising, right?

Because everybody’s using more Kubernetes, for example Kubernetes was up 70 percent year over year. I think it’s only natural that spend is going to increase. However, if we look on the right side and I think what we learned as well from the poll, is that it’s really surprising to me how few folks actually have accurate cost monitoring in place.

Per the CNCF survey, only about 12 percent have accurate showback or chargeback. And overall, this survey is consistent with what our customers tell us as well as what the poll told us. Customers are using more Kubernetes, spend is increasing, but they don’t have a way to track it.

We expect this to continue. It’s our single mission to help solve this problem. I think one question that’s really common that comes up a lot is people ask us or ask themselves, well, why don’t people just track the Kubernetes costs, right? Like why is this hard?

What we see is that fundamentally this is a different and difficult technical challenge because most teams run containers and Kubernetes in a multi-tenant mode. Tthat means that they’re sharing underlying infrastructure.

For example, in this diagram, that’s what’s happening, right? You can see the dev, the staging and the production clusters are sharing those underlying nodes. Commonly teams have dynamic workloads within those environments where resources are short-lived, they’re fluctuating greatly, and it’s difficult to disentangle those costs.

Even if, for example, you knew that a cluster is costing you a thousand dollars today, it’s challenging to know which individual team or which application within that cluster is responsible for those costs.

In addition, there’s behavioral challenges with Kubernetes – we’re talking about more rapid and decentralized deployments. Things are always changing and oftentimes there’s not a central team to manage all of that in a consistent manner.

How Kubecost monitors and optimizes Kubernetes costs

That brings us to Kubecost and how Kubecost helps. We’re focused on three main things, and these tend to go in order with our users. First and foremost, we are answering the question for our users: What has been spent across my Kubernetes workloads today?

We answer this question by providing visibility. This is absolutely fundamental to getting the allocation equation figured out. What we do is take cluster costs and we distribute them fairly across individual tenants.

I emphasize this fairness piece because that is so important if you want to do this the right way. You can’t just arbitrarily allocate costs other than by precise resource requests. For example, if we did this in an unfair manner, then teams would not buy into the underlying measurement that we’re using.

Additionally, in this environment where we’re providing visibility, we use a public billing API. Customers get accurate cloud prices, or even custom pricing such as, if they’re using spot, if they’re using discount, if they’re using savings plans or anything else, all of that will be factored in.

We start to work with DevOps or infrastructure teams here and really understand and get comfortable with the data in this visibility tier.

Next, once we move into optimization, the question that we’re really answering for customers is how could I have spent on Kubernetes more optimally? To do this, we work with application teams that are shipping workloads as well as finance teams that see the data that includes engineering managers.

For the first time, these users are able to see the cost of running a team or an application or a product line, and they can more accurately make budgeting decisions by having this data. Once we’ve built this out across the organization, then we can go through and start to build out cost targets, understand idle costs, understand cost efficiency, and we can build awareness on how to optimize those original costs that we detected in the visibility phase.

It’s really common that teams start working with us and have these really low cost efficiency numbers. But we can increase that significantly. And in many cases that’s led to millions if not – there’s several customers we have that have saved tens of millions of dollars in savings by completing a robust optimization.

In the last phase, we build robust solutions on top of cost monitoring. That includes things like policies or monitoring and alerting and different teams can have different dollar thresholds, for example. And we want to make this data actionable.

We integrate this into other tools like alerting tools, reporting tools, deployment tools. This is one example of how we work with the Mirantis team especially on Lens. Some of this may just be setting thresholds, or it may actually right size workloads dynamically, which is super cool. And it’s really interesting to move resource workloads based on that cost.

That’s it for the overview. I think we’re in a great position to get deeper into the application and into the demo, but I felt that this context was important to have.

Demos

Edward: Excellent. Thank you so much, Alex, for the overview on Kubecost.

As mentioned earlier, we are going to be taking a look at three different technologies today. The first one being Mirantis Kubernetes Engine, again, that resilient, robust certified Kubernetes distro that has all the batteries included, of course, swappable as well.

We’ll take a look at Kubecost. Then we’re also going to take a look at one of our open source technologies that many of you may be familiar with, which is Lens, the Kubernetes IDE. Lens has seen significant growth with over 600,000 users and over 200,000 clusters deployed on top of Lens, just in the last 30 days.

We have over 70,000 GitHub stargazers and a wonderful community that’s actively answering and helping other community members answer questions for Kubernetes, Kubecost and so forth.

Mirantis Kubernetes Engine overview

So here, what we can take a look at is the Mirantis Kubernetes Engine dashboard, an overview of our nodes and how they’re performing in the last hour. We can see the max CPU, the max memory, the amount of disk that’s being used. Of course, we can see from our manager node to our worker nodes as well.

We can even configure this to be able to see how it’s been performing in the last 24 hours, which is extremely important. Of course, we have a ton of admin functionality that we can do directly to our Kubernetes cluster through the dashboard itself.

We can leverage access control. We can create orgs, we can create teams, we can create users. Role-based access control as well here can all be created. We can add some YAML. We can assign it to a namespace. We can view our grants and so forth.

Here, we can see all of the images that are running, our nodes, and everything that’s really happening within our Kubernetes cluster. Now, that being said, we do want to download our Kubeconfig file directly from MKE to actually be able to add it to Lens.

I’ve already done this. You’ll generate a new bundle and that’s going give you your Kubeconfig file and so forth.

Lens overview

Let me close this out. I have downloaded it already for demonstration purposes. From here, we’re going to navigate through Lens. Lens is the technology I was just describing. As we can see within Lens, I have various different clusters currently added.

I have two Mirantis Kubernetes Engine clusters currently running. I have this one here, which is actually going to be the cluster where I deploy the Helm chart too.

From there, I’m going navigate to my other cluster where the Helm chart is actually already deployed and we’re going to pull forward directly in that application. The neat thing behind Lens, I talked about this, that is the multi-cluster management.

On the left hand side, you can see, I have various different clusters. I can click into this cluster, immediately get an understanding of how this cluster’s performing from the CPU, the memory, our nodes, of course, an overview of our workloads as well, how all of our resources are aligned all of our pods, our deployments, and so forth.

Lens is an IDE as well. You can make configuration changes directly to your Kubernetes cluster. I do want to mention that we’re always leveraging your role-based access control from your Kubeconfig file.

Just because somebody has access to a Kubeconfig file, they can’t just add it to Lens and begin to have admin functionality like shelling into a pod or making configuration changes, deploying more code or objects on top of it.

That’s a high level overview of Lens. We do tons of demos about Lens as well.

We will share with each and every single one of you where you can actually access the community additions of all three of these technologies. That way, you get value test them prior to any decision making processes you may have.

How to deploy Kubecost to a Kubernetes cluster

From here, what we’re going to do is jump into my terminal session. This is quite easy and straightforward, but we’re going deploy the Kubecost Helm chart directly to my Kubernetes cluster.

I’m going navigate to this awesome tutorial that Kubecost has built for the community, where all you really need to do is download the Helm chart and install it directly to your Kubernetes cluster.

The neat thing behind Lens is it begins to show us all of the pods that are being deployed direct into our Kubernetes cluster. We can see that some are pending our deployments, our DaemonSets and so forth.

We get real-time events associated as well, where we can see exactly what is happening. It says that we’re running out of some time and then we get our error messages here as well. This generally just takes a couple minutes to do, but due to time constraints and demonstrations, I do have this ready and we can navigate back to this later in the demo, but I do have it installed to my Kubernetes cluster already, the Mirantis Kubernetes Engine, where we can go through our cluster.

Again, that overview screen navigating here. Clicking into my pods, I’m going be able to actually get a clear understanding of how all of my pods are currently performing. Here we can see all of our Kubecost pods as well.

We can see our deployments. The neat thing is, is you do get a granular view of how every single object within your clusters performing through Lens, and on the top righthand side, you can make administration changes and configuration changes if you choose to do so. And you see all the details associated as well.

Enough about Lens – let’s port forward directly into this application.

Another neat thing behind Lens is you can port forward directly into your application as well if you have the correct permissions.

We can navigate the services; we can navigate to Kubecost analyzer, and clicking this port here, it’s going to port forward us directly into our application. It’s going to fetch our clusters and we can immediately see how much money we’re spending on this cluster.

This is a Mirantis Kubernetes Engine cluster as mentioned that we leverage from the dashboard. Again, you can get started with any one of these technologies for free. Navigating back here, I’m going to click into here, but at this point, I’m actually going to hand it over to Rob, since he’s the expert here, and he’s going be able to show showcase everything that we’re seeing here, give us clear and brief descriptions of every single item.

How to allocate costs for on-prem Kubernetes clusters

Alex: While you guys make the switch on the presenter side, I did have one question sent to me directly, and it’s about Kubecost. The question is on-premise customers have different costs for their different types of hardware. So how can users customize those costs and account for those within Kubecosts?

Rob, this is usually something you cover, so let’s just make sure to emphasize these points. The short answer to this question is users can do a one-time upload, or actually they can do it as often as they want to make the changes, but you can upload a custom price sheet for the different costs of the individual components in your on-premise environment.

That is a use case that users are using today in production, and we see a lot of. So fully supported. Let us know if there’s any further questions on that, but Rob will also touch on that in the demo.

Rob: Great question to start with, and I’ll just reiterate that please ask questions throughout this. Alex is going to be monitoring the chat, so I want this to be as collaborative as possible. A real quick and dirty on that question was two different options there:

For on-prem clusters, you can set a global price per resource or global price per CPU, per gig of RAM, etc. or you can get super granular and provide us a CSV file. We can key off of like instance type and you can provide different CPU prices, different memory prices, different storage prices, GPU, etc. if you have a really sophisticated cost model for those on-prem clusters. Fantastic. question.

How to allocate costs for a Kubernetes cluster

We’re going to start here on this Cost Allocation tab. The Cost Allocation tab is the first step in that three-step journey with Kubecost. All we’re really trying to do here is we’re trying to basically represent what have you spent in this cluster.

You’ll notice that I said in this cluster. I’m going start this tour in the context of just a single cluster, and then we’re going to graduate to how Kubecost behaves in the world of multiple clusters. Let’s say we’ve got multiple MKE clusters, maybe prod, non-prod. How does it behave in that manner?

We’ll gradually build so we can get there. On this tab here, I just have a basic kind of week to date time range. You can go as far back as you like. I’m aggregating by namespace, which is our default aggregation, but just know we have a ton of different options here for you to aggregate and delineate your workloads running in your cluster.

We support custom tags, custom labels, annotations built in. Namespaces, that’s what we’re seeing most people use, but you can get all the way down to the pod level.

In this view here, if we look at this kind of chart, all I’m showing is what, in the week to date time period, each of these bars represents a specific namespace that was running in this cluster. You can see the metric that I’m graphing, let’s say kube-system. We’re often familiar with kube-system, it has a total cost of about $64 over this week to date time period.

Okay, well, that’s cool, but what the heck do we actually mean when we say total cost? If we just scroll down here a little bit, in this data table, you’re going to see all the different metrics that are accumulated to come up with that total cost.

In these rows here, each row represents a namespace. I’ve got my own Kubecost namespace running kube-system, which we just saw the bar for. Total cost is really the sum of three different types of cost.

Types of Kubernetes costs: In-cluster, out-of-cluster, and shared resources

The first type of cost are represented by these first five columns here. It’s what I consider in-cluster resources: Compute, GPU, memory, storage and network egress. We look at those five together and they represent what I call the in-cluster side of spent. We’ll go into the details here shortly about how we come up with the numbers.

The second component of total cost: This is an optional component and it’s all aggregated here out of this external cost. External costs are things that I refer to as out-of-cluster cost. Let’s say you’re running MKE. Maybe you’re running it in Amazon, and you’re also utilizing some S3 storage buckets or some dynamic DB instances or Lambda functions. You can optionally pull in those resources and associate them to a namespace or a custom tag or a controller, etc. It’s a nice way to paint what I consider a comprehensive view of spend, both in-cluster and then out-of-cluster.

The third type of cost again for that total cost is what we consider shared resources. Think about in your clusters, perhaps you have a monitoring service running, or maybe you have a login service or a security service.

In reality, no one really owns that. You don’t have a product or an application or a team that owns that. Perhaps that’s owned by the folks that manage the clusters, maybe SREs or DevOps engineers, platform engineers, etc.

You want to take the cost of running those resources and you actually want to distribute them to all of your tenants that are running in your cluster. We have that option as well. And we could go into those details a little bit later on.

So just a quick summary. Total cost is the sum of three different types of cost: in-cluster, out-of-cluster, and shared. All together – that’s the fully loaded kind of solution of Kubecost. Let’s rewind and go to that first type of cost because there’s plenty to unpack just with that first type alone which is again, the in-cluster cost.

What I’m going to do here is I’m looking at the Kubecost namespace. I see how much time they spent for compute, for GPU, for RAM, etc. I’m just going to click on this row. All I’m doing right now is I’m drilling into that Kubecost namespace.

You’ll notice that my aggregation has changed to controller. All I’m doing now is I’m looking at all the controllers that reside within that namespace. Then I can click on the controller and now I’m getting even more granular; I’m looking at all the pods that reside within that cost analyzer and deployment.

When I click on this specific pod, now we’re really getting down into the weeds of how we actually come up with these numbers. I think this is really important, especially when dealing with cost related information. It’s always important to be able to go back and basically describe how the numbers are derived.

Here you can see, I have every container that belongs in that pod. And then I can expand each of these rows. What we see is the total cost of the container decomposed into three different metrics.

Metrics for calculating container costs: Time, amount of resource, and price

I’m just going to walk you through them briefly. The first component is time. This is the hours number here. How long was this container up and running? For us, this comes directly from the Kubernetes scheduling API.

We query that by default once per minute. Think of that as analogous to a scrape interval and something like Prometheus. That tells us, basically how long was it up and running?

The second component then for each resource: How much compute was this container “responsible for”, or on the hook for? In Kubernetes, this could be a little bit complicated.

We actually take the maximum of two different values. We take the maximum of how much the container requested versus how much it used. I’m sure a lot of you on the call here are familiar with teams perhaps over over-allocating for their containers who say, “Hey, I need four cores for this container to run.”

In reality, when you look at it, it’s actually only using one core. Even though it’s only using one, it’s requesting four. We still think it’s still the right thing to charge them for the four, because the scheduler has reserved those four cores for that container.

Then the third component is the price. We know how long the container was running. We know how much per resource it’s “on the hook” for. The third component then is what is the price per resource?

We alluded to this a little bit in Alex’s question when we first kicked off the demo, but we have two different ways to come up with pricing. If you’re on public cloud, like let’s say AWS or GCP or Microsoft Azure, we integrate directly with their billing API.

In the case of AWS, think about like integrating with the cost and usage report. That would help us reflect any type of enterprise discounts you qualify for. If you’re using spot nodes, it accounts for that. It accounts for ROIs, all that kind of fun stuff.

In the case of on-prem, which was asked about previously, you can specify what your price sheet looks like. You can also have a hybrid. You can have a nice mix. You could have some on on-prem clusters and you can have some public cloud clusters. They both can have their own kind of pricing models.

We take those three numbers together: the time, the amount of the resource, and the price per resource. Multiply them together, and that’s how we come up with the cost. You can see we’re really super granular here in terms of how we’re able to come up with what these numbers are.

One other thing I’ll mention here. You see this button here, opening Grafana. I do just want to let you know that we do shift with our own Grafana dashboard. Teams really find that beneficial. I think one thing that’s really interesting is: How do you get developers and technical people to care about cost information?

As a former engineer, I always cared about making sure my applications are up and running and always up and running, regardless of what the usage pattern looked like, which is obviously mission critical, but at the same time now cost is becoming more and more prevalent.

We’ve made a conscious effort to surface this type of data in areas that developers and engineers see as part of their day to day workflow: things like Grafana, Slack, kubectl. We have a kubectl plugin.

I’m just trying to make it so that you don’t have to switch over to some business intelligence tool so that an engineer can consume this information. That’s a very a big part of our tool.

We’re down into the weeds now. I’m just going to close this out and go back where we were previously, which was that report just viewing by namespace. We really dug down at just that one specific namespace.

How to allocate costs for multiple Kubernetes clusters

The next step I want to talk about is how does this behave in the world of multiple clusters? Because, again, all we’re looking at right now is just a single cluster. Let’s say it’s a broadcluster, for example.

WI can change my aggregation here to sort of aggregate by namespace; I can aggregate by cluster. So now, and this is my demo environment, which has two clusters reporting into it. We could talk about the architecture as well behind the scenes here, but what’s really beneficial here is now I’m able to say is that I have overall cost per cluster.

That’s pretty interesting, but we can make this a little bit even more interesting. I could take advantage of some of these filters that we have. Let’s say I still care about kube-system, because I have it in here.

Let’s say kube-system represented some cost center or some business unit that your finance team cares about, or perhaps it represents a product. You can have comma-separated values here. We support wild cards as well.

What we’re trying to do is paint a single view that says, how much does this namespace – in this case, it’s the kube-system namespace – cost across all clusters that it’s running on? It could be as simple as prod, non-prod clusters, but it could be as wild as multiple regions. It could be as well as having, like I mentioned, on-prem clusters plus public cloud clusters.

Some folks have run multiple public cloud clusters these days, multiple cloud providers. You can have everything aggregating together. It’s a hard view to produce. But once you kind of get this nice single paned glass, it’s pretty powerful.

One other thing I’ll mention is: Because this is a multi-cluster setup, you’re basically viewing how much each of these namespaces cost across all the clusters they’re running on. And that’s a really important piece.

Now, if you’re only concerned about specific clusters, that’s fine. You can use our filter here to drill down into specific clusters. You can also just view this. There’s a single UI per cluster as well, but it’s a nice way just to be able to get an understanding of all of the Kubernetes clusters that are running and all the pieces of the products and application and teams trying to really create that single aggregate view.

One other area that I’ll mention is that we have some capabilities where you can kind of build a report and save it.

This is nice because maybe you work with someone who’s not as familiar with Kubernetes nomenclature. You could build a report for them. They just know they come in here, they view this single report. It has a specific URL for it.

They can go into that report and they don’t have to worry about setting the filters and doing their aggregation by practice like a finance person or a business person. You build it for them. Then it’s like, okay, you want to know what the cost per namespace is for the month?

Here’s the report you go to from here on out. I no longer need to generate some CSVs for you on the fly. You’re able to just be self-sufficient to view this.

Another thing I’ll mention is that you can download the data to a CSV file. We also have a full API for everything you see in our UI. Teams will oftentimes, if this information is interesting for a finance team or a business team, perhaps they’re using a tool like Tableau or Looker or Power BI, they can use our API and ingest the information and still allow the finance user to use their existing tool while the engineers and the developers can use the Kubecost UI and our command line utilities and things like that.

Efficiency metrics and recommendations

One of the last area I want to mention – I know this is an abbreviated demonstration of Kubecost, but I think it’s definitely worth mentioning: our efficiency metrics.

Everything we’ve reported on so far, it’s in the past tense. It’s what you have spent. What the efficiency column is basically doing is saying is: How efficient was that spend? The lower the number here, the lower the percentage, the more we think there’s opportunity for savings.

I’m just going to hop into our savings section – I think this is the easiest way to view this. If I load the savings sections, we have about 15 to 20 different reports that can we service here based on what we’re observing across your clusters.

But if we look at this one here, Right Size Your Container Requests. What this is able to show you is what is the container using in terms of CPU and using in terms of RAM versus how much is it requesting?

And what we then offer up as a recommendation in terms of what we’re sort of seeing as your usage pattern, but even if we just table the recommendation for a second, and just think about the difference between usage and requests, you can see a lot of opportunity.

And this is a demo environment, so you’re not going to see significant savings here, but you will, this is really difficult data to kind of aggregate together. And the last column here then will, because we have your price information because you provided it for us, we’re able to estimate, if you took our recommendation for, let’s say, 2.1MB of CPU, but you’re requesting 50 MB.

So we would say, why don’t you change that to 10MB per request? We’d estimate that’s a savings of like $4 a month. Now in a demo environment, $4 a month, probably not worth to put that effort in there, but it is not uncommon for us to see users when they first put Kubecost to a cluster to see the top 10 rows here being in the multiple thousands of dollars a month.

Because it’s really hard to manage this. And it’s hard to aggregate this data together. Sometimes it’s kind of the wild, wild west – people aren’t really monitoring this and maybe they’re not even using request size as well.

This is a great first stop if you’re in that boat where you don’t have request sizes today, you’re not enforcing it, and you want to give teams a nice baseline of where to start. This is a great report that can help them understand where our starting point should be.

A couple other things I’ll mention is we have multiple windows here. The usage pattern comes from the window that you set. We have a one day, a seven day, a 30-day window, also the profile. If you tell us: Hey, this is actually a dev profile so I’m going to change this to a dev profile that changes the target resource util that we look for.

That just impacts our recommendations a little bit where prod has about a 65 percent target resource utilization while dev we’re a little bit more aggressive there, so we’d go to like about an 80 percent resource util. Of course, all the filters that we had before on the previous report are available here for kind of customizing these few.

Governance and alerting features of Kubecost

The last piece I’ll briefly mention is the governance and alerting side of this. What’s really interesting here is being able to be a little bit more proactive. Typically what happens is a bill comes in to a finance person or a budget owner.

They’re like holy cow, why did our cost go up by 5X over the month of July? And then eventually it trickles down to us engineers who are managing the clusters. We have to go figure out what the heck happened. And by the time it all gets fixed, it’s like three months after the bill finally came in.

What we’re able to do is we’re able to set up alerts based on budgets, based on efficiency. An example would be if you’re like, hey, I just I want a blanket rule. If I ever have a workload running less than, or equal to five percent efficiency or 15 percent efficiency, just send a proactive alert, maybe it’s in Slack.

I don’t need to wake out of bed for it, but I just want to be aware. I’m not fighting some fire two months from now when everybody’s like, holy cow, we’re wasting all this money. Treating cost-related alerts and metrics as kind of a first or second-class citizen is a big part of this as well. It can help teams get ahead of this to really reduce the amount of waste that perhaps they have.

I’ll just stop there. There’s a lot more to cover, but I want to be mindful of time and any kind of questions than anybody has. Anything there in the chat, Alex?

Q&A

Edward: It looks like we did get a question in the questions section. It says: Can you please explain how to allocate costs for a shared resource within three different LOBs? I’m not necessarily sure what that is. Consider they are not equally consuming that resource.

Rob: Great question. LOBs in my world, that’s a line of business. With that, the way to do that is a great question. Thanks for the question. We have two different ways that you can distribute.

I would think about that: Like how do you distribute that shared resource or the cost of that shared resource? If you come into this view here, you can see that I have two different ways to share it. One is share evenly.

Let’s say you’ve got three tenants in that cluster, which I think your question alluded to, and I’ve got a shared resource, let’s say it’s a monitoring service, and it costs me $100 a month. If I share that evenly, everybody’s going to get $33.33.

But the more interesting way of doing it – this is more what we’re seeing as a common practice in DINOPs – is let’s share it weighted by cost. What that would do is say, okay, well, let’s look at those three tenants in your cluster.

Let’s look at how much resources they’re using within the cluster. That would be specifically compute and memory we would look at. And if I’ve got one of those tenants, let’s say they’re using 75 of the compute and memory and the cluster, we’re going to assume that they’re responsible for that same percentage of the shared resource.

We consider that kind of more of a proportional distribution and that distribution is using the in-cluster usage as its ratio to try to match. That’s kind of why you see everybody here getting like a different amount of shared resources.

It’s a really hard thing. A lot of teams today, even before Kubernetes, they’re kind of just taking a single price and dividing it by the total number of users or tenants. This is what we’re seeing, a more “fair way” of distributing those costs.

Alex: Rob looks like we have one more here. The question is: Can Kubecost be integrated with GitOps tools to automatically size and scale your cluster in nodes?

Rob: Fantastic question. When I hear that question, the first place I want to gravitate to is exactly where we were before, which is that right sizing container request. What teams do today is they take advantage of our API.

I think we have a YouTube video on this with I believe Spinnaker if I’m not mistaken. Every time you deploy this container, let’s make a call to the Kubecost API.

Let’s see what Kubecost is recommending we size that container for in terms of computed memory, and then maybe we run it against the policy. Maybe we don’t take Kubecost the final say, maybe we want to pad it.

There’s a whole bunch of flexibility you can do there with your policies and then take it and then dynamically write it to that container config. So definitely today, the solution is to use our API. We have some plans as Alex mentioned earlier in the presentation to maybe automate that even more and make it a little bit even easier for you.

Alex: We actually had a name for this one. We called this Continuous Cost Efficiency where you identify the cost efficiency within Kubecost. Then by using the GitOps tools or DevOps tools, you can adjust the deployment and resources accordingly, using those inputs from Kubecost and then collect new metrics and repeat, and you create this flywheel that’s really cool. This is one super exciting use case of how customers have leveraged this data.

Rob: One other thing I’ll mention, just because it might not be clear, is the architecture of Kubecost. I know there’s a lot of technical folks on the calls that might be interested.

Kubecost comes bundled with a bunch of things that you’re already familiar with like Prometheus, Kube-State-Metrics, and cAdvisor. We can take advantage of existing implementation of these within your clusters. We can talk about that if you have specific questions there.

But we are a totally self-hosted solution. The data does not beacon out to us. We’re not storing your data. It’s always residing there within your clusters. We do that for multi-cluster view, use something either like Thanos or Cortex, VictoriaMetrics, things like that, where we could kind of aggregate data for all of the individual clusters from the Prometheus within those clusters and write to an S3 compatible storage bucket to provide both longer term storage, more reliable as well as that nice, aggregated view.

The question comes up quite often when people are thinking about implementing Kubecosts is: Are are you guys storing all my data? But the answer to that is, No. But we do have a SaaS solution for that in beta right now.

If you are interested in like, “Hey, I don’t want to manage this data. I’m comfortable with you guys managing it.” That is something that is in beta today.

Edward: I just want to summarize what we went over today. We took a little look at Mirantis Kubernetes Engine from a UI perspective, as well as being able to add that cluster directly to Lens, the Kubernetes platform.

We chatted a little bit about Lens as well. It’s a multi-cluster management tool that works with any certified Kubernetes distro giving you that full situational awareness of everything that’s happening within your cluster.

We were able to deploy Kubecost directly to MKE via the Helm chart that Kubecost provides. We’re all leveraging community additions here. From there, we transitioned and gave the ball over to Rob where Rob gave us a deep understanding of many things that Kubecost can do. I know he wasn’t able to get to all of them due to timing, but it was a fantastic session.

Thanks for reading! If you’d like to view the full recording, you can access it here.