NEW! Dynamic Resource Balancer in Mirantis OpenStack for Kubernetes 24.2   |   Learn More

< BLOG HOME

Mirantis OpenStack for Kubernetes 24.2 unveils Dynamic Resource Balancer for efficient resource management and cost optimization

Michelle Yakura - July 09, 2024
image

One of a VMware admin’s worst nightmares is when production servers max out and shut down, leaving customers hanging. Today, as VMware users seek alternative virtualization solutions following Broadcom’s takeover, many want to know if OpenStack infrastructure-as-a-service can offer resource management capabilities similar to those of the vSphere Distributed Resource Scheduler, which many admins rely on to keep loads distributed across hosts in a balanced way. 

Fortunately, the Mirantis OpenStack for Kubernetes (MOSK) 24.2 release delivers a technical preview of the new Dynamic Resource Balancer (DRB) service, easing the transition to OpenStack for VMware users who utilize automated resource scheduling and load balancing in everyday operations. For advanced OpenStack users, DRB also helps to maintain optimal performance and stability by preventing noisy neighbors and hot spots in an OpenStack cluster, which can cause latency or other issues.

Optimizing placement of virtualized workloads

The Dynamic Resource Balancer is an extensible framework unique to Mirantis OpenStack for Kubernetes that allows cloud operators to always ensure optimal placement of workloads in their cloud environment, without the need for manual recalibration. It continuously collects resource usage metrics for OpenStack nodes every 5 minutes and automatically redistributes workloads from nodes that have surpassed predefined, customizable load limits. DRB is included out of the box with MOSK 24.2 and must be enabled by the cloud operator.

With the addition of DRB, MOSK now offers a comprehensive set of resource scheduling and optimization capabilities, including:

  • Initial workload placement - Recommends where to put a new VM

  • Balancing cluster capacity - Automatically migrates VMs during maintenance without disruption, improving service levels by guaranteeing resources to VMs. Enables each system administrator to monitor and manage more infrastructure.

  • Cluster maintenance - Determines the optimum number of nodes for simultaneous maintenance

  • Constraint correction - Redistributes VMs after hypervisor failure, considering hypervisor affinity

  • Automated load balancing - Auto-migrates VMs to optimize performance 

Dynamic Resource Balancer architecture

The Dynamic Resource Balancer consists of three main components: the collector, which gathers data about server usage; the scheduler, which makes scheduling decisions about workload placement; and the actuator, which live-migrates approved workloads to appropriate node(s). By default, DRB uses the included Mirantis StackLight observability and analytics tooling as the collector.

Mirantis OpenStack for Kubernetes Dynamic Resource Balancer

Cloud operators can limit workload redistribution only for specific availability zones or groups of compute nodes as needed. Additionally, the collector, scheduler, and actuator are Python plugins that cloud operators can customize with the most relevant data sources and decision criteria for their workloads and environments (e.g., to apply power supply metrics, use different observability tooling as the collector, or execute cold migrations). 

MOSK implements DRB as a Kubernetes operator, which is controlled by a DRBConfig custom resource created by the cloud operator. Below is an example of a DRBConfig custom resource, which defines various parameters for workload redistribution, including the maximum number of parallel migrations, which workloads to consider for redistribution, the threshold for identifying overloaded compute hosts, etc.

.. code-block:: yaml

    apiVersion: lcm.mirantis.com/v1alpha1
    kind: DRBConfig
    metadata:
      name: drb-test
      namespace: openstack
    spec:
      actuator:
        max_parallel_migrations: 10
        migration_polling_interval: 5
        migration_timeout: 180
        name: os-live-migration
      collector:
        name: stacklight
      hosts: []
      migrateAny: false
      reconcileInterval: 300
      scheduler:
        load_threshold: 80
        min_improvement: 0
        name: vm-optimize

For a full list of the DRBConfig parameters and their explanations, please see the MOSK 24.2 Reference Architecture

Deciding which workloads to redistribute

So how does the scheduler decide which workloads to live-migrate, and where they should go? By default, the scheduler tries to minimize load imbalances across a cluster by assessing the load of each host in terms of CPU core, so that different types of compute hosts can be compared. It then uses the 5 minute average CPU load and the predefined load_threshold parameter (default >80%) to identify overloaded compute hosts. 

However, it doesn’t always make sense to move workloads off of a compute host just because it’s overloaded. After all, any VM migration inherently costs some resources and involves a level of risk. Thus, DRB tries to predict how beneficial it would be to redistribute workloads from a compute host, and applies the min_improvement threshold to determine if live migration should occur. If DRB determines that sufficient benefit will result, then live migrations of VMs occur in parallel, starting with the least loaded VMs first.

DRB can be set to either of two modes, using the migrateAny parameter. By default, any workload can potentially be migrated, unless explicitly tagged for exclusion. Alternatively, cloud operators can configure DRB to only migrate workloads explicitly tagged for redistribution. Similarly, the DRBConfig can also specify which hypervisors to target. Developers can tag their workloads for DRB inclusion/exclusion using a CLI command. 

As of the MOSK 24.2 release, DRB can make scheduling decisions only for basic resource classes, including RAM, DISK and vCPU. Support for more complex resources, such as NUMA, CPU-pinning, and hugepages dataplane acceleration is planned for a future release.

Existing customers who upgrade to MOSK 24.2 can use the noop plugin to do a dry-run of DRB with their workloads and environments, and see the decision-making process in controller logs.

For a successful workload redistribution, here is a summary of the main steps recorded in the DRB controller log:

  1. Collecting data for all hosts, including compute node and instance metrics

  2. Choosing instance subjects and compute node targets

  3. Identifying overloaded nodes

  4. Calculating the load standard deviation and comparing it with the predicted load standard deviation after redistribution

  5. Deciding that redistribution should occur and estimating the improvement for the overall cluster

  6. Requesting and executing live migration to the target node

  7. Sleeping for 5 minutes to allow the metrics to settle

To learn more about DRB, please refer to the MOSK 24.2 Reference Architecture and User Guide.

See the Dynamic Resource Balancer in action

If you’d like to see how DRB works, sign up for our webinar on Thursday, July 25: MOSK Live Demo: See the New Dynamic Resource Balancer

Contact us to meet with one of our cloud architects and learn how we can help you get started with a PoC of MOSK deployed with the DRB service tuned for your specific workloads.

Mirantis simplifies private cloud.

From Mirantis OpenStack for Kubernetes to fully managed services and training, we can help you at every step of your private cloud journey.

Connect with a Mirantis expert to learn how we can help you.

Contact Us

NEWSLETTER

Cloud Native & Coffee

Subscribe to our bi-weekly newsletter for exclusive interviews, expert commentary, and thought leadership on topics shaping the cloud native world.

JOIN NOW