< BLOG HOME

Mirantis k0rdent and NVIDIA BlueField: A Practical Blueprint for Building Next-Generation AI Infrastructure

A pixelated cloud

AI Changes the Cloud Pattern 

AI workloads are dramatically changing how we think about deployment patterns in cloud-native systems. Traditionally, the industry has worked to separate workloads from infrastructure—abstracting applications away from hardware to support portability. These patterns emerged with the rise of cloud models from hyperscalers, designed to meet a wide range of needs and offer broad, flexible services to everyone. AI workloads and accelerated computing break these patterns. They often rely on very specific physical features of the infrastructure—not just access to GPUs, but also the complex systems that support them, such as accelerated networking (like InfiniBand, NVLink, and RoCE) and high-speed, high-throughput storage.

These specialized resources represent a significant investment, making it essential to maximize return through efficient utilization and workload optimization.  To achieve this, infrastructure must be defined and provisioned dynamically at every layer, while still maintaining strong security and isolation. In short, we need to deliver true, declarative Infrastructure as a Service (IaaS).

Challenges 

Delivering infrastructure for high-performance computing brings unique challenges. These include allocating and slicing GPUs, enabling RDMA networking, meeting advanced scheduling needs, tuning for performance, and efficiently scaling platforms like Kubernetes. The complexity increases when solutions must support multiple, dynamic workloads shared across different users.

This becomes even harder in environments that require multi-tenancy—such as cloud service providers or developer teams—where users expect the dynamic flexibility of cloud services. As a result, the challenges and complexity only grow.

AI factories must address these challenges across both workload and infrastructure configuration, including:

  • Time to Value: AI infrastructure often needs specialized fine-tuning and setup, taking more time to configure and learn than traditional systems.

  • Multi-tenancy: Supporting multi-tenancy is essential for data security, resource sharing, and managing contention.

  • Data Sovereignty: AI workloads are driven by confidential data and often contain proprietary models and weights. Controlling where and how this data is used is critical.

  • Scale and Sprawl: The infrastructure used for AI is typically either comprised of a large number of compute systems or highly distributed for edge and IoT-type workloads. 

  • Resource Scarcity: GPUs and other key compute resources are scarce, so they must be shared and used wherever they’re available.

  • Skills Gap: Many AI projects are run by data scientists or developers who are not infrastructure specialists and don’t want to be.

Cloud and GPU as a Service Providers 

Cloud and GPU as a Service (CSP and GPUaaS) providers also have a unique set of challenges they must overcome to provide multi-tenant services. The requirements for these environments include:

  • Compliance with regional and use-case-specific model and data standards​

  • Safeguarding data privacy, business, and application security​

  • Maximizing resource utilization and efficiency

    • Optimization of GPU selection and partitioning

    • Reducing network latency and improving performance

    • Optimizing storage for performance and cost

  • Monitoring for service continuity

    • Tracking CPU and GPU health ​and utilization

    • Detecting failure events in networking, storage, and support infrastructure

    • Benchmarking model and application performance and latency

  • Rapidly iterating and deploying models​ and applications

  • Tracking and managing costs effectively ​

  • Managing all of the above swiftly and without service disruption​

  • Ensuring strict tenant isolation at all times

Infrastructure as a Service 

One clear answer to these challenges is to build a comprehensive Infrastructure as a Service (IaaS) solution that lets you define everything you need—declaratively—from bare metal up through the software stack.

To make IaaS work for AI, you need a set of foundational layers that can be combined into flexible, repeatable patterns for service delivery to end users. The key is using templates, so infrastructure can be deployed consistently and configuration drift is prevented.

Conceptual Overview Diagram - Simplified for readability (click to view larger)


With this templating model, you can define services that cover everything from the infrastructure layer up to virtual machines and Kubernetes clusters—making sure the applications get exactly what they need.

The overall approach allows for the automation of infrastructure provisioning as a service, and is made up of a number of layers that each combine to provide the essential services to run your applications:

  • Management and Observability: The overall control plane for the infrastructure, using a declarative, template-based model.

  • Infrastructure Layer: The provisioning and configuration of compute hosts and associated network infrastructure.

  • Operating System Layer: Provisioning of the operating system on bare metal hosts, as well as the guest operating systems in the virtualization platform (OpenStack or KubeVirt)

  • Platform Layer: Automation of Kubernetes as a Service, Virtualization, and dedicated bare metal hosts

  • Application Platform Layer: Automation of PaaS solutions to offer specialized services like Inference (e.g.NVIDIA Dynamo,Run.ai,llm-d)

Multi-Tenanted Kubernetes as a Service with Strong Isolation

There are many patterns that the templating approach can provide, but we will focus on how a service provider or enterprise can create a multi-tenant Kubernetes as a Service (KaaS) offering, with strict isolation at all layers of the stack. This provides support for complex networking and isolation, while still providing tenants full access to their K8s cluster and allowing them to choose their own CNI at the cluster level.

 The outcome here is a Kubernetes cluster with the essential services pre-deployed and configured, allowing for the immediate deployment of Kubernetes workloads. The clusters are delivered with the following characteristics:

  • Strict separation between Kubernetes workers and the control plane.

  • Fully hosted and managed Kubernetes control plane

  • Strict network isolation at the network interface level (tenants only have access to the interfaces assigned to them, and those interfaces only have access to the VLANs assigned to the tenant) 

  • Isolated immutable worker operating system (even if a tenant compromises the host they cannot get to the network or the hypervisor host)

Environment Overview 

We use k0rdent to deliver a multi-tiered service that provides on-demand KaaS clusters with complete tenant isolation. Achieving secure multi-tenancy at every layer of the stack requires the following setup: 

  1. Bare metal compute hosts with NVIDIA BlueField-3 DPUs

  2. Bare metal Kubernetes cluster deployed on the hosts

  3. Separate Kubernetes cluster deployed on BlueField-3 DPUs (see below)

  4. Virtualization layer orchestrated by KubeVirt

  5. Isolated tenant clusters with k8s workers as VMs and hosted control plane

Simplified View

To help with some of the terminology used here:

  • Mothership Cluster: Kubernetes cluster that hosts the management and observability components and is the overall control plane. 

  • Child Cluster: Kubernetes clusters that are deployed and managed by k0rdent from the Mothership.

    • NOTE: these are defined as Cluster Deployment objects 

  • Cluster Templates: These are templates that define the setting and configuration of a type of cluster deployment, typically scoped to a particular type of infrastructure provider

  • Service Templates: These are templates that define a Kubernetes application to be run on a cluster. They can include complex configuration and can leverage information from the deployed clusters

  • DPU Cluster: Kubernetes cluster where the workers are running on the BlueField-3 DPU ARM Cores

 

BlueField-3 Setup

The BlueField-3’s are configured in DPU mode and ZeroTrust mode for host isolation, and then provisioned using k0rdent. The physical connection setup of the cards can be specified depending on the number of available BlueField cards in a host and the environment's needs, supporting both Ethernet and Infiniband on a single host. For the setup here, both ports are connected to an ethernet switch.

The BlueField-3 DPUs are initialized with a custom image by the k0rdent bare metal operator, based on the specific machine template defined for them. 

When the cards are initialized they are configured as follows:

  • ZeroTrust is enabled through redfish ( '{"Oem": {"Nvidia": {"HostPrivilege": "Restricted"}}}' )

  • VirtIO emulation is enabled along with PCI Switch emulation and the Host PCIe RShim Interface is disabled

Example Code Block:

  • mlxconfig -d set \

      VIRTIO_NET_EMULATION_ENABLE=1 \

      VIRTIO_NET_EMULATION_NUM_PF=10 \

      VIRTIO_NET_EMULATION_NUM_VF=10 \

      PCI_SWITCH_EMULATION_ENABLE=1 \

      PCI_SWITCH_EMULATION_NUM_PORT=10 \ 

      RSHIM_ENABLE=0

k0rdent is used to perform the following actions based on a cluster definition template:

  • k0rdent Bare Metal Operator is used to register and provision the DPU with a custom image that includes the k0s (CNCF Sandbox Kubernetes Distribution) worker

  • The k8s cluster provisioned on the DPU using a hosted control plane template, leverages a secure control plane provided by k0smotron, ensuring strict worker and control plane isolation

  • The NVIDIA DOCA Platform Framework (DPF) components are deployed onto the DPU cluster using a k0rdent service template

k0smotron Overview

Hypervisor and Bare Metal Host Provisioning

The hypervisor hosts are Kubernetes workers provisioned on bare metal hosts, using the k0rdent bare metal provider and a k0smtotron-hosted control plane. (This same pattern could be provided to a tenant as a dedicated Kubernetes cluster with isolated networking.) The Kubernetes cluster then has KubeVirt deployed on it to provide the orchestration layer for virtual machines.

Deployment Steps:

  1. Network creation: k0rdent networking (network object is created for a tenant) defines the tenant network name, IP addresses (IPAM Support is included), and VXLAN ID

  2. Management interfaces: K0rdent Network Operator assigns a port for the k8s cluster management ports on the hosts and configures OVS (In a typical VM deployment, this step would not be necessary)

  3. Node discovery:  k0rdent bare metal (leverages Ironic and Metal3) interrogates the node and captures details about the IO Systems and node configuration(this includes discovery of the management network port)

  4. Node pool: Node is assigned to a node pool ready to be used for a deployment

  5. Cluster deployment: k0rdent Cluster Deployment object  is created referencing the node pool and the network to be used

  6. Node provisioning: The operating system and Kubernetes are deployed by k0rdent (leverages CAPI and Metal3 Provider)

  7. GPU Operator: The NVIDIA GPU Operator is deployed onto the cluster by a service template that is defined in the Cluster Deployment object. (Configuration options apply here depending on the setup)

  8. KubeVirt deployment: KubeVirt is deployed to the cluster by a service template that is defined in the Cluster Deployment object

Bare Metal Host

The bare metal host templates include setting specific environmental settings through cloud-init to ensure the host configuration is properly prepared for virtualization. These include setting the following (these assume an Intel-based host):

  • Enable IOMMU to support greater isolation on the host

    • intel_iommu=on iommu=pt

  • Enable PCI Reallocation to support reusing PCI addresses

    • pci=realloc

  • Disable the nouveau module and deploy the Nvidia GPU Drivers

    • echo "blacklist nouveau" | sudo tee /etc/modprobe.d/blacklist-nouveau.conf

    • echo "options nouveau modeset=0" | sudo tee -a /etc/modprobe.d/blacklist-nouveau.conf

GPU Operator

When deploying the hypervisor, cluster choices need to be made on whether vm-passthrough (VFIO) or vm-vpgu (vGPU) is going to be used and set in the service template for GPU Operator (While it is possible to deploy a cluster with both, you will need to set this at a host level then).

The choice here is set in the Cluster Deployment object. This will ensure that the configuration for the deployment is correct and the node labels are set appropriately: e.g. for VGPU.

Advanced networking - Hypervisor Cluster

At this point, we now have a fully managed Kubernetes cluster on bare metal with strict isolation running KubeVirt, and KVM ready for the deployment of tenant child clusters. For clarity, we will refer to this as the hypervisor cluster (it is actually a child cluster). The boxes in boxes, or turtles all the way down memes apply here.

This environment has been configured to support passthrough for the GPU and can support SR-IOV for the network should a workload require it.

Child Clusters 

The deployment of child clusters is very similar to that of the bare metal cluster, only instead of the bare metal operator, we are now using the KubeVirt provider to act as the provisioner. The cluster is built declaratively by defining a Cluster Deployment object that leverages the KubeVirt cluster templates to deploy a cluster.

Advanced networking - Child Clusters

Cluster Templates

The child cluster templates used will be dependent on the GPU provisioning strategy selected earlier. Again, a unique Cluster Deployment object is created for the cluster, which will then leverage the k0rdent KubeVirt provider and network provider to provision the cluster.

The steps that the system takes are as follows:

  1. The Network Operator creates the network based on the IPAM rules (or user assigned)

  2. Network Operator will then configure the OvS network flows via the ML2 Plugin Mechanism, setting up the flows to link the Virtio Port and tag it on the appropriate Network

  3. k0rdent then deploys the cluster by creating the VMs and attaching them to the assigned ports

Conclusion

As AI workloads continue to redefine infrastructure requirements, it’s clear that the traditional abstractions of cloud-native architectures no longer suffice. These workloads demand precision through specialized hardware, tightly integrated networking, secure multi-tenancy, and real-time provisioning, all orchestrated efficiently and at scale.

By combining the declarative, Kubernetes-native capabilities of Mirantis k0rdent with the high-performance data processing and isolation features of NVIDIA BlueField DPUs, organizations can build infrastructure that is not only optimized for AI but also governed, flexible, and cost-effective. This approach enables a new class of Infrastructure-as-a-Service: composable, secure, and tailored to the needs of dynamic, data-driven workloads.

Whether you're a platform team architecting GPUaaS solutions or a service provider scaling out AI offerings, the integration of k0rdent and BlueField offers a practical blueprint for building next-generation infrastructure, one that meets the performance, governance, and agility demands of the AI era.

Shaun O'Meara

Shaun O'Meara is the Chief Technical Officer at Mirantis.

Mirantis simplifies cloud native development.

From the leading container engine for Windows and Linux to fully managed services and training, we can help you at every step of your cloud native journey.

Connect with a Mirantis expert to learn how we can help you.

CONTACT US
cloud-native-callout-bg