Accelerating AI Cloud Maturity with Mirantis k0rdent AI and the NVIDIA AI Cloud Ready Initiative
)
Building a GPU cloud is the easy part. The hard part is making it commercially successful.
Neoclouds and NVIDIA Cloud Partners (NCPs) face a significant challenge: customers expect a hyperscaler-like experience and multi-tenancy from day one. Developing the required IT operational capabilities from scratch is hard; bare metal automation, Kubernetes orchestration, GPU virtualization, observability, and a service catalog can take months of engineering effort before a single dollar of cloud revenue comes in. Most providers end up either under-building (limiting their margins and customer base) or overspending on custom integration work that doesn't differentiate them in the market.
Mirantis addresses this gap as a part of the NVIDIA AI Cloud Ready Initiative, a strategic initiative that enables Independent Software Vendors (ISVs) to build and validate complete, full-stack AI Factory solutions.
Drawing on over a decade of building and operating cloud infrastructure for global telcos and Fortune 500 enterprises, Mirantis developed k0rdent AI as a turnkey Metal-to-Model™ platform. A single, integrated stack that enables Neoclouds and NCPs to better leverage their infrastructure investments and bring increasingly more sophisticated AI-powered offerings to market.
With Mirantis k0rdent AI, Neoclouds and NCPs can go from zero-to-cloud by leveraging NVIDIA certified hardware, enabling them to capitalize on the economic opportunity from day one.
From Zero to Cloud Revenue in Days, Not Months
The AI Cloud Maturity Model gives service providers a stage-by-stage path to progressively improve margins, utilization, and scalability.
Stage 1: Automated AI Services Platform
k0rdent AI gives operators the building blocks to skip months of custom integration work and start generating revenue from their infrastructure investments immediately.
Move beyond commodity Hardware-as-a-Service to deliver Bare Metal-as-a-Service with API-driven lifecycle management, elasticity, and differentiated, value-added AI services such as NVIDIA Run:ai, Slurm, Ray, and inference frameworks. It provides self-service, API-driven bare metal Kubernetes using NVIDIA’s NCX Infra Controller to automate complex operations below the OS and Kubernetes stack, while unlocking higher margins and reducing operational overhead.
Mirantis integrates with NCX Infra Controller's lifecycle automation functionality with k0rdent AI's provisioning and cluster management layers — work that extends NCX Infra Controller's capabilities into a fully orchestrated, production-ready platform rather than just a standalone tool. For more information about the integration of k0rdent AI with NVIDIA NCX Infra Controller, please see our deep dive blog.
Stage 2: Multi-Tenant GPU Cloud
Improve the economics and efficiency of existing hardware with virtualization and hardware-enforced multi-tenancy. Providers can precisely match GPU resources to workload requirements and increase profitability through higher client density and maximizing GPU utilization.
Stage 3: Self-Service AI Cloud
With k0rdent AI, operators can scale revenue faster than OpEx by delivering a turnkey AI cloud with customer self-service and custom marketplace capabilities, thereby building new revenue streams while users deploy and manage GPU resources themselves. The NVIDIA AI Cloud Ready Initiative ensures the end-to-end architecture is tested and validated to support full automation at scale, while Mirantis provides production-proven expertise and optional fully-managed remote operations so Neoclouds and NCPs can focus on growing their business, not managing infrastructure.
A Validated, Full-Stack Integration Architecture
Mirantis k0rdent AI is closely aligned with NVIDIA NCP reference designs and validation frameworks, supporting the full breadth of NVIDIA's hardware portfolio: NVIDIA Ampere, NVIDIA Hopper, and NVIDIA Blackwell GPU architectures; NVIDIA DGX, NVIDIA HGX, and NVIDIA MGX validated servers; NVIDIA Quantum InfiniBand and NVIDIA Spectrum-X Ethernet network switches; and NVIDIA Grace Blackwell NVL72 rack-scale systems. The platform also provides a forward-looking foundation for the upcoming NVIDIA Vera Rubin NVL72 rack-scale system, giving operators a future-proof, single umbrella API across multiple GPU fabric generations.
The k0rdent AI stack covers three integrated layers:
IaaS Layer: NVIDIA NCX Infra Controller delivers zero-touch lifecycle automation from hardware discovery and Bill of Materials (BOM) validation through firmware management and secure tenant transitions. k0rdent AI integrates through open standard interfaces to drive automated Kubernetes cluster provisioning on validated, attested bare metal nodes.
PaaS Layer: k0rdent Cluster Management (KCM) handles the full Kubernetes cluster lifecycle declaratively, while k0rdent State Management (KSM) manages the service catalog – standardizing deployment of Slurm, NVIDIA Run:ai, inference runtimes, Jupyter, and MLflow as operator-exposed catalog items rather than manual per-tenant configurations. Carrier-grade multi-tenancy is enforced across compute, network, and storage via KVM-based virtualization with GPU passthrough, with k0rdent Observability and FinOps (KOF) providing unified monitoring, metering, and cost attribution across all tenants.
AI Application Services: The catalog surfaces as production-ready, tenant-facing AI services: distributed training environments, inference endpoints, and AI Studio workspaces for model development and experimentation. Operators can expose these as self-service offerings. Customers select a service, provision resources, and get a working environment – without manual per-tenant configuration.

A key architectural differentiator: the same API layer manages heterogeneous NVIDIA infrastructure. An operator running GB200 NVL72 systems alongside HGX nodes manages both through a common control plane, eliminating separate management tooling per hardware generation.
A Differentiated, Open, and Extensible Platform
Several technical characteristics set k0rdent AI apart from alternatives in the NVIDIA AI Cloud Ready Initiative ecosystem:
Open source foundation: k0rdent AI is built on and contributes to open source projects, including the k0rdent platform (upstream on GitHub) and the Cluster API ecosystem, reducing vendor lock-in and aligning with NVIDIA's direction of open sourcing NVIDIA Cloud Accelerator software components.
Kubernetes-native architecture: Infrastructure and services are managed as Kubernetes objects, inheriting configuration reconciliation, scalability, and GitOps compatibility. This is architecturally distinct from platforms that use Kubernetes as a thin wrapper over proprietary control planes.
Full-stack coverage: The Metal-to-Model stack covers bare metal provisioning through AI application services in a single integrated platform with a unified API and observability layer, with no need to source and integrate separate solutions for each layer.
Composable service catalog: KSM's template-based catalog is extensible and open, giving operators a library of pre-built, validated service templates, including database-as-a-service, custom inference runtimes, and additional workload schedulers, deployable immediately without modifying the core platform.
Validation Coverage and Results
Mirantis has completed all existing phases of NVIDIA AI Cloud Ready validation testing against both NVIDIA GB200 NVL72 and NVIDIA HGX H100 systems. Validation covered the full operational scope an NCP needs to run production AI workloads: automated bare metal provisioning and node lifecycle management, GPU topology detection and NVIDIA NVLink domain mapping, Kubernetes cluster deployment on attested bare metal nodes, and correct GPU resource exposure to Slurm and NVIDIA Run:ai workload schedulers.
Performance benchmarking confirmed that the k0rdent AI stack introduces no degradation against NVIDIA reference expectations across tensor compute, GPU-to-GPU interconnect bandwidth, and LLM inference throughput. For operators, this means the platform has been proven end-to-end across the workload types that drive NCP revenue (i.e., HPC batch jobs and AI inference) on both current-generation Hopper and next-generation Blackwell hardware.
What's Next
Mirantis will extend validated configurations to NVIDIA GB300 NVL72 and future Vera Rubin platforms. For operators, this means a clear, supported upgrade path as hardware generations evolve, without re-platforming or replacing your management stack.
Upcoming additions include VM-as-a-Service, enabling VMs alongside Kubernetes clusters as first-class tenant resources, plus additional storage integrations including VAST. As the platform grows, so does the ability to offer more services, serve more customers, and grow revenue.
Get Started
Mirantis k0rdent AI is available for evaluation and deployment on NVIDIA infrastructure, with validated support spanning Hopper, Blackwell, and Grace Blackwell architectures, with Vera Rubin support currently in development. Mirantis professional services can design and deploy the right solution for your current stage of AI cloud maturity, with optional managed services to accelerate time to production.
Join us at NVIDIA GTC: Visit Mirantis Booth 102, attend one of our live demos, and register for the AI Executive Salon, an executive-only reception co-hosted by Mirantis, Netris, and Saturn Cloud.
Not attending GTC? Request a demo or speak with a Mirantis solutions architect to find the right starting point for your AI cloud journey.

)
)
)

)
)
