Automating AI Factory Deployments with Mirantis k0rdent AI and NVIDIA Run:ai
)
Many enterprises are discovering that buying AI infrastructure is far easier than deploying or operating it. Mirantis k0rdent AI closes that gap by automating large portions of the AI factory platform, now including the deployment of NVIDIA Run:ai.
Building an AI factory requires complex, layered assembly before running a single workload. Provisioning GPU nodes is only the first step. Teams must then stand up the orchestration layer, configure the operator stack, sequence dependencies correctly, and verify that the entire environment works before any training or inference job can start.
For enterprises building private AI factories, the gap between infrastructure provisioning and a usable AI platform carries real cost; poor coordination of AI jobs can stall even a well-built private cloud stack. The Mirantis k0rdent AI integration with NVIDIA Run:ai automates that gap away — delivering a fully operational AI factory platform in minutes, not weeks, on any supported infrastructure, with an inference control plane that keeps an enterprise’s GPUs busy and optimally distributed.
NVIDIA Run:ai in the AI Factory Stack
NVIDIA Run:ai is a core component of the NVIDIA AI Factory reference architecture. It provides the AI workload and GPU orchestration layer that sits directly above the infrastructure, enabling data scientists, ML engineers, and platform operators to submit training jobs, run inference workloads, and spin up interactive notebooks through a UI, CLI and API. Users get immediate access to GPU resources without needing to understand what's running underneath — no manual cluster configuration, no dependency management, no guesswork about whether the environment is ready. While incredibly useful, any inference control plane can be complex to deploy and operate, especially given the organizational disconnect that can occur between Data and IT teams.
As a validated member of the NVIDIA AI Cloud Ready Initiative, Mirantis built the k0rdent AI platform to automate the infrastructure layers of the AI factory, from bare metal provisioning through platform deployment, so that NVIDIA Run:ai and the workloads it serves can be operational as quickly as possible.
k0rdent AI handles the full stack automatically: ingress and external DNS, cert-manager, the NVIDIA GPU Operator, NVIDIA Network Operator, the Dynamic Resource Allocation (DRA) Operator, Message Passing Interface (MPI) Operator, training operator, and the NVIDIA Run:ai platform templates themselves. k0rdent AI deploys, configures, and sequences each component, eliminating the need for manual intervention.
What would otherwise take experienced engineers days or weeks is completed in minutes. For neoclouds, this enables true on-demand AI factory provisioning: a full NVIDIA Run:ai environment spun up when needed and torn down when not, keeping GPU utilization high and idle time low. For enterprises, it means consistent, repeatable AI factory deployments across teams and environments, without relying on tribal knowledge to get the platform right each time.
For organizations operating in regulated industries or government environments, the integration also supports air-gap deployments — fully isolated environments with no external network dependencies — ensuring the same automated, repeatable deployment experience in network-restricted infrastructure.
Certified for NVIDIA Run:ai Compatibility
The k0rdent AI + NVIDIA Run:ai integration has been tested and validated through NVIDIA Run:ai's official certification program. Mirantis executed the NVIDIA Run:ai test suite, with more than 100 functional tests covering workload submission, scheduling behavior, multi-tenant operations, and platform lifecycle functions, and achieved partner-certified status. Mirantis k0rdent is one of the partner-certified distributions listed in the official NVIDIA Run:ai documentation.
The k0rdent AI integration of NVIDIA Run:ai enables AI factory deployments to take full advantage of the latest rack-scale GPU architecture, with native support for NVIDIA GB200 NVL72 via NVIDIA NCX Infra Controller including automated configuration of NVIDIA IMEx (Internode Memory Exchange) service.
What's Next
The current integration establishes the foundation for full AI factory lifecycle management: upgrades, configuration changes, and ongoing platform operations managed through k0rdent's declarative model, with day-two operations automation on the roadmap.
The AI factory model is rapidly becoming the standard for both enterprise private AI and neocloud GPU services. The k0rdent AI and NVIDIA Run:ai integration enables organizations to accelerate time to production for AI factories, using infrastructure that is automated, repeatable, and ready to scale.
For more information, contact us to speak with one of our solution architects.

)
)
)
)
)
)
