< BLOG HOME

Scaling AI for Everyone: Why llm-d Marks a Turning Point for Open Source AI Infrastructure

Kubernetes logo along with llm-d logo

Today’s announcement from Red Hat about the launch of llm-d, supported by partners like CoreWeave, Google Cloud, IBM Research, and NVIDIA, is more than just another entry in the open source AI space. It marks a real shift in how organizations will run AI inference at scale, and it signals growing momentum behind open, community-driven approaches to solving tough infrastructure problems.

Inference Is Where the Action Is

The AI conversation has mostly revolved around training massive models, but in the real world, value comes from putting those models to work. Inference is where models meet users, where predictions happen, and where performance and efficiency matter most. According to Gartner, by 2028 over 80% of data center workload accelerators will be dedicated to inference — not training.

That tracks with what we’re seeing. Foundation models are becoming more powerful, but also more demanding. The challenge is no longer just about building capable models, it’s about running them sustainably. Costs and latency are becoming real blockers for adoption, especially when inference workloads scale.

Open Source Levels the Playing Field

We’ve always believed that open source is one of the best ways to tackle shared infrastructure challenges. llm-d fits into that tradition. Like Linux, and like Kubernetes, it offers a common foundation that everyone can build on. And it’s laser-focused on some of the biggest barriers to inference at scale:

  • Breaking past the limits of single-server deployment

  • Leveraging Kubernetes to run anywhere

  • Separating compute phases (prefill and decode) for better efficiency

  • Offloading KV cache to reduce GPU pressure

  • Using smarter routing to cut latency and improve performance

These are not academic problems. They’re the real-world issues that teams face when moving inference from the lab to production. llm-d aims to standardize solutions to those problems in a way that’s open, extensible, and broadly useful.

What's Next

We see llm-d as the beginning of a new era in AI infrastructure. Here’s what we expect to see:

  • Standardized inference stacks built on open projects like llm-d, just as Kubernetes became the default for container orchestration

  • Abstraction of hardware differences, making AI workloads more portable across accelerators and clouds

  • A shift in focus to operational efficiency, where the economics of inference become the deciding factor

  • New platforms optimized for inference, tightly integrating orchestration, observability, and cost controls

  • Better integration with existing cloud-native tools, making AI infrastructure feel like infrastructure, not research

Mirantis' Role in the Ecosystem

We’re excited about llm-d because it aligns with how we think AI infrastructure should evolve - open, portable, efficient, and built around Kubernetes-native patterns.

For our part in the broader open source effort to build scalable, flexible AI infrastructure, earlier this year we contributedk0rdent, a Kubernetes-native framework for platform engineering, designed to help platform teams build and operate composable, production-grade environments for modern AI inference workloads. In our view, k0rdent and llm-d are complementary efforts that together help bring scalable, high-performance AI within reach for a wider range of organizations. In the meantime, we delivered our turnkey k0rdent AI Inference solution leveraging our partnership with Gcore and have already started to solve real world problems at scale for customers like Nebul.

Hats Off to the Founding Team

We want to recognize the work that Red Hat and its partners have put into launching llm-d. It’s rare to see this kind of cross-industry alignment on a new open source effort, and the momentum is real. With contributions and support from academic teams at UC Berkeley (behind vLLM) and the University of Chicago (LMCache), plus companies like AMD, Cisco, Hugging Face, Intel, Lambda, and Mistral AI, this project starts on strong footing.

Looking Forward

Like any major open source initiative, the success of llm-d will depend on a few key ingredients:

  • Transparent, inclusive governance

  • High-quality documentation and learning resources

  • Solid reference implementations

  • Clear, trusted performance benchmarks

  • Integration with modern DevOps workflows

At Mirantis, we’re eager to contribute to this growing community. We believe the future of AI belongs to everyone — and it’ll take shared effort to get there.

We’re just getting started on the road to scalable inference. But with projects like llm-d helping lay the foundation, that road is starting to look a lot more open.


Randy Bias is Vice President, Open Source Strategy and Technology at Mirantis, within the Office of the CTO, and directs the Mirantis Open Source Program Office (OSPO). He is a long-time contributor to open source software and governance, including roles within major projects like TungstenFabric and OVN. Prior to Mirantis, he founded CloudScaling, a platform provider to cloud builders that leveraged OpenStack.

Mirantis simplifies Kubernetes.

From the world’s most popular Kubernetes IDE to fully managed services and training, we can help you at every step of your K8s journey.

Connect with a Mirantis expert to learn how we can help you.

CONTACT US
k8s-callout-bg.png