“Customers want an OpenAI-like service they control,” says Red Hat’s Tushar Katarki.

Product Head Tushar Katarki talks giving enterprises true ownership over data and cost

When enterprises first began experimenting with generative AI, many raced to cloud-based frontier models. It didn’t take long for the reality of cost, privacy and lock-in to set in. For many, the promise of AI independence felt out of reach.

Red Hat thinks it doesn’t have to be. The company that helped make open-source safe for enterprise is now applying that philosophy to AI through llm‑d, an open-source, Kubernetes-native framework for running language models in production.

“Customers want an OpenAI-like service they control,” says Tushar Katarki, Head of Product for GenAI Foundation Model Platforms at Red Hat.

Launched with founding contributors including CoreWeave, Google Cloud, IBM Research and NVIDIA, the llm-d project is designed to make model inferencing faster, less costly and more flexible, especially for organisations deploying on-premises or in private clouds. 

The Enterprise Control Shift

The early wave of AI adopters, especially in banking and financial services, learned fast that renting intelligence via hyperscaler APIs came with trade-offs. “They want to run their own models, either on-prem or in private clouds,” Katarki says.

Many started with frontier commercial-models through APIs and built proofs of concept. But as workloads scaled up, cost-per-token rose, and internal governance and regulatory concerns followed. According to Red Hat, “the true impact of gen AI hinges on more efficient and scalable inference.” 

The “easy way to think about it,” Katarki explains, “is an OpenAI-like service that uses open-source and other models they can control.”

llm-d provides the foundation for that shift. Built on Kubernetes, vLLM and an inference gateway architecture, it allows platform teams to serve multiple models efficiently while maintaining cost and compliance visibility. “We build everything from source, we containerize, we sign, and we validate,” Katarki says.

Red Hat’s strategy places it between hyperscalers and open-source communities: offering enterprise-grade governance with open innovation. Customers gain flexibility, while also getting validated model artifacts and secure provenance. 

Running AI Like Infrastructure

At its core, llm-d signals a philosophical shift: treating AI as a managed workload, not a research project.

“AI has moved from training to inferencing, that’s where models go into production,” Katarki explains. “We’re helping enterprises run AI like infrastructure.”

The framework enables platform engineers to build “model-as-a-service” layers that balance performance and efficiency. Engineers can optimise for latency, throughput and hardware usage depending on the service-level objectives (SLOs) of the application. Whether a chat interface needs millisecond responses or a batch agent processes thousands of documents overnight, llm-d helps tune the trade-offs.

This operational reliability echoes Red Hat’s history with Linux and OpenShift: the company isn’t trying to reinvent AI; it’s industrialising it. “We don’t really have dogma about this,” Katarki says. “We’re here to be trusted advisors.”

The company emphasises flexibility as the new reliability: customers can deploy models across NVIDIA or AMD accelerators, on-prem or in hybrid cloud, with minimal changes. According to the project’s GitHub page, llm-d “supports multiple accelerator types including NVIDIA GPUs, AMD GPUs, Google TPUs and Intel XPUs.” 

The Multi-Model Future

Katarki believes enterprise AI’s next chapter will rely less on a single model than on many models working together.

“Real-world use cases are a combination of large models, small models, tool calling, and predictive models,” he says.

Large models handle reasoning and planning; smaller fine-tuned models manage domain-specific actions; predictive models provide deterministic accuracy. Together, they form the composite systems enterprises increasingly require.

This multi-model reality is shaping llm-d’s roadmap. Customer feedback has led to expanded support for smaller/dense models, vision models, and easier deployment defaults. “We’re creating defaults that make llm-d easy to deploy,” Katarki says, “while keeping the deeper levers for power users who need scale.”

Another focus: hierarchical storage and memory management: the project blog notes use of off-heap caching and cache-aware routing to improve scalability. Hardware support also broadens: at launch, Red Hat emphasised multi-vendor support including AMD and Google Cloud TPUs. 

Beyond performance, the next frontier is governance:

“Cost, efficiency and accuracy will continue to matter most. Safety and security are next,” Katarki says.

As AI systems shift from internal tools to customer-facing applications, the need for guardrails and trust frameworks becomes critical.



Red Hat’s approach feels familiar. The company built its reputation by turning open-source systems into enterprise-ready infrastructure. Now, with llm-d, it’s applying the same logic to AI, transforming experimentation into infrastructure.

“We’re not just running models, we’re helping enterprises run AI like infrastructure,” says Tushar Katarki.

📣 Want to advertise in AIM Media House? Book here >

Picture of Mukundan Sivaraj
Mukundan Sivaraj
Mukundan covers the AI startup ecosystem for AIM Media House. Reach out to him at mukundan.sivaraj@aimmediahouse.com or Signal at mukundan.42.
Global leaders, intimate gatherings, bold visions for AI.
CDO Vision is a premier, year-round networking initiative connecting top Chief
Data Officers (CDOs) & Enterprise AI Leaders across major cities worldwide.

Subscribe to our Newsletter: AIM Research’s most stimulating intellectual contributions on matters molding the future of AI and Data.