Red Hat Scales AI Inference with Red Hat AI 3
red-hat-schaalt-ai-inferentie-met-red-hat-ai-3
Published by
WINMAG Pro Editorial Team
Tue, 10 March 2026, 19:50
Read time: 6 min 0 sec
Share

As organizations move beyond experimenting with AI, they encounter challenges such as data privacy, cost control, and managing diverse models. "The GenAI Divide: State of AI in Business" from the NANDA project at the Massachusetts Institute of Technology outlines the reality of AI in production: 95% of organizations are estimated to see no measurable financial return on the approximately $40 billion in business spending.

Red Hat AI 3 directly addresses these issues with a consistent and clear platform for CIOs and IT leaders, enabling them to maximize their investments in accelerated computing applications. It allows for rapid scaling and distribution of AI workloads across hybrid, multi-vendor environments while simplifying collaboration between teams around new AI applications such as agents. Built on open standards, Red Hat AI 3 aligns with every step in an organization’s AI journey, regardless of model or hardware, from data centers to public cloud and sovereign AI environments to the edge.

From Training to Doing: The Shift to Enterprise AI Inference

When organizations bring AI into production, the focus shifts from training and fine-tuning to inference: the phase where AI effectively "does". Red Hat AI 3 emphasizes scalable and cost-efficient inference, built on the successful open source projects vLLM and llm-d, combined with Red Hat's own optimization capabilities. This enables companies to reliably deploy large language models (LLMs) in production.

To help CIOs get the most out of their valuable hardware, Red Hat OpenShift AI 3.0 introduces the general availability of llm-d. This project redefines how LLMs run natively on Kubernetes. llm-d enables smart, distributed inference, combining the power of Kubernetes orchestration with the performance of vLLM and open source technologies such as the Kubernetes Gateway API Inference Extension, the NVIDIA Dynamo library for fast data transfers (NIXL), and the DeepEP Mixture of Experts (MoE) communication layer.

This allows organizations to:

  • Reduce costs and improve response times through intelligent, inference-driven model scheduling and decentralized processing
  • Achieve operational simplicity and reliability with clear "Well-lit Paths" that simplify the rollout of models at scale
  • Maintain greater flexibility thanks to support for various hardware accelerators, including NVIDIA and AMD

llm-d builds on vLLM and evolves from a single-node inference engine to a distributed, scalable system, tightly integrated with Kubernetes. It is designed to enable predictable performance, measurable ROI, and efficient infrastructure management. All enhancements address the challenges of varying LLM workloads and serving massive models like Mixture-of-Experts.

A Unified Platform for Collaborative AI

Red Hat AI 3 brings an integrated, flexible experience that addresses the need for collaboration in building production-level generative AI solutions. The platform creates tangible value by connecting teams and streamlining workflows within a single environment where both platform and AI engineers can execute their AI strategy. New features help companies transition from proof-of-concept to production with greater efficiency and productivity.

The Model-as-a-Service (MaaS) capabilities build on distributed inference and give IT teams the opportunity to act as MaaS providers themselves. They can offer central models with on-demand access for both AI developers and AI applications. This simplifies cost management and provides solutions for use cases that cannot run on public AI services due to privacy or data security concerns.

The AI Hub allows platform engineers to explore, deploy, and manage AI assets. They have one central place with a curated catalog of validated and optimized generative AI models, a registry for model management, and an environment to configure and monitor all AI assets on OpenShift AI.

The Gen AI Studio provides AI engineers with an interactive workspace to play with models and quickly prototype new applications. Thanks to the AI assets endpoint feature, they can easily discover and use available models and MCP servers, simplifying interaction with external tools. The built-in playground makes it easy to test models, refine prompts, and tune parameters for applications such as chat and retrieval-augmented generation (RAG).

New validated and optimized Red Hat models further accelerate development. The selection includes popular open source models such as OpenAI's gpt-oss, DeepSeek-R1, and specialized models like Whisper for speech-to-text and Voxtral Mini for voice-driven agents.

The Foundation for the Next Generation of AI Agents

AI agents are poised to fundamentally change the way applications are built. Their complex, autonomous workflows place heavy demands on inference capabilities. The latest Red Hat OpenShift AI 3.0 release builds on that foundation, not only through improved inference but also through new features for agent management.

To accelerate the building and deployment of agents, Red Hat introduces a Unified API layer based on Llama Stack, aligning development with industry standards such as OpenAI-compatible LLM protocols. Additionally, Red Hat supports the Model Context Protocol (MCP) as an early adopter, a new open standard that defines how AI models communicate with external tools. This forms a crucial foundation for modern AI agents.

Red Hat AI 3 also includes a modular and extensible toolkit for model customization, based on InstructLab. It provides specialized Python libraries that give developers more control and flexibility. The toolkit utilizes open source projects such as Docling for data processing, which converts unstructured documents into an AI-readable format. Furthermore, it includes a flexible framework for synthetic data generation and a training hub for LLM fine-tuning. The integrated evaluation center helps AI engineers validate and track results so they can confidently deploy their own data for more accurate and relevant AI outcomes.

About Red Hat

Red Hat is a global leader in open hybrid cloud technology, providing a reliable, consistent, and comprehensive foundation for groundbreaking IT innovation and AI applications. The portfolio includes cloud, developer, AI, Linux, automation, and application platform technologies, enabling any application to run anywhere – from the data center to the edge.

As the largest provider of enterprise open source software, Red Hat actively invests in open ecosystems and communities to address the IT challenges of tomorrow. In close collaboration with partners and customers, Red Hat helps organizations build, connect, automate, secure, and manage their IT environments. Customers are supported with expert advice, award-winning training, and recognized certification programs.

6g-hoe-ziet-de-toekomst-van-netwerken-eruit

6G: what does the future of networks look like?

Saturday 16 May 2026 - 10:30
nederland-scoort-te-laag-op-digitale-weerbaarheid

The Netherlands scores too low on digital resilience

Thursday 14 May 2026 - 08:00
hoe-as-a-service-de-it-wereld-verandert

How 'as a Service' is changing the IT world

Wednesday 13 May 2026 - 20:00
ai-en-duurzaamheid-strategieen-voor-organisaties

AI and Sustainability: Strategies for Organizations

Tuesday 12 May 2026 - 22:15