IT Brief UK - Technology news for CIOs & IT decision-makers
Gcore

Gcore brings managed Nvidia Dynamo to AI inference

Fri, 27th Feb 2026

Gcore has integrated Nvidia Dynamo into its AI inference products, offering the open-source framework as a managed, one-click option across public cloud, private cloud, hybrid, and on-premises environments.

The infrastructure and software provider said the move is aimed at cost and performance problems that emerge when organisations run inference at scale, including GPU underutilisation, static resource allocation, memory bottlenecks, and inefficient data transfer.

The integration is available through Gcore's Everywhere Inference and Everywhere AI services, and is deployed as a fully managed option via the Gcore Customer Portal. Customers can enable it without managing routing, KV cache logic, or GPU scheduling.

Inference economics

AI inference has become a major commercial focus for cloud and infrastructure suppliers as businesses shift from training large models to serving them in production. Economics often hinge on how efficiently providers keep GPUs busy while maintaining predictable response times.

Materials accompanying the announcement cited a projection that the AI inference market could exceed $250 billion by 2030. They also said GPUs accounted for 58% of data centre compute spending in 2025, against rising costs tied to large-scale AI workloads.

Gcore positioned Dynamo as a way to increase effective throughput and reduce tail latency-slower responses that affect a small share of requests but can shape overall user experience and service-level objectives.

"Modern inference isn't just 'run a model' - it's batching, routing, dynamic workloads, longer contexts, and tight SLOs. In that reality, small scheduling and utilization losses become big performance and cost penalties. By integrating Dynamo as a managed service in Gcore, we bring advanced GPU optimization directly into the runtime path so customers see higher effective throughput and steadier tail latency, without operating the complexity themselves," said Seva Vayner, Product Director of Edge Cloud and AI, Gcore.

Performance claims

Gcore said the Dynamo integration can deliver "up to 6x higher throughput and 2x lower latency". Earlier materials referenced "up to 5x higher throughput and 2x lower latency". No benchmark configurations or workloads were provided.

Dynamo is an open-source inference framework designed for large-scale generative AI and inference models. At Gcore, it is integrated into the company's inference offering rather than released as a standalone product.

Gcore said it has pre-optimised the managed service for popular inference models, allowing customers to use Dynamo without building and operating their own scheduling and request-routing layer.

How Dynamo works

The announcement highlighted several techniques used by Dynamo, including splitting prefill and decode work, applying KV cache-aware routing, and using NIXL for inter-node communication.

Gcore said these methods are intended to increase GPU utilisation and reduce wasted cycles during decode and cache recomputation. The broader goal is to process more inference requests on the same hardware.

Gcore linked the changes to a lower cost per token and improved return on investment for customers running AI services. It also said the managed delivery model makes it easier to apply these efficiencies at scale.

Product scope

Dynamo is supported across private cloud, hybrid, and on-premises inference environments via Everywhere AI and Everywhere Inference. Gcore also described the integration as part of a broader strategy to simplify AI deployment through a single customer portal.

Gcore operates infrastructure across six continents and positions itself around low-latency delivery. It provides a mix of AI, cloud, networking, and security services.

As hyperscalers invest heavily in GPU capacity, smaller cloud and edge infrastructure providers are seeking differentiation through software orchestration and managed inference tools. The focus has shifted to serving models efficiently as usage grows and price pressure intensifies.

Gcore said Dynamo-powered inference is available on Gcore Inference and Everywhere AI. Demonstrations of Nvidia Dynamo on Gcore are planned for MWC and Nvidia GTC.