Skip to content

The Next Wave of AI: Why Inference Is Taking Over

AI has entered a new chapter. Training large models was the pioneering era. Now, the real action is inference. Deploying those models at scale, everywhere.

Guy Currier of The Futurum Group presented his case at Tech Field Day AI 8 during a community session I attended.

The infrastructure is shifting. The spending is shifting. Furthermore, the architecture is transitioning from centralized hyperscale data centers outward, toward hybrid, edge, and on-premises deployments. Smaller, smarter models are replacing one-size-fits-all giants. This is not incremental. This is a structural break.

From Training to Inference: Two Different Animals

Training and inference are fundamentally distinct workloads. Consequently, they require fundamentally different infrastructure.

Training is batch-oriented and data-hungry. It runs on massive GPU clusters, processes enormous datasets, and can tolerate long job queues. Latency, therefore, is not the primary constraint. Inference, by contrast, runs continuously, responds to users in real time, and is acutely sensitive to latency.

Inference workloads will account for roughly two-thirds of all AI compute in 2026, up from one-third in 2023. Furthermore, inference can account for 80–90% of the lifetime cost of a production AI system, since it runs continuously.

The rise of agentic AI, however, is blurring the line. Agentic deployments increase token consumption by 20–30× compared to standard generative AI. In practice, agents plan, retrieve context, call tools, and iterate, meaning a single request can trigger dozens of model calls.

Additionally, fine-tuning, once a purely training-side activity, now happens on smaller models closer to the edge, further dissolving the clean boundary between the two phases. Any infrastructure strategy that treats training and inference as the same problem will be wrong on both counts.

Where AI Lives: On-Prem, Cloud, Hybrid, and the Edge

According to The Futurum Group’s 2H 2025 AI Platforms Market Sizing & Five-Year Forecast, Hyper/Neo cloud and On-Premises spending will roughly triple between 2026 and 2030. However, Hybrid/Edge spending is on track to grow approximately six times over the same period, roughly twice the pace of the other categories.

AI inference infrastructure
Figure 1 - AI Investment Will Accelerate Away from the Core

That acceleration makes strategic sense. The global edge AI market was estimated at $24.9 billion in 2025 and is projected to reach $66.47 billion by 2030, at a CAGR of 21.7%. Meanwhile, 85% of cloud buyers are either deployed or actively deploying a hybrid cloud, according to IDC.

Hybrid wins because it bridges two worlds: heavy training stays centralized, while inference shifts closer to the point of consumption.

Beyond hybrid, the edge category reflects how broadly AI is spreading. AI PC spending is projected to grow from $7 billion in 2024 to $98 billion by 2029, a 69% compound annual growth rate (The Futurum Group, Intelligent Devices, Jan. 2025).

Smaller Models, Smarter Deployments

One of the most consequential shifts underway is the move toward smaller, domain-specific language models. Initially, large foundation models were necessary to establish what AI could do. However, they are expensive to run and often excessive for focused tasks.

The 2026 fine-tuning paradigm has shifted toward model distillation: using a large teacher model to generate high-quality synthetic training data, then fine-tuning a smaller student model, achieving 10x cheaper inference with near-frontier quality for specific tasks.

Consequently, a healthcare provider needs domain accuracy, not general-purpose breadth. A manufacturer running quality inspection at the edge needs low latency and low power, not a 70-billion-parameter model. As inference infrastructure matures, model size becomes a tunable variable rather than a fixed constraint.

Conclusion: The Roads Are Being Built

Guy Currier framed the current moment as AI’s “colonia problem.” The frontier settlements, the foundation models, and the hyperscale data centers are built. Now come the roads: inference infrastructure, hybrid deployments, edge nodes, and smaller models that travel light and run fast.

The investment data confirms the direction. The global AI inference market will expand from $106 billion in 2025 to $255 billion by 2030, a CAGR of 19.2%. Hybrid/Edge is the fastest-growing deployment category. Smaller fine-tuned models are becoming the production standard. Moreover, agentic AI is making latency an architectural constraint rather than merely a performance metric.

Ultimately, who builds the best infrastructure, not the biggest models, will define the next four years.

Back To Top