Are we throwing hardware at an architecture problem? In the latest episode of the Utilizing AI Podcast, I joined Stephen Foskett and Gina Rosenthal to explore exactly that.
We covered three interconnected topics: the stark disparity in hardware utilization among top AI organizations, the reflex to buy more GPUs rather than fix systemic bottlenecks, and the measurement challenges that make optimization so elusive.
1. Hardware Utilization Disparity
Utilization rates across the AI industry vary dramatically, and the numbers are striking. xAI, Elon Musk’s AI initiative, reportedly runs at just 11% utilization. That figure alone raises serious questions about how the industry defines efficiency and whether more compute is ever the right answer.
2. The "Throw Hardware at the Problem" Mentality
This is a classic IT trap, and AI is not immune. When organizations hit performance bottlenecks, they default to purchasing more hardware rather than diagnosing the underlying system. That instinct backfires. Adding GPUs to a poorly managed infrastructure does not reduce costs. It makes them harder to justify and the system harder to run.
3. Measurement and Workload Optimization
Two factors make AI optimization genuinely hard. First, no standardized GPU utilization metric exists. Unlike CPUs, where core usage gives a clear signal, GPU efficiency often centers on memory consumption, which tells a different story. Second, the industry lacks any apples-to-apples benchmark for comparing efficiency across workloads.
Without consistent measurement, organizations cannot tell whether they have an architecture problem or simply need more hardware. Usually, it is the former.
Is your organization measuring GPU utilization or just assuming more hardware solves the problem? Watch the full episode to hear how Stephen, Gina, and I unpack it. And if this resonates with what you are seeing in the field, let’s talk.
Additionally, find our insights on GPU optimizations have a look at “The Tech Corner”.