GPU & Compute Demand Forecasting

The $1.7T Compute Stack

How enterprise AI adoption, hyperscaler investment, and the Jevons Paradox are shaping GPU demand through 2030. Two models, one conclusion: the world needs exponentially more compute.

$660B

Hyperscaler Capex 2026

12.7x

YoY Token Growth

10x

Blackwell Efficiency

58%

NVIDIA Share 2030

$1.7T

AI Compute TAM 2030

Part V — The Infrastructure Arms Race

Chapter 16: The GPU Demand Curve

$1.7 trillion. That is the projected size of the AI compute stack by 2030 — a number that encompasses GPUs, networking, cooling, power infrastructure, and the software that orchestrates it all. It is also, by any historical measure, the single largest infrastructure buildout since the electrification of the industrialized world.

The intelligence revolution is not merely a software phenomenon. Behind every AI-generated paragraph, every reasoning chain, every enterprise copilot suggestion lies a physical substrate of silicon, copper, and kilowatt-hours. Understanding the economics of that substrate — who builds it, who buys it, how much it costs, and where the bottlenecks lie — is essential to any strategic assessment of the AI landscape. This chapter maps the full demand chain: from token consumption to GPU demand to datacenter capacity to power requirements, revealing the interdependencies that will shape the industry through the end of the decade.

Five numbers anchor the analysis. Hyperscaler capital expenditure reaches $660 billion in 2026, more than doubling the $260 billion spent in 2024. Token consumption is growing at 12.7x year-over-year, driven by enterprise adoption, consumer applications, and the emergence of AI agents. NVIDIA's Blackwell architecture delivers a 10x per-token cost reduction over Hopper — yet total compute demand grows, not shrinks, because cheaper inference unlocks use cases that were previously uneconomical. NVIDIA retains 58% of the accelerator market by 2030, down from 92% in 2023, as AMD and custom ASICs gain share in an expanding market. And the total addressable market for AI compute infrastructure reaches $1.7 trillion under the base case scenario.

The central thesis is Jevons Paradox — the 250-year-old observation that making a resource more efficient increases, rather than decreases, its total consumption. Per-token costs drop 2,000x from 2023 to 2030. Token volume grows 20,000x. Total compute spending grows 10x. The same paradox that drove coal consumption upward after James Watt's more efficient steam engine now drives GPU demand upward after each generation of more efficient AI hardware. See also: NVIDIA Infrastructure Analysis and Inference Demand Model.

$130.5B

NVIDIA FY2025 Revenue

10x

Blackwell Cost Reduction

12.1T

Weekly Token Consumption

3.7x

CoWoS Capacity Growth 2024-2026

1. NVIDIA Revenue Trajectory

Total revenue vs data center segment, FY2022-FY2026E

Data center went from 58% of revenue (FY2022) to 88% (FY2026). NVIDIA is now an AI infrastructure company that also makes gaming GPUs.

NVIDIA: From Gaming GPUs to AI Infrastructure Monopoly

NVIDIA's transformation is the most dramatic corporate pivot in semiconductor history. Data center revenue grew from $10.6 billion in FY2022 (58% of total) to $193.7 billion in FY2026 (88% of total) — an 18x expansion in four years. The company that once derived the majority of its revenue from gaming graphics cards is now, by revenue and strategic significance, the most important semiconductor company in the world. Its CUDA software ecosystem, built over 15 years and embraced by more than 4 million developers, creates a moat that no competitor has successfully breached for training workloads.

2. Hyperscaler Capex by Company

Capital expenditure commitments 2024-2026E across the Big Five

Combined capex grows from $260B to $660B in just 2 years. Amazon alone commits $200B in 2026 — more than the entire industry spent in 2024.

The $660 Billion Bet

Combined hyperscaler capex grows from $260 billion in 2024 to $660 billion in 2026 — a 2.5x increase in just two years. Amazon alone commits $200 billion in 2026, more than the entire industry spent in 2024. Alphabet commits $185 billion, Meta $135 billion. These are not speculative allocations; they represent signed contracts for datacenter construction, GPU procurement, and power infrastructure that will take years to build out. Roughly 40% of AI capex flows to GPU and accelerator purchases, 20% to power and cooling, 15% each to networking and facilities, and 10% to software.

The risk is concentrated in free cash flow. Combined FCF for the Big Five hyperscalers drops from $237 billion in 2024 and risks turning negative by 2027. This is sustainable only if AI-driven revenue growth materializes at the pace these companies project. If it does not, a capex pullback of 20–40% in 2027–2028 is the single largest risk to GPU demand forecasts.

3. Training vs Inference: The Structural Flip

Compute spend evolution from training-dominated to inference-dominated

The flip is structural: training scales with model generations (finite), while inference scales with users and usage (unbounded). By 2030, 80% of compute spend is inference.

The Structural Flip from Training to Inference

The shift from training-dominated to inference-dominated compute is structural, not cyclical. Training scales with model generations — GPT-5 costs more to train than GPT-4, but there is one training run per model generation, and as benchmarks plateau, marginal training dollars yield diminishing returns. Inference, by contrast, scales with users and usage. Every enterprise worker running 540,000 tokens per day, every consumer query, every AI agent action consuming inference — these workloads are recurring, cumulative, and without a natural ceiling. Inference share rises from 20% of compute spend in 2023 to 80% by 2030, and this transition reshapes the entire hardware market. Inference favors throughput-optimized hardware, batch processing, and cost efficiency over the raw FLOPS that dominate training workloads.

4. Token Consumption Growth

Daily token volume on logarithmic scale, 2024-2030E

From 0.14T tokens/day in 2024 to 200T by 2030 — a 1,400x increase. AI agents are the multiplier: each user action triggers 10-100x more inference calls.

5. AI Accelerator Market Share Evolution

NVIDIA dominance erodes as AMD and custom ASICs grow — but in an expanding market

NVIDIA drops from 92% to 58% — but 58% of a $200B market ($116B) exceeds their entire FY2025 revenue. Losing share in a growing market still means growth.

The Competitive Landscape: NVIDIA's Moat Erodes — in a Growing Market

NVIDIA's market share drops from 92% to 58% between 2023 and 2030, but 58% of a $200 billion market yields $116 billion in data center GPU revenue alone — still exceeding NVIDIA's entire FY2025 revenue of $130.5 billion. AMD's MI300X shipped 327,000 units in 2024, competitive on inference price-performance though hampered by the weaker ROCm software ecosystem. Custom ASICs from Google (TPU v6e), AWS (Trainium2), Microsoft (Maia 100), and Meta (MTIA v2) are projected to reach 20% of the accelerator market by 2030. They are strongest for inference, where workloads are predictable and can be optimized for specific model architectures, and weakest for training, where NVIDIA's CUDA ecosystem and NVLink interconnect remain irreplaceable.

6. TSMC CoWoS: Supply vs Demand

Advanced packaging capacity vs demand in thousands of wafers per month

The severe GPU shortage of 2023 (12-month waits) resolves by 2026 as CoWoS capacity scales from 15K to 130K wafers/month. After 2026, power — not packaging — is the bottleneck.

Supply Chain Resolution

The supply story is one of resolution. TSMC's CoWoS advanced packaging — the critical manufacturing bottleneck for AI GPUs — scales from 15,000 wafers per month in 2023 to 130,000 by 2026, a nearly 9x expansion. The severe GPU shortage that led to 12-month H100 wait times in 2023 eases to zero by 2026. After 2026, GPU supply ceases to be the binding constraint. Power and cooling become the new bottleneck: GB200 NVL72 racks consume 72 kilowatts each, data center power demand grows 15–20% annually, and in power-constrained regions, grid capacity — not silicon availability — determines how quickly new AI infrastructure can be deployed.

7. Compute Demand by Workload Type

How AI compute demand is distributed across workload categories

8. Demand Models: Bottom-Up vs Top-Down

Two independent models converge — then diverge after 2026, revealing a capacity gap

Both models agree on ~15M H100-equivalents by 2026. The divergence after 2026 means either capex must accelerate further, efficiency must exceed projections, or demand goes partially unmet.

Two Models, One Conclusion

We built two independent demand models — one bottom-up from enterprise tasks, tokens, and GPU requirements, one top-down from hyperscaler capex — and they converge at approximately 15 million H100-equivalents by 2026, within 10% of each other. After 2026, they diverge: the bottom-up model, which assumes Jevons Paradox in full effect, projects exponential token demand growth that outpaces the top-down model's capex constraints. This divergence is the key finding. Either hyperscaler spending must accelerate further, or efficiency gains must exceed our projections, or some demand will go unmet. The gap is not an error in the models; it is the strategic question facing the industry.

9. The Jevons Paradox

Per-token cost drops 2,000x while token volume grows 20,000x — total compute demand grows 10x

This is the GPU demand story in one chart. Cheaper compute doesn't reduce total consumption — it increases it, just like Watt's steam engine increased coal use despite being 3x more efficient.

Jevons Paradox: The Engine of GPU Demand

This is the GPU demand story in one chart. In 2026, NVIDIA's Blackwell delivers approximately 10x per-token cost reduction over Hopper. Token consumption, however, grows 4.6x in the same year. The net result: the world needs 4.6x more total compute despite 10x better efficiency per token. The paradox unlocks entirely new categories of AI usage. Enterprise tasks that cost $1 per query at Hopper prices become $0.10 on Blackwell — and adoption triples. AI agents that require 100 inference calls per action become feasible at scale. Consumer applications that needed free-tier economics to survive become commercially sustainable. Each efficiency gain widens the addressable market faster than it shrinks the per-unit cost, ensuring that total compute spending rises even as the unit price falls.

The historical precedent is consistent and unambiguous. Watt's steam engine was 3x more efficient than Newcomen's; total coal consumption increased. LED lighting is 90% more efficient than incandescent; total light consumption rose 5x. Cloud computing was 10x cheaper than on-premises infrastructure; total compute spending grew 20x. AI inference costs drop 2,000x from 2023 to 2030; total inference consumption grows 20,000x.

10. Scenario Analysis: NVIDIA Revenue

Bull, Base, and Bear cases for NVIDIA revenue through 2030

Even the bear case has NVIDIA at $250B by 2030 — nearly double FY2025. The bull case at $500B would make NVIDIA the highest-revenue semiconductor company in history.

11. Scenario Analysis: Installed GPU Base

Cumulative H100-equivalent GPUs deployed worldwide

The installed base grows 5-12x from 2025 to 2030. Even the bear case requires 45M H100-equivalents — 5x today's deployed base.

12. Risk Factor Matrix

Key risks to the GPU demand forecast: probability vs severity. Bubble size = impact magnitude.

Demand Segmentation Deep Dive

Select a segmentation view and year to explore the demand breakdown

Scenario Analysis and Risk Assessment

Even the bear case projects NVIDIA revenue of $250 billion by 2030 — nearly double FY2025. The bull case, at $500 billion, would make NVIDIA the highest-revenue semiconductor company in history. The installed GPU base grows 5–12x from 2025 to 2030 across all scenarios, with the base case projecting 72 million H100-equivalents and the bull case reaching 100 million. The primary risk factor remains hyperscaler capex sustainability: $660 billion in 2026 commitments against negative projected free cash flow by 2027 is only viable if AI revenue growth materializes at the pace these companies project.

Secondary risks include the acceleration of custom ASICs, which reduce NVIDIA-specific demand but not total compute demand; geopolitical export controls that bifurcate the market; and power infrastructure constraints that may delay deployments by 6–18 months in capacity-constrained regions. The one risk the data argues against is efficiency gains outpacing demand. The 250-year record of Jevons Paradox, replicated across every major efficiency improvement in computing history, suggests that cheaper AI will produce more AI consumption, not less.

What Comes Next

The $1.7 trillion compute stack exists to serve enterprise demand. Part VI begins with Chapter 19: Enterprise Disruption, which maps the $607 billion enterprise AI TAM across eight industry domains — translating the GPU infrastructure analyzed here into the business value it enables. The question shifts from "how much compute does the world need?" to "what does the enterprise do with it?"

← Chapter 15: Open vs Closed Chapter 17: NVIDIA →