The State of AI — Chapter 2

The Intelligence Cost Curve

AI Pricing, Commoditization & the $20 Subscription Crisis — tracking 22 models across 6 years, three scenarios through 2029, and the historical forces that make near-free intelligence inevitable.
1,000x
Cost reduction since GPT-3 (2020)
12-18mo
Intelligence/$ doubling time
22
Models tracked
97-99%
Subscription margin crisis by 2028
5
Task complexity tiers
Part I — The Intelligence Layer
Chapter 2: The Intelligence Cost Curve

AI inference pricing is commoditizing faster than any technology in recorded economic history. The question is no longer whether intelligence becomes cheap—it is how enterprises position themselves before it does.

In March 2023, accessing GPT-4-class intelligence cost $60 per million output tokens. By February 2026, equivalent quality is available from DeepSeek V3 at $0.28 per million tokens—a 99.5% price collapse in under three years. Even within OpenAI’s own product line, the journey from GPT-3 to GPT-4o-mini represents a 400x reduction in per-token cost at roughly comparable quality. The Stanford HAI 2025 AI Index documents a 280x inference cost reduction across the broader market. No prior technology—not semiconductors, not bandwidth, not electricity—has deflated this rapidly at the same stage of adoption.

We call this dynamic the Densing Law: capability density per dollar of inference spend doubles approximately every 3.5 months. Unlike Moore’s Law, which operated on an 18–24 month cadence constrained by lithography physics, the Densing Law compounds across four simultaneous vectors—hardware efficiency, algorithmic optimization, competitive pricing pressure, and open-source commoditization. The result is a cost curve that bends downward faster than enterprise planning cycles can absorb.

This chapter tracks 22 models across six years, projects three probabilistic scenarios through 2029, dissects the emerging subscription crisis, and maps the historical analogies that make near-free intelligence a near-certainty. The data tell a single, unambiguous story: value is migrating from the model layer to the orchestration layer, and the organizations that grasp this shift earliest will define the next era of competitive advantage.

1. Historical Pricing — The 1,000x Collapse

Output price per 1M tokens from GPT-3 (2020) to February 2026. Logarithmic scale reveals the exponential decline.

AI Output Pricing Timeline (Log Scale)

From $60/M (GPT-3, 2020) to $0.28/M (DeepSeek V3, 2025) — with premium reasoning models at $80-600/M creating a 2,143x range.

The pricing timeline above reveals a pattern that should alarm any executive still budgeting AI as a premium line item. The 1,000x cost reduction from GPT-3 to today’s budget models is not a gradual decline—it is a series of step-function collapses, each triggered by a new competitive entrant or architectural breakthrough. GPT-3.5-turbo slashed prices 30x. GPT-4o cut them another 4x. Then GPT-4o-mini dropped the floor by 25x more. DeepSeek V3, trained for a reported $5.6 million, proved that frontier-class quality could be delivered at $0.28 per million tokens—undercutting every Western provider by an order of magnitude.

Yet the market has simultaneously bifurcated. While budget models race toward zero, premium reasoning models like o1-pro command $600 per million tokens—ten times more expensive than the original GPT-4. This creates a 2,143x price range within a single product category, a spread unprecedented in technology markets. The implication is strategic: enterprises must learn to route tasks to the appropriate cost tier, matching model capability to task complexity rather than defaulting to the most expensive option.

2. The Intelligence per Dollar Curve

Original projection vs February 2026 reality — the curve is 3-6x ahead of schedule.

Intelligence/$ Multiplier (vs 2023 GPT-4 Baseline)

Doubling every 12-18 months. Feb 2026 reality: 8-15x practical value, vs 2.5x originally projected.

The intelligence-per-dollar curve is the single most important metric in this report. It captures both sides of the equation—falling prices and rising capability—in a single trajectory. The original 2024 projection estimated a 2.5x multiplier by 2026 relative to the GPT-4 baseline. The February 2026 reality is 8–15x, placing us three to six times ahead of schedule. GPT-4-equivalent output cost has fallen from $60 per million tokens to $0.40. Meanwhile, the best available MMLU score has climbed from 86.4% to 93%+, a gain of nearly seven percentage points.

This acceleration is not a temporary anomaly. It reflects the compounding of multiple deflationary forces acting simultaneously: NVIDIA’s Blackwell architecture delivers roughly 10x cost reduction per token over Hopper; inference optimizations—speculative decoding, INT4 quantization, KV-cache sharing, continuous batching—have collectively reduced serving costs 3–5x from 2023 baselines; and open-source models from Meta, Mistral, and DeepSeek have compressed the time between a frontier release and a commodity-priced equivalent to fewer than six months. The curve is not merely doubling every 12–18 months. It is compounding faster than any prior technology cost curve on record.

3. Cost Drivers — Five Forces Down, Five Forces Up

Reducers dominate for standard inference (80%+ of tasks). Increasers only hold for frontier reasoning.

▼ Price Reducers

Hardware Efficiency 25%
Each GPU generation is 2-3x more efficient. H100→B200 = 2.5x inference throughput. Custom AI chips (TPU, Trainium, Groq)...
Competition / Price Wars 25%
More providers = price wars. OpenAI, Anthropic, Google, xAI, Meta, DeepSeek, Mistral, Cohere all competing. Chinese prov...
Inference Optimization 20%
Speculative decoding, quantization (FP16→INT4), distillation, KV-cache optimization, batching, continuous batching. Each...
Scale Economics 15%
Larger deployments = lower per-unit costs. Hyperscalers run at massive scale. Data center efficiency improvements compou...
Open Source Pressure 15%
Llama, Mistral, DeepSeek, Qwen force price drops. Local models = $0/month after hardware. Closed providers must compete ...

▲ Price Increasers

Training Compute 30%
Frontier models cost $100M-$1B+ to train. GPT-5 estimated $200-500M. Google Gemini Ultra reportedly $100M+. These costs ...
Energy Costs 20%
Data centers hitting power limits. AI inference is energy-intensive. Nuclear, solar, and new grid connections needed. Po...
AI Talent Costs 15%
AI researcher salaries rising. Senior ML engineers command $500K-$2M+. Talent wars between labs, FAANG, and startups....
Safety & Alignment Overhead 15%
RLHF, red-teaming, content filtering, constitutional AI, monitoring. Each adds 10-30% to training and inference cost....
Reasoning Tokens 20%
Chain-of-thought, tree-of-thought, and extended thinking use 10-100x more tokens. o1/o3 models generate massive hidden r...

Cost Driver Impact Weights

Reducers vs increasers — net direction is strongly downward for standard inference.

The five price reducers collectively account for a net annual decline of roughly 50% for standard inference tasks. Hardware efficiency and competitive price wars carry the heaviest weight at 25% each. Open-source pressure—from Llama, Mistral, DeepSeek, and Qwen—acts as a relentless floor-lowering mechanism: whenever a closed provider maintains premium pricing, an open alternative emerges within months offering 83–94% of the capability at a fraction of the cost. The price increasers—training compute, energy, talent, safety overhead, and reasoning token multiplication—are real but concentrated. They matter most for frontier reasoning tasks, which represent only about 5% of total inference volume.

The net assessment is decisive: for the 80% of tasks that are routine or standard professional work, the deflationary forces are overwhelming. For the 15% that require mid-tier intelligence, prices are falling at 40–60% per year. Only the top 5%—frontier reasoning, novel research, and breakthrough problem-solving—see prices hold or rise, driven primarily by the 10–100x reasoning token multiplier that chain-of-thought and tree-of-thought architectures demand. This creates the bifurcated market the scenario analysis below explores.

4. Three Scenarios for 2025-2029

Scenario A (aggressive, 60%) is on track or ahead. Scenario C (premium divergence, 10%) is happening faster than expected. They coexist.

Scenario A vs B: Mid-tier Price Trajectory

Standard output pricing per 1M tokens (Scenario A = aggressive 60%, B = moderate 30%)

Scenario C: Premium Divergence

Budget models race to $0.01/M while premium rises to $50/M — the gap explodes to 5,000x

The critical insight from the scenario analysis is that Scenarios A and C are not mutually exclusive—they are playing out simultaneously across different market tiers. Standard inference is following the aggressive Scenario A trajectory, with budget models already at $0.28–$0.60 per million tokens—prices that the original 2024 analysis did not expect until 2028. Meanwhile, frontier reasoning is tracking Scenario C, with premium models commanding $25–$600 per million tokens, creating a 50–2,000x gap that already exceeds the original 2027 projections. The market is not following a single path; it is stratifying into distinct economic layers, each with its own cost dynamics.

For enterprise strategists, this bifurcation presents both an opportunity and a trap. The opportunity lies in the 80/15/5 routing pyramid: intelligently matching tasks to the correct cost tier can reduce inference budgets by 10–50x without sacrificing output quality. The trap is treating all AI spend as a single budget line, overpaying for routine tasks with premium models or, worse, under-investing in frontier capabilities where reasoning quality genuinely matters. The organizations that master this routing discipline will extract dramatically more value per dollar than those that do not.

5. The $20 Subscription Margin Collapse

Provider cost drops from $5-15 (2025) to $0.10-0.50 (2028) while users still pay $20 — 97-99% margins are unsustainable.

$20 Subscription: User Pays vs Provider Cost

The growing gap between what users pay and what it costs the provider. By 2028, users are paying $20 for $0.50 of compute.

The subscription crisis is not hypothetical—it is already visible in the data. A typical casual subscriber generates 1–2 million tokens per month. At February 2026 pricing with efficient models and caching, that workload costs the provider $0.50–$4. Yet the user pays $20. Margins of 85–95% are already the norm for light users, and they will reach 97–99% by 2028. This is not a sustainable equilibrium. API pricing is public; BYOK tools like OpenRouter make the arithmetic transparent; and open-source alternatives from Meta, Mistral, and DeepSeek offer frontier-minus-one quality at zero marginal cost after a modest hardware investment. The $20 price point survives only if what it buys evolves from raw intelligence into a differentiated experience.

6. Three Subscription Futures

Price drops (Netflix), tiered (Spotify), or feature differentiation (already happening). The $20 survives but transforms.

Price Drops (Netflix Model)

Most Likely
Like Netflix dropped from $15 to $7 for basic tier
2025: $20/month → Current Claude/GPT
2026: $15/month → Better models
2027: $10/month → Much better models
2028: $5-8/month → Near-frontier models
2029: $3-5/month → Commodity AI

Tiered Model (Spotify Model)

Likely
Like Spotify Free vs Premium vs Family
Free: $0 → Basic AI (today's GPT-4 level)
Good: $5/month → Good AI (today's Sonnet level)
Pro: $20/month → Frontier AI + features
Enterprise: $50/month → Enterprise + priority + agents

Feature Differentiation (Already Happening)

Already Happening
$20 subscription justifies itself through non-intelligence features
• Unlimited usage (peace of mind)
• Priority access (no rate limits)
• Advanced features (artifacts, projects, memory)
• Integrations (Claude Code, desktop apps)
• Support (faster response)

The three subscription futures outlined above converge on a single strategic truth: AI providers will follow the same evolutionary path as internet service providers, streaming platforms, and cloud computing vendors. Pure access to the resource gets commoditized. Premium pricing migrates to the experience layer—priority access, agents, integrations, memory, and workflow tools. By 2029, raw AI chat becomes a $3–$5 commodity. The $20 tier transforms into an “unlimited agents plus priority frontier plus verification tools” package. This is not speculation; it is the structural logic of every technology utility market in the past century.

7. Historical Analogies — Same Law, Same Outcome

Bandwidth, silicon, electricity — all followed the same commoditization pattern. AI is not the exception.

Internet Bandwidth

1990s-2026
10,000-100,000x cost reduction
Bandwidth became a utility. The $20 subscription survived but stopped being 'pay for bandwidth' and became 'pay for premium experience + no throttling + extra features.'

Microprocessor Computing / Moore's Law

1960s-2026
~1 trillion x cost reduction
The intelligence-per-dollar doubling every 12–18 months is Moore’s Law accelerated for the inference layer.

Electricity

1900-2026
~2x real-terms decline, with vast scale expansion
$1-5/month for casual AI users, $20 for 'premium experience + agents + priority' is the natural evolution — just like basic electricity is cheap but premium appliances/features cost more.

The Universal Commoditization Pattern

All four resources follow the same 5-phase lifecycle. AI is currently in phase 2-3.

The historical parallels are not approximate—they are structurally identical. Internet bandwidth dropped 10,000–100,000x from dial-up to fiber. Transistor costs fell by a factor of roughly one trillion over sixty years. Electricity evolved from a luxury powering private generators for the wealthy to a metered utility so cheap that households scarcely think about it. In every case, the same five-phase pattern held: expensive and metered, then efficiency plus competition collapses unit cost, then flat-rate subscriptions emerge, then subscriptions face pressure and pivot to experience, and finally, value migrates permanently to the layer above the commodity resource. AI inference is currently transitioning from phase two to phase three. The analogies do not merely suggest what comes next; they make it economically inevitable.

The strategic implication is blunt: companies building moats around model access are building on a commodity. Microsoft, Apple, and Google did not win by selling transistors—they won by building software, ecosystems, and user experiences on top of near-free silicon. Netflix did not win by selling bandwidth—it won by building a content and recommendation platform that rode the bandwidth commodity to global scale. The AI equivalent is already taking shape: the orchestration layer—agents, workflow tools, memory systems, and multi-model routing—is where durable competitive advantage will accrue.

8. Task Cost Evolution by Complexity

Five tiers from trivial (already free) to frontier/breakthrough (premium persists). Complex tasks stay expensive 3-4 years longer.

CategoryExamples2026 Cost20282030Utility Cheap Date
Trivial / Everyday Casual chatting, Basic math (sqrt, calc), Fact check $0.000005-$0.0005 <$0.00005 Free / embedded Already here (2025-2026)
Routine Professional Email writing, Simple code snippet, Customer support reply $0.0005-$0.02 $0.00005-$0.002 Free / $0.0005 2027-early 2028
Advanced Professional Full scripts/apps, Market/legal drafts, Creative campaigns $0.02-$1 $0.001-$0.10 $0.0001-$0.02 Mid 2028-2029
Expert / PhD-level Novel research synthesis, Hypothesis generation + testing, Peer-review paper draft $0.50-$30+ (often $2-15) $0.05-$3 $0.005-$0.50 Late 2029-2030
Frontier / Breakthrough Original invention, Solving unsolved math/science problems, Multi-year corporate strategy $10-$300+ $1-$30 $0.10-$5 2030+ (premium tier persists for true novelty)

Task Cost by Tier (2026 vs 2028 vs 2030)

Log scale — showing the cost compression across all tiers over time.

The task cost evolution table above encodes a profound insight: the cost gap between trivial and frontier tasks is 100–10,000x in 2026, but the gap compresses relentlessly over time. A casual 500-token chat costs approximately $0.0002 today. A PhD-level literature review involving 50,000 effective tokens with heavy reasoning runs $5–$30. The difference is driven almost entirely by the reasoning token multiplier—chain-of-thought and extended thinking architectures consume 10–100x more tokens than simple response generation. As reasoning optimization matures (reducing the multiplier from 20x toward 2–10x), even expert-level tasks will slide down the cost curve, reaching utility pricing by 2029–2030.

9. Research Paper Cost Calculator

How much does it cost to AI-generate a 50-page research paper? From $0.25 (budget) to $120 (frontier PhD) in 2026.

YearTierModel ExamplesTotal CostQuality Level
2026 Budget GPT-5 Nano, Grok 4.1 Fast, Gemini Flash $0.25-$0.80 Solid undergrad / quick draft (good enough for many internal reports)
2026 Balanced / Pro GPT-5.2, Claude Sonnet 4.6 $12.00-$25.00 Strong master's / journal-ready first draft
2026 Frontier / PhD Claude Opus 4.6, GPT-5.2 pro $65.00-$120.00 True PhD / top-tier journal submission quality (novel insights, rigorous)
2028 Budget Next-gen cheap models $0.08-$0.25 Utility / near-free
2028 Balanced / Pro $4.00-$9.00 Excellent professional / conference paper
2028 Frontier / PhD $22.00-$45.00 PhD / top-journal ready
2030 Budget $0.01-$0.05 Completely free / embedded in tools
2030 Balanced / Pro $1.00-$2.50 Near-perfect for most users
2030 Frontier / PhD $6.00-$15.00 Still premium but affordable even for individuals

50-Page Research Paper Cost (3 Tiers x 3 Years)

Budget (undergrad), Balanced (master's), and Frontier (PhD) quality tracks over time.

The research paper calculator makes the abstract concrete. In 2026, a solo researcher can generate a journal-ready first draft of a 50-page paper for the price of two coffees—$12–$25 using a balanced-tier model. A full PhD-quality paper with novel contributions costs $65–$120 at frontier tier, already 70–90% cheaper than hiring a research assistant for two weeks. By 2028, that same PhD-quality paper drops below the cost of a single lunch. By 2030, it costs less than printing the paper on physical pages. These are not marginal efficiency gains. They represent a structural reordering of the economics of knowledge production, with implications for every professional services industry from consulting to legal to scientific research.

10. Commoditization Timeline — When Each Tier Reaches Utility Pricing

2026: trivial tasks free. 2027: routine commoditizes. 2028: advanced affordable. 2029: PhD-level utility-priced. 2030+: 95% near-free.

2026
Today
Cheap: Everything trivial + most routine professional work
Still expensive: Anything requiring genuine multi-step reasoning or novelty (PhD-level analysis, invention)
2027
Routine Commoditizes
Cheap: All routine professional tasks (emails, simple code, customer support, basic analysis)
Still expensive: PhD-level starts dropping fast but still carries a noticeable cost for high-quality output
2028
Advanced Professional Reaches Utility Pricing
Cheap: Full coding projects, in-depth reports, and standard legal/medical drafts reach commodity pricing
Still expensive: Expert/PhD synthesis becomes affordable for individuals/small teams (~$0.05-3 per deep task)
2029
Expert/PhD Enters Utility Zone
Cheap: Most Expert/PhD-level work (research synthesis, hypothesis testing, paper drafting)
Still expensive: Only true frontier/breakthrough tasks remain premium
2030
Near-Free Intelligence
Cheap: 95%+ of all human cognitive work (including most PhD-level)
Still expensive: Frontier remains the only expensive tier (like hiring a world-class consultant today)

The commoditization timeline traces a remarkably predictable progression. In 2026, trivial and routine professional tasks are already utility-priced—no one marvels at an AI drafting an email or answering a factual question, just as no one marvels at a calculator computing a square root. By 2027, agents handle 80%+ of white-collar busywork at near-zero marginal cost. By 2028, full coding projects and legal drafts feel routine. By 2029, most PhD-level work enters the utility zone. The frontier—genuine invention, unsolved scientific problems, multi-year strategic synthesis—remains premium, much as custom supercomputing or dedicated fiber lines remain premium today. But the premium tier shrinks to 5% of total demand, and even it faces a 10x cost reduction per year.

11. The Value Shift — From Intelligence to Capabilities

The most important strategic insight: value migrates from the model layer (commoditizing) to the orchestration/experience layer (differentiating).

Today (2026)

Access to a smart model
User thinks: "Pay for intelligence"
Moat: Best model wins (GPT-4 was the moat)

Tomorrow (2029+)

Agents + Tools + Integrations + Memory + Priority
User thinks: "Pay for capabilities and experience"
Moat: Best orchestration/UX wins (model is commodity)

The value shift visualized above distills the entire chapter into a single strategic directive. Today, enterprises pay for intelligence—access to a smart model is the product, and the best model is the moat. Tomorrow, intelligence is a commodity input, and enterprises pay for capabilities: agents that execute multi-step workflows, memory systems that retain institutional knowledge, routing layers that match tasks to optimal cost tiers, and integration frameworks that embed AI into existing business processes. The winning formula is already clear: BYOK (bring your own key) access to commodity models, combined with proprietary orchestration, persistent memory, and low-friction user experience. The companies that own the experience layer will capture the margin as the intelligence layer compresses to utility pricing.

12. Connections Across the Report

How the intelligence cost curve connects to every other chapter of this strategic analysis.

Chapter 1: Intelligence Yield
Direct extension
Intelligence per Dollar is the Intelligence Yield metric expressed in different notation. This chapter extends the IY curve from 2025 to 2030 with three probabilistic scenarios and February 2026 revalidation data.
Chapter 11: Model Taxonomy
Tier mapping
Budget, Balanced, and Frontier tiers map directly to the 80/15/5 task pyramid. This chapter shows when each tier reaches utility pricing.
Chapter 18: Inference Demand
Price trajectory corroboration
Corroborates the $5/M (2024) to $1.25/M (2026) to $0.40/M (2028) to $0.12/M (2030) pricing trajectory. The Jevons Paradox applies: as prices fall, demand explodes, explaining the 296T tokens/day projection by 2030.
Chapter 12: Small Models
Budget tier evidence
Budget models at $0.10–1/M delivering 83–94% of frontier capability directly supports the sub-32B enterprise thesis. Small models are the commoditization engine for the bottom 80% of tasks.
Chapter 19: Enterprise Disruption
Disruption timing
Cost curves determine when each of the $607B SaaS verticals faces disruption. When advanced professional tasks reach $0.001–$0.10 (2028–2029), the disruption wave accelerates from early adopters to the mass market.
Chapter 21: Task Map
Task cost mapping
The 98 tasks across 14 job functions and 4 model categories map directly to the 5 cost tiers. This chapter adds the time dimension: when does each task and function combination become utility-priced?
Chapter 16: GPU Demand
Supply-side mirror
Cheaper inference drives more demand (Jevons Paradox), pushing GPU demand up 68x net even with 15x efficiency gains. The cost curve here drives the demand curve there.
Chapter 22: Intelligence Routing
Cost model input
The 12-agent corporate system cost model ($1.2M Year 1, $650K Year 2, $500K Year 3) benefits directly from these pricing projections. By 2028–2029, inference costs drop to $50–100K per year, making the system accessible to mid-market companies.

What Comes Next

The intelligence cost curve establishes the economic foundation upon which every subsequent chapter of this report builds. Prices are falling at 50% per year for standard inference. The intelligence-per-dollar multiplier has already reached 8–15x the 2023 baseline—three to six times ahead of projections. And the market is bifurcating into commodity and premium tiers that require fundamentally different strategic responses. But cost deflation alone does not explain how enterprises should deploy these rapidly cheapening capabilities. For that, we must examine what happens when models that were once separated by vast performance gaps begin to converge—when last year’s frontier becomes this year’s commodity, and the window of proprietary advantage shrinks from years to months. That convergence—its pace, its implications, and the strategic playbook it demands—is the subject of Chapter 3.