Where the Sweetest Margins Live in Jensen’s 5-Layer Cake
The two margin levers driving the entire AI economy
Apr 23, 2026
Jensen Huang recently described the AI economy as a five-layer cake: energy, chips, infrastructure, models, applications. He used this to sell NVIDIA’s position in the stack. But if you stare at the economics of each layer long enough, a more interesting pattern emerges.
Software now has a marginal cost for each user. There are only two margin levers in the entire AI economy, and they apply at every layer.
Each layer will have to converge on the same business model: usage-based pricing with cost-plus margins. And the size of the “plus” — the margin you get to keep — is determined by exactly two things:
How differentiated your offering is vs. everyone else selling the same underlying commodity unit in their own brand wrapper.
How much you can drive down the cost of producing that commodity unit of value without your customer noticing any change in quality.
The Commodity Unit at Every Layer
Each layer of the stack has an atomic unit of value. Each has a cost driven by usage. Each is, at its core, a commodity.
Energy: The commodity unit is an electron, priced per kilowatt-hour. Utility-scale solar PPAs run $0.04-0.06/kWh depending on region. Retail rates for data centers run $0.10-0.17+/kWh. The spread between those numbers is where margin lives.
Chips: The commodity unit is a processor, priced per chip. An NVIDIA H100 costs ~$3,320 to manufacture and sells for $25,000-$40,000. That’s an 88% gross margin on the chip itself, 75% at the company level (the fattest in the stack right now).
Infrastructure: The commodity unit is a GPU-hour, priced per hour of compute—but increasingly measured in “goodput,” or the amount of usable work delivered. H100 cloud rates have crashed from $8-12/hr at peak to $1.49-3.90/hr today — a 44-75% decline depending on provider. The perception of supply uncertainty and market certainty of insatiable demand makes this layer attractive in the near term, with plenty of opportunity for differentiation.
Models: The commodity unit is a token, priced per million tokens. GPT-4-equivalent performance went from ~$36/million tokens at GPT-4’s launch in early 2023 to $0.40/million tokens today via GPT-4.1 mini. A ~1,000x cost decline in three years when you factor in blended rates and efficiency gains.
Applications: The commodity unit is now intelligence measured in tokens. Pricing is still messy — 61% of SaaS companies now use hybrid models blending seats with usage — but it’s migrating toward consumption-based pricing because, for the first time in software history, the marginal cost of serving a user is non-trivial.
Traditional SaaS had near-zero marginal cost. Every new user was almost pure margin. AI applications burn tokens on every interaction. The economics are fundamentally different, and the pricing has to follow. Applications are not dead, they just have to be more compelling to buy than build and be priced on usage not subscription.
The Two-Axis Framework
Margin in this stack comes down to two variables: how much customers are willing to pay up (driven by differentiation), and how low you can push your costs down (driven by infrastructure innovation). So then the interesting question for each layer is: how much room is there to move on each axis?
Some layers have enormous room for product differentiation. Others are stuck selling something indistinguishable. Some layers have wide-open opportunities for cost innovation. Others are constrained by physics or regulation.
Map every layer on these two axes and you get a clear picture of where durable margin will live and where it won’t.
The Bookends Win
When I do this analysis, it seems the two layers with the most durable margin potential are the bookends of the stack—energy and applications. The bottom and the top. The dumbest commodity and the smartest one.
Energy
Energy is the only regulated layer in the stack. No one needs government permission to build a model or launch an app, but you need permits, interconnection agreements, and regulatory approval to generate and transmit electrons. That regulatory layer is a moat that doesn’t exist anywhere else in the stack (for now).
Changes in demand drive opportunity and the energy market is fully in flux. Human electricity demand is variable — it peaks on hot afternoons, dips at 3am, surges in winter and summer. The entire grid, and the entire business model of power generation, is built around this variability.
AI demand is different. Data centers run 24/7 at near-constant load. A power provider who can optimize generation for a flat, high-utilization, baseload demand has a structurally different cost curve than one built to absorb the peaks and valleys of human consumption. The provider who figures out how to serve AI’s specific demand profile cheaply and reliably earns a durable margin that isn’t easily competed away — because it’s protected by atoms, regulation, and multi-year capital cycles.
Applications
At the other end of the stack, applications have the most room for differentiation because they’re the hardest to compare. An electron is an electron. Chips have specific performance characteristics that are objectively measurable and a model token is roughly a token. But “how well does this tool fit my workflow” is a judgment call that varies by customer, use case, and taste.
Applications build behavioral lock-in through workflow and data moats, and in some cases network effects. They are the largest and most malleable value creation layer for the end customer. And the gap between “wrapping an API” and “building a genuinely valuable AI-native workflow” is where the margin opportunity lives.
Today, application margins are “terrible” — 20-60% gross margins vs. 70-90% for traditional SaaS. But the current margin compression is driven by temporary conditions:
1. Binary model dependence
Most AI applications have been hostage to a single frontier model provider for any meaningful intelligence. But as all models get smarter and the open-source and local architecture landscape expands, model choice widens. An application built exclusively on the latest frontier model from OpenAI will lose to one that delivers the same customer value using open-source, small models — because the cost structure is fundamentally different.
2. Lack of sophistication
Most AI applications so far have been “model wrappers” — thin UIs over an API call. Model wrappers don’t deserve thick margins and won’t earn them. But as founders learn how to build high-value, AI-native applications with real workflow intelligence, the value gap between wrapper and product will widen.
3. Legacy pricing models
AI-native applications have inherited per-seat and per-month pricing from traditional SaaS. But that pricing model doesn’t work when input costs per use are meaningful for the first time in the history of the software application industry. Usage-based pricing is inevitable not because it’s trendy, but because the cost structure of this new, intelligent, proactive software world demands it.
The Middle Layers Get Squeezed
If the bookends have durable margin, the middle layers—chips, infrastructure, models—face a harder road ahead.
Chips
NVIDIA’s 75-88% gross margins are extraordinary. But they’re not structural to the chip layer. They were earned through a specific set of innovations: supply chain lock-up, CUDA’s software ecosystem, and a capability lead in memory, efficiency, and latency.
Capital intensity is not a moat in this market. Every serious competitor — AMD, Google, Amazon, Microsoft — has the money to invest. What’s scarce is innovation.
The chip layer offers a rich set of innovation opportunities — energy efficiency, heat management, time-to-first-token, memory architecture — that can earn thick margins for whoever leads. But the layer itself doesn’t protect you. The innovation does. And innovation advantages are temporary unless you can keep compounding them. NVIDIA has done this in spades and there’s a trillion and 1 reasons to believe this continues for awhile.
Infrastructure
Cloud GPU infrastructure is in a brutal price war. Rates have dropped 44-75% in a year. The service is increasingly commodity. The differentiation vectors—scale, reliability, security—are real but narrow.
Right now, we live in a world of “more inference, more better.” Demand vastly exceeds supply, so infrastructure providers can charge premium prices. But supply will catch up. And when it does, infrastructure follows the same arc as electricity markets and oil markets: build out massive capacity, experience wild price volatility, then develop spot markets, futures, capacity auctions, and financial instruments to manage it all.
The financialization of compute is coming. When it arrives, the only advantages in this layer will be economies of scale and cost innovation. The winners will look more like airlines than tech companies—capital-intensive, operationally complex, competing on route efficiency and load factor while customers choose almost entirely on price and availability.
Models
The model layer can earn thick margins through breakthrough innovation — a new architecture, a new capability, a new modality. But it can’t keep them. The pattern is already clear: a breakthrough creates a temporary window of enormous pricing power, then gets replicated, open sourced, or leapfrogged within 12-18 months.
There’s one exception. If someone figures out true continuous learning — where inference informs training, where every API call makes the model smarter — that would create genuine network effects at the model layer. It would turn usage into a compounding advantage rather than just a revenue event. But that doesn’t exist yet.
Until it does, the model layer is a treadmill. You have to keep innovating just to maintain your margin. And it’s telling that the model companies themselves — OpenAI, Anthropic, Google — are all racing to build applications. They’re fleeing upward in the stack to find durable margin because they know the model layer alone won’t sustain it.
Note: I can see a world where the middle three layers—chips, infrastructure, models—start collapsing into each other. Google is already there with TPU plus GCP plus Gemini, and NVIDIA is pushing in that direction with CUDA and its full-stack ambitions. OpenAI and Anthropic could limit access to their most powerful models to applications hosted and managed on their proprietary infrastructure.
The Specialists Win
If every layer converges on usage-based pricing with cost-plus margins, and those margins thin as commoditization accelerates, then the “plus” that makes a good product a great business has to be earned. Continuously. This creates a market dynamic where the specialists will win.
At every layer, the bar for differentiation is rising and the vectors of competitive advantage are known. You have to be better at discovering cost advantages that scale. You have to be better at meeting customer needs in an N-of-1 way. You have to be better at navigating regulation, building workflow lock-in and compounding data advantages.
As margins get thinner and harder to defend, the winners will be the best operators with the deepest domain expertise and the strongest bias to action. Because as every layer gets cheaper, faster and more available by the quarter, what justifies your margin—your “plus”—is the judgment you layer on top.
Incredible founders with a clear north star and burning urgency will have an advantage precisely when it’s harder to build a sustainable business. Because in a commodity world, the hardest thing to commoditize is the human who knows exactly what to build, for whom, and why it matters.


