Leaked internal documents reported by TechCrunch reveal how much OpenAI has paid Microsoft under their long running Azure partnership, showing a revenue share of about 20 percent and sizable per inference cloud fees. These numbers matter because they highlight AI cloud costs 2025 and how inference cost can become the dominant operating expense for generative AI services.
Background
Microsoft and OpenAI have collaborated since 2019, with Microsoft providing cloud compute, data center capacity, and engineering support in exchange for strategic access and commercial rights. In cloud economics terms, revenue share reduces gross margin while securing capacity and preferred hardware access. Inference cost refers to the compute and energy used each time a model serves a user request, and for large models and high volume services it can exceed training cost over time.
Key details from the leaks
- Revenue share about 20 percent: Early reporting cites an approximate 20 percent revenue share that OpenAI pays Microsoft for commercial use of Azure infrastructure. Across high volume AI services, that share represents a meaningful slice of gross receipts.
- Substantial inference and compute fees: The documents reportedly include per inference charges and other line items that add up to large operating costs for model serving at scale, highlighting AI inference workload costs and GPU cloud pricing as central drivers.
- Strategic and contractual depth: The partnership spans multiple years, includes engineering collaboration and preferred hardware access, and illustrates how platform economics shape AI infrastructure pricing.
Implications for the industry
The leaks shine a light on several trends that matter to decision makers focused on cloud cost optimization and FinOps for AI cloud.
- Profitability pressure: A roughly 20 percent revenue share plus heavy inference fees can materially erode margins. Companies may have to raise prices, accept slimmer margins, or invest in model efficiency such as pruning, quantization, and request caching.
- Strategic lock in to hyperscalers: With the largest cloud providers controlling much of the market, deals with hyperscalers influence who can scale profitably. That concentration gives providers leverage in pricing and capacity allocation.
- Competitive and regulatory risk: Transparent terms fuel debate about unfair advantage and market access. Regulators and rivals may scrutinize how platform deals affect competition and pricing for end customers.
- Operational choices: Teams will weigh negotiating better cloud terms, building proprietary infrastructure, or optimizing models and deployment patterns to lower per request cost.
How to optimize AI cloud costs in 2025
For business and technical leaders, the leaks are a prompt to put cloud economics at the center of product strategy. Practical actions include:
- Reassess unit economics: Model per request cost into pricing and profitability scenarios instead of focusing only on model accuracy or features.
- Negotiate cloud agreements: Seek committed use discounts, volume based pricing, and terms that reduce per inference fees. Compare Azure, AWS, and Google Cloud on AI infrastructure pricing and GPU cloud pricing.
- Invest in efficiency: Prioritize model tuning, quantization, caching, batching, and using smaller specialized models where appropriate to lower inference cost.
- Adopt FinOps practices: Implement cost attribution, real time monitoring, and automated optimization to find savings and improve ROI on AI spend.
- Consider diversification: Multi cloud and edge deployments can provide bargaining leverage and resilience against vendor lock in.
Conclusion
The TechCrunch leaks offer a rare window into the cloud economics underpinning modern AI services. Significant revenue share obligations and steep inference costs show that controlling AI cloud costs is a strategic priority. Expect follow up reporting, contract renegotiations, and increased focus on FinOps as companies try to balance performance, cost, and competitive positioning.