As Microsoft and other cloud service providers prioritize their GPU reserves for internal teams and major enterprise clients, AI startups are struggling to access NVIDIA graphics card resources. Small and medium-sized companies are forced to compete for the remaining computing power servers at high prices, placing them in an increasingly difficult position.
Founders and investors from multiple related companies have revealed that this round of computing power shortages has affected several well-known, well-funded AI startups. Their backers include top-tier institutions such as Sequoia Capital, Founders Fund, General Catalyst, and Andreessen Horowitz. An informed source stated that, pressured by the computing crunch, General Catalyst partner Hemant Taneja has distributed a survey to portfolio company founders to assess their GPU access situation.
In the survey, Taneja wrote, "We have received significant feedback that access to computing resources, particularly GPU availability, has become one of the biggest bottlenecks for growth this year."
The current market situation is highly reminiscent of early 2023: at that time, major cloud providers reclaimed public cloud computing resources to prioritize their internal operations and key clients like OpenAI. To alleviate the shortage, venture firms like Andreessen Horowitz and Index Ventures began building their own GPU resource pools to support their portfolio companies.
However, unlike 2023 when AI applications were still nascent, the current explosion in demand for AI code development tools has further intensified the chip shortage. Cloud provider executives and startup leaders indicate that with soaring computing demands from top AI developers like Anthropic and makers of automated coding tools, cloud platforms are significantly reducing GPU allocations for smaller clients.
In response to the crisis, General Catalyst is planning solutions, including setting up shared computing pools and negotiating directly on behalf of portfolio companies, to help them secure stable access to GPU resources.
The supply-demand imbalance for chips has allowed cloud service providers to raise rental prices for NVIDIA-powered servers. Previously, many cloud providers struggled to turn a profit on their GPU businesses; these price increases have significantly improved their profit margins.
However, rising costs are severely squeezing the viability of AI companies. Image generation AI model developer Krea is a prime example. This four-year-old startup has raised a total of $83 million from investors including Andreessen Horowitz and Bain Capital Ventures.
Krea's co-founder and CEO, Víctor Pérez, explained that six months ago, multiple cloud providers were competing for their business. The company secured a six-month lease for hundreds of NVIDIA Blackwell chips at a rate of $2.80 per chip per hour. Recently, however, when the company sought to acquire more computing power to train large models from scratch, sales representatives from several cloud providers became unresponsive or gave evasive answers.
Even when contact was eventually made, providers either quoted significantly higher prices or insisted on mandatory three-year long-term contracts.
"Some salespeople simply disappeared, others said no resources were available, and some tried to force unreasonable contract terms," Pérez stated.
He added that while evaluating various computing cluster options, the available resources were snapped up by other customers within just a few days.
Ultimately, Krea was forced to sign a new one-year contract to renew the lease for hundreds of the same chips, with the unit price rising to $3.70 per hour—a 32% increase. Compared to other industry quotes, this price was still relatively low.
Pérez admitted, "The biggest threat to us isn't a small price hike, which we can manage. The real existential risk is being unable to reliably access the computing power needed to run our platform and train our models. Supply disruption is a fatal blow."
Another startup founder reported plans to lease nearly a thousand high-interconnect cluster GPUs. An NVIDIA sales representative directly informed them that top cloud providers face extreme resource constraints, with a massive queue of customers competing for access, making it difficult to meet their demand. The daily rental cost for such a cluster exceeds $70,000, and they are still struggling to find available resources.
**Contracts Expiring Simultaneously, Intensifying Supply-Demand Conflict**
Compounding the problem, leading cloud providers like Microsoft, Amazon, and CoreWeave have secured multi-billion dollar long-term deals, locking up vast GPU resources for companies like Anthropic and OpenAI. Despite this, Anthropic itself remains deeply constrained by computing shortages due to its explosive business growth.
Another core driver of the shortage is the simultaneous expiration of two-to-three-year cloud service contracts signed by many AI startups in their early years. Cloud providers are using this opportunity to significantly raise prices or reallocate existing computing capacity to higher-paying clients.
An informed source revealed that Microsoft Azure management has internally notified employees that the situation of long-term computing scarcity is expected to persist at least until the end of 2026.
The CEO of an AI cloud service provider disclosed a plan to transfer a GPU cluster from a company whose contract had expired to a new client willing to pay a roughly 30% premium. After urgent negotiations by the original company to retain the resources, a deal was finally struck at a higher price.
Will Falcon, CEO of GPU cloud service provider Lightning AI, stated that while the platform operates about 40,000 GPUs online, it has a backlog of rental requests from nearly 40 companies, representing a total shortfall of 400,000 chips. Over the past six months, computing rental prices have surged by over 25%, with the hourly rate per chip rising from $1.60 to over $2.00, with even higher premiums for popular resources. The platform's primary chips are NVIDIA's previous-generation Hopper architecture products.
**Microsoft Implements "Reclaim on Idle" Control Policy**
According to internal Microsoft employees, pressured by the computing demands of major clients and internal projects, Azure has comprehensively tightened server rental quotas for small and medium-sized customers. Many SMEs now face wait times of several months to expand their GPU resources.
Microsoft has long prioritized allocating its top-tier flagship chip clusters to OpenAI and its own operations, while continuously building dedicated computing clusters for Anthropic. The GPU allocation for regular customers depends entirely on their spending level on Azure and the financial commitment in new computing contracts.
Internal information indicates that in recent months, Microsoft has imposed a hard requirement: clients wishing to rent NVIDIA's high-end Blackwell chips must commit to purchasing at least 1,000 chips with a contract term of one year or more, with the minimum cost per contract reaching tens of millions of dollars.
Even for older-generation NVIDIA chips, the lead time for ordinary customers on the Azure platform can stretch to weeks or even months.
Microsoft employs a tiered system to manage customer priority: Tier 1 consists of approximately a thousand top-spending enterprises with annual high consumption, granting them priority access to computing resources; Tier 2 includes medium-spending clients served by dedicated sales representatives; Tier 3 comprises small and micro-enterprises handled by channel partners.
Customers who have not signed large reserved capacity contracts and use pay-as-you-go models face extended queuing periods. Simultaneously, Microsoft strictly monitors computing utilization rates; even short periods of idle time lasting a few hours can result in the revocation of GPU access rights.
Furthermore, Microsoft is phasing out free computing benefits under its startup support programs. Companies that received server credits through the "Microsoft for Startups" program risk having their GPU access permanently revoked if they fail to utilize the chips at full capacity.
**Building In-House Capacity Emerges as a New Path**
Faced with mounting restrictions from cloud providers, some startups are beginning to bypass cloud platforms altogether by building their own computing infrastructure.
Collide, an AI agent developer for the oil and gas industry that completed a $14 million seed funding round last year, is one example. Founder Colin McClelland, frustrated by queuing times and contract limitations, stated the company plans to invest approximately $500,000 to purchase NVIDIA GPUs and build a private computing cluster. The company is considering directly leasing data center space to deploy its own hardware.
McClellan believes that while the upfront cost of building hardware is significantly higher than leasing in the short term, it completely avoids the risks of supply disruption and price volatility. In the long run, the total cost of multi-year leasing can be higher, making the self-build model more cost-effective.
"A lack of computing power at a critical moment can be devastating for a company," he said. "While most teams fear hardware maintenance, my background in operating oil well projects has accustomed me to managing heavy asset models."
Comments