Computing Power Hierarchy Intensifies as Tech Giants Prioritize Internal Needs, Leaving Startups Struggling for GPU Access

Deep News04-24 21:46

Major cloud providers like Microsoft and Amazon are prioritizing the allocation of NVIDIA GPUs for their internal teams and top-tier clients, creating a severe shortage for small and medium-sized AI startups. This competition for computing resources is sparking a new structural crisis in Silicon Valley.

According to a report, the supply crunch is impacting numerous AI startups backed by top-tier firms such as Sequoia Capital, Founders Fund, General Catalyst, and Andreessen Horowitz. Hemant Taneja, Managing Partner at General Catalyst, has circulated a questionnaire to founders in his portfolio, directly stating, "We are hearing from many that compute—especially GPU access—is one of the biggest bottlenecks this year."

Tightening supply is directly driving up rental prices, boosting profit margins for cloud service providers while significantly increasing operational costs for startups. Concurrently, Microsoft Azure has informed internal staff that customers should expect extended wait times, likely persisting until at least the end of 2026. This reshaping of the computing power landscape is profoundly affecting the entire AI startup ecosystem.

History is repeating, but with greater intensity. The current GPU shortage closely resembles the situation in early 2023, when cloud providers similarly diverted resources to prioritize internal teams and key clients like OpenAI. This led venture capital firms, including Andreessen Horowitz and Index Ventures, to eventually create their own GPU resource pools to aid their portfolio companies.

However, the current situation is even more severe. The explosive demand for AI programming tools is exacerbating the shortage. Surging compute needs from large AI developers like Anthropic are further squeezing the capacity available for smaller clients. Another structural factor worsening the shortage is the expiration of two-to-three-year cloud service contracts previously signed by many AI startups. Cloud providers are using this opportunity to offer capacity to clients at higher prices or reallocate it to higher bidders.

Microsoft has established a clear hierarchical system for allocating compute resources. According to a Microsoft employee with knowledge of the matter, Azure segments customers into three tiers: Tier 1 consists of roughly 1,000 top-spending cloud clients who receive priority access; Tier 2 includes clients with significant, though smaller, spending who still have dedicated sales representatives; Tier 3 comprises smaller businesses whose relationships are managed through Microsoft partner distributors like CDW.

Regarding chip access, Microsoft has recently begun requiring clients seeking NVIDIA's Blackwell chips to commit to leasing at least 1,000 chips for a minimum of one year, involving contracts worth tens of millions of dollars. Even for older-generation NVIDIA chips, clients face waits of several weeks or months. More notably is Microsoft's "use it or lose it" policy: for clients with GPU access via pay-as-you-go models, Microsoft monitors utilization rates. If servers sit idle for even a few hours, access may be revoked. Startups receiving free compute credits through the "Microsoft for Startups" program are also subject to this rule—failure to utilize chips sufficiently can lead to revoked GPU access.

The plight of image-generation AI startup Krea is illustrative. The four-year-old company, which has raised $83 million from investors like Andreessen Horowitz and Bain Capital Ventures, secured a six-month contract six months ago for hundreds of NVIDIA Blackwell chips at $2.80 per chip per hour. However, when Krea recently sought additional servers to train new models from scratch, the situation deteriorated rapidly. Co-founder and CEO Victor Perez reported that sales representatives from some cloud providers stopped answering calls. Upon callback, they not only quoted significantly higher prices but also demanded three-year contracts just to negotiate. "Some just disappeared, some said they had no inventory, and others tried to make us accept extremely harsh terms," Perez said. Ultimately, Krea signed a one-year contract at $3.70 per hour, a 32% price increase. Meanwhile, another founder seeking to lease a tight cluster of nearly 1,000 GPUs was informed by an NVIDIA salesperson last week that finding such a cluster at a major cloud provider is extremely difficult—the daily rental cost for such a cluster would exceed $70,000.

Data from GPU cloud service provider Lightning AI underscores the supply-demand tension. The company currently has about 40,000 GPUs online, but has pending orders from approximately 40 clients totaling demand for around 400,000 GPUs. CEO Will Falcon stated that prices have risen over 25% in the past six months, from about $1.60 per hour to over $2.00, and even higher in some cases.

Faced with long waits and rising rental costs, some startup founders are considering bypassing cloud providers altogether by purchasing GPUs directly. Collin McLelland, founder of AI agent startup Collide, said his company is contemplating spending around $500,000 to buy NVIDIA GPUs and operate them itself. Collide, which raised a $14 million seed round last year focusing on AI agents for the oil and gas industry, plans to rent space in a data center to host its self-owned GPUs, avoiding the wait times and uncertainties of the rental model. "For us, the biggest risk is needing compute and not having it," McLelland said. "Most people are just scared of hardware. I've owned oil wells, so I'm numb to it." While the upfront cost of purchasing is significantly higher than short-term rental, he believes the total cost of ownership over multiple years is lower and eliminates dependency on cloud providers.

For cloud providers, the supply crunch has brought welcome profit improvements. Some providers had faced margin pressure on GPU services, but the current supply-demand imbalance allows them to increase rental prices, boosting margins. However, the long-term implications for the AI startup ecosystem are concerning. The concentration of compute resources towards top clients creates higher barriers and greater uncertainty for smaller startups in model training and product iteration. General Catalyst is exploring solutions such as establishing shared compute pools or negotiating directly on behalf of startups to secure GPU resources—an approach reminiscent of the venture capital firms' GPU pool creation in 2023, highlighting that compute access has become an unavoidable structural challenge within the AI investment landscape.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment