Amazon Embraces Cerebras Chips for High-Speed AI Inference Solutions

Deep News03-14 11:53

Amazon Web Services has entered into a multi-year partnership with chip startup Cerebras to deploy its specialized chips in data centers for artificial intelligence inference computing. Under the agreement announced on Friday, March 13, Amazon's cloud division will integrate Cerebras chips with its in-house Trainium processors to deliver accelerated inference services. This marks another significant endorsement for Cerebras, following a multi-billion dollar agreement with OpenAI in January, as it secures backing from another major technology player.

Cerebras promotes its chips as an "ultra-fast inference solution," highlighting their ability to handle complex tasks known as "decoding"—the phase in which AI models generate responses to user queries. The company claims its technology operates up to 25 times faster than Nvidia’s GPUs in inference workloads. This collaboration represents a major expansion of Cerebras’ commercial reach. According to CEO Andrew Feldman, growing adoption of AI for increasingly complex tasks makes the Cerebras-Trainium solution an attractive offering on a leading cloud platform, providing access to a broad customer base.

The AI industry is witnessing a shift in computing demand. As user adoption of AI tools and agents grows, the focus is moving from training models to inference—where speed and responsiveness are critical. While GPUs excel in training, they are not always optimal for high-speed inference, prompting companies to diversify their chip suppliers. Amazon, the world’s largest cloud provider, has primarily relied on its Trainium chips, developed by its Annapurna Labs semiconductor unit. By incorporating Cerebras technology, AWS aims to address Trainium’s limitations in high-speed inference scenarios while introducing tiered pricing: a lower-cost, Trainium-only option and a premium combined Cerebras-Trainium solution.

Nafea Bshara, AWS Vice President and co-founder, emphasized the company’s goal of "continuously improving speed and reducing costs." Feldman added that for applications requiring rapid token generation—such as code generation or AI agent tasks—Cerebras aims not only to be the fastest but to set an industry standard.

The deal reflects mounting competition for Nvidia, as specialized chip designers target specific use cases with faster, more cost-effective alternatives. In December, Nvidia reportedly signed a $20 billion licensing agreement with startup Groq and plans to launch a new inference-focused system using Groq’s technology. For Cerebras, the AWS partnership comes at a pivotal time. In February, the company raised $1 billion in new funding, bringing its total raised to $2.6 billion and valuing it at approximately $23 billion. Earlier, in January, OpenAI committed to a deal worth over $10 billion to power its flagship chatbot using Cerebras chips, with plans to deploy up to 750 megawatts of computing capacity.

Cerebras is backed by prominent investors including Fidelity Management, Atreides Management, Benchmark, Tiger Global, and Coatue, though it previously faced fundraising challenges. The company filed for an IPO in September 2024 but withdrew its application about a year later, with no current timeline for reviving its public listing plans.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Comments

We need your insight to fill this gap
Leave a comment