AI Chip Boom Strains Supercomputing Supply: US National Labs Seek Alternatives as Startups Gain Traction

Stock News05-18 21:30

In recent years, the explosive growth of artificial intelligence (AI) technology has led major chip giants like NVIDIA and Advanced Micro Devices to shift their R&D focus and production capacity towards the lucrative field of AI low-precision computing. However, this strategic pivot is creating an unexpected ripple effect: U.S. national laboratories, struggling to procure chips that meet their high-precision scientific computing needs, are beginning to turn their attention to emerging chip startups.

It is reported that Sandia National Laboratories, located at Kirtland Air Force Base in New Mexico, is testing chips from Israeli startup NextSilicon, seeking new pathways to break through supply chain constraints.

As major manufacturers pivot towards AI, the demand for high-precision computing is being sidelined. Sandia National Laboratories is one of the three primary U.S. labs responsible for nuclear weapons development and maintenance. The liquid-cooled supercomputer within its facility routinely handles extremely complex simulations—from modeling the atmospheric trajectory of hypersonic nuclear weapons to simulating the detonation of one nuclear warhead in close proximity to another.

For over a decade, the chips processing these highly classified and demanding tasks have primarily come from mainstream semiconductor companies like NVIDIA and Advanced Micro Devices. However, Steve Monk, director of the high-performance computing team at Sandia Labs, stated that as mainstream chip companies increasingly orient their product designs towards AI and face supply shortages, the lab is under growing pressure to obtain chips that meet its high-precision scientific computing requirements. Pressure from both the supply chain and computational capabilities has raised concerns within the team about future mission delivery capacity.

The core divergence lies in a technical metric known as "double-precision floating-point computing" (FP64). For scientific computations like nuclear weapons physics simulations, chips need to handle extremely large and small numbers simultaneously without losing precision. For years, NVIDIA and Advanced Micro Devices have vied for leadership in accelerating such computations, securing numerous supercomputing contracts with universities and government labs as a result.

However, AI training and inference workloads do not rely on double-precision calculations, causing the balance in chip design to shift. FP64 is a key technology underpinning modern aviation, rocket launches, vaccine development, and the proper functioning of nuclear weapons. It can represent over 18.44 quintillion unique values and is considered the "gold standard" in scientific computing. In contrast, modern AI models are typically trained using FP8 precision, which can represent only 256 unique values.

NVIDIA's recently announced Rubin GPU, while achieving a qualitative leap in AI computing power—with inference speeds reaching 50 petaFLOPS, 2.5 times that of the previous Blackwell generation—has a peak FP64 performance of approximately 33 teraFLOPS. This is actually 1 teraFLOPS lower than the H100 launched four years ago.

Although NVIDIA introduced FP64 software emulation technology based on the Ozaki scheme, claiming it can achieve matrix performance up to 200 teraFLOPS within the CUDA library—4.4 times the hardware performance—Advanced Micro Devices has raised doubts. Advanced Micro Devices researcher Nicholas Malaya pointed out that while this emulation method performs acceptably in some benchmarks, its reliability in real-world physical simulations, such as materials science or combustion codes, is questionable. Issues include potential non-compliance with IEEE standards and a doubling of memory consumption.

Ian Cutress, principal analyst at chip consultancy More Than Moore, noted that NVIDIA's upcoming Rubin chip shows a decline in double-precision performance by some metrics, raising concerns among many scientists in the high-performance computing field.

The strategic adjustments by chip giants are opening market opportunities for emerging companies like NextSilicon. Founded in 2017, this Israeli startup, after eight years of R&D, has completed approximately $303 million in seed and subsequent venture funding rounds, with its valuation once reaching around $1.5 billion.

In stark contrast to the traditional GPU or CPU-based technology paths of NVIDIA and Advanced Micro Devices, NextSilicon's flagship "Maverick-2" chip employs an intelligent dataflow architecture. This allows for runtime dynamic reconfiguration and optimization through software-defined dataflow hardware, enabling the chip to be reprogrammed in real-time for more efficient operation. In terms of power efficiency, the dataflow architecture reduces the time and energy consumed by moving data back and forth between the computing system's memory.

James Laros, a senior scientist at Sandia National Laboratories overseeing the project testing new computing architectures, gave high praise: "NextSilicon's performance results are impressive, demonstrating genuine potential to enhance computing power without requiring extensive code modifications."

On Monday, Sandia National Laboratories, NextSilicon, and Penguin Solutions—which assisted in integrating NextSilicon's chips into the supercomputer—jointly announced that the supercomputer system equipped with NextSilicon chips has passed key technical milestones in a series of general-purpose supercomputing tests. This qualifies the system for further testing this autumn with more challenging computational tasks closer to the actual work of nuclear security.

Laros stated that the lab's active collaboration with small and medium-sized chip companies like NextSilicon is fundamentally aimed at building a diversified chip procurement system. This ensures the lab can stably and consistently obtain computing chips suitable for its research tasks, even as leading chip companies shift their strategic focus. "We must maintain viable options to accomplish our mission, because there is no fallback for this mission," Laros emphasized.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

AI Chip Boom Strains Supercomputing Supply: US National Labs Seek Alternatives as Startups Gain Traction

Comments