Jensen Huang's Fiery Response: Top AI Firms Abandoning CUDA? "Your Premise is Wrong"

Deep News10:16

In a nearly two-hour interview on the "most popular podcast in Silicon Valley," Jensen Huang directly addressed various questions about NVIDIA's journey to a $4 trillion market capitalization during the large model era. The discussion was packed with significant insights.

Within half a day of its release, the video garnered over 100,000 views on YouTube alone. Observers noted it was rare to see Huang so impassioned.

Here are the key takeaways for a quick summary:

The input is electrons, the output is tokens. NVIDIA operates in between.

AI will not make software cheap or homogeneous. The proliferation of Agents will rapidly increase tool deployment rates, leading to faster growth.

TPUs pose no threat to NVIDIA. NVIDIA GPUs make inventing new algorithms easier.

NVIDIA's earlier decision not to invest in companies like OpenAI was a "misjudgment" and also "unavoidable."

NVIDIA's philosophy is to do "what is necessary, but as little as possible."

NVIDIA never prioritizes GPU allocation based on higher payments.

Even without deep learning, NVIDIA would still focus on accelerated computing.

More details are available in the full transcript, which includes Huang's vigorous rebuttals.

**NVIDIA's Moat**

Q: Software companies are experiencing valuation drops because people believe AI will make software cheap. A seemingly naive view is that NVIDIA is fundamentally a software company, while manufacturing is handled by others. If software becomes cheap, could NVIDIA also lose its moat?

Jensen Huang: Ultimately, something must convert electrons into tokens. This conversion process, and the time that makes tokens more valuable, is difficult to homogenize or make cheap. The journey from electrons to tokens is incredible. The art, engineering, science, and invention invested in making one token more valuable than another are evident. We are witnessing this process in real-time, and everything involved is far from fully understood; the journey is far from over. I am skeptical that the scenario you describe will happen.

Of course, we will make this process more efficient. Your question aligns with NVIDIA's operational mindset: input is electrons, output is tokens. NVIDIA exists in between. Our job is to exert the necessary effort while intervening as little as possible to maximize this conversion capability. "As little as possible" means we leave tasks to partners, making them part of the ecosystem when we don't have to do them ourselves.

Today, NVIDIA has the largest partner ecosystem, covering upstream and downstream supply chains, all computer companies, application developers, and model creators. Think of AI as a five-layer cake; our ecosystem covers every layer. We aim to do less, but the part we must do is exceptionally difficult. I don't believe this part will become homogeneous.

In fact, I don't think software companies or tool developers will lose their moat either... Most software companies today are tool developers. For example, Excel is a tool, PowerPoint is a tool, Cadence creates tools, Synopsys creates tools. Contrary to popular belief, I think the number of Agents will grow exponentially, and the user base for tools will also grow exponentially. The deployment volume of these tools will likely surge.

Today, we are limited by the number of engineers. But in the future, numerous Agents will support engineers, exploring design spaces in unprecedented ways. The tools we use today will not be abandoned. I believe the proliferation of tools will lead to rapid growth for software companies. This hasn't fully happened yet because Agents are not yet efficient enough at using these tools. Either these companies will build their own Agents, or Agents will evolve to use tools efficiently. I think both will happen together.

Q: In your recent filings, NVIDIA's procurement commitments for foundries, memory, packaging, etc., are close to $100 billion. SemiAnalysis suggests your actual commitments reach $250 billion. One interpretation is that NVIDIA's moat lies in locking down the supply chain for scarce components early. Is this NVIDIA's biggest moat for the coming years?

Jensen Huang: This is something we can do that others find difficult. We make massive commitments upstream. Some commitments are explicit, like the procurement contracts you mentioned. Others are implicit. For example, many upstream investments are driven by our supply chain partners because I tell CEOs of these companies: "Let me tell you how big this industry will be, let me explain why, let me work through it with you and show you what I see."

This way, I continuously communicate, motivate, and align with CEOs across various upstream sectors. They become willing to invest. Why invest for me and not others? Because they know I can absorb their supply and sell it downstream. The reality is, NVIDIA's downstream supply chain and demand scale are enormous, making them willing to invest.

If you've attended GTC, you'd be amazed by its scale and attendee count. It's a full 360-degree view, gathering the entire AI universe. People gather because they need to understand each other. I bring them together so upstream can see downstream, downstream can see upstream, and everyone can see the latest AI advancements. Most importantly, they meet AI-native companies and startups, witnessing everything I've told them firsthand. I spend significant time directly or indirectly communicating future opportunities to our supply chain, partners, and ecosystem.

Some say, "Jensen, most of your keynotes are just one announcement after another." Actually, part of my presentation is deliberately challenging, almost like a lecture. I need to ensure our entire supply chain—upstream and downstream—understands the changes happening, why they're happening, when, and how big, and can reason systematically like I do.

Returning to the moat question, we are preparing for the future. If our business reaches a trillion-dollar scale in the coming years, our supply chain is ready. Without our market reach and business driving force... just as cash flow has its liquidity, the supply chain has its fluidity. Without sufficient business fluidity frequently, no one would build a supply chain to support architectural scaling. We can handle this scaling because our downstream demand is massive. Everyone has witnessed this. It enables us to do what we do at our current scale.

Q: I want to understand more specifically if upstream can keep up with demand. In recent years, your annual revenue has doubled, and the flops you provide globally have grown more than threefold.

Jensen Huang: Doubling revenue at this scale is indeed incredible.

Q: It is. But regarding logic chips, you are TSMC's largest customer for the N3 node and a major customer for N2. SemiAnalysis predicts AI will account for 60% of N3 capacity this year and 86% next year. If you already occupy most of the capacity, how can you continue doubling? Must AI compute growth slow due to upstream constraints? Do you see ways around this? How can fab capacity double every year?

Jensen Huang: To some extent, instantaneous demand has exceeded the total global upstream and downstream supply. At any moment, we might be limited by the number of "plumbers." That does happen.

Q: Then next year's GTC should invite the plumbers (laughs).

Jensen Huang: That's a good idea (laughs). But having demand that outstrips industry supply is a good thing. Obviously, the opposite is bad. If the gap between supply and demand is too large, the industry quickly converges on the gap. For example, you'll find hardly anyone talks about CoWoS packaging technology anymore.

Q: Why?

Jensen Huang: Because the industry has invested heavily in it over the past two years, with its scale even doubling several times. We're in a pretty good state now. TSMC knows CoWoS supply must keep pace with logic and memory demand. They are scaling CoWoS and future packaging technologies to align with logic chip development. This is excellent because CoWoS and HBM were once considered "specialty technologies." Now they are mainstream computing technologies.

Certainly, we can now influence the supply chain more broadly. Early in the AI revolution, I was saying much of what I say now. Some believed and invested, like Sanjay at Micron and his team. I was very impressed with that meeting; I clearly explained why things were happening and future predictions. They doubled down, partnering with us on LPDDR and HBM memory. This undoubtedly brought huge growth to their company. Some came later, but they are all here now.

We pay extreme attention to every bottleneck. Now we anticipate these bottlenecks years in advance. For example, our collaborative investments with Lumentum, Coherent, and the silicon photonics ecosystem in recent years have truly reshaped the supply chain. We built a complete supply chain around TSMC, collaborated with them on the COUPE project, invented a bunch of new technologies, and licensed patents to the supply chain to keep it open.

We help partners scale capacity through new technologies, workflows, inspection equipment, and investments. You can see we are trying to ensure the supply chain supports this scaling through ecosystem building.

Q: It seems some bottlenecks are easier to solve than others. Scaling CoWoS to a larger scale might be relatively easier—

Jensen Huang: By the way, I picked the hardest example.

Q: Which one?

Jensen Huang: Plumbers and electricians.

This is also something that worries me about certain "doomsayers" who always talk about jobs ending and positions disappearing. If we discourage people from becoming software engineers, we will face a shortage of software engineers.

Similarly, similar predictions were made a decade ago. Some pessimists said, "Whatever you do, don't become a radiologist." You might still find videos online saying radiologists would be the first to disappear. What happened now? We are precisely short of radiologists.

Q: Back to the point about some bottlenecks being easier to solve. How do you manufacture twice the logic chips every year? Logic and memory chip scaling are limited by EUV lithography. How do you achieve a doubling every year?

Jensen Huang: This can be scaled quickly. None of this is difficult; it just requires demand signals. Once you can make one, you can make ten, then a million. All of this is easily replicable.

Q: How deeply do you get involved? Do you communicate with ASML, telling them, "Look at the demand in three years. For NVIDIA to achieve $2 trillion in annual revenue, we need more EUV machines."

Jensen Huang: Some I have to address directly, some indirectly. For example, if I convince TSMC, ASML will naturally be convinced. The key is we must consider critical bottlenecks. But once TSMC is convinced, you'll see enough EUV equipment within a few years.

My view is that no bottleneck lasts more than two or three years.

Meanwhile, we are making huge progress in improving computational efficiency. For instance, the efficiency improvement from Hopper to Blackwell is 30-50x. Because of CUDA's flexibility, we can develop entirely new algorithms. Additionally, we are increasing capacity while improving computational efficiency. These issues are less concerning to me. The real risks come from downstream issues, like policies limiting energy expansion. Without energy, you cannot build an industry; without energy, you cannot establish a new manufacturing plant.

We want to reshape US industry. We want to bring back chip manufacturing, computer manufacturing, packaging processes; we want to build new things like electric vehicles, robots; we want to build AI factories. But you can't do this without energy, and these issues take a long time to solve. In comparison, chip capacity issues can be solved in 2-3 years. CoWoS capacity expansion is also a 2-3 year matter.

Q: Interesting. I feel sometimes my guests express completely opposite views. In such cases, I lack the technical knowledge to judge.

Jensen Huang: The good news is you're talking to an expert now (laughs).

**TPUs Pose No Threat; NVIDIA is "Redefining Computing"**

Q: I have a question about competitors. Two of the world's top three AI models—Claude and Gemini—are trained on TPUs. What does this mean for NVIDIA's future?

Jensen Huang: What we build is very different from a TPU. NVIDIA builds Accelerated Computing, not just a Tensor Processing Unit (TPU).

Accelerated computing can be used for various purposes: molecular dynamics, quantum chromodynamics, data processing, data frames, structured and unstructured data. It's also used for fluid dynamics and particle physics. Additionally, we use it for AI computing.

Accelerated computing is more diverse. Although everyone talks about AI today, and AI is indeed very important and far-reaching, the scope of computing is much broader.

NVIDIA has redefined computing, transitioning from general-purpose computing to accelerated computing. Our market coverage is far greater than any TPU or ASIC can achieve. We are the only company that can accelerate a wide variety of applications. We have a huge ecosystem, so various frameworks and algorithms run on the NVIDIA platform.

Furthermore, most self-built systems are not designed for others to operate easily. Our systems are ubiquitous, including on Google, Amazon, Azure, and OCI, because anyone can operate them.

If you want to operate this compute capacity via leasing, you better have a large, multi-industry customer ecosystem to absorb these resources. If it's for your own use, we can certainly help you operate these systems, like we do for Elon Musk's xAI. And because we can support operators in any company and any industry, you can use it for building supercomputers dedicated to scientific research and drug discovery, like for Eli Lilly. We help them operate their own supercomputers to accelerate the entire diverse process of drug discovery and bioscience.

There are numerous application scenarios that TPUs cannot cover. NVIDIA made CUDA an excellent tensor processing unit, but it also handles the entire lifecycle of data processing, computing, AI, etc. Our market opportunity is broader, our coverage greater. Because we support all types of applications worldwide, you can build an NVIDIA system anywhere and be confident it will have customer demand. It's a completely different concept.

Q: Next is a long question. Your revenue is astounding, and this money isn't coming from pharmaceuticals or quantum computing. The reason for $60 billion quarterly revenue is that AI is an unprecedented technology growing at an unprecedented rate.

So the question is, what is truly the best choice for AI? I'm not familiar with the details, but when talking to my AI researcher friends, they say, "Look at TPUs. They are large systolic arrays, very good at matrix multiplication, while GPUs are very flexible. GPUs excel with lots of branches or irregular memory access."

But what is AI essentially? It's just predictable matrix multiplication, over and over. You don't need to waste any chip area on warp schedulers or switching between threads and memory groups. TPUs are indeed optimized for the main growth demand and use cases of current AI computing. I'm curious about your response.

Jensen Huang: Matrix multiplication is an important part of AI, but it's not everything. If you want to develop a new attention mechanism, decouple it differently, or invent a completely new architecture like hybrid SSM (State Space Models), you need a generally programmable architecture. If you want to build a fused diffusion and autoregressive model, you also need a generally programmable architecture. We can run everything you can imagine. This is our advantage: our architecture makes inventing new algorithms easy because it's a programmable system.

The ability to invent new algorithms is the real reason driving AI's rapid progress. Devices like TPUs are also limited by Moore's Law, growing about 25% per year. The only way to achieve 10x or 100x leaps is to fundamentally change algorithms and computing methods.

This is NVIDIA's core advantage. The reason we achieved a 50x performance improvement from Hopper to Blackwell... When I first announced Blackwell was 35x more power-efficient than Hopper, no one believed it. Later, Dylan wrote an article pointing out I was actually "intentionally conservative"; it was 50x. This simply cannot be achieved by relying solely on Moore's Law. We solve this through new models, like MoE, parallelizing, decoupling, and distributing them in the computing system. Without CUDA support, developing such new kernels would be nearly impossible.

Our advantage is that NVIDIA's architecture has programming flexibility, and we are also a highly co-design capable company. We can even offload some computations into the compute fabric, like NVLink; or integrate them into the network, like Spectrum-X. We can influence the processor, system, architecture, libraries, and algorithms simultaneously. Without CUDA, I wouldn't even know where to start.

Q: This touches on an interesting question about the characteristics of NVIDIA's customer base. Currently, 60% of your revenue comes from the top five hyperscalers. In a different era, with different customers—like professors doing experiments—they needed CUDA. They couldn't use other accelerators; they just needed to run PyTorch with CUDA and ensure everything could be optimized smoothly.

But these hyperscalers have enough resources to write their own kernels. In fact, to get the last 5% of performance their specific architecture needs, they must. Anthropic and Google have moved to their own accelerators, like TPUs and Trainium. Even OpenAI, which uses NVIDIA GPUs, developed tools like Triton because they need their own kernels. From CUDA C++ to cuBLAS and NCCL, they have a complete independent stack and can compile to other accelerators.

Given that most major customers can and actually are building CUDA alternatives, is CUDA still the key reason cutting-edge AI chooses NVIDIA?

Jensen Huang: CUDA is a rich ecosystem. If you want to develop software for any computer, choosing CUDA first is definitely the smart choice. Because the ecosystem is so rich, we support every development framework. If you want to create custom kernels... for example, we contributed significantly to Triton. Triton's backend contains a lot of NVIDIA technology.

We are very happy to help every framework be perfect. There are many, many frameworks, like Triton, vLLM, SGLang, and more emerging reinforcement learning frameworks like verl and NeMo RL. The field of post-training and reinforcement learning is exploding rapidly. So if you're building on one architecture, building on CUDA is the wisest choice because you know the ecosystem is strong and reliable.

You'll know that if something goes wrong, it's probably in your code, not in the massive underlying codebase. Don't forget, when building these systems, the amount of code you face is huge. When something doesn't work, is it your problem or the computer's? You always want it to be your error and trust the computer's robustness. Of course, our own systems have issues, but they are deeply optimized; you can at least build on this reliable foundation. That's point one: the ecosystem's richness, programmability, and capability.

Second, if you are a developer building anything, the most important thing is the install base. You want your software to run on many other computers. The software you develop isn't just for yourself; it's for your team or even other teams. If you are a framework developer, NVIDIA's CUDA ecosystem is an invaluable treasure trove of hardware and software.

Hundreds of millions of NVIDIA GPUs are deployed worldwide; it's on every cloud platform. A10, A100, H100, H200, various L-series and P-series devices, many types and forms. We are basically everywhere. This massive install base means that once developed, your software or model can run anywhere in the world. This value is immeasurable.

Finally, our pervasiveness on cloud platforms makes us truly unique. If you are an AI company or developer unsure which cloud provider to partner with, or where to run your system, NVIDIA systems cover everywhere—including running directly within your company. This ecosystem richness, the breadth of the install base, coupled with flexible deployment models, makes CUDA irreplaceable.

Q: That makes sense. I'm interested in whether these advantages still seem so crucial to your primary customers. For most users in the industry, this is probably very important. But for customers who can actually build their own software stacks—the ones constituting the bulk of your revenue, especially in a world where AI is becoming more powerful... the question ultimately becomes: If hyperscalers can write their own kernels instead of relying on CUDA, can NVIDIA maintain its current profit margins?

Jensen Huang: The number of engineers our company allocates to these AI labs is staggering. We continuously optimize their software stacks for them because no one understands the complexity and details of our own architecture better than we do.

These architectures are not as "general-purpose" as CPUs. A CPU is like a Cadillac; it runs smoothly, performance doesn't have extreme fluctuations, anyone can drive it well. But NVIDIA GPUs and accelerators are more like Formula 1 cars. I can imagine everyone being able to drive these GPUs at 100 mph, but to truly reach the limit requires extremely high expertise. We also use a lot of AI to optimize our existing kernel libraries.

I'm quite sure that for the foreseeable future, our expertise will remain indispensable to the cooperating AI labs. We can often optimize their software stacks further, improving performance by 1x to 2x. Sometimes optimizing a specific kernel can directly improve performance by 2x or 3x. This improvement is very important for customers running large numbers of Hopper or Blackwell devices, as it directly increases the entire facility's efficiency, thereby increasing the customer's revenue.

Undoubtedly, NVIDIA's compute software stack offers the world's best performance Total Cost of Ownership (TCO). No single platform can provide a better performance-TCO ratio than us. The benchmarks are there; I encourage TPU or Trainium to use InferenceMAX, MLPerf to show their so-called amazing inference cost advantages, but no one is willing to come out and show it. From first principles, it simply doesn't make sense.

I think the reason we are so successful is simple: our TCO is excellent.

Second, you mentioned 60% of our customers are from the top five cloud companies, but most of that business is actually for external customers. They choose us because we have strong customer coverage. We bring them the world's best customers. These customers choose NVIDIA because of our unique broad coverage and versatility.

I think the flywheel effect comes from several aspects: our install base, the programmability of our architecture, the richness of our ecosystem, and the existence of numerous AI companies. There are thousands of AI companies now. If you are one of these AI startups, which architecture do you choose? You choose the most pervasive architecture globally—that's us. You also choose the architecture with the largest install base—that's also us. And an architecture with a rich ecosystem—that's also NVIDIA's unique advantage.

So, that's the flywheel. The core reasons for our success include: First, performance and cost advantage. Our performance per dollar is excellent; customer cost is lowest. Second, energy efficiency advantage: our performance per watt is the highest globally. If a company builds a 1GW data center, that center must maximize revenue and generate as many Tokens as possible, which directly translates to revenue. We have the architecture with the most Tokens per watt globally. Finally, if your goal is to lease infrastructure, we have the most customers globally.

Q: Interesting. I think the crux of the question is what the market structure really is. Perhaps there could be a world with thousands of AI companies having roughly equal compute share. But from the perspective of the five hyperscalers, the entities actually using this compute are Anthropic, OpenAI, and large foundational model labs capable of building their own various accelerators.

Jensen Huang: No, I think your assumption is wrong.

Q: Maybe, but let me ask you a slightly different question.

Jensen Huang: No, let me correct your assumption.

Q: Okay. Let me rephrase the question.

Jensen Huang: But still allow me to correct this assumption. Because it's too important for AI, too important for the future of science, too important for the future of the industry. This assumption... listen—

Q: Let me finish the question, then we can discuss this topic.

Jensen Huang: Okay.

Q: If these metrics about price, performance, and performance per watt are true, then how do you view this: For example, Anthropic recently announced a multi-gigawatt TPU compute agreement with Broadcom and Google; most of their compute is done on TPUs. Obviously, for Google, TPUs provide the main compute resource. And from my observation, these large AI companies, it seems most of their compute... used to be entirely on NVIDIA, but now it's not. So, if these parameter data are true on paper, how do you view these companies still choosing other accelerators?

Jensen Huang: Anthropic is a special case, not a trend. Without Anthropic, would TPU have growth? Entirely propped up by Anthropic. Without Anthropic, would Trainium have growth? Also entirely propped up by Anthropic. It's not that there are lots of ASIC opportunities; there's only one Anthropic.

Q: But the cooperation between OpenAI and AMD... they are developing their own Titan accelerator.

Jensen Huang: Yes, but we can all acknowledge that OpenAI's primary compute still relies on NVIDIA. We are still cooperating extensively.

I don't mind other companies trying different things. If they don't try these products, how will they know how good ours is? We also need to be reminded that we must keep working hard to maintain our position today.

There will always be exaggerated claims. But look at the number of canceled ASIC projects in the past. Making a product better than NVIDIA's is not easy. Actually, it's not wise. Certainly, NVIDIA will miss some things, but at our scale and speed, we are the only company pushing technology leaps significantly every year—every single year.

Q: I imagine their logic might be: "Hey, these products don't need to be better; they just need to be not 70% worse than NVIDIA," because buying from you means paying a 70% margin.

Jensen Huang: Don't forget, even for ASICs, the profit margin is very high. Assuming NVIDIA's margin is 70%, ASIC margins are close to 65%. How much are you really saving?

Q: You mean Broadcom?

Jensen Huang: Yes. You always have to pay someone. From the data I know, ASIC margins are very high. They think so themselves and are proud of the amazing ASIC margins.

A long time ago, we didn't have the capability to do such things. At that time, I didn't deeply realize how difficult it was to establish a foundational AI lab like OpenAI or Anthropic, which requires massive investment from suppliers. We couldn't provide billions in investment for Anthropic to use our compute then, but Google and AWS could. They invested huge amounts early on, leading Anthropic to ultimately use their compute resources. We couldn't do that at the time.

My mistake was not deeply recognizing that AI labs had no choice; venture capital firms would never invest $5-10 billion in a lab. Even if I understood it, I don't think we could have done it then. Fortunately, I won't make the same mistake again.

I'm happy to invest in OpenAI and help them scale. I'm also happy that when Anthropic approached us later, we could invest and support them. We couldn't in the past. If we could do it over—if NVIDIA then had the scale we have today—I would be very willing to do so.

**Why Doesn't NVIDIA Become a Hyperscaler?**

Q: This is indeed interesting. For years, NVIDIA has been the company making the most money in AI. Now you are making investments; reports say you invested up to $30 billion in OpenAI and $10 billion in Anthropic. And now, their valuations have grown significantly, and I believe they will continue to grow.

So, over these years, you've been providing compute to these companies; you could see their direction. A few years ago, even just a year ago, their valuations were a tenth of what they are now, and you had ample cash on hand. Logically, one possibility was that NVIDIA itself could build a foundational research lab, make huge investments to make it possible, or complete the deals you're doing now earlier, before the high valuations. I'm curious, why not do it earlier?

Jensen Huang: We did it as soon as we could. If conditions allowed earlier, I would have been willing to do it earlier. But when Anthropic needed us to do it, we weren't in a position to. It wasn't a reasonable choice for us at the time.

Q: Why? Was it a funding issue?

Jensen Huang: Yes, the scale of investment. We had never invested in external companies before, especially at such a scale. We didn't realize it was necessary at the time. I always thought they could go to venture capital like other companies. But what they wanted to achieve couldn't be done through venture capital. What OpenAI wanted to achieve also couldn't be done through venture capital. I recognize that now, but didn't understand it then.

But this was also their cleverness. They realized early on they had to do this. I'm glad they made that choice then. Although it meant Anthropic had to go to others, I'm still happy they exist. Anthropic's existence is good for the world, and I'm genuinely pleased about it.

Q: Certainly, you still made a lot of money, and you're making more every quarter.

Jensen Huang: Even so, there can still be regrets.

Q: So the question remains—now you have a lot of cash on hand and keep making more money, what should you do with these funds? One answer is that an ecosystem of intermediaries is emerging, enabling these research labs to convert capital expenditure into operational expenditure so they can rent compute. Chips are very expensive, but they generate huge value over their lifecycle because AI models are becoming more powerful. NVIDIA has enough funds to undertake such capital expenditure. In fact, it's reported you provided up to $6.3 billion in support to CoreWeave and invested $2 billion in it.

So, why doesn't NVIDIA become a cloud provider itself and lease this compute?

Jensen Huang: This is a company philosophy issue. NVIDIA should do "what is necessary, but as little as possible." This means the work we are doing building the computing platform is such that if we didn't do it, I truly believe no one would.

If we didn't build NVLink the way we do, build the entire technology stack the way we do, establish the entire ecosystem the way we do, if we hadn't persisted in building CUDA over the past 20 years—a period during which we lost money most of the time—if we hadn't done all this, no one would have.

If we hadn't created all the CUDA-X libraries, making them domain-specific... over a decade ago, we started focusing on domain-specific libraries. We realized that if we didn't create these libraries, whether for ray tracing, image generation, or early AI development, then technologies like data processing, structured data processing, vector data processing wouldn't exist. We even created a library for computational lithography called cuLitho. If we didn't create it, no one would. So, if we didn't do this work, accelerated computing wouldn't have progressed as it has today.

So, this is what we must do. We should go all out, do our best to accomplish it. However, there are many cloud providers in the world; if we don't do it, someone else will always appear. NVIDIA's guiding philosophy is to do "what is necessary, but as little as possible." Everything is centered around this.

Regarding cloud services, if we didn't support the existence of "new cloud providers" like CoreWeave, then these AI cloud companies wouldn't exist. Without our support, CoreWeave simply couldn't exist. We didn't support Nscale; they wouldn't be here today. Without our support, Nebius wouldn't be at its current level. And now, they are developing very well.

Q: Why don't you pick winners?

Jensen Huang: First, it's not our responsibility. Second, when NVIDIA was founded, there were 60 companies in the 3D graphics business. In the end, only we survived. But if you had asked those 60 companies back then which one would survive, NVIDIA would likely have been listed as the least likely.

At that time, NVIDIA's graphics architecture was completely wrong. Not a little wrong, but utterly wrong.

We designed an architecture that developers simply couldn't support. It could never succeed. We derived it based on correct first principles but arrived at the wrong solution.

Back then, everyone would have excluded us from the list of competitors. Yet, look at us now.

Therefore, I deeply know to maintain enough humility. Don't pick winners. Either let them compete themselves or support everyone.

Q: I'm a bit confused. You said NVIDIA doesn't prioritize supporting new cloud companies, but then listed many new cloud companies and said, "Without NVIDIA's support, they wouldn't exist." How are these two statements compatible?

Jensen Huang: First, they need the will to exist and proactively seek our help.

When they are eager to exist, have their own business plans, professional skills, and passion—obviously they must have some capabilities themselves. But ultimately, they need some investment to get established, and we provide that support. The sooner their flywheel starts, the better.

Your question is, "Do we want to be financial investors?" The answer is no. Financing is someone else's domain; we prefer to cooperate with everyone in the financing business rather than be financiers ourselves. Our goal is to focus on what we are good at, keep our business model as simple as possible, while supporting our ecosystem.

For example, when OpenAI needed investment as high as $30 billion, we stepped in to help them. The world needs them to exist. The world desires their existence, and I want them to exist. They have strong growth momentum now. We will support them and help them scale. We will make such investments because they need us. But we are not trying to do "as much as possible"; it's "as little as possible."

Q: This question might be obvious, but we've been in a GPU shortage for years, and now with model advancements, the supply-demand gap seems larger.

Jensen Huang: Correct, GPUs are still in short supply.

Q: Yes. NVIDIA allocates scarce resources in a unique way, not simply to the highest bidder, but more considering "we want these new cloud companies to exist," thus allocating some resources to CoreWeave, Crusoe, Lambda, etc. Why does NVIDIA adopt this approach? Do you agree with this market description?

Jensen Huang: No, no, your premise is wrong. We are very careful about these things.

First, if you haven't placed an order, no amount of discussion is useful. Before we receive an order, we really can't do anything. So step one: We work with everyone to do demand forecasting well because these things take a long time to produce, and data center construction takes a long time. We coordinate supply and demand through forecasting. That's step one.

Second, we do demand forecasting with as many people as possible, but ultimately, orders must actually be placed. Maybe for some reason, you didn't place an order; what can we do then? After a certain point, it's "first come, first served." However, if your data center isn't ready, or certain components aren't ready to get the data center operational, we might prioritize serving other customers. This is just to maximize our own factory capacity utilization; we might make some adjustments like that.

Other than that, priority is "first come, first served." You need to place an order. If you don't place an order, there's really nothing to be done. Of course, this can evolve into stories, like the reported dinner with Larry Page, Musk, and me begging for GPUs. That was completely untrue. We did have dinner together; it was a very pleasant dinner. But they absolutely did not beg for GPUs. They just need to place orders. Once the order is in, we do our best to meet their demand. It's not complicated.

Q: Okay, so it sounds like there's a queue. If your data center is ready and the order is placed at a certain time, you get delivery in order. But it still sounds like it's not that the highest bidder gets priority. Why adopt such a strategy?

Jensen Huang: We never do that.

Q: Okay.

Jensen Huang: We never have.

Q: Why not sell to the highest bidder?

Jensen Huang: Because it's bad business practice. You set a price, then let people decide whether to buy. I know other companies in the chip industry adjust prices when demand is high, but we don't. We have never done that. You can rely on us. I prefer to be an industry foundation, not requiring customers to guess repeatedly. If we give you a quote, that's the final price. If demand surges, let it surge.

Q: On the other hand, this is also why you have a good relationship with TSMC, right?

Jensen Huang: Yes, NVIDIA and TSMC's cooperation is nearing 30 years. We don't even have a legal contract between us. Some things are about fairness overall; sometimes I benefit, sometimes I lose out. But overall, we have an excellent relationship. I can trust them completely, rely on them completely.

What you can believe about NVIDIA is: Every year, you can expect the progress we bring. This year it's Vera Rubin, next year Vera Rubin Ultra, after that Feynman, the next year maybe an unnamed new product. Every year, we give you something to look forward to. Across the entire ASIC field, it's hard to find another team so stable, capable of reducing die cost by an order of magnitude every year while maintaining high yield.

**Without Deep Learning, NVIDIA Would Still Do Accelerated Computing**

Q: An interesting question. Suppose you already occupy most of TSMC's 3nm capacity and will occupy most of the 2nm node in the future. Do you think, given AI demand is so large and leading-edge capacity can't meet it, you could go back and use remaining capacity on older process nodes like 7nm, for example, manufacturing a chip based on Hopper or Ampere architecture but incorporating existing numerical optimization techniques and other improvements you mentioned? Do you think we'll see such a thing before 2030?

Jensen Huang: It's not necessary. The reason is, each architecture generation doesn't rely solely on transistor process. Engineering design, packaging, stacking, numerical optimization, and various improvements in system architecture play significant roles.

If faced with insufficient capacity, going back to an older process node to redesign the chip... the R&D investment required is unaffordable for anyone. We can afford the investment to move forward, but we can't afford the cost of going backward. Of course, if the situation is... as a thought experiment: if one day we conclude, "We can't get any more leading-edge capacity," if that day really comes, I would certainly choose to go back to using 7nm immediately.

Q: Someone raised a question: Why doesn't NVIDIA run multiple chip projects using different architectures in parallel? For example, you could develop wafer-scale chips like Cerebras, or large packages like Dojo, even a design completely without CUDA. You have enough resources and engineering talent to run these projects in parallel. So, why put all your eggs in one basket?

Jensen Huang: Oh, we could do that. But the problem is, we haven't found better ideas. We could try these things, but they aren't better. We've simulated all these schemes in our simulators, and the conclusion is clear: they are worse. So we won't do them. The project we are focused on is the one we most want to do.

Of course, if the task category undergoes a major change—I don't mean algorithms, but real task demand changes, depending on the market shape—then we might decide to add some other accelerators.

For example, recently we introduced Groq, integrating it into the CUDA ecosystem. We did this because the value of tokens today is astonishingly high, allowing different pricing for tokens. A few years ago, tokens were either free or not expensive. But now, customers are increasingly diverse; they need different performance. For example, our software engineers: if I can provide faster-response tokens, making them more efficient than now, I'm willing to pay for that.

This market has only recently emerged. I think we can now segment the market based on response time. That's why we decided to expand the Pareto frontier and create a推理 segment with faster response time, even though its throughput is lower.

Before this, increasing throughput was always prioritized. But we think a scenario might emerge in the future where, even with lower factory throughput, it makes sense due to high Average Selling Price (ASP).

That's why we did it. But overall, from an architectural perspective, if I had more resources, I would invest those resources in NVIDIA's existing architecture.

Q: I find this idea of "high-premium tokens" and inference market segmentation very interesting.

Jensen Huang: Yes, further refinement of the market.

Q: Good, last question. Suppose the deep learning revolution never happened. What would NVIDIA be doing now?

Jensen Huang: Accelerated computing—what we've always done.

We determined Moore's Law was slowing down... General-purpose computing performs well in many aspects, but is not ideal for many computational tasks.

Therefore, we combined an architecture called GPU with CPU to accelerate CPU compute loads. Different code kernels or algorithms could be offloaded to our GPUs to run. The result is you can speed up an application by 100x, 200x.

Where can this performance be used? Clearly in engineering and science fields, like physics, data processing, computer graphics, image generation, etc. Even without AI today, NVIDIA would still be a very large company.

There's a very fundamental reason for this: the potential for general-purpose computing to continue scaling has basically reached its end. The feasible way to go further is domain-specific accelerators.

One of the earliest fields we entered was computer graphics, but there are many others, like particle physics and fluid simulation, structured data processing, and various algorithms benefiting from CUDA technology.

Our mission has always been to bring accelerated computing to the world, push applications that general-purpose computing cannot achieve, and help break scientific boundaries. Some early applications included molecular dynamics, seismic processing for energy exploration, image processing, and all aspects of computer graphics, where general-purpose computing was too inefficient.

Without AI, I would regret it very much. But it's because of our advances in computing technology that deep learning was普及到世界各地. We enabled researchers, scientists, students to do amazing scientific research with a PC or GeForce graphics card. This commitment has never changed, not one bit.

If you look at GTC, the opening part has nothing to do with AI. Computational lithography, quantum chemistry research, data processing—these topics are not related to AI but are still very important. I know AI is exciting, but many people are doing important work not involving AI, and these computational tasks are not limited to tensor computing.

Disclaimer: Investing carries risk. This is not financial advice. The above content should not be regarded as an offer, recommendation, or solicitation on acquiring or disposing of any financial products, any associated discussions, comments, or posts by author or other users should not be considered as such either. It is solely for general information purpose only, which does not consider your own investment objectives, financial situations or needs. TTM assumes no responsibility or warranty for the accuracy and completeness of the information, investors should do their own research and may seek professional advice before investing.

Jensen Huang's Fiery Response: Top AI Firms Abandoning CUDA? "Your Premise is Wrong"

Comments