NVIDIA Corporation ($NVDA)

Earnings Call Transcript · March 17, 2026

NasdaqGS US Information Technology Semiconductors and Semiconductor Equipment Shareholder/Analyst Calls 98 min

Earnings Call Speaker Segments

Toshiya Hari

Executives
#1

Good morning, everyone. As we quiet the back room, I have a very important job. As a reminder, the content of this presentation may contain forward-looking statements and investors are advised to read our reports filed with the SEC for information related to risks and uncertainties facing our business. And with that, I will turn it over to Jensen and Colette.

Colette Kress

Executives
#2

All right. Good morning, everybody. I hope you enjoyed the presentation yesterday, went a little bit longer, but I think it was an absolutely great summary for us, but we're going to take this time to focus on your needs and some of the additional kind of questions you are. We're going to start with a couple, maybe the first slide or so, and then we'll open it up for questions. And I'm going to turn this over to Jensen with that.

Jen-Hsun Huang

Executives
#3

Yes. As I was saying yesterday, there were 3 inflection points in recent AI. The first 1 was generative AI. The second was reasoning. And we're at the third inflection point now, and each one builds on the others. There's a lot of technical reasons why each 1 of them built on the others. But here we are with the third inflection point, which is Agentic systems. Agentic systems that are able to operate autonomously. That's why they call them agentic because they have agency, and you can give them goals. And instead of just answering questions, they can now perform tasks and tasks could be -- anything from, of course, one of the most popular applications of agenetic systems is write software. Engineers in your company, I'm sure, and engineers in my company for sure are using agentic systems all day long. And what used to be a thing for engineers is when you come to work, they give you a laptop. Now when you come to work, they give you laptop and tokens. And token budget is now a real thing. Every engineer is going to have a token budget. And the idea that you would hire a $300,000 engineer and they spend no tokens in doing their job, you've got to ask the question, what are they doing? And so it is very, very clear now that every engineer will have a lot of tokens that we have to consume, and those tokens are going to be produced. Now I just said something a second ago, if you just connected the dots, we used to be when an engineer comes to work, software programmer, somebody comes to work, you give them a laptop, that's a tool. To get today, we give them a laptop and tokens. Those tokens have to be manufactured. And so a computer used to be just a tool, a computer of the future is a manufacturing equipment. And so these computers, as you see, there's no different than ASML manufacturing equipment in the future. They're producing something that is sold, just it's no different than a dynamo machine a long time ago that produced electricity. These are manufacturing systems and the energy efficiency of it, the production efficiency of it matters everything because it drives your revenues, okay? And so I -- the third inflection point is here. As you know, Open claw. Many of these things, when they first drop these open source projects when they first drop, they seem like toys, you take a step back. and just analyze what is Open claw on first. [Audio Gap] Our Linux strategy just as we all had to have an Internet strategy, just what is your mobile cloud strategy. Now the question is what's your Open Claw strategy, okay? And so this is a very big deal. The next I wanted to answer the questions about what I said here a little bit more. First of all, a year ago -- a year ago, I said that we had strong visibility of our Blackwell and Rubin shipments of $500 billion through 2026. I was standing in 2025, right? And so GTC 2025 is around was it March, April? It was October?

Colette Kress

Executives
#4

October.

Jen-Hsun Huang

Executives
#5

Okay. October, I was standing there? You sure. It was October. GTC DC or GTC -- I said it twice, though. The first time I said it was GTC here, right?

Colette Kress

Executives
#6

I think you've been saying it twice. I don't think all the way back.

Jen-Hsun Huang

Executives
#7

I see. Yes. Okay. Anyways. Anyhow, in 2025 -- 2025, one of those months, I said that we have strong visibility of Blackwell plus Rubin demand -- purchase orders and demand okay, very firm demand of $500 billion. And there were a lot of questions from many of you that -- so where are we now? And so you wanted an update on where we are now. And so I thought I'd give you guys an update, where we're standing right now and what month are we just for the record, March. And so here we are in March. Here we are in March, the end of 2027 -- the end of 2027, as you know, is many more months away. I just want to first let you guys know that. However, because we're building infrastructure and factories, and the lead times for everyone is long, they want to make sure they give us firm demand or give us -- purchase orders and firm demand as early as they can to secure their supply, okay? And so we have strong confidence and visibility, visibility and strong confidence of $1 trillion plus. There's -- it's not a floating point number, you guys, okay? It is also not 94 digits of accuracy, okay? And we're not counting cents. You can keep your cents. However, we have strong visibility of $1 trillion plus of Blackwell plus Rubin. And the reason why it's only Blackwell plus Rubin and not all of the other things that we sell is because I referenced it from the last year when I was only talking about Blackwell and Rubin, does it make sense? So last year, we didn't have Grok. Last year, we weren't selling stand-alone CPUs. Last year, we didn't have many of the things that we have to sell now. And so it wouldn't have made sense for me to include those today and not because we didn't have those things yesterday. Does that make sense? Somebody nod then I can continue, okay? And so therefore, a couple of things. It's only Blackwell and Rubin, it's not Fin. It's not Rubin Ultra. It's not any of those things. It's not Vera standalone, it's not Grok. So Blackwell plus Rubin, we have high confidence, strong visibility, demand, forecast, purchase orders of $1 trillion plus. We closed businesses that we ship oftentimes, oftentimes, and we expect to close and ship more business between now and the end of 2027. We expect to close to close book and ship more business on top of this between now and 2027. And the reason for that is because we expect to be coming to work between now and the end of 2027. Now unlike other businesses, because we build and complete systems of this quality, we can actually win book, ship new business in the same quarter. Of course, you can't do that if you have to build an ASIC or obviously, if you don't see that -- if you don't see it now, you're not shipping it by the end of 2027, but that's not true for us. We built inventory. We have a pipeline of supply that -- and we have to take care of customers who come out of the blue because they're desperate for more compute. Does that make sense? And so when they're desperate for more compute and all of a sudden in last day, they say, goodness gracious, I could use more. I would like to be able to say and we are always in a position to say, we'd be more than happy to help you. We're also working on new customers, new markets, new regions that we haven't put in here yet because we still have -- well, about 21 months to go, okay? And so I want you guys to understand what that $1 trillion. It's, by definition, going to keep growing. By definition, because what I compare it against, it will keep growing, and it will be larger than that. A couple of things that I wanted to say also that last year was a really good year because 2025 was our year of inference, and I think we helped everybody understand that the price of the computer and the cost of the token, the price of the computer and the cost of the token are only marginally related. The price of the computer and the cost of the token. Remember, people are buying these computers to produce tokens. The effectiveness of the production of those tokens matter greatly. They're not reselling the computer. If you bought a computer and it's expensive, if you resold it and that's it, then it's expensive. But you bought a computer and it's expensive because the technology is incredible, but it produces tokens at such incredible rates, you have -- simultaneously have purchased the most expensive computer and produce the lowest-cost tokens. Does that make sense? This is what we do every day. This is our job. It is the reason why we deliver the value that we deliver, the value discrepancy that we deliver here, the 2 numbers that I just described is how we're able to secure our gross margins. We have to deliver and we consistently deliver so much more value, which is tokens per second, which is tokens per second per watt. We deliver so much more value every single generation that customers would prefer to buy our next-generation product at a higher price than our current generation product at a lower price. They prefer instantaneously to convert the moment that Vera Rubin comes, it is smarter to install Vera Rubins than to continue to buy Grace Blackwells. Are you guys following me? Somebody nod, okay? Because the value is better even though the price is higher. So I'm comparing these 2 systems because these are the 2 de facto systems in the world. And until you can beat these 2 systems, there's no point buying something else. And these 2 systems are incredibly hard to beat because Moore's Law doesn't give you 35x. So Moore's Law alone won't do it. Building a faster chip won't do it. You're going to have to build a faster lots of chips. And so last year was our 2025 year in inference, and I think we demonstrated our inference leadership, training, the post training to now inference. And then some of the other things that we did last year that was really great is we expanded the reach. We expanded the number of AIs that now support our platform. Last year, 2025, we added Anthropic to our platform, which is net new. We added Meta SL, which is net new. We're still working with Meta on all of the other stuff. MetaSL is a net new entity and they have net new computing requirements. And we can all acknowledge that last year, open source software, open-source models really took off to the point where API inference service providers now see that open models probably represent -- approximately represent the second most popular AI model, meaning the large -- the first one, of course, is OpenAI into total number of tokens generated. In aggregate, open models represent number two. As you know, NVIDIA is the best platform for open models in the world. We are the standard for open models everywhere. And so number one, OpenAI, number two, all the open models, #3, Anthropic. Number four, XAI, just take your list, keep working. I think NVIDIA's coverage of models last year increased substantially, which explains are accelerating growth at a very large number. We are already a very large company, as you know, and we're now accelerating our rate of growth is actually accelerating. And so anyways, that's -- I think about it. One last point. We love our hyperscaler partners, and we work very, very closely with them. But it's important to understand that our relationship with hyperscalers is we're not selling or not just selling to them. We attract customers for them, having CUDA in their cloud brings all of the CUDA developers, all the AI natives, all the large companies that we work with, whenever we accelerate those large companies, those [indiscernible] small companies, we bring them, we terminate -- we have them hosted in the world CSPs. We are one of the best sales forces of the world's CSPs. It is the reason why if you go down to the show floor, they have all of the largest booths. AWS has the largest booth here. Google Cloud has the largest booth here. Azure has the largest booth here. Oracle, giant booth here. Coreweave, big booth here. Does it make sense? Because we bring customers to them. Why are they here to talk to sell to my developers? And all of our developers only know how to program one thing. They only know how to program CUDA, and they only use CUDA X libraries. And when we help those developers integrate NVIDIA, they land on one of our CSP partners. We are one of the CSP's best sales forces, all right. However, -- we are also seeing tremendous customer diversity outside of the CSPs. Regional clouds, industrial, enterprise on-prem when Dell and Lenovo and HP, they're all growing so fast and all the ODMs are growing so fast. A lot of that business go towards the right-hand side of that chart, the 40%. Most people see our business in the left 60%. That -- the right 40% without NVIDIA's full stack without our -- the fact that we can build [indiscernible] entire AI factory, and the fact that all of the world's open platforms run on top of NVIDIA, you have no hope addressing the 40%. So a big -- so the net of this chart is this, a big part of that 60% is NVIDIA developers landing in the cloud, 100% of the 40% is impossible without full stack, without end-to-end. Was I successful in communicating that? It's important to understand our business. We aggregate that whole thing into what is called accelerated computing and it's probably a disservice to you. So next year, we're going to separate it out a little differently. Well, in the future, we're going to separate out a little differently, and it's going to look probably like this chart. You'll see something like hyperscalers or something like that and 60% of it. And even when you see that, remember, a lot of those customers we brought to the cloud. And then on the right-hand side, that 40% is completely impossible if you just build a chip because they don't buy chips, they buy platforms. Three messages. All in 1 slide. which probably major brain blew up. And therefore, I did it again. Was that helpful? I should -- you know what I should have done, I should have made 3 panels or 3 slides. It would have been a 7-hour keynote, but it would have been worth it. Okay. That's it. Thank you. Questions.

Colette Kress

Executives
#8

We're opening up for questions now.

Benjamin Reitzes

Analysts
#9

It's Ben Reitzes, Melius Research. Thanks for having us here for this event. It's amazing access that you guys provide. Congrats to you and the team for that. This is great. Jensen last night, when we took a picture, by the way, you all can still like that picture. I need to beat last year's record.

Jen-Hsun Huang

Executives
#10

What picture?

Benjamin Reitzes

Analysts
#11

We took a quick picture and I posted it, and I'm trying to beat last year's likes.

Jen-Hsun Huang

Executives
#12

Okay. All right. All right. So was that in some vulnerable position or anything?

Benjamin Reitzes

Analysts
#13

Let's put it this way, the camera added 10 pounds to me, but not to you. I don't know how that works. You look great. So I promised I'd ask you an inference question, and this is related. This is great, like I don't think a lot of people here get this. I think the main pushback we get is the juice worth the squeeze and will the hyperscalers have upside to their revenues for API and cloud that justify all the spend? And what is Jensen seeing? Because I have estimates for the hyperscalers and I've said there's upside to the revenues. But for now, the CapEx is 20% above their cloud API revenue. And I'm wondering what you're seeing. You've said in the past that there's this massive upside to these cash flows and from your customers, particularly hyperscalers and those that are serving Anthropic and OpenAI. So when do we adjust those higher? I know this is a tough question for you because you got to guide for 3 or 4 other -- 5 other companies. But if we see that upside, I think your stock will behave a lot better because then we'll realize this build can keep going. So when is this inflect -- I mean we're seeing the inflection, but when is it -- what is the upside to their revenues? And how do we feel better about it?

Jen-Hsun Huang

Executives
#14

Yes. So I wish those companies were public. And the reason for that is because then you'll see what I see. No company -- no companies in history has ever grown as a start-up company nonpublic company, as a start-up company, increased revenues by $1 billion or $2 billion a week. That's what they're experiencing right now. Now remember, just a week, the entire IT software industry is, call it, $2 trillion. That $2 trillion industry, I don't believe it's going to be disrupted. I think it's going to be transformed. I believe that every one of that $2 trillion IT industry is going to integrate a combination of OpenAI Anthropic and open models and turn them into connected with an open source software called Open Claw that we turned into an enterprise-ready version called Nemo Claw, and you have instantly an agent. 1.5 million people downloaded Open Claw and built themselves an agent. It's one line of code. And then you tell the agent to finish building itself. So you don't -- you don't know this thing, go learn it and it goes off and learn it. And so in the future, those agents will be integrated into the IT industry. This IT industry is $2 trillion of software licenses today. it's probably going to be -- let me just pick a random number, $8 trillion that also resells an enormous amount of tokens. 100% of the world's IT industry will become resellers of OpenAI and Anthropic. Are you guys following me? No.

Benjamin Reitzes

Analysts
#15

Take your estimates up for Open AI and Anthropic.

Jen-Hsun Huang

Executives
#16

I believe that Anthropic and OpenAI. And of course, all of the IT company will also modify and customize their own software, their own models with open models and that's what Nemo Trans is for and that's what Nemo's for and all the -- we've created all the tools, and that's why we're working with all of them. They're all going to create agents that integrate these 3 components. And I believe they're going to grow incredibly. The time it's going to come soon. And the reason for that is you could see it in Anthropic's numbers, you could see it in OpenAI's numbers. They are growing, not -- they're growing an entire IT company in a month. And the revenues of these AI companies. Their AI will be used by enterprise directly, but it's also going to be resold through IT companies integrated into IT companies. Does that make sense?

Benjamin Reitzes

Analysts
#17

Yes.

Jen-Hsun Huang

Executives
#18

Because just think of that AI is just software. Their software is going to be offered directly to enterprises, but it's also going to be integrated and become domain-specific and specialized governed, secured, easily provisioned connected to their system of records, so on and so forth. There's going to be a whole -- and that Agentic system will be rented to customers, but they still would have to consume tokens through factories. And so if it comes down through Open AI, that's terrific. It comes down through Anthropic. That's terrific. It comes down through open models, that's terrific, but they all have to have tokens generated. So the net-net is IT companies of the past licensed software, IT companies of the future will rent tokens -- will generate tokens. Are you guys following me? Their business models will change. The companies will become bigger, their gross margins will change. Gross margin profile will change because they now have tokens in the -- they have COGS in their business model now, but they offer greater -- much, much more value. And so this is exciting for them, super exciting for them.

Christopher Muse

Analysts
#19

C.J. Muse from Cantor Fitzgerald. Thank you for hosting this event. Really appreciate it. wanted to, I guess, maybe follow up on Ben's question and think about the evolution of this chart of 60-40. You talked about Nemoclaw. And then you announced yesterday the Vera Rubin-DSX AI factory reference design, essentially providing the blueprint for your non-hyperscale customers to compete with the hyperscalers. So I'm curious, as you put it all together, you see a massive spike in token generation how you're expecting pretty much this chart to evolve over time and how we should be thinking about the different players inside there as to their relative kind of growth factors.

Jen-Hsun Huang

Executives
#20

I think that this chart grows on both sides of it grows at similar rates approximately until the physical AI inflection happens in a few years. And so let's say, physical AI inflection happens, then the industrial side has to be done on-prem and it has to be done at the edge. It has to be done in location. It has to be done in the factory. Then all of a sudden, that 40% is likely to grow. And I think, ultimately, that 40% becomes larger. And the reason for that is because the world's industries that are related to physical AI is much, much larger than the industry is related to digital AI. Something like $70 trillion of the world's industries, 50, 60, 70 is requires physical AI because the world is happening not in our laptop, the world happens out where the world is. And so there's a lot of ADM-related businesses that simply can't be taken care of without physical AI. And so I believe and I hope that, that 40% actually becomes 70%, but both of them are going to be incredibly large because the world is going to produce tokens every single day continuously, it will not stop. Right now, as we speak, all of our laptops, well, hopefully most of you laptops are kind of sitting idle, but in the future, the computer is going to be running 24/7 creating tokens because your agents are off doing work. Somebody -- I was reading one of the Reddit posts. Somebody's claw consumed 50 million tokens in a day. Now that sounds like a lot, but that's only $50 and if you had an agent doing productive work for $50, that's not bad. And so you could have somebody who makes a few thousand dollars a day, have a whole bunch of agents spending $50 a day, becoming a lot more productive. This is going to be the norm. I have at NVIDIA right now as we speak. And I'm hoping the person that I'm paying a couple of thousand dollars a day to is spending more than $50 a day of tokens. Are you nuts? I want you to be managing an entire fleet of agents doing your work. And so I'm really hoping that somebody who makes $2,000 a day is spending $1,000 a day of tokens. And what I just said makes sense, and it's going to happen, and it's already happening in software companies all over the world.

Stacy Rasgon

Analysts
#21

Stacy Rasgon from Bernstein. I have a quick clarification [indiscernible] Colette and then Jensen, I have a question for you. Colette, just to clarify, I know you've talked about Rubin ramping in the second half. Grok sounds like it's launching in Q3. So am I correct in thinking that Rubin should launch with Grok because I don't think Grok goes stand-alone. And then Jensen, I want to ask a longer-term question from you. I really like the chart you put up the other day. It almost to be showed like sort of the extension of the spectrum of inference which drove -- which -- I mean, drove value from Grok. You used to talk about how GPUs were fully the way to go. We now see architectures like Grok are needed to sort of take advantage of that spectrum of insurance widens, low lanes becomes more important. I guess I wanted to give me how do you see that spectrum evolving from here. Does your platform now have all the pieces that you need as we go forward over the next like several years and hopefully longer than that. What are the new types of workloads with inference that you see coming? And do you have all the pieces you need to take advantage of that? Is that something else that we still need to be keeping our eyes on as that grows?

Colette Kress

Executives
#22

So first, Stacy, thanks for the question regarding Grok and the LPX. We did communicate that, that would be also in the second half of this year starting, and we'll see where that looks once we get closer to the second half of the year. But it is in this current year.

Stacy Rasgon

Analysts
#23

It's going to say Grok shipping in Q3, I think yesterday.

Colette Kress

Executives
#24

Correct.

Jen-Hsun Huang

Executives
#25

So what we're expecting. However, Verirubin is going to ship before Grok.

Stacy Rasgon

Analysts
#26

It will ship before?

Jen-Hsun Huang

Executives
#27

Yes, yes. And the reason for that is because we're already in production of Vera Rubin. Systems are already going through lines and -- and so at the moment, that's the condition, right? And so -- and it's okay. It's just fine. Varirubin is extremely hard to beat even for Grok. Even adding Grok to Vera Rubin is very tough to beat varirubin. And I'm going to explain your question in a second. It turns out in computing. You have it's not completely true, but it's close to true that you have 2 types of architectures, 1 that are extremely low latency -- 1 that's extremely high throughput, 1 that's extremely low latency. And in fact, a CPU is a low latency computer and notice the size of the cache on board, the SRAM. Grok is an extreme version of that, hyperextreme version of that. where the SRAM occupies basically nearly the whole chip. And the scheduling is done completely statically, meaning the compiler figures out where the data and where the compute is and then makes them meet just in time. And the whole Grok system is like 1 giant synchronous machine. As a result, it is deterministic. It's extremely low latency. It is not easy program. It is not flexible. It's not general purpose, but it is what it is. And so what we've done is we've taken Vera Rubin which occupies yesterday, I described about 3/4 of that space, Vera Rubin is the right answer. We don't know how to make that better. If we knew how to make that better, we would have made that better. NVLink 72 and the Vera RUbin Ultra NVLink 144 and VimanLink1152, is going to keep expanding the aperture of that left-hand side where high throughput matters tremendously. We're going to add Grok, fuse it with Vera Rubin, fuse it with our GPUs and use Grok to process the very last stage of auto regressive models, which is used for language models. That last stage is extremely bandwidth-intensive. And if we ganged up a whole bunch of SRAMs like thousands of Grok chips, okay, it's 8:1. So for that last 25% of the power and that last 25% of the use case because your data center has all kinds of different use cases. It's not just one, right? We're all using ChatGPT. We're all using it in different ways. We all have different tiers of pricing. And so we're in different bands in my graph. We're in different bands in that graph. Are you guys following me, Stacy? So there's -- I showed the 0 tier, the free tier, good, better, best, extreme version. And so for free good, better, Vera Rubin is untouchable. We can't think of anything close by. And then for best and extreme probably the best in extreme adding Grok to that, you could increase your throughput on the best, and you could extend the extreme version even further. Now the extreme version is now introduced a new tier, but your volume because the throughput curve, your volume is so low. You can't afford to make that demand too high. So you have to set the price quite high. Does that make sense? However, there's a new class of customers who is very, very rich software engineers. They already cost so much money that if I added to them $100 a day of inference cost, token cost, I'd be more than happy to do it. If I added even $1,000 on crunch time, more than happy to do it. Does that make sense? And so I'm simply describing what's happening to a market that is, if you will, maturing. In the beginning of the market, nobody knew the technology wasn't mature and people didn't know exactly how to use it, 100% of the early inference customers were free tier. And as the technology started to reach '01 and '03, all of a sudden, the paid tier skyrocketed because people are now able to use it for something useful. Then all of a sudden, when agents came. Now, for example, cloud code, right, Codex, those tokens are a lot more expensive than free tier, and they're a lot more expensive than $20 a month. And so that segment, we just added 2 more segments. Did you say see that? And so this is no different than iPhone in the beginning, there was only 1 version. And now there are a whole lot of versions, no different than the car industry, no different than any industry. As the market expands, the segments expand. I showed a factory that is able to produce tokens of different segments and different tiers from very, very smart, incredibly fast to high throughput free tier. And I described an architecture of AI factory architecture that allows you to address the whole thing to maximize ultimately the total revenues of the factory and we let you decide how you want to mix and match. My estimate is it's probably about 25% today for, call it, a handful of companies. you have to be one of the -- you need to have -- you need to generate a lot of tokens to make it worthwhile. And so -- and then there's a whole bunch of -- they call them inference service providers, ISPs, API service providers. I think they could also benefit from this, okay, because they would like to have a different segmentation of token generation and so I call it a group of 10 customers and 25% of that 10 customers represents a big part of that pie, we can increase our total revenues with Grok by 2x on 25%, 2x by 25%. Does that make sense? So say, 25%.

Stacy Rasgon

Analysts
#28

And I mean as you continue like with new versions of Grok with new generations, so what does that do? Are you pushing that out even further? Or are you lowering the cost and increasing the demand? Like I'm just trying to get some feeling.

Jen-Hsun Huang

Executives
#29

We're always doing 1 of 2 things. We're pushing the throughput at every tier up and we're always pushing the smartness of the AI out. And so you see the pareto. I'm always pushing it up. I actually did the transition showing you guys from hopper to Blackwell to Vera Rubin. So I'm always pushing it up, and I'm always pushing it out. Whenever I push up the production volume of your factory goes up at every price point. ISO price point, the volume goes up, okay? When I push it out, you can introduce new tiers of AI, new tiers of tokens. And therefore, you got new price point today. Price point of, call it, $6 per million tokens. That's kind of where the world is. We really like to be. I know they would all love to be $50 per million tokens but super large models, super fast. Could you imagine a $10 trillion parameter model running at 500 tokens per second. Our engineers will pay big money for that, and I would let my engineers pay big money for that. And so that world wants to come and then the next year will come again, because the models will get bigger, they'll think more, they'll use more tools and things like that. It's just like back in the old days, I don't know how many of you are new NVIDIA in the beginning, but we had 1 product, REVA 128, Reva 128, $299. That was it. One product, those good old days. And then today, we have 5090, 5080, 2 different SKUs, 5070, 3 different SKUs, 50 -- are you guys following me? And all of these SKUs exist because the market got larger and it started the segment and people wanted different things. The market is exactly do the same thing with tokens is getting larger and larger in different segments wanting different things. And so I need to -- we need to help the customers. We need to help our model makers produce, manufacture different segments of tokens. I know they look like numbers, but they're different AIs. Makes sense?

Stacy Rasgon

Analysts
#30

Got it. It does.

Jen-Hsun Huang

Executives
#31

Yes, so incredible. So we're going to increase the throughput, and we're going to increase their pricing simultaneously, that's the benefit of Vera Rubin. And we did that every single time. We did that with Blackwell. We did that Vera Rubin. We're going to do that with Vera Rubin with Grok. We can do that with Vera Rubin with Ultra. We're just going to keep pushing that envelope and ultimately, the simplistic way is that Pareto chart because the factory is a lot of different workloads and different customers. That Pareto chart, we want to push the Pareto frontier out -- up and out, constantly up and out, constantly up and and the computer science are necessary to do that insane, the hardest problem of all.

Vivek Arya

Analysts
#32

Vivek Arya from Bank of America Securities. Thanks Jensen, thanks Colette for hosting us and for a very informative event. I wanted to ask actually 2 related questions. One is in this $1 trillion Jensen that you showed. You have other products also that you spoke about yesterday, right, the Vera CPU, right, other CPUs, you have Grok. You have a storage solution, right, CPX prior to assume. So how much of that is incremental, right? Is it a small number? Is it a medium like -- how much more is that addressable market that is not captured in this $1 trillion, assuming it is incremental to this? And then I wanted to double-click on Grok again, Jensen. I think you mentioned that it will take up 25% of the inference. That's a pretty big statement. And is it cannibalizing something? Is it -- what is kind of the value capture from Grok over time? And a lot of people ask us, is it cannibalistic of high-bandwidth memory demand? I don't think it is, but I would love to hear your view on how to kind of put Grok in the value capture, right, part of the spectrum.

Jen-Hsun Huang

Executives
#33

Okay. We're the only company in the world today that can optimize an architecture on AI factory across 3 memories, of course, HBM memory, but we're the first to use LPDDR5 which is extremely high bandwidth and very low power. And that changes the equation for CPUs. And the third is SRAM. We can now utilize all 3 memory types to create the perfect architecture and we are, okay? That's number one. We used to offer just MVL 72 Grace Blackwall. That was our rack. We have 1 rack. We now have 5 racks as you know. And the reason why is because can you go to the next slide? Thank you. That was previous. Yes. So let's go. No. Back. There you go. Is that the one? Yes. You see that. This is what MVL72 did. It ran that. Are you guys following me? It ran all these large language models. This is what it was designed to do. And all of our inference stack ran that. But remember, when an agentic system is, it runs this. This is what Claude code now do. This is what codecs now do. It runs all of this. It has memory. That goes into the kv cache. It has -- and that's on the STX system. This memory has grown so much that it needs to be accelerated. It's just too much -- all of our working memories, every time we use it, the more we use it, the harder the problem we solve. This is structured and unstructured data. This is where I started the keynote with [indiscernible]. The stuff that nobody ever talks about, which is value incredible in the future because this agent is way faster than a human and it's going to bang on that way harder and faster. Does that make sense? And then tool use, web browser. And so a web browser runs on a CPU. And so you need a CPU to give the agent access to tools. And then it spawns off subagent and who knows what this could be. One of the sub agents could be COPT, which is GPU accelerated. There are some subagent could be Omniverse, GPU accelerated. And so we need those kind of GPUs in the data center. So the way to think about what is Vera Rubin, Vera Rubin as a system expanded tremendously because we went from processing that, which is -- it's still 90% of the workload to processing all of this. Are you guys following me? This is AI. This is where ChatGPT started, but this is where it is now. Can someone nod? You guys get it? Okay. Give me a thumbs up. All right. Thank you. And so because I'll do it again. This is like -- sometimes our keynotes run long because I look in the audience, and there's some person sitting in front of me that's like they look lost. And so I just -- I'm going to have to do this again. I don't leave nobody behind and so this is an agent. So what just happened. In our data center. That data center doesn't want to be cobbled up Frankenstein, and wants to use -- it wants to use elegant power delivery and cooling systems. And so we took all of the computers that's here, and we put them into the MGX Rec and we designed the world the perfect processor for each 1 of these things and just rack them up. Does it make sense? And so -- and if you're going to -- if you're going to put storage, which is right up there in here, if you're going to put that in the east, west, which is in the same aisle as the compute, you better make it so it's not a Frankenstein outfit. You can have liquid cooled in NVLink 72 racks. And then air cooled, you can't have 300 kilowatts here and then use 50 kilowatts here. It makes no sense. And so we took the whole thing and we harmonized all of it in 1 single rack architecture. And so if you want to build a cluster to run that uses connect them all up. It's incredible. Same power delivery, same cooling system, all 100% liquid cooled, all completely optimized for the workload, all fully accelerated. And so now your question, in order to run this agent and be able to offer all the things that we were just talking to Stacy about, you would increase your CapEx, you would increase your compute spend, the GPU compute spend by 25%. And so you add Grok to 25% of the workload. And you by 8x as many chips, which is approximately the same price as the NVLink 72 racks, okay? So 25% is multiplied by 2, and that's the same as 25%, okay? And so your 25 -- your compute spend goes up by 25%. That's the first one. And that's not in the $1 trillion. And so if 100% of that $1 trillion now adds Grok, then it will be $1.25 trillion, okay? And then we also have storage, which is a lot because storage as you know, there's just a lot of storage in the world. It is the second largest compute spend. And then the third will be CPUs for tool use. But I'm not expecting CPUs to be that much and call it because just CPUs just don't add up too much, okay? And so you could say CPU is another 5%, okay? So if you were to say, all in, the difference between Grace Blackwell racks, which as you saw was however big it was and the Vera Rubi racks, okay, if it added another 50% opportunity, I think that's probably not far off. Did I just kind of reason threw it for you? Is that -- everybody got that, okay? And so that's the fundamental difference between the Grace Blackwell go-to-market and the Vera Rubin go-to market. Because we were solving in the Grace Blackwall world, inference. We wanted to be inference king, who doesn't, right? And so that's what we're solving. Vera Rubin, we're solving for this. That's why I said Open Claw is completely transformational. Finally, we have 1 piece of software that runs across this whole thing. One open source software, it is the operating system of this chart. It's incredible. Now every company in the world can go build this.

Joseph Moore

Analysts
#34

Joe Moore from Morgan Stanley. You're generating $1 billion every couple of days, which seems pretty good. Can you talk about the uses of that cash to build strategic advantage in your business? You're making investments in ecosystem partners. You've got purchase commitments on components, you're also returning cash to shareholders. How do you balance those priorities?

Jen-Hsun Huang

Executives
#35

Well, the priorities have to go, number one, it has to fund our growth. And our supply chain, we work very closely with, and we're in a great place with our supply chain today for a good reason. And it's because we work very long term with them. We help them plan their business. We award businesses to them to support their growth. We even prepay and sometimes we'll even fund their capacity with them growth. But we're preparing for $1 trillion over the next -- I'll just have to be very clear for $1 trillion plus through December 25th, I think we probably shut it down at 04:00 p.m. And so through that time, Pacific Standard Time. There's a lot of caveats in there just make sure. But anyways, the plus and so that's number one. Number two, we invest in our ecosystem because, as you know, the CUDA developers and the growth of this AI natives in this stage is really important. And then after that, we're still going to generate quite amount of free cash flow. And so well, I'll let let Colette answer it. I mean we have a good plan. So go ahead.

Colette Kress

Executives
#36

Yes. So with the strong growth that we have at the $1 trillion going forward, that gives us, of course, a very good position in terms of free cash flows. He talked about some of them upfront in terms of making sure that our suppliers and everything that we need to do is build is an order, and that may take some prepaids. The second thing is our investments. We are still working in terms of with our commitments that we made over the last year that we need to do in the first half of this year. But once we move forward, and complete those, we do have an opportunity for stock repurchases and focusing on returning capital to her to our shareholders. It is still a very important part of our work that we are going to do. We had a good year last year, and I think we're going to have another great year in terms of what we can do in terms of returning capital to them. Do you want to give certainty on that.

Jen-Hsun Huang

Executives
#37

It's up to you.

Colette Kress

Executives
#38

Okay. Where we stand right now, it is probably not taking into account the plus sign -- not take in account the plus sign. We will probably be at 50% stock repurchases and dividend together as a percentage of our free cash flow. So that's where we're starting out. And as you can see, the plus sign is real. And then that goes give us an additional opportunity to even do more. The timing of it, again, remember looking through what we have to do here in the first half of the year with some of our existing commitments, but stay tuned.

Timothy Arcuri

Analysts
#39

It's Tim Arcuri at UBS. So let me preface this by saying that this is not what I think, but this is what I hear from a lot of folks out there. So there's some concern that you're capturing too much of the value of the ecosystem and you can -- and that you can't sustain these margins over time. So how do you respond to those concerns? I know you see stuff online about having to invest in the ecosystem and people sort of spin that in a negative way. So can you just talk about how you can sustain your margins?

Jen-Hsun Huang

Executives
#40

First of all, almost everything I told you guys yesterday is a new perspective. It is not illogical. That everybody has to understand tokenomics. It is not illogical that the world needs to learn what a computer has become. If we deliver, if we continue to deliver x factors -- x factors of tokens per second per watt every year, if we continue to deliver x factors of ASP increase for them because we introduced new token segments. Customers will be more than delighted to continue to do work with us. And it is -- it's also true, and I've said it before, and the math is absolutely clear. Every CEO of every cloud service provider, I would challenge them all to go and create that chart for themselves. And I'll help them. And you pick your favorite other configuration. You pick your favorite other configuration, third-party chips, built your own chips, and you put it into that model faithfully and then you can decide would you like to have higher revenues or lower. Would you like to have higher ASPs or lower, would you like higher margins or lower because that's all it means. Look, TSMC's wafers are the highest in the world, but they're the best value in the world. And I gladly pay for it. And so the idea ASML systems are the most expensive in the world, they're worth it. There's no question about it. And so the question is simply, do you want to make more money? Or do you want to buy the lowest cost equipment? Do you want to make more money? Or do you want to buy the lowest cost equipment? That's the difference. Now what I just said is a new concept, and I think we can all acknowledge that. I just treated a computer system. The way I treat TSMC chip factory, the way I treat ASML manufacturing equipment. And that's not the way people thought about it in the past if I have 2 CPUs, 1 of them is 256 cores, the other 1 is 256 cores. Tell me which one is the better one. Well, the cheaper one's the better one because I'm running it by the core anyways. But that's not the way tokens are created. You don't rent by the core, you monetize by the tokens per second. And so it's a different economic. Does it make sense? You're not renting cores, you're not renting nodes. You're producing tokens, which is the reason why everything changed. It was necessary to make sure that everybody understands the economics of the new world. So we are -- anybody who says that simply does not understand the business, that's all. They're trying to buy the lowest equipment, lowest cost equipment. My equipment costs 30% cheaper. What does that mean to your factory? What does that mean to your factory? That's really the question. And so I think people -- anybody who says my chips are 50% cheaper. Put that in the context of the factory, and that person is actually demonstrating to you they don't understand AI. They're just saying somebody goes, I'm 30% cheaper, you don't understand anyone I'm 40% cheap, you don't understand AI. My chips are cheaper. You don't understand AI. I'm not talking about anybody. I was just saying. It's a theoretical comment.

Joshua Buchalter

Analysts
#41

Josh Buchalter from TD Cowen. Thank you for spending the morning with us and there's a lot of customers and partners that are after your time, so we appreciate it. I wanted to ask a question. You said a few times, I think, yesterday that you expect to be short capacity in the 2027. Can you elaborate on where you're seeing those shortages? And on that note, you've described yourself as the chief revenue destroyer. And Satya's made some comments about not wanting to over-index to 1 generation. There's another 1 coming very soon. Is that behavior unique to Microsoft? And are these constraints sort of protecting it.

Jen-Hsun Huang

Executives
#42

By the way, Satya would also tell you who told him that. Exactly. I told Satya, buy what you need this year because next year, there will be something better.

Joshua Buchalter

Analysts
#43

So I guess my question on that is, is TSMC constraints or the capacity sort of protecting your other customers from doing that? Or do you see them holding a similar mindset as Satya's.

Jen-Hsun Huang

Executives
#44

I think I don't want you guys to thinly slice and dice our choice of words. Is the world supply constrained at some level, yes, right? Can we all agree, saying the opposite is weird. Hi. Is the world constraint on cars? Well, you see cars in -- would I have tripled the demand? Yes. And so everything is somewhat constrained. It just depends on everything. And because we're building at such a large scale, our life is just not simplistic. It's not so simplistic as I say, "Oh, I can -- if I just solve this 1 problem, that's it." Life is good. We are working multiple dimensions across multiple suppliers and making sure that things are in harmony -- you don't have too much. We don't have too little. We can meet our demand plus. And the reason why we want to meet our demand plus is because there's always new demand coming for the next 21 months. I got a whole bunch of new demand that's coming. And so I got to prepare for that. And so the all kinds of parameters and not simple. And if I told you that we are supply constrained on this 1 item, then I know what you guys are going to do. You know -- so I think the system is harmonious. Nothing is too much, nothing is too little. We don't have too much power. We don't have too little power. We don't have too many construction workers. We don't have too many plumbers. We don't have too few plumbers. We don't have enough -- we don't have too many cables. We don't have too many optics. We have -- we don't have too few optics. We don't have -- are you guys following it's just kind of right there, and we'll work it every day Perfect.

Aaron Rakers

Analysts
#45

Aaron Rakers with Wells Fargo. Thanks for doing this as well. I'm surprised we got to this point without this question being asked and it's more technical. There's a lot of discussion.

Jen-Hsun Huang

Executives
#46

You know what, we're kind of like the Fed now. Did he say near or almost. And what did he mean by -- well, we've got to do all of his previous transcripts. And when did he use that word? And what here's what I know. Demand is accelerating at a very large scale. And we'll be able to support the supply.

Aaron Rakers

Analysts
#47

Perfect. So I was going to ask about architecture. I've gotten a lot of questions about yesterday's presentation where CPO starts where copper ends. You outlined NVL 576, there was NVL -- or 1152 on the slide. So I'm curious of what is your current thought process around offering both. And how does that evolve as we scale to Vera Rubin ultra refinement, just curious to your thoughts.

Jen-Hsun Huang

Executives
#48

Okay. Please treat my partners properly. They're all doing great, okay? I'm not saying anything here that suggest any of their businesses, I'm going to go the other way. All of their businesses are going to grow because of us. We're going to grow copper. We're going to grow optics tremendously. We're going to grow copper. We're going to grow optics tremendously. Now did I say something that is completely logical? The answer is yes. And let me tell you why. We should scale with copper as long -- as far as we can as long as we can, but at a meter plus or minus, it's kind of the limits of copper, okay? And so you've seen us go from NVLink 72 to now Rubin Ultra NVLink 144, right, where the back plane was designed to be able to support that, okay. So that's kind of approximately -- and we're going to keep working on our series and if we could extend it from 144 to 288, we'll be more than happy to do so because you should use copper for as long as you can because copper is just easy to manufacture. It's more reliable. We've been manufactured for a long time. Humanity has been using it for a long time. And so did I say anything that's illogical to anybody? Everybody makes sense. You should breathe air for as long as you can until you out of it. After that, we'll breathe like compressed liquid air. But until then, how about air. It's free. We've been using it for a long time. It's safe, all right? And so one, we should scale up with copper as long as we can. As you know, we also took Ethernet to a structure cable backplane. So that's incremental growth opportunity. Did I -- isn't that right? I just said it yesterday. We're going to take the backplane of Ethernet, and we turned it into these spines because these structured cables are really easy. Now that we got -- we mastered how to use it and manufacture it is, it's a real artistry we now can create these things and you -- it's easy to maintain, it's easy to ship, easy to wire it up. You make no mistakes, right? It's fantastic. However, simultaneously, we want to scale up beyond 72 to 144, right, to 1152 and maybe even further than that someday. And there's a limit to how far copper can go. And so you could see we're 100% copper now. The next-generation ultra will have 2 options. You could copper or copper plus CPO, copper or copper plus CPO, copper or copper plus CPO -- because I have 2 options: copper plus CPO or copper. Okay. That's 1 year from now, 2 years from now, at 1152, it's all CPU because there's a limit to how far it could take copper. And so there's a transition. However, even when MV Link is CPO and Spectrum XPO. We will still have copper for the Ethernet scale up on our racks. We will still have copper for our storage. We will -- does that make sense? Because we have 5 different racks and so the amount of copper we will use will continue to be high because even though scale up will go to CPO in 2, 3 years, the total consumption of copper connectors is going to continue to grow because our demand in our total capacity continues to grow with all these different other racks. Was I got to select the words. Yes, perfectly.

James Schneider

Analysts
#49

Jim Schneider, Goldman Sachs. Thanks for taking the question. You previously talked about the spectrum of token costs and very helpful to hear the 25% of that in the high tier. How do you see the market evolving over time in terms of growth rates of the lower free tier versus the high tier in a market that's been sort of predicated by big decreases in token costs coming down over time. How do you see that trending? Does that start to slow or potentially flatten out and why?

Jen-Hsun Huang

Executives
#50

Token cost is going to keep on coming down -- can we go to the next slide, Colette. Like token cost is going to keep on coming down. every single year. This is just Grace Blackwall and then Rubin token costs will come down again and Ruben Ultra token costs will come down again, okay? Meanwhile, the token smartness the smartness per token is going to keep on going up as well as we extend that curve to the right, okay, the X-axis. Meanwhile, we're going to increase the throughput. This is everything that has to be nobody cares about tokens per second. You always have to divide it by what. And the reason for that is because your data center is only so big. Your data center, it's a gigawatt, you're not going to have 2. If it's 200 megawatt, you're not going to have 3. Does it make sense? And so you always have to normalize it. Otherwise, no architecture, you can compare nothing. And Moore's Law was always divided by something, okay? So you have to take tokens per second per one. Anybody who shows you anything else just doesn't understand anyone, okay? Or they're trying to see you somehow, all right? So that's the reason why someone analysis did it right. They did it right. Everything was divided by one, okay? And so we're going to keep on increasing throughput. So whatever -- this is the price of a token, whatever the price -- whatever that ASP is, we increase its throughput. Whatever the ASP is, we increase the throughput. Does that make sense? And then here, whatever that segment is, we reduce the cost. Whatever that segment is, we reduce the cost. So this is kind of like this down here is essentially your segment, product segment. And that's through how many -- the volume production and that's the cost of it. These are the 2 -- that's why these 2 curves are so important. Now I combine those 2 curves, you can combine those 2 curves. If you like, but it's -- it makes your head blow up. But this curve is essentially the Pareto. This -- and we spend -- in fact, most of the world today is simply right here. This is the hopper world. You see that, Hopper is kind of right here. Blackwell extended it and added a couple of segments. And this is really valuable, and people love that because the ASP difference between here and here could be 5x, 10x makes sense, larger model and faster, okay? And so these are really valuable. Now how do I see the curve changing, demand curve changing? Yesterday, I used 25% here, 25% here, 25% here and 25%. That's all I did. But a supplier's -- a manufacturers' distribution of different product segments, just kind of depends. Do you guys see the I'm saying? It kind of depends. Ferrari is kind of all high end, nothing in the free tier. And then somebody else, right? Just depends on the brand. And I think it's going to be the same here, guys. If your business is search, you're going to be largely free tier because nobody pays for search. So if you're a search business, you're going to be largely free tier. If you're cogeneration, if your code -- agentic code, you're going to be a lot here. If you're an enterprise worker, and the average salary of that person, let's pick a number, say, 50,000 or 70,000. You might be here to you want your product -- if your customer is that person, you want your product price somewhere here. Does that make sense? It depends on your customer and the work that you do for them. It depends on the customer, the work you do for them and the competition. Those 3 things matter. It's just exactly like products. AI tokens or products, a new commodity, and we market it as such and different suppliers, different brands, different target markets are going to have different shapes. I just simply chose an equal distribution yesterday. Makes sense?

James Schneider

Analysts
#51

Yes, just which segment do you see is growing faster in the future?

Jen-Hsun Huang

Executives
#52

They're all going to grow really fast at the moment. It just -- I don't think at the moment, it just doesn't matter. They're all going to grow so fast. They're all growing exponentially at the moment, every 1 of them. We're at the beginning, right? The growth rate is divided by a very small number.

Mark Lipacis

Analysts
#53

Mark Lipacis, Evercore ISI. Thanks a lot for joining the Q&A. I always love the insights Jensen, our field work is telling us that AI engineers are getting excited about state space models because they address memory requirements. And in your keynote, you showed [indiscernible] 3 is benchmarking in 1 of the top models and I believe that's a hybrid mixture of experts, state-based model. And I'm wondering ...

Jen-Hsun Huang

Executives
#54

Impressive. I'm was trying to ...

Mark Lipacis

Analysts
#55

Thank you. Jensen. In the new AI workloads have led to the adoption of different AI models.

Jen-Hsun Huang

Executives
#56

That was my darth vader imitator. Impressive. Young Jedi.

Mark Lipacis

Analysts
#57

So the question is, is Agentic AI creating a new demand -- a need for a new AI model. Is that what you're doing with Nemotron and the hybrid, what does space get you for Nematron-3I that pure mixture of experts did it? And what are the implications on the competitive environment for NVIDIA if there's this transition to a new kind of AI model?

Jen-Hsun Huang

Executives
#58

We run all AI models, whether it's full transformer discrete tokens, continuous diffusion state space, hybrid, our architecture's beauty is that it does it all. For example, Grok do diffusion models. But we can do everything. Does that make sense? And so I'm picking on rock, not because I'm picking on drug, it belongs to me now, so I can say these things. And so but every architecture has its place. The reason why NVIDIA is so versatile and the reason why it's used so freely everywhere is because irrespective of what innovation your research scientist come up with tomorrow, I promise you it's going to run great on CUDA. I just promise you that. And the reason for that is because we know we have all of the necessary computing elements to do all of it, okay? And so it's Nemotron-3I was designed so that you can deal with extremely long context. And in time, the AI models, we're going to -- you're going to have conversations with your AI hopefully for as long as you shall live. And so the question is how do you deal with context how do you deal with the relevant conversational memory so that on the 1 hand, if you memorize everything, and we talk about something over time, which version of that memory do you pull back. When you have too much memory, over time, it could become garbled. And maybe a reset is helpful. These are research areas, long memory areas or really research areas. But the hybrid architecture, I think, is going to be a very major thing because it allows you to deal with extremely long context and not have to suffer the quadratic explosion in computation. And that's the reason why we invented it and we put it out in open source, and it could -- we love for everybody to use it. And so it's intended to advance AI not to compete with anybody. We don't need to. We just -- we just want to advance AI.

Unknown Analyst

Analysts
#59

Thank you, Jensen. So I'm trying to understand how concentrated your downstream like the AI market is and is going to be and so you have this chart showing 60% is hyperscalers. But I'm kind of thinking the other 40%, the majority of that is Tier 2 cloud and a lot of them are actually reselling or renting their capacity to hyperscalers or to the frontier labs. So if you take hyperscalers plus frontier labs, it might be like 80% of people actually using the infrastructure that is being deployed. So that's an element of concentration and then these models like Anthropic models, the Open AI models, et cetera, seems to be like a very small handful that are really at the frontier. And so do you think that's the right description of the situation today? How do you see that evolving? And maybe what does that mean in terms of right to make money in the value chain and development and like further acceleration of AI?

Jen-Hsun Huang

Executives
#60

Okay. So I would slice it into 3 dimensions, okay? And as you were talking, I simplified it as much as I can into a cube into 3 dimensions. The first dimension is what is the end model being run? And I said earlier, OpenAI is the largest. The second largest by category is basically all the open models. In aggregate, is by definitely solidly #2. And then number 3 would be Anthropic and then so on and so forth, okay? And so -- and that -- and the tail is actually -- is fairly long, okay? And so if you look at the world of model consumption, even just language, that's the way to think about it. And we run all of them. We're in all of them. That's 1 dimension. In that sub dimension of models, you have to decide to add also physical AI models, which is robotics, like all the robots you saw, they're not running they're running vision language action models. And those models are different than just language models. And for example, the control of motors is continuous. It's not -- it's not like a character. It's not like words, it's continuous. And so physics is continuous. Biology has based geometry because things chemicals, a base geometry, okay? And so there's a lot of different types of models. But point being that you have to first think about the different types of models being run, and that's helpful to how you think about the write to -- write to business. The second dimension is are they -- are the computing -- depending on the way that the companies are structured and their intentions or interests, they are either companies that want to build their own chips, and we have to compete with companies that want to host NVIDIA customers in their cloud and obviously, CUDA only runs on NVIDIA CUDA and then are they companies like, for example, NCPs, where they need us -- they can't just buy chips, they really have to buy systems. And so they're really infrastructure customers. Or are they companies that want to build on-prem. Therefore, my distribution channel goes through Dell and HP and Lenovo because it has to integrate a whole bunch of other enterprise computing components and Dell and HP, they don't build their own chips. Or are they at the edge and maybe their radio networks, maybe the robotic systems or self-driving cars or satellites and so on and so forth, doesn't make sense. Now you got to decide where where is the computing being done okay? And so there's kind of the several dimensions, I guess, you could think about it. And when you're done subdividing all of that, you come back to the chart that I showed you is 60%, 40%. Within that 60%, 40%, 40% of it basically, they need computing. It doesn't matter what models they run it could be open models could be Anthropic models. The fact that NVIDIA supports confidential computing makes it possible for Open AI to run on the right side at all. we make it possible for Anthropic the right to run on the right side at all because we have confidential computing. That side, they want entire platforms, they want confidential computing. They want computers at different parts of the world, not just in the cloud. Even in the cloud, we compete with some part, but we also bring customers to the other part. And so some part of that chart of 60%, we have to compete. And our job is just to deliver that chart better than anybody else in the world, and we're doing very, very well, and we're actually increasing our position day in and day out. And then the other part, we bring customers to them. They're just grateful. Makes sense? So I took all of that dimensionality and I compressed it into basically is 2 slices of the pie. And so that compression, I think if you test against, do they design the -- do we -- does NVIDIA compete with them on chips? Okay, there you go. That's interesting. And then you got to figure out where are we in our position and what's our opportunity and so on and so forth. I don't think OCI will design their own chips. I don't think it's sensible for them to do it. Obviously, Core's not going to design their own chips. And so there, we -- so where do we compete and where do we bring the cloud service provider customers? And their cloud revenues, a lot of them, a lot -- a big part of it, obviously, I nearly 100% of that is because of NVIDIA, right, with OpenAI.

Toshiya Hari

Executives
#61

We'll take our last question.

Timm Schulze-Melander

Analysts
#62

It's Timm Schulze-Melander from Rothschild & Co Redburn. Maybe just a question around how you run the company, Jensen. And looking ahead, this 12-monthly flywheel is part of your competitive advantage. But when I look at headcount, actually, it seems to be growing very slowly, relatively slowly. And yet the undertaking that you are going for is growing much more rapidly than that. How do you manage that or prepare for that going forward? And how do you manage maybe the risk that, that could pose to your business?

Jen-Hsun Huang

Executives
#63

Yes. As you know, I have 60 people on my direct team. And the reason why we need 60 people is because the company's architecture was designed to deliver on this architecture on the products. The organization, the architecture of a company should reflect the products they build. Every company should not look -- have the same business org. And I look across and I said, "Oh, look, they have a business unit here. They have a business unit there. They're a business unit there and yet they want to build what we want to build." What you build as a company, for example, the way -- not because I've seen it, I've read about it. The way you build a Ferrari and the way we build it a Ford is very different. In 1 case, you move the car in the other case, you move to people, okay? And so the car stay stationary. And so it depends on the results of what you want to create, the architecture should reflect it. If you look across my management team, every aspect of the technology necessary to build Vera Rubin's entire factory is right there, 100%. Everybody is representing. All of the expertise sitting at the table, making a decision together. And the second thing is we had the discipline to develop the entire software stack. You can't build what we build on a yearly basis if you can't bring it up. Have you guys following me? It's very logical. How do you test it if you can't bring it up? And if you're cobbling up new technology from everybody else, how do you bring it up once a year. It's just not even practical. It's not possible. So we align all of our chips to the platforms -- all 7 chips, they only have 1 tape-out schedule. I don't cobble up everybody's tape-out schedule and figure out when the system comes. The system comes when the system needs to come, and everybody aligns to it. And the software stack, we completely own every piece. The storage, that's the reason why we developed it, networking, of course, all of the even the factory operating system we call Dynamo, we created everything. So that we could deliver every single benchmark, test everything to the limit, test for reliability, test for -- and the reason why NVIDIA built NemoTron is so that we could do pretraining post training and now we can do inference. We own all of the software so that we can bring up all of the systems on an annual basis, which basically says you're bringing up all the time. If you don't own everything, you have no shot, 0% chance. People are talking about their new GPU, but where's their scale up fabric coming from? And how is that going to work? And that's just -- I just gave you 2 examples. That whole agentic system that we were talking about earlier, that's the future computer. And so that's really what we -- the company's organization, the company's mission, the company's capabilities are all aligned to me delivering the promise that I just delivered to the marketplace. And that's why we're able to keep doing it. A PowerPoint slide is not going to deliver that system. And a PowerPoint slide with 2 bar charts is not going to convince somebody to give you $50 billion. It doesn't make any sense. And to engineer it all into existence inside the data center, by the time that you bring it up, we're already 2 clicks down the road. So this is the pace that we put the whole industry on, and it is, frankly, extremely, extremely hard. And we could do it, but that's because of all the things that I just described. You also know that every one of our systems is CUDA compatible. So on day 1, I've got yesterday software that runs perfectly on this one. I own all the scale-up switch, I own all the scale-out switch. I own all the software, do I not? So on day 1, I take yesterday's software and put it on the new system. If it doesn't work, what's the point? Then once we get everything brought up because we own all the software stack, then we can take it to the limit. And so having CUDA compatibility, we have this thing called DOCA compatibility. We own all the compilers, we own all the software stack, really, really important. You can't outsource that to other people. Somebody else is building it on your behalf. That is how do you bring up a system. They're not going to bring up your system for you. They're not going to qualify for you. And so. That's it. Can we take 1 more question? Is that okay? Can you guys tolerate one more question? I'm enjoying it so much. Let me just -- somebody is going to ask me a question when I had to choose the precise word hair or did he say hair or a hare. That's materially different.

Unknown Analyst

Analysts
#64

Thank you for extending the session and squeezing me in. Jensen, I just want to clarify 1 thing.

Jen-Hsun Huang

Executives
#65

Here it comes. Oh dear, I changed my mind. I changed my mind. Everybody have a good GTC.

Unknown Analyst

Analysts
#66

Quick clarification. Does the $1 trillion plus include Rubin Ultra or not? And my question is...

Jen-Hsun Huang

Executives
#67

No, I got to stop you right there. Thank you. No, no. Absolutely not. And yes, absolutely not.

Unknown Analyst

Analysts
#68

Okay. My question is we talked a lot about inferencing at this event. I just -- I was hoping that you could spend a couple of minutes on training in terms of how do you see the compute intensity growing? What will drive in your view over the next few years? Is it still the larger and larger models? Or is there something else on the horizon that you see? And I guess if you take a 3- to 5-year view, what's your view on training versus inferencing mix in terms of compute demand.

Jen-Hsun Huang

Executives
#69

Training went from pretraining to post-training, pre-training is basically memorization, memorization and generalization. The more -- the more you memorize and generalize, the better foundation you have. Once you have that foundation, that's why it's called pretraining. It's kind of like AI kindergarten, okay? It's more than kindergarten, but AI high school. And so now you have the pretraining. You have the basic vocabulary and grammar and a lot of hidden reasoning capability that when I teach your new skills, you'll even understand it. So now when I tell you to go solve a math problem or right code or try to write code, you actually understood what I meant. If you don't even understand what I meant, how can you possibly even attempt to doing it. And so -- so pre-training does that. Post-training teaches all kinds of skills, okay? And reinforcement learning reinforcement learning with executable grounding, reinforcement learning, verifiable feedback, a whole bunch of technology techniques for batch-oriented reinforcement learning, tool use. I mean, the list goes on and on, okay? Structured-based APIs, unstructured based tool use. I mean there's just -- there's a whole lot of domains. And that part, computing intensity, I'm going to guess, probably million times, more than pretraining. I'm probably off by a factor of 1.2, but it's a lot. And the reason for that is because there's a lot of skills to go learn and all these skills, the rollout is really, really long. And so the models have to get larger and larger. When you get good at these, when you got at these, you take all of that synthetic data and some of it, you're going to push back to pretraining next time. And so yesterday's pretraining start all from Internet data. Today's pretraining is mostly Internet data. In a couple of generations, pretraining will be mostly synthetic data. Meanwhile, you're adding multimodality to it. Meanwhile, you're adding motion to it. Long rollout physical actions to it. And the reason for that is because there's a lot of common sense that's cognitively logic related that if you were able to interact in the physical world, you could deal with that concept a lot easier even in the abstract world, okay? Because you actually have grounded experience in the physical world. And so notice the amount of computation that I just described, we're 1 million -- 1 billion times future amount of computing necessary for training. And then after that, continuous learning. So almost everybody's model will be lastly trained, fine-tuned so that it could also be memorized and generalized per person. And so in the future, basically, where inference starts and ends and where training starts and ends will come blurrier and blurrier. Just kind of when are you learning and when are you applying your wisdom? Well, in most people's cases is continuous now. And so I think that kind of gives you the 3 phases of it. And with respect to inference versus training, let me tell you my hope. My hope is that 99% of the world's compute goes towards inference. And the reason for that is because inference is where we translate tokens generated to economics. Nobody pays you for learning. Nobody pays for training, you pay for training. I want the world to be able to use these tokens for valuable outcome, impactful outcome for health care, for manufacturing, for financial services or -- right? For engineering, for right, you name it, isn't that right? And so we want the world that's our hope is that 99% -- and if our dreams come true, 100% of the future tokens are going towards economic benefits while the AI models are learning. And so -- it's -- there's a really good reason why NVIDIA went all in on inference last year. And the reason for that is because we see this future where inference in training and pretraining and learning and all that is just 1 big continuum. It's not as of -- go back and read 2 years ago, the story is people write, NVIDIA really good at training, inference is easy any company could do that. And therefore, do you guys remember that. Inference is super hard. Look at this chart. It's super hard, it's going to get way harder Inference is thinking, it's working, it's doing things. How could that be easy? I thought my life was easy pre high school, not post-high school. Pre-high score super hard. After that, it was -- after that was super hard. And so I think people just got it all completely backwards. And they just -- they wanted to make up stories that rationalized their opportunity, which is fine. But you had a reason about it from first principles. And I take a long time answering questions for you guys instead of a short, highly curated, super well-selected precisely adjusted verbs announced. And the reason for that is because I want you guys to learn how to reason through these things. So when you see it yourself, you go, no, that's not making sense or if that makes sense or we could -- because you're analysts, you need to be able to understand these things. Okay. All right, guys. Thank you very much. Thanks for coming to GTC.

For developers and AI pipelines

Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.