NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary
March 19, 2024
Earnings Call Speaker Segments
Jensen Huang
executiveGood morning.
Colette Kress
executiveGood morning. We've got everything here. I can stay all day.
Jensen Huang
executiveNice to see all of you. All right. What's the game plan?
Colette Kress
executiveOkay. Well, we've got a full house, and we're thanking you all for coming out for our first in-person in such a long time. Jensen and I are here to kind of really go through any questions that you have, questions from yesterday. And we're going to go through a series of folks that are going to be in the aisles that you can just reach out to us. Raise your hand, we'll get to you with a mic. And Jensen are here to answer any questions from yesterday. We thought that would be a better plan for you. I know you have already asked quite a few questions, both last night and this morning. But rather than giving you a formal presentation, we're just going to go through a good Q&A today. Sounds like a good plan? I'm going to turn it to Jensen to see if he wants to add some opening remarks because we have just a quick introduction, and we'll do it that way. Okay?
Jensen Huang
executiveYes. Thank you. Thank you. First, great to see all of you. There were so many things I wanted to say yesterday and probably have -- and I wanted to say better. But I got to tell you, I've never presented at a rock concert before. I don't know about you guys, but I've never presented at a rock concert before. I had simulated what it was going to be like. But when I walked on stage, it still took my breath away. And so anyways, I did the best I could. Next, after the tour, I'm going to do a better job, I'm sure. I just need a lot more practice. But there were a few things I wanted to tell you. Is there a clicker? Oh, look at that. See? This is like spatial computing. It's -- by the way, if you get a -- I don't know you'll get a chance because it takes a little setup, but if you get a chance to see Omniverse in Vision Pro, it is insane, completely incomprehensible how realistic it is. All right. So we spoke about five things yesterday. And I think the first one really deserves some explanation. I think the first one is, of course, this new industrial revolution. There are two things that are happening, two transitions that are happening. The first is moving from general-purpose computing to accelerated computing. If you just looked at the extraordinary trend of general-purpose computing, it has slowed down tremendously over the years. And in fact, we've known that it's been slowing down for about a decade. And people just didn't want to deal with it for a decade, but you really have to deal with it now. And you can see that people are extending the depreciation cycle of their data centers as a result. You could buy a whole new set of general-purpose servers and it's not going to improve your throughput of your overall data center dramatically. And so you might as well just continue to use what you have for a little longer. That trend is never going to come -- never going to reverse. General-purpose computing has reached this end. We're going to continue to need it, and there's a whole lot of software that runs on it. But it is very clear we should accelerate everything we can. There are many different industries that have already been accelerated, some that are very large workloads that we really would like to accelerate more. But the benefits of accelerated computing is very, very clear. One of the areas that I didn't spend time on yesterday that I really wanted to was data processing. NVIDIA has a suite of libraries that, before you could do almost anything in a company, you have to process the data. You have to, of course, ingest the data and the amount of data is extraordinary, zettabytes of data being created around the world. It's doubling every couple of years. even though computing is not doubling every couple of years. So you know that data processing, you're on the wrong side of that curve already on data processing. If you don't move to accelerated computing, your data processing bill will just keep on going up and up and up and up. And so for a lot of companies that recognize this, AstraZeneca, Visa, Amex, Mastercard, so many, so many companies that we work with. They've reduced their data processing expense by 95%, basically 20x reduction. To the point the acceleration is so extraordinary now with our suite of libraries called RAPIDS, that the inventor of Spark who started a great company called Databricks, and they are the cloud, large-scale data processing company; they announced that they're going to take Databricks, their Photon engine, which is their crown jewel, and they're going to accelerate that with NVIDIA GPUs. Okay. So the benefit of acceleration, of course, pass along savings to your customers, but very importantly so that you can continue to sustainably compute. Otherwise, you're on the wrong side of that curve. You'll never get on the right side of the curve. You have to accelerate. The question is today or tomorrow, okay? So accelerated computing. We accelerated algorithm so quickly that the marginal cost of computing has declined so tremendously over the last decade, that it enabled this new way of doing software called generative AI. Generative AI, as you know, requires a lot of FLOPS, a lot of FLOPS, a lot of computation. It is not a normal amount of computation and the same amount of computation. And yet it can now be done cost effectively that consumers can use this incredible service called ChatGPT. And so it's something to consider that accelerated computing has dropped, has driven down the marginal cost of computing so far that enabled a new way of doing something else. In this new way is software written by computers. With a raw material called data, you apply energy to it. There's an instrument called GPU supercomputers. And what comes out of it are tokens that we enjoy. When you're interacting with ChatGPT, you're getting all -- it's producing tokens. Now that data center is not a normal data center. It's not a data center that you know of in the past. The reason for that is this, it's not shared by a whole lot of people. It's not doing a whole lot of different things. It's running one application 24/7, and its job is not just to save money, its job is to make money. It's a factory. This is no different than an AC generator of the last industrial revolution. And it's no different than the raw material coming in is, of course, water, they applied energy to it and turns in electricity. Now it's data that comes into it. It's refined using data processing and then, of course, generative AI models. And what comes out of it is valuable tokens. This idea, this idea that we would apply this basic method of software, token generation, what some people call inference, but token generation. This method of producing software producing data, interacting with you, ChatGPT is interacting with you. This method of working with you, collaborating with you, you extend us as far as you like, copilots to artificial intelligence agents, you extend the idea as long as you like, but it's basically the same idea. It's generating software. It's generating tokens. And it's coming out of this thing called an AI generator that we call GPU supercomputers. Does that make sense? And so the two ideas, one is the traditional data centers that we use today should be accelerated, and they are. They're being modernized. Lots and lots of it and more and more industries one after another. And so what is $1 trillion of data centers in the world will surely all be accelerated someday. The question is how many years would it take to do? But because of the second dynamic, which is its incredible benefit in artificial intelligence, it's going to further accelerate that trend. Does that make sense? However, the second data center, the second type of data center called AC generators -- or excuse me, AI generators or AI factories, as I've described it as, this is a brand-new thing. It's a brand-new type of software, generating a brand-new type of valuable resource, and it's going to be created by companies, by industries, by countries and so on and so forth, a new industry. I also spoke about our new platform. People are -- there are a lot of speculations about Blackwell. Blackwell is both a chip at the heart of the system but it's really a platform. It's basically a computer system. What NVIDIA does for a living is not build the chip, we build an entire supercomputer, from the chip to the system to the interconnects, the NVLinks, the networking, but very importantly, the software. Could you imagine the mountain of electronics that are brought into your house, how are you going to program it? Without all of the libraries that we've created over the years in order to make it effective, you've got a couple of billion dollars worth of assets you just brought into your company. And any time it's not utilized, it's costing you money. And the expense is too incredible. And so our ability to help companies, not just buy the chips, but to bring up the systems and put it to use and then working with them all the time to make it -- put it to better and better and better use, that is really important, okay? That's what NVIDIA does for a living. The platform we call Blackwell has all of these components associated with that I showed you at the end of the presentation that give you a sense of the magnitude of what we've built, all of that we then disassemble. This is the part that's incredibly hard about what we do. We build this vertically integrated things, but we build it in a way that can be disassembled later and for you to buy it in parts, because maybe you want to connect it to x86, maybe you want to connect it to a PCI Express fabric. Maybe you want to connect it across a whole bunch of fiber, okay, optics. Maybe you want to have very large NVLink domains. Maybe you want smaller NVLink domains. Maybe you can use Arm, maybe so on and so forth, doesn't make sense. Maybe you would like to use Ethernet, okay? Ethernet is not great for AI. It doesn't matter what anybody says. You can't change the facts, and there's a reason for that. There's a reason why Ethernet is not great for AI, but you can make Ethernet great for AI. In the case of the Ethernet industry, it's called Ultra Ethernet. So in about 3 or 4 years, Ultra Ethernet is going to come. It will be better for AI. But until then, it's not good for AI. It's a good network, but it's not good for AI. And so we've extended Ethernet. We've added something to it. We call it Spectrum-X that basically does adaptive routing. It does congestion control. It does noise isolation. Remember, remember, when you have chatty neighbors, that takes away from the network traffic and AI. AI is not about the average throughput. AI is not about the average throughput of the network, which is what Ethernet is designed for, maximum average throughput. AI only cares about when did the last student turn in their partial product. It's the last person. A fundamentally different design point. If you're optimizing for highest average versus the worst student, you will come up with a different architecture. Does that make sense? Okay? And because AI has AllReduce, all2all, AllGather, just look it up in the algorithm, the transformer algorithm, the mixture of experts algorithm, you'll see all of it. All these GPUs all have to communicate with each other, and the last GPU to submit the answer holds everybody back. That's how it works. And so that's the reason why the networking is such a large impact. Can you network everything together? Yes. But will you lose 10%, 20% of utilization? Yes. And what's 10% to 20% utilization if the computer is $10,000? Not much. But what's 10% to 20% utilization if the computer is $2 billion? It paid for the whole network, which is the reason why supercomputers are paid -- are built the way they are, okay? And so anyways, I showed examples of all these different components. And our company creates a platform and all the software associated with it, all the necessary electronics. And then we work with companies and customers to integrate that into their data center because maybe their security is different. Maybe their thermal management is different. Maybe their management plane is different. Maybe they want to use it just for one dedicated AI. Maybe they want to rent it out for a lot of people to do different AI with. The use cases are so broad, and maybe they want to build an on-prem, and they want to run VMware on it. And maybe somebody just wants to run Kubernetes, Somebody wants to run Slurm. I could list off all of the different varieties of environments and it is completely mind-blowing. And we took all of those considerations. And over the course of quite a long time, we've now figured out how to serve literally everybody. As a result, we could build supercomputers at scale. But basically, what NVIDIA does is build data centers, okay? We break it up into small parts, and we sell that as components. People think, as a result, we're a chip company. The third thing that we did was we talked about this new type of software called NIMs. These large language models are miracles. ChatGPT is a miracle. It's a miracle not just in what it's able to do, but the team that put it so that you can interact with ChatGPT in very high response rate, that is a world-class computer science organization. That is not a normal computer science organization. The OpenAI team that's working on this stuff is world class. It's a world-class team, some of the best in the world. Well, in order for every company to be able to build their own AI, operate their own AI, deploy their own AI, run it across multiple clouds, somebody is going to have to go do that computer science for them. And so instead of doing this for every single model for every single company, every single configuration, we decided to create the tools and the tooling and the operations, and we're going to package up large language models for the very first time. You can buy it. You can just come to our website, download it, and you can run it. And the way we charge you is all of those models are free, but when you run it, when you deploy it in an enterprise, the cost of the -- of running it is $4,500 per GPU per year, basically the operating system of running that language model, okay? And so the per-instance, the per-use cost is extremely low. It's very, very affordable, but the benefit is really great, okay? We call that NIMs, NVIDIA inference microservices. You take these NIMs, you take these NIMs, and you're going to have NIMs of all kinds. You get a NIMs of computer vision, you're going to have NIMs of speech and speech recognition and text-to-speech, and you're going to have facial animation, you're going to have robotic articulation, you're going to have all kinds of different types of NIMs. These NIMs, the way that you would use it is you would download it from our website and you would fine-tune it with your examples. You would give it examples. The way that you responded to that question isn't exactly right. It might be right in another company, but it's not right in ours. And so I'm going to give you some examples that are exactly the way we would like to have it. You show it your work products. This is the way -- this is what a good answer looks like. This is what right answer looks like, a whole bunch of them. And we have a system that helps you curate that process that tokenize that, all of the AI processing that goes along with it, all the data processing that goes along with it, fine-tuning that, evaluate that, guardrail that so that your AIs are very effective, number one, also very narrow. And the reason why you want it to be very narrow is because if you're a retail company, you would prefer your AI just didn't pontificate about some random stuff, okay? And so whatever the questions are, it guardrails it back to that lane. And so that guardrailing system is another AI. So we have all these different AIs that help you customize our NIMs. And you could create all kinds of different NIMs. And we gave you some frameworks for many of them. And one of the very important ones is understanding proprietary data because every company has proprietary data. And so we created a microservice called Retriever. It's state-of-the-art, and it helps you take your database, which is structured or unstructured images or graphs or charts or whatever it is, and we help you embed them. We help you extract the meaning out of that data. And then we take the -- it's called semantics. And when the semantic is embedded in a vector, that vector is now indexed into a new database called vector database, okay? And that vector database then afterwards, you can just talk to it. You say, "Hey, how many mammals do I have," for example, and it goes in there and it says, "Hey, look at that, you've got a cat, you have a dog, you have a giraffe. This is what you have in inventory." In your warehouse, you have, okay, so on and so forth, all right? And so all of that is called NeMo, and we have experts to help you. And then we put our -- we put a canonical NVIDIA infrastructure we call DGX Cloud in all of the world's clouds. And so we have DGX Cloud in AWS. We have DGX Cloud in Azure. We have DGX Cloud in GCP and OCI. And so we work with the world's enterprise companies, particularly the enterprise IT companies and we create these great AIs with them. But when they're done, they can run in DGX Cloud, which means we're effectively bringing customers to the world's clouds. A platform like us, a platform company brings system makers customers. And CSPs are system makers. They rent systems instead of sell systems, but they are system makers. And so we bring customers to our CSPs, which is a very sensible thing to do, just as we brought customers to HP and Dell and IBM and Lenovo and so on and so forth and Supermicro and CoreWeave, so on and so forth, we bring customers to CSPs because a platform company does that. Does it make sense? If you're a platform company, you create opportunities for everybody in your ecosystem. And so the DGX Cloud allows us to land all of these enterprise applications in the world CSPs. And if they want to do it on-prem, we have great partnerships with Dell that we announced yesterday, HP and others that you can land those NIMs in their systems. And then I talked about the next wave of AI, which is really about industrial AI. This -- you know that the vast majority of the world's industries, the largest in dollars, are heavy industries. And heavy industries have never really benefited from IT. They've not benefited from a lot of the design and all the digital -- it's called not digitization, but digitalization, putting it to use. They've not benefited from digitalization, not like our industry. And because our industry is completely digitalized, our technology advance is insanely great. We don't call it chip discovery, we call it chip design. Why did they call it drug discovery like tomorrow could be different than yesterday? Because it is. And it's so much -- it's so complicated, it's so complicated biology, it's so -- and the longitudinal impact is so great because, as you know, life evolves at a different rate than transistors. And so, therefore, cause and effect is harder to monitor because it happens over a large scale, large scale of systems and large scale of time. These are very complicated problems. Physics is very similar, okay? Industrial physics is very similar. And so we finally have the ability using large language models to the same technologies, if we can tokenize proteins, if we could tokenize -- well, if we can tokenize words, tokenize speech, tokenize images, we can tokenize articulation. This is no different than speech, right? We can tokenize proteins moving. That's no different than speech, okay? We can tokenize all these different things. We can tokenize physics, then we can understand its meaning, just like we've understood the meaning of words. If we can understand its meaning and we can connect it to other modalities, then we can do generative AI. So I'll just explain very quickly that 12 years ago, I saw it, our company saw it with ImageNet. The big breakthrough was literally 12 years ago, we said, "Huh, interesting, but what are we actually looking at? Interesting, but what are you looking at?" ChatGPT, I would say everybody should say, interesting, but what are we looking at? What are we looking at? We are looking at a computer software that can emulate you, emulate us. By reading our words, it's emulating the production of our words. Why -- if you can tokenize words, and if we could tokenize articulation, for example, why can't it imitate us and generalize it in a way that ChatGPT has. So the ChatGPT moment for robotics has got to be around the corner. And so we want to enable people to be able to do that. And so we created this operating system that enables these AIs to be able to practice in a physically based world, and we call it Omniverse. Omniverse is not a tool. Omniverse is not even an engine. Omniverse are APIs, technology APIs that supercharge other people's tools. And so I'm super excited about the announcement with Dassault. They're using -- they're connecting to Omniverse API to supercharge 3DEXCITE. Microsoft is connected to Power BI. Rockwell has connected it to their tools for industrial automation. Siemens has connected too there. So it's a bunch of APIs that is physically based and it produces image or articulation or -- and it connects a whole bunch of different environments. And so we -- these APIs are intended to supercharge third-party tools. And I'm super delighted to see the adoption across it, particularly in industrial automation. And so those are the five things that we did. I'll do this next one very quickly. I'm sorry, I took longer than I should. But let me do this next one really quickly. Look at that. All right. So this chart, don't over-stare at it, but it's basically it communicates several things. On top are developers. NVIDIA is a market maker, not share taker. The reason for that is everything we do doesn't exist when we started doing it. There is no such -- you just go up and down. In fact, even in -- originally, 3D computer games didn't exist when we started working on it. And so we had to go create the algorithms necessary for -- real-time ray tracing did not exist until we created it. And so all of these different capabilities did not exist until we created it. And once we created it, there are no applications for it. So we had to go cultivate and work with developers to integrate this technology we have just created so that applications could be benefited by it. I just explained that for Omniverse. We invented Omniverse. We didn't take anything from anybody, didn't exist. And in order for it to be useful, we now have to have developers, Dassault, ANSYS, Cadence, so on and so forth. Does that make sense? Rockwell, Siemens. We need the developers to take advantage of our APIs, our technologies. Sometimes they're in the form of an SDK. In the case of Omniverse, I'm super proud that it's in the form of cloud APIs because now it's so easy to use. You could use it in both ways, but APIs are much, much easier to use, okay? And we host Omniverse in the Azure cloud. And notice, whenever we connect it to a customer, we create an opportunity for Azure. So Azure is on the foundation. They're a system provider. Back in the old days, system providers used to be OEMs, and they continue to be. But system providers on the bottom, developers on top, we invent technology in the middle. The technology that we invent happens to be chip last. It's software first. And the reason for that is without a developer, there will be no demand for chips. And so NVIDIA is an algorithm company first, and we create these SDKs. They call them DSLs, domain-specific libraries. SQL is a domain-specific library. You might have heard that Hadoop is a domain-specific library, in-storage computing. NVIDIA's cuDNN is potentially the most successful domain-specific library short of SQL the world's ever seen. cuDNN is the domain-specific library, it's computation engine library for deep neural networks. Without DNN, none of them would have been able to use CUDA. So DNN was invented. Real-time ray tracing optics, which led to RTX, makes sense? And we have hundreds of domain-specific libraries. Omniverse is a domain-specific library. And these domain-specific libraries are integrated with developers on the software side, which then when the applications are created and there's demand for that application, creates opportunities for the foundation below. We are market makers, not share takers. Does that make sense? And so what is that -- what is the takeaway? The takeaway is you can't create markets without software. It has always been the case. That has never changed. You could build chips to make software run better, but you can't create a new market without software. What makes NVIDIA unique is that we're the only chip company, I believe, that can go create its own market, and notice all the markets we're creating. That's why we're always talking about the future. These are the things we're working on. We really -- nothing would give me more joy to work with the entire industry to create the computer-aided drug design industry, not drug discovery industry, drug design industry. We had to do drug design the way we do drug chip design, not chip discovery. And so I expect every single chip next year to be better than the one before, not as if I'm looking for truffles, which is discovery. Some days are good, some days are less good, okay? All right. So we have developers on top. We have our foundation on the bottom. The developers want something very, very simple. They want to make sure that your technology is performing, but they have to solve the problem that they couldn't solve any other way. But the most important thing for a developer is installed base. And the reason for that is they don't sell hardware, their software doesn't get used if nobody has the hardware to run it, okay? So what developers want is installed base. That has not changed since the beginning of the time, it's not changed now. Artificial intelligence. If you develop artificial intelligence software and you want to deploy so that people could use it, you need installed base. Second, the systems companies, the foundation companies, they want killer apps. That's the reason why killer app word existed. Because where there's a killer app, there's customer demand; where there's customer demand, you can sell hardware. And so it turns out, this loop is insanely hard to kick-start. And how many accelerated computing platforms can you really, really build? Can you have an accelerated computing platform for generative AI as well as industrial robotics, as well as quantum, as well as 6G, as well as weather prediction as well. And you can have all these different versions because some of it is good at fluids, some of it is good at particle, some of it is good at biology, some of it is good at robotics, some of it is good at AI, some of it is good at SQL. The answer is no. You need a general, sufficiently general-purpose accelerated computing platform, just as the last computing platform was insanely successful because it ran everything. Now NVIDIA, it's taken us a long time, but we basically run everything. If your software is accelerated, I am very certain it runs on NVIDIA. Does that make sense? Okay? If you have accelerated software, I am very, very certain it runs on NVIDIA. And the reason for that is because it probably ran on NVIDIA first. Okay? All right. So this is the NVIDIA architecture. I spoke about whenever I give keynotes, I tend to touch on all of them, different pieces of it, some new things that we did in the middle; in this case, Blackwell. I spoke about there were so many good stuff and you really had to go to our talks. There's like 1,000 talks. 6G research. How's 6G going to happen? Of course, AI. Of course, AI. And what are you going to use the AI for? Robotic MIMO. Why is MIMO so preinstalled, meaning that why does the algorithm come before the site? We should have site-specific MIMO just like robotic MIMO. And so reinforcement learning and it deals with the environment. And so 6G, of course, is going to be software defined. Of course, it's going to be AI. Quantum computing, of course, we should be a great partner for the quantum computing industry. How else are you going to drive a quantum computer to have the world's fastest computer sitting next to it? And how are you going to simulate a quantum computer, emulate a quantum computer? What is the programming model for a quantum computer? You can't just program a quantum computer all by itself. You need to have classical computing sitting next to it. And so the quantum would be kind of a quantum accelerator. And so that -- who should go do that? Well, we've done that. And so we work with all the industry on that. So across the board, some really, really great stuff. I wish I could have covered -- we could have a whole keynote just on all that stuff, but we covered the whole gamut. Okay? So that was kind of yesterday. Thank you for that.
Colette Kress
executiveOkay. We have them going around, and we'll see if we can grab your questions.
Jensen Huang
executiveThat was the question that I'm sure first question goes. If you could have done the keynote in 10 minutes, why didn't you just do yesterday in 10 minutes? Good question.
Benjamin Reitzes
analystBen Reitzes with Melius Research. So I wanted to ask you a little bit more about your vision with software. You are creating industries, you have a full-stack approach. It's clear your software makes your chips run better. Do you feel that your software business over the long term could be as big as your chip business is? How do you look at -- if we look in 10 years, are you -- and you're not a chip company. But what do you think you look like? Given what you're seeing with the momentum in software and how you're building these industries, it would seem like you're going to be a lot more.
Jensen Huang
executiveYes. Thank you, Ben. I appreciate that. First of all, I appreciate all of you coming. This is a very, very different type of an event, as you know. Most of the talks are software talks, and they're all computer scientists and they're talking about algorithms. What NVIDIA -- the NVIDIA software stack is about two things. It's either algorithms that help the computer run better. TensorRT-LLM is insanely complicated algorithm, and it explores the computing space in a way that most compilers never have to do. And TensorRT-LLM can even be built without a supercomputer. And it's very likely that TensorRT in the future -- TensorRT-LLM in the future actually just have to run on the supercomputer all the time and in order to optimize AIs for everybody's computer. And so that optimization problem is very, very complicated. So that would be an example of software that we create, the optimization, the run time. The second software we create is whenever there's an algorithm where the known -- the principal algorithm is well known, for example, Navier-Stokes, however -- a Schrödinger's equations . However, maybe the expression of it in a supercomputing or accelerated computing or a real-time way, ray tracing is a great example, real-time way has never been discovered. Does it make sense? Okay? And so as you know, Navier-Stokes is insanely complicated algorithm. And to be able to refactor that in a way that can run in real time is insanely complicated as well and requires a lot of invention. And some of the inventions, some of our computer scientists in our company have Oscars. They're award-winning computer scientists because they've solved these problems at such a large scale, that you use it for movies. And so their inventions are -- their algorithms, their data structures are computer science in itself, okay? And so we'll dedicate ourselves to these two layers. And then when you package it all -- back in the old days, that's useful for entertainment, media and entertainment, science, so on and so forth. But today, because AI has brought this technology so close to application, simulating molecules used to be a thing that you do in universities. Now you're going to do that at work. And so as we now reformulate all of these algorithms for the consumption of enterprise, it becomes enterprise software, enterprise software like nobody has ever seen before. We call them -- we're going to put them in NIMs, these packages. We'll have hundreds of them, and we'll manufacture these things and support them and maintain them and keep them performant and so on to support customers with it. And so we'll produce NIMs at a very large scale is my guess. And this is going to be -- we call that underneath, the entire bucket of software, we call it NVIDIA AI Enterprise. A NIM is basically an AI in a microservice for enterprise. And so my expectation is that this is going to be a very large business. And this is the part of the industrial revolution. If you saw that there's the IT industry, okay, today, SAPs and great companies, ServiceNows and Adobes and Autodesk and [indiscernible], that layer, that's today's IT industry. That's not where we're going to play. We're going to play on the layer above. That layer above is a bunch of AIs, and these algorithms really, we're the right company to go build them. And so we'll build some with them, we'll build some ourselves, but we'll package them up and deploy it at enterprise scale, okay? And so I appreciate you asking the question. And while she's walking there, go ahead. Yes.
Vivek Arya
analystVivek Arya from Bank of America Securities. So Jensen, my question is perhaps a little more near to medium term, which is just the size of the addressable market, because your revenues have gotten big so quickly. And when I look at how much they represent as a percentage of the spending of some of your large customers, they are like 30%, 40%, 50%, right, sometimes more. But when I look at how much money they are generating from generative AI, it's like less than 10% of their sales. So how long can this gap persist, right? And then more importantly, are we kind of midway through how much of their spending can be spent on your products? So just I think in the past, you have given us kind of a $1 trillion market going to $2 trillion. If you could just educate us on how large the market is. And where are we in that adoption curve based on how much it can be -- based on how much it's being monetized in the near to medium term?
Jensen Huang
executiveOkay. I'm going to first give you the super condensed version, and I'll come back and work it out, okay? So the answer for how big the market is, how big we can be has to do with the size of the market and what we sell. Remember, what we sell as a data center, I just broke it into parts. But in the end, I sold the data center. Notice that the last image you saw at the keynote, it's a reminder of what we actually sell. We showed a bunch of chips, but remember, we don't really sell that. The chips don't work all by themselves. You can buy the chips, but they don't work. You need to build them into a system, and most importantly, the system software and the ecosystem stack is really complicated. And so NVIDIA builds entire data centers for AI. And we just break it up into parts so that it fits into your company. So that's number one. What do we sell and what's the opportunity? The opportunity for the world today, the data center size is $1 trillion, right? And it's $1 trillion worth of installed, $250 billion a year. We sell an entire data center in parts. And so our percentage of that $250 billion per year is likely a lot, lot, lot higher than somebody who sells a chip. It could be a GPU chip or a CPU chip or a networking chip. That opportunity hasn't changed from before. But what NVIDIA makes is an accelerated computing platform, data center scale, okay? And so our percentage of $250 billion will likely be higher than the past. Now second question, how sustainable is it? There are two answers for that. One reason that you buy NVIDIA is for AI. If you would just build TPUs, if your GPU is only used for one application, then you have to hang your hat on 100% of that. What can you monetize of AI today? Token generation returns. However, if your value proposition is that AI token generation, but that AI training the model and, very importantly, reducing the cost of expense of computing, accelerated computing, sustainable computing, energy-efficient computing, that's what NVIDIA does for a living, at its core. It's just that we did it so well that generative AI was created, okay? And now people forgot that it's a little bit like our first application was computer graphics and the first application was games. We did that so well and we did it so passionately, people forgot we're an accelerated computing company. They thought, "Hey, you're a gaming company." And a whole generation of young people grew up, and once they learn they use RIVA 128 and they went to college with GeForce and then when they finally became an adult, they thought you were a gaming company. And so we just do -- we do accelerated computing so well. We do AI so well. People think that, that's all we do. But accelerated computing, it's $250 billion a year. $250 billion a year should go to accelerated computing with or without AI just for the sake of a sustainable computing, just to process SQL, which is, as you guys know, one of the largest consumption of computing in the world, okay? And so I would say $250 billion a year should go to accelerated computing, no matter what. And then on top of that is generative AI. How sustainable do I think generative AI is going to be? You know how I feel about it. I think we're going to be generating words, images, videos, proteins, chemicals, kinetic action manipulation. We're going to be generating forecast. We're going to be generating bill plans. We're going to be generating bill of materials. We're going to be generating, list goes on.
Stacy Rasgon
analystStacy Rasgon, Bernstein Research. I wanted to ask about the interplay within CPUs and GPUs. Most of the benchmarks, if not all of them that you showed yesterday, really run the Grace Blackwell system that had, I guess, 2 GPUs and 1 CPU, sort of doubled the CPU per GPU ratio versus Grace Hopper. You didn't talk a lot about benchmarks relative to the stand-alone GPUs. Is this a shift -- are you guys looking for much more CPU content, I guess, in these AI servers going forward? And then how do I think about the interplay between the Arm CPUs that you're developing in x86? Seems like you're putting a little less emphasis on the x86 side of things going forward.
Jensen Huang
executiveYes. Stacy, I appreciate the question. There's actually zero concern about either one of them. I think x86 and Arm are both perfectly fine for data centers. There's a reason why Grace is built the way it is. Grace is built in such a way -- the benefit of Arm is that we could mold the NVIDIA system architecture around the CPU so that we could create this thing called chip-to-chip. The NVLink that connects between the GPU and the CPU, we can make the two sides coherent, meaning when the CPU touches a register, it invalidates the same register on the GPU side. As a result, the two sides can work together on one variable coherently. You can't do that today between x86 and peripherals. And so we were able to solve some problems that we couldn't solve otherwise. And as a result, Grace Hopper is insanely great for CAE applications, which is multi-physics. Some of it is running on CPUs, some of it is running on GPUs. It's insanely great for different combinations of CPU and GPUs so that we can have very large memories associated with each maybe 1 GPU or 2 GPU coherently. And so we can solve some of these problems, data processing, for example, insanely great on Grace Hopper, okay? And so it's just harder to solve, not because the CPU itself, but because we couldn't adapt the system. Second, the reason why I showed -- I will say that there was one chart where I showed Hopper versus Blackwell on x86 systems, B100, B200, and then also GB200, which is the Grace Blackwell. The benefit of Grace Blackwell in that case wasn't because the CPU is better, it's because in the case of Grace Blackwell, we were able to create a larger NVLink domain. And that larger NVLink domain is really, really important for the next generation of AI. The next 3 years, the next 3, 5 years, which is as far as we can see right now, if you really want a good inference performance, you're going to need NVLink. That was the message I was trying to deliver. And we're going to talk more about this. It's abundantly clear now, these large language models, they're never going to fit on one GPU, okay? That's not the point anyways. And in order for you to be sufficiently responsive and have high throughput to keep the cost down, you need a lot more GPUs than what you even fit in. And in order to have a lot of GPUs working together without the overhead, the IO overhead getting in the way, you need NVLink. NVLink's benefit in inference. Everybody always thought NVLink's benefit is in training. NVLink's benefit in inference is off the charts. That's the difference between 5x and 30x. That was another 6x. It's all NVLink. NVLink in the new Tensor Core, excuse me, okay? And so the -- Grace gives us the ability to architect the system exactly as we need it, and it's harder to do it with x86. That's all. But we support both. We'll have 2 versions of both. And in the case of B100, it just slides into where H100 and H200 goes into. And so the adoption, the transition for Hopper to Blackwell is instantaneous. The moment it's available, you just slide it in, and then you can figure out what to do about the next data center, okay? So we get the benefit of extremely excellent performance at its limit of the architecture as well as easy-peasy transition.
Matthew Ramsay
analystIt's Matt Ramsay from TD Cowen. I wanted, Jensen, for you to comment on a couple of topics that I've been noodling on, one of which is NIMs that you guys talked about yesterday. It seems like a vertical-specific accelerant for people to get into AIE and onboard customers more quickly. I wonder if you could just give us an overview of how your company is going at broader enterprise and just what different vehicles there are for people to onboard into AI? The second topic is on power. My team has been spending a good bit of time on power. I'm trying to decide if I should spend more time there or less. Some of the systems you introduced yesterday are up to 100 kilowatts or more. I know that scale of computing couldn't be done without the integration that you guys are doing, but also we're getting questions on power generation at the macro level, power delivery to the cabinet at that density. I just would love to hear your thoughts about how your company is working with the industry to power these systems.
Jensen Huang
executiveOkay. I'll start with the second first. Power delivery, 100 kilowatts, as you know, for computers is a lot, but 100 kilowatts is a commodity. You guys know that, right? The world needs a lot more than 120 kilowatts. And so the absolute amount of power is not an issue. The delivery of the power is not an issue. And the physics of delivering the power is not an issue, and cooling 120 kilowatts is not an issue. We can all agree on that, okay? And so none of this is a physics problem. None of this requires invention. All of it requires supply chain planning. Makes sense? So that's the way -- and how big of a deal is supply chain planning? A lot. I mean we take it very seriously. And so we think about supply chain planning all the time. And you've got to go -- the reason why we have great partnerships with -- if you -- I think if you look at Vertiv, I think the front page is a paper that we wrote together. So Vertiv, NVIDIA engineers working on cooling systems, okay? And so Vertiv is very important in the supply chain of designing liquid cool and otherwise, data centers. We have great partnerships with Siemens. We have great partnerships with Rockwell. Schneider, for all good reasons. This is exactly the same as having great partnerships with TSMC and Samsung and SPIL and Wistron and so on and so forth. And so we're going to have to go -- our company supply chain relationships are quite broad and quite deep. And the fact that we build our own data centers really help that. We've been building supercomputers now for quite some time. This is not our first time. Our first supercomputer was DGX-1 in 2016, that kind of puts in perspective, and we built 1 every year. And this year, we're building several. And so the fact that we're building, it gives us tactile sensation of who we're working with, who are the best, and we do it for that very reason, one of the reasons for that. NIMS, there are 2 onboards, 2 ways to onboard into enterprise. There's the most impactful way and then there's the other way, okay? They're both important. The other -- I'll start with the other. The other way is that we're going to create these NIMs. We're going to put it on our website, and we're going to go through GSIs and a lot of solution providers, and they're going to help companies turn these NIMs into applications. And that's going to have a whole thing, that's going to have a whole thing, okay? And so that go-to-market includes large GSIs and smaller specialized GSIs and so on and so forth, okay? We have a lot of partnerships in that area. The other area that I think is really quite exciting, and I think that this is really where big action is going to happen is the $1 trillion of enterprise companies in the world. They create tools today. In the future, they're going to offer you tools plus copilots. Remember, the single most pervasive tool in the world is Office and the now Copilots for Office. There's another tool that is super important to NVIDIA, Synopsys, Cadence, ANSYS. We would like to have copilots for all of them. Notice, we're building copilots for our own tools. We call them ChipNeMo, and ChipNeMo is super smart. And ChipNeMo now understands NVIDIA lingo, NVIDIA chip talk, and it knows how to program NVIDIA programs. And so every engineer that we hire, the first thing we're going to tell them is, here's ChipNeMo and then there's the bathroom, and then there's the cafeteria, and so in that order. And so they'll be productive right away. While they're eating lunch, they could -- ChipNeMo could be doing some stuff. And so that just gives you an example. But we have copilots that are being built on top of our own tools all over the place. Most companies probably can't do this, and we can teach the GSIs to do this. But in the area of these tools, Cadence and others, they're going to build their own copilots, and they will rent them out as -- hire them out as engineers. I think they're sitting on a gold mine. SAP is going to do that. ServiceNow is going to do that, and they're very specialized copilots. They understand languages like in the case of SAP, ABAP, is that right, which is a language that only an SAP lover would love. And as you know, ABAP is a very important language for the world's ERP systems. Every company runs on it. We use ABAP. And so now they have to go create a ChatABAP and that ChatABAP just like ChipNeMo, or ChatUSD that we created for Omniverse. And so Siemens will do that. Rockwell will do that, so on and so forth. Does that make sense? And that, I think, is another way you get to enterprise. And that ServiceNow is going to do that. Lots and lots of copilots that they're building. And that's how they can create another industry on top of their current industry. It's almost like an AI workforce industry. Yes. I'm super excited about the partnerships we have with all of them. I'm so excited for them. Every time I see them, I'm just -- I tell them, [ Andrew ], you're sitting on a gold mine, [indiscernible], you're sitting on a gold mine. I mean I'm so excited for them.
Timothy Arcuri
analystit's Tim Arcuri at UBS. I had a question also about the TAM. And it's more greenfield versus brownfield because up until now, H100 was pretty much all greenfield. So people weren't taking A100s and ripping them out and replacing them with H100s. Could B100 be the first time where you see some brownfield upgrades, where we go in and we rip out A100s and we replace them with B100 so that maybe the TAM, if the $1 trillion goes to $2 trillion, you have a 4-year placement cycle, you're talking about $500 billion, but much of that growth comes from upgrading the existing installed base? Wondering if you can comment on that.
Jensen Huang
executiveYes, a really good question. Today, we are upgrading the slowest computers in the data center, which will be the CPUs. And so that's what should happen. And then eventually, you'll get around to the Amperes, and then you get around to the Hoppers. I do believe that in 5, 6, 7, 8 years, you're going to -- we're going to be in a pick your year out there. I'm not picking one. I'm just saying, in the outer years, you're going to start seeing replacement cycles, obviously, of our own infrastructure. Yes, but I wouldn't think that, that's the best utilization of capital at the moment. Amperes are super productive, as you know.
Brett Simpson
analystIt's Brett Simpson here at Arete Research, and thanks for hosting a great event this last couple of days. My question was on inference. I wanted to get your perspective on, you put up some good performance numbers with the B100 in terms of how inference compares with H100. How -- what's the message you're giving to customers on cost of ownership around this new platform? And how do you think it's going to compare with ASICs or other inference platforms in the industry?
Jensen Huang
executiveI think for language models, large language models, Blackwell with the new transformer engine and NVLink is going to be very, very, very hard to overcome. And the reason for that is the dimensionality of the problem is so large. And TensorRT-LLM, this exploration tool, this optimization compiler that I talked about, the architecture underneath the Tensor Cores are programmable. NVLink allows you to connect a whole bunch of GPUs working in tandem with very, very low overhead, basically no overhead, okay? And so as a result, 64 GPUs is the same as 1 programmatically. It's incredible. And so when you have 64 GPUs without overhead, without this NVLink overhead, if you have to go over the network like Ethernet, it's over. You can't do it. You just wasted everything. And because they all have to communicate with each other, it's called all2all. Whenever all have to communicate with each other, the slowest link is the bottleneck, right? It's no different than having a city on one side of the river, having a city on the other side of the river, that bridge, that's it. That's the throughput during -- that defines the throughput, okay? And that bridge will be Ethernet. On one side is NVLink, on the other side is NVLink, Ethernet in the middle makes no sense, so we have to turn that into NVLink. And so now we have all of the GPUs working together, generating tokens one at a time. Remember, the tokens cannot be -- it's not as if you splat out a token because the tokens, the transformer has to generate the tokens one at a time in sequence. And so this is a very complicated parallel computing problem, okay? And so I think the -- I think Blackwell has raised the bar a lot, just mountains, utterly mountains, ASIC or otherwise.
Christopher Muse
analystC.J. Muse with Cantor. Question on your pricing strategy. Historically, you talked about the more you buy, the more you save. But it sounds like initial pricing on Blackwell is coming in at perhaps maybe a lower premium than the productivity that you're offering. So curious, as you think about maybe Razer, Razer Blade and selling software and the full system, how that might cause you to kind of evolve your pricing strategy? And how we should think about kind of normalized margins within that construct?
Jensen Huang
executiveThe pricing that we create always starts from TCO. I appreciate that comment, C.J. The -- we always come from TCO. However, we also want to have the TCO not of the main body of customers. And so when the customers -- when you only have one particular domain of customers, let's say, it's molecular dynamics, then if it's only one application, then you set the TCO based on that one application. It could be a medical imaging system. And all of a sudden, the TCO is really very, very high, but the market size is quite small. In every single generation that goes by, our market size is growing, isn't that right? And we want to make the entire market be able to afford Blackwell. And so in a way, it's kind of a self-curing problem. As we solve for the TCO for a much larger problem, larger market, then some customers would get too much value, if you will, but that's okay. But you're making the business simpler having one basic product, and you're able to support a very, very large market. Now over time, over time, if the market were to bifurcate, then we can always segment. But that's -- we're nowhere near that today. And so I think we have the opportunity to create a product that delivers extraordinary value for many and extremely good value for all, and that's our purpose. Okay, yes.
Joseph Moore
analystJoe Moore from Morgan Stanley. It seems like the most impressive specs that you showed were around GB200, which you just described as a function of having that bigger NVLink domain. Can you contrast what you're doing with GB200 with what you did with GH200 and why you think it could be a much bigger product this time around?
Jensen Huang
executiveGreat question. The simple answer is GH100, GH200, Grace Hopper before it could really take off significantly, Grace Blackwell is already here. And Grace Hopper had the additional burden that Hopper didn't have. Hopper fit right into where Ampere left off. A100s went to H100s. They're going to go to B100s, so on and so forth. And so that particular chassis or that particular use case is fairly well established, and we'll just keep on moving. Software is built for it. People know how to operate it, so on and so forth. Grace Hopper is a little different. And it addressed a new class of applications that we didn't address very well before. And I was mentioning some of it earlier. Multi-physics problems with the CPU and GPU, we have to work closely together, very large datasets, so on and so forth. Difficult to paralyze, for example, those kind of problems. Grace Hopper was really good for it, okay? And so we started developing software for that. My recommendation for most customers is at this point, just gear for Grace Blackwell. And I have given them that recommendation. And so everything that they do with Grace Hopper will be completely architecturally compatible. That's the wonderful thing. And so whatever they have, whatever they buy is still fantastic. But I would recommend that they put all their energy into Grace Blackwell because it's so much better.
Unknown Attendee
attendeeI want to ask a question on robotics. It seems like every time we come back to GTC, you sneak some at the end and in a couple of years we go, "Wow, he's been talking about that for a while." I heard this week, you guys mentioned that robotics may be getting close to its ChatGPT moment. Can you describe what that means, and where you start to see that robotics evolution in kind of like our day-to-day lives? That would be super helpful.
Jensen Huang
executiveOkay. Several things. First of all, I appreciate that. I showed Earth-2, 2 years ago. And 2 years later, we have this new algorithm that is able to do regional, regional weather prediction at 3 kilometers. The supercomputer you need to do that is 25x larger -- excuse me, 25,000 times larger than the one that you currently use to do weather simulations at NOAA in Europe and so on and so forth. Three kilometer resolution is very, very, very high resolution, if you will, right above your head, okay? And weather simulation also requires a whole lot of voice called ensembles because the world is chaotic. And you want to simulate a lot of distribution, sample a lot of different parameters, a lot of different perturbations and try to figure out what is that distribution and that the middle of that distribution likely is going to be the weather pattern. Well, if it takes that much energy just to do it one time, they're not going to do it more than one time. But in order to predict where weather is going to be a week from now, especially extreme weather that can change so dramatically, you're going to need a lot of what they call members, a lot of ensemble members, a lot of samplings. And so you're basically doing -- we're basically doing weather simulation 10,000 times, okay? And because we trained an AI to understand physics and it's physically plausible and you can't just -- it can't hallucinate, and so it has to understand the laws of physics and such. And so I just -- 2 years ago, I showed it today, and we connected into the most trusted source of weather in the world, The Weather Company, okay? And so we're going to help people do regional weather all over the world. If you're a shipping company and you need to know weather conditions, if you're an insurance company, you need to know weather conditions. If you're in the Southeast Asia region, you have so many hurricanes and typhoons and things like that, you need some of this technology. And so we're going to help people adapt it for their region and their use case. Well, I did that a couple of years ago. The ChatGPT moment kind of works like this. Take a step back and ask yourself what happened with ChatGPT. The technology is insanely great, okay? It's really incredible. But there are several things that happened. One, it learned from a whole lot of human examples. We wrote the words, right? It was our words. So it learned from our human examples, and it generalized it, so it's not repeating back the words. So it can understand the context and it can generate original form. It understood the context, means that it adapted to itself, okay? Or it adapted to the current circumstance, the context. And then the third thing is it could now generate original tokens. Now I'm going to take everything back into tokens. Forget words, just tokens now, use all the same words that I just used, but replace words with tokens. If I could just figure out how to communicate with this computer, what this token means, okay? If I can just tokenize this, just as when you do speech recognition, you tokenized my sound, my voice. Just as we -- when we reconstructed proteins, we tokenize the amino acids. You can tokenize almost everything you can digitize, a simple way of representing each chunk of the data, okay? So once you can tokenize it, then you can learn it, we call it learning the embeddings of it, the meanings of it. And so if I can tokenize motion, okay, the world, and I can generalize -- and I can tokenize articulation kinematics and I can learn and generalize it and then generate, I just did the ChatGPT moment, how is it any different? The computer doesn't know. Now of course, the problem space is a lot more complicated because it's physical things. So you need this thing called alignment and what was the great invention of ChatGPT, reinforcement, learning, human feedback alignment, is that right? So it would try something and you say, "No, that's not as good as this." It would try something else, and you said, "No, that's not as good as this." Human feedback, reinforcement learning, and it keeps -- it takes that reinforcement and improves itself. And so what is Omniverse for? Well, if it's in a robot, then how would you do feedback? And what is feedback about? It's physical feedback, physics feedback. It generated a movement to go pick up a cup, but it tipped the cup over. It needs a reinforcement learning to know when to stop. Does it make sense? And so that feedback system is not human, that feedback system is physics. And that physics simulation feedback is called Omniverse. So Omniverse is reinforcement learning, physical feedback, which grounds the AI to the physical world, just as reinforcement learning human feedback grounds the AI to human values. Are you guys following me? I just described two completely different domains using exactly the same concepts. And so what I've done is I've generalized general AI. And by generalizing it, I can reapply it somewhere else. And so we made this observation some time ago, and we started preparing for this. And now you're going to find that Isaac Sim, which is a gym on top of Omniverse, is going to be super, super successful for just about anybody who is doing these robotic systems. We've created the operating system for robots. I'm sure there's a corporate answer for all the questions you guys ask, but unfortunately, I only know how to answer the one geek way.
Atif Malik
analystAtif Malik from Citigroup. I have a question for Colette. Colette, in your slides, you talked about availability for the Blackwell platform later this year. Can you be more specific? Is that the October quarter or the January quarter? And then on the supply chain readiness for the new products, is the packaging, particularly on the B200 CoWos-L, and how are you getting your supply chain ready for the new products?
Colette Kress
executiveYes. So let me start with your second part of the question, talking about the supply chain readiness. That's something that we've been working well over a year, getting ready for these new products coming to market. We feel so privileged to have the partners that work with us in developing out our supply chain. We've continued to work on resiliency and redundancy. But also, you're right, moving into new areas, new areas of CoWos, new areas of memory and just a sheer volume of components and complexity of what we're building. So that's well on its way, and we'll be here for when we are ready to launch our products. So there is also a part of our supply chain as we talked earlier today talking about the partners that will help us with the liquid cooling, the additional partners that will be ready in terms of building out the full of the data center. So this work is a very important part to ease the planning and the processing to put in all of our Blackwell different configurations. Going back to your first part of the question, which is when do we think we're going to come to market. Later this year, late this year, you will start to see our products come to market. Many of our customers that we have already spoken with talked about the designs, talked about the specs have provided us their demand, desires. And that has been very helpful for us to begin our supply chain work, to begin our volumes and what we're going to do. It's very true though that on the onset of the very first one coming to market, there might be constraints until we can meet some of the demand that's put in front of us. Hope that answers your question.
Atif Malik
analystYes.
Jensen Huang
executiveYes, that's right. And just remember that Hopper and Blackwell, they're used for people's operations and people need to operate today. And the demand is so great for Hoppers, they've -- most of our customers have known about Blackwell now for some time, just so you know, okay? So they've known about Blackwell. They've known about the schedule. They've known about the capabilities for some time. As soon as possible, we try to let people know so they can plan their data centers, and notice the Hopper demand doesn't change. And the reason for that is they have an operations they have to serve. They have customers today, and they have to run the business today, not next year.
Pierre Ferragu
analystPierre Ferragu, New Street Research. So like a geeky question on Blackwell.
Jensen Huang
executiveThank you. Thank you.
Pierre Ferragu
analystIt's a 2 dies and the 10 terabytes between the 2 dies. Can you tell us about how you achieved that? How much work you've put over the years into being able to achieve that technically like from a manufacturing standpoint? And then how you see the future in your road map looking further away? Do you think we're going to see more and more dies getting together into a single package? So that's one side of my question, which is more like on the chip and the architecture. And the other side is you must be seeing like all these models that are like some men say behind the veil of ignorance. And so can you tell us about what you see and how you see the next generation of models influencing your architecture? And so what's the direction of travel for GPU architecture for data center AI?
Jensen Huang
executiveYes. I'll start with the second. This is one of the great things about being the platform where all AI research is done. And so we get the benefit of seeing everything that's coming down the pike. And of course, all next-generation models are intended to push the limits of current generation systems to its limit. And so large context windows, for example, insanely large context windows; state-based vectors; synthetic data generation, essentially models talking to themselves; reinforcement learning, essentially AlphaGo of large language models; Tree Search, these models are going to have to learn how to reason and do multipath planning. And so instead of one shot, it's a little bit like us thinking we have to work through our plan. And that planning system, the reasoning system, multistep reasoning systems could be quite abstract and the path can be quite long, just like plain go. And so -- but the constraints are much, much more difficult to describe. And so this whole area of research is super, super exciting. The type of systems that we're going to see in the next several years, a couple to 3 years is unimaginable compared to today for the reasons I described. There are some concern about the amount of Internet data that's available for training these models, but that's just not true. 10 trillion tokens is great, but don't forget synthetic data generation, models talking to each other, reinforcement learning, the amount of data you're going to be generating, it's going to take 2 computers to train each other. Today, we have 1 computer training on data. Tomorrow, it's going to be 2 computers, right? Don't forget, remember, AlphaGo? It's multiple systems competing against -- playing against each other, okay, so that we could do that as quickly as possible. And so some really exciting groundbreaking work around the corner, all right. The one thing that we're certain is that the scale of these -- the scale of our GPUs, they want to be even bigger. The SerDes of our company is world class. NVIDIA's SerDes are absolutely the world's best. The data rate and the energy consumed, the data rate, the picojoule per bit, the picojoule per bit in our company is unbelievably good. It is the reason why we're able to do NVLink. Remember, NVLink was because we could not make a chip big enough. And so we connected 8 of them together. This was in 2016. We're on NVLink Gen 5. The rest of the world doesn't even have NVLink Gen 1 yet. NVLink Gen 5 allows us to connect 570 chips -- 576 chips together. They are together as far as I'm concerned. The data center is so big. Does it have to be this close together? No, not at all. And so it's okay to split them up 576 ways. And the SerDes are so low energy anyways. Now we could make even closer chips. Now the reason why we want that is because the software cannot tell the difference. When you break up chips, the algorithm should be build the largest chip that lithography can make and then put multiple of them together, however, whatever technology is available to do so. But you start by building the largest chip ever. Otherwise, why didn't we do multi-chip back in the old days? We just kept pushing right, monolithic as power. And the reason for that is because the data rate on chip and the energy on chip allows for the programming model to be as uniform as possible. You don't have these things, it's called, speaking of geeking out, NUMA, non-uniform memory access, right? So you don't have NUMA behavior. You don't have weird cache behavior. You don't have memory locality behavior, which causes the programs to work differently depending on the nodes, the systems they run on. We want our software to run exactly the same wherever they are. And so you start with the biggest chip possible. That's the first Blackwell die. We connected 2 of them together. The technology, 10 terabytes per second is insane. Nobody's ever seen 10 terabytes per second link before. That's 10 terabyte per second. And it obviously consumes very little power. Otherwise, it would be nothing but that link. And so we -- you had to solve that, number one. The second thing you had to solve was a question that came up before was CoWoS. It's the largest CoWos in the world. Because the first-generation CoWos was already the largest CoWos in the world, now the second generation is even larger. The benefit that we have is we're not surprised this time. The volume ramp demand happened fairly sharply last time. But this time, we've had plenty of visibility. And so Colette is absolutely right, we've worked with the supply chain, worked with TSMC very closely. We are geared up for an exciting ramp.
Aaron Rakers
analystAaron Rakers at Wells Fargo. I really appreciate all this detail. I'm actually going to dovetail off this last comment because today, you started the conversation by talking a little bit about Ethernet. And how Ethernet with Ultra...
Jensen Huang
executiveI love Ethernet.
Aaron Rakers
analystYes. So I want to understand a little bit NVLink, 576 GPUs now interconnected together, this idea of the fabric architecture, where does that play relative to the evolution of Ethernet, your Spectrum-4 product, this move to 800 gig? I'm just trying to understand the interplay between those, and whether or not you see NVLink competing with Ethernet in those environments.
Jensen Huang
executiveNo. First, the algorithm is actually very simple. First, build the largest die you possibly can, so big that if you added one more transistor, it would literally fall on the ground. That's algorithm number one. And look at the chips that we build. They're literally the larger, they're radical limits. Number two, if possible, connect 2 of them together. You're not going to connect 4 of them together. That's not going to happen. But if you can connect 2 of them together, and that's the Blackwell invention. We now know how to build dies that big. But beyond that, you're going to have all kinds of weird NUMA effects and locality effects. You might as well go to NVLink. And so once you get to NVLink, the question is -- and of course, we're in Gen 5. If you don't have NVLink, then you're kind of stuck, okay? You can't build systems like this. But if you have NVLink, then the next part is build NVLink as large as you can, modulated by power and cost. And that's the reason why NVLink is direct connect, it's direct drive. Not because optical transceivers are out of fashion. Optical, are you kidding me? We love optical. We need optical. We're going to use tons of optical, but you should build the NVLink as large as you can using copper because you could save a lot of power, you could save a lot of money. You can make it scalable, sufficiently scaled. Now you've got one giant chip, 576 GPU chip effectively. But that's only 576 GPU chips. That's not enough. And so we're going to have to connect multiple of them. The next click after that, the best thing you have is InfiniBand. The second best you have is Ethernet with an augmented computing layer on top of it we call Spectrum-X so that we can control the traffic that's in the system so that we don't have these long tails. Remember, as I said, the last one to finish determines the speed of the computer. This is not an average throughput. This is not like all of us individually are accessing hyperscale and our average throughput is good enough. This is literally the last person who finishes that partial product, who finishes that tensor, everybody else is waiting on him. I don't know who it is in this room that's going to be the last, but we're going to hope that, that person doesn't hold up, right? And so we're going to make sure that, that last one is we push everything to the middle. We only want one answer. It all shows up at the right time, okay? And so that's the second best. And then you scale that out as much as you can. And that's going to need optics and so on and so forth. There's a place for all of it. There's a place for all of it. I think if anybody is concerned about optics, don't be concerned. We're -- I think there's -- the demand for optics is very, very high. Demand for repeaters is very, very high. We didn't change anything about that. All we did was we made computers larger. We made GPUs larger. Can we take one more question? This is so much fun.
William Stein
analystOne last question from the buy side, Jensen. You've talked a lot about -- oh, I'm sorry.
Jensen Huang
executiveWhere is he? There he is.
William Stein
analystSovereign AI. Is there a way to sort of understand like what you're going to do for the United Arab Emirates? That would be one question. And I guess my second question is like I'm going to go home, I'm going to see my 91-year-old mother. How can I try to explain to some 91-year-old what accelerated computing -- I guess I've got a good -- answer the first question. I'll figure out the second one.
Jensen Huang
executiveOkay. Yes. I don't know what you're going to say on the second one, but on the second one, I would say, use the right tools for the right job. Yes. And right now, general-purpose computing, you're using the same tool for every single job. Literally, what you have is a screwdriver, and you're using it from the moment you woke up to the moment you go to bed. And so you start with you brushing your teeth with a screwdriver. It probably works. I haven't tried it, but probably works. And so you just use that one tool the whole day. Now of course, because you're going to use that one tool for the whole day, over time, humans have gotten pretty smart. And so we made that general-purpose tool. And so now the screwdriver has brushes on it. It's got hair on it. And so then it becomes useful for all kinds of stuff. And you could also use it to clean the bathroom and all that kind of stuff. And so one tool. Was that the answer you were going to give? All right. So we created basically two tools. We said the CPU is incredibly good at sequential things, and what it's not good at is parallel things. Now the parallel things, the weird thing is this. For most applications, let's say, Excel, the parallel part is not very much. That's the reason why CPUs are really the best processor for Excel. For your web browser, except for graphics that we came along later, most web browsers are largely single threaded, okay? Java is largely single threaded. And so for many applications of personal computing it's largely single threaded and the CPU is really quite ideal. And then all of a sudden, there's this new application that came along, computer graphics, video games, where literally, 1% of the code is 99% of the run time. Do you guys understand what I'm saying? 1% of the code is 99% of run time. And the reason for that is because it's computing the pixels one at a time. So 1% of the code is 99% of the run time. And we said, "Ha-ha. Look at that. How interesting." Why don't we take, go create something that's insanely good at 1% of the run time, meaning it's bad at 99% of the run time -- excuse me, bad at 99% of the code. It's good at 1% of the code. And we just go -- we go create applications or find applications where that 1% of the code is 99% of the run time, molecular dynamics, medical imaging, seismic processing, artificial intelligence, makes sense? That's why accelerated computing, data processing, so on and so forth, where 1% of the code is 99% of the run time. And that's the reason why we get such great speed up. All right. Your...
Colette Kress
executiveSovereign AI.
Jensen Huang
executiveSovereign AI. Every country has their own natural resource. And that natural resource is called their intelligence. It's in their language. India has their own language. They have many of them. Lots of different dialects. They have their language, their sensibility, their culture, their history, it belongs to them. It belongs to them. It belongs to them. And a lot of it is in their national archives and is digitized. It's not actually on the Internet. It belongs to them. They ought to take that and go create their own sovereign AI, and they believe the same. Sweden is the same way. Japan is going to do the same. You name it. Companies -- countries all over the world realize that this is their natural resource, and they shouldn't let it just be used by anybody to then import their natural resource back to them in an automated way by paying somebody else. Don't let their data go out for free and import AI. They now realize it ought to be the other way around, that they should keep their own data and then export AI. And so export the AI of Korea, export the AI of Malaysia, export the AI of, you name it, Middle East countries. And so we will -- we have export control limitations on our products. And in most of the areas, the answer is it's not export controlled. And if there's any export control, we can still work with the U.S. government and make sure that the export is going to be fine. But we, number one, just make sure that we are compliant with export control. And in some countries, we have to offer degraded products or I didn't say that right, lower specification products. And -- but anyways, number one, just be compliant with export controls and help countries around the world to be able to do this.
William Stein
analystIt sounds like a big market.
Jensen Huang
executiveIt's a very big market. Yes, it's a very big market. There are going to be AIs that are going to be trained and continuously refined for just about every culture in the world. Thank you. Do you guys -- no. No. Thank you very, very much. I appreciate, Colette and I appreciate all of your support and interest in the company. And this is really quite an extraordinary time. It is not usual that we get to live through a time like this, where the single most important instrument of society is being reinvented after 60 years, that a new way of doing software has emerged, and you know that software is one of the most important technologies that humanity has ever created, and that you're in the beginning of a new industrial revolution. And so the next 10 years, you don't -- definitely don't want to miss, all right? Thank you very much.
This call discussed
For developers and AI pipelines
Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.