NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary

June 2, 2024

NASDAQ US Information Technology Semiconductors and Semiconductor Equipment conference_presentation 135 min

Earnings Call Speaker Segments

Unknown Attendee

attendee
#1

[Presentation] [Foreign Language] Please welcome to the stage NVIDIA Founder and CEO, Jensen Huang.

Jensen Huang

executive
#2

[Foreign Language] I'm very happy to be back. Thank you, NTU, for letting us use your stadium. The last time I was here, I received a degree from NTU. And I gave the run, don't walk speech. And today, we have a lot to cover, so I cannot walk. I must run. We have a lot to cover. I have many things to tell you. I'm very happy to be here in Taiwan. Taiwan is the home of our treasured partners. This is, in fact, where everything NVIDIA does begins, our partners and ourselves take it to the world. Taiwan and our partnership has created the world's AI infrastructure. Today, I want to talk to you about several things: one, what is happening and the meaning of the work that we do together; what is generative AI; what is its impact on our industry and on every industry, a blueprint for how we will go forward and engage this incredible opportunity; and what's coming next. Generative AI and its impact, our blueprint and what comes next, these are really, really exciting times, a restart of our computer industry, an industry that you have forged, an industry that you have created, and now you're prepared for the next major journey. But before we start, NVIDIA lives at the intersection of computer graphics, simulations and artificial intelligence. This is our soul. Everything that I show you today is simulation. It's math. It's science. It's computer science. It's amazing computer architecture. None of it's animated, and it's all homemade. This is NVIDIA's soul, and we put it all into this virtual world we called Omniverse. Please enjoy. [Presentation]

Jensen Huang

executive
#3

[Foreign Language] I want to speak to you in Chinese; but I have so much to tell you, I have to think too hard to speak Chinese. So I have to speak to you in English. At the foundation of everything that you saw was 2 fundamental technologies, accelerated computing and artificial intelligence running inside the Omniverse. Those 2 technologies, those 2 fundamental forces of computing are going to reshape the computer industry. The computer industry is now some 60 years old. In a lot of ways, everything that we do today was invented the year after my birth in 1964. The IBM System/360 introduced central processing units, general-purpose computing, the separation of hardware and software through an operating system, multitasking, I/O subsystems, DMA, all kinds of technologies that we use today, architectural compatibility, backwards compatibility, family compatibility, all of the things that we know today about computing largely describe the 1964. Of course, the PC revolution democratized computing and put it in the hands and the houses of everybody. And then in 2007, the iPhone introduced mobile computing and put the computer in our pocket. Ever since, everything is connected and running all the time through the mobile cloud. This last 60 years, we saw several, just several, not that many, actually, 2 or 3 major technology shifts, 2 or 3 tectonic shifts in computing where everything changed, and we're about to see that happen again. There are 2 fundamental things that are happening. The first is that the processor, the engine by which the computer industry runs on, the central processing unit, the performance scaling has slowed tremendously. And yet, the amount of computation we have to do is still doubling very quickly, exponentially. If processing requirement, if the data that we need to process continues to scale exponentially but performance does not, we will experience computation inflation. And in fact, we're seeing that right now as we speak. The amount of data center power that's used all over the world is growing quite substantially. The cost of computing is growing. We are seeing computation inflation. This, of course, cannot continue. The data is going to continue to increase exponentially, and CPU performance scaling will never return. There is a better way. For almost 2 decades now, we've been working on accelerated computing. CUDA augments a CPU, offloads and accelerates the work that a specialized processor can do much, much better. In fact, the performance is so extraordinary that it is very clear now, as CPU scaling has slowed and substantially stopped, we should accelerate everything. I predict that every application that is processing intensive will be accelerated and surely, every data center will be accelerated in the near future. Now accelerated computing is very sensible. It's very common sense. If you take a look at an application and here, the 100t means 100 units of time. It could be 100 seconds. It could be 100 hours. And in many cases, as you know, we're now working on artificial intelligence applications that run for 100 days. The 1t is code that is -- requires sequential processing where single threaded CPUs are really quite essential. Operating systems, control logic, really essential to have one instruction executed after another instruction. However, there are many algorithms, computer graphics is one, that you can operate completely in parallel. Computer graphics, image processing, physics simulations, combinatorial optimizations, graph processing, database processing and, of course, the very famous linear algebra of deep learning, there are many types of algorithms that are very conducive to acceleration through parallel processing. So we invented an architecture to do that. By adding the GPU to the CPU, the specialized processor can take something that takes a great deal of time and accelerate it down to something that is incredibly fast. And because the 2 processors can work side by side, they're both autonomous and they're both separate, independent, that is, we could accelerate what used to take 100 units of time down to 1 unit of time. Well, the speed up is incredible. It almost sounds unbelievable. It almost sounds unbelievable, but today, I'll demonstrate many examples for you. The benefit is quite extraordinary, a 100x speed up but you only increase the power by about a factor of 3, and you increased the cost by only about 50%. We do this all the time in the PC industry. We added GPU, a $500 GPU, GeForce GPU to a $1,000 PC and the performance increases tremendously. We do this in a data center. A $1 billion data center, we add $500 million worth of GPUs and all of a sudden, it becomes an AI factory. This is happening all over the world today. Well, the savings are quite extraordinary. You're getting 60x performance per dollar, 100x speed up. You only increase your power by 3x. 100x speed up, you only increase your cost by 1.5x. The savings are incredible. The savings are measured in dollars. It is very clear that many, many companies spend hundreds of millions of dollars processing data in the cloud. If it was accelerated, it is not unexpected that you could save hundreds of millions of dollars. Now why is that? Well, the reason for that is very clear. We've been experiencing inflation for so long in general purpose computing. Now that we finally came to -- we finally determined to accelerate, there's an enormous amount of captured loss that we can now regain, a great deal of captured retained waste that we can now relieve out of the system, and that will translate into savings, savings in money, savings in energy. And that's the reason why you've heard me say the more you buy, the more you save. And now I've shown you the mathematics. It is not accurate but it is correct, okay? That's called CEO math. CEO math is not accurate, but it is correct. The more you buy, the more you save. Well, accelerated computing does deliver extraordinary results, but it is not easy. Why is it that it saves so much money but people haven't done it for so long? The reason for that is because it's incredibly hard. There is no such thing as a software that you can just run through a C compiler, and all of a sudden, that application runs 100x faster. That is not even logical. If it was possible to do that, they would have just changed the CPU to do that. You, in fact, have to rewrite the software. That's the hard part. The software has to be completely rewritten so that you could refactor, reexpress the algorithms that was written on a CPU so that it could be accelerated -- offloaded, accelerated and run in parallel. That computer science exercise is insanely hard. Well, we've made it easy for the world over the last 20 years. Of course, the very famous cuDNN, the deep learning library that processes neural networks. We have a library for AI physics that you could use for fluid dynamics and many other applications where the neural network has to obey the laws of physics. We have a great new library called Aerial that is a CUDA-accelerated 5G radio so that we can software define and accelerate the telecommunications networks the way that we've software defined the world's networking Internet. And so the ability for us to accelerate that allows us to turn all of telecom into essentially the same type of platform, a computing platform just like we have in the cloud. cuLitho is a computational lithography platform that allows us to process the most computationally intensive parts of chip manufacturing, making the mask. TSMC is in the process of going to production with cuLitho, saving enormous amounts of energy and more enormous amounts of money, but the goal for TSMC is to accelerate their stack so that they're prepared for even further advances in algorithm and more computation for deeper and deeper, narrow and narrow transistors. Parabricks is our gene sequencing library. It is the highest throughput library in the world for gene sequencing. cuOPT is an incredible library for combinatorial optimization, route planning optimization, the traveling salesman problem, incredibly complicated. People have -- well, scientists have largely concluded that you needed a quantum computer to do that. We created an algorithm that runs on accelerated computing that runs lightning fast. 23 world records, we hold every single major world record today. cuQuantum is an emulation system for a quantum computer. If you want to design a quantum computer, you need a simulator to do so. If you want to design quantum algorithms, you need a quantum emulator to do so. How would you do that? How would you design these quantum computers, create these quantum algorithms if the quantum computer doesn't exist? Well, you use the fastest computer in the world that exists today, and we call it, of course, NVIDIA CUDA. And on that, we have an emulator that simulates quantum computers. It is used by several 100,000 researchers around the world. It is integrated into all the leading frameworks for quantum computing, and it's used in scientific supercomputing centers all over the world. cuDF is an unbelievable library for data processing. Data processing consumes the vast majority of cloud spend today. All of it should be accelerated. cuDF accelerates the major libraries used in the world. Spark, many of you probably use Spark in your companies, pandas, a new one called polar and of course, NetworkX, which is a graph processing database library. And so these are just some examples. There are so many more. Each one of them had to be created so that we could enable the ecosystem to take advantage of accelerated computing. If we hadn't created cuDNN, CUDA alone wouldn't have been able -- wouldn't have been possible for all of the deep learning scientists around the world to use because CUDA and the algorithms that are used in TensorFlow and PyTorch, the deep learning algorithms, the separation is too far apart. It's almost like trying to do computer graphics without OpenGL. It's almost like doing data processing without SQL. These domain-specific libraries are really the treasure of our company. We have 350 of them. These libraries is what it takes and what has made it possible for us to have such opened so many markets. I'll show you some other examples today. Well, just last week, Google announced that they put cuDF in the cloud and accelerate pandas. Pandas is the most popular data science library in the world. Many of you in here probably already use pandas, is used by 10 million data scientists in the world, downloaded 170 million times each month. It is the Excel, that is the spreadsheet of data scientists. Well, with just one click, you can now use pandas in Colab, which is Google's cloud data centers platform accelerated by cuDF. The speedup is really incredible. Let's take a look. [Presentation]

Jensen Huang

executive
#4

That was a great demo, right? Didn't take long. When you accelerate data processing that fast, demos don't take long. Okay. Well, CUDA has now achieved what people call a tipping point, but it's even better than that. CUDA has now achieved a virtuous cycle. This rarely happens. If you look at history and all the computing architectures, computing platforms, in the case of microprocessor CPUs, it has been here for 60 years and has not been changed for 60 years. At this level, this way of doing computing, accelerated computing has been around -- has -- creating a new platform is extremely hard because it's a chicken and egg problem. If there are no developers that use your platform then, of course, there will be no users. But if there are no users, there are no installed base. If there are no installed base developers aren't interested in it. Developers want to write software for a large installed base. But a large installed base requires a lot of applications so that users would create that installed base. This chicken or the egg problem has rarely been broken and has taken us now 20 years, 1 domain library after another, 1 acceleration library, another. And now we have 5 million developers around the world. We serve every single industry from health care, financial services, of course, the computer industry, automotive industry, just about every major industry in the world, just about every field of science. Because there are so many customers for our architecture, OEMs and cloud service providers are interested in building our systems. System makers, amazing system makers like the ones here in Taiwan are interested in building our systems, which then takes and offers more systems to the market, which, of course, creates greater opportunity for us, which allows us to increase our scale, R&D scale, which speeds up the application even more. Well, every single time we speed up the application, the cost of computing goes down. This is that slide I was showing you earlier. 100x speedup translates to 97%, 96%, 98% savings. And so when we go from 100x speed up to 200x speed up to 1,000x speed up, the savings, the marginal cost of computing continues to fall. Well, of course, we believe that by reducing the cost of computing incredibly, the market developers, scientists, inventors will continue to discover new algorithms that consume more and more and more computing so that one day, something happens, that a phase shift happens, that the marginal cost of computing is so low that a new way of using computers emerge. In fact, that's what we're seeing now. Over the years, we have driven down the marginal cost of computing in the last 10 years in 1 particular algorithm by 1 million times. Well, as a result, it is now very logical and very common sense to train large language models with all of the data on the Internet. Nobody thinks twice. This idea that you could create a computer that could process so much data to write its own software, the emergence of artificial intelligence was made possible because of this complete belief that if we made computing cheaper and cheaper and cheaper, somebody is going to find a great use. Well, today, CUDA has achieved the virtuous cycle. Installed base is growing. Computing cost is coming down, which causes more developers to come up with more ideas, which drives more demand. And now we're in the beginning of something very, very important. But before I show you that, I want to show you what is not possible if not for the fact that we create a CUDA, that we created the modern version of -- the modern Big Bang of AI, generative AI. What I'm about to show you would not be possible. This is Earth-2. The idea that we would create a digital twin of the Earth, that we would go and simulate the Earth so that we could predict the future of our planet to better avert disasters or better understand the impact of climate change so that we can adapt better so that we could change our habits now, this digital twin of Earth is probably one of the most ambitious projects that the world has ever undertaken, and we're taking step -- large steps every single year, and I'll show you results every single year. But this year, we made some great breakthroughs. Let's take a look. [Presentation]

Jensen Huang

executive
#5

Someday in the near future, we will have continuous weather prediction at every square kilometer on the planet. You will always know what the climate is going to be. You will always know. And this will run continuously because we trained the AI, and the AI requires so little energy. And so this is just an incredible achievement. I hope you enjoyed it. And very importantly, [Foreign Language] The truth is that was a Jensen AI. That was not me. I wrote it, but an AI, Jensen AI had to say it. [Foreign Language] That is a miracle. That is a miracle indeed. However, in 2012, something very important happened. Because of our dedication to advancing CUDA, because of our dedication to continuously improve the performance of drive the cost down, researchers discovered -- AR researchers discovered CUDA in 2012. That was NVIDIA's first contact with AI. This was a very important day. We had the good wisdom to work with the scientists to make it possible for deep learning to happen, and AlexNet achieved, of course, a tremendous computer vision breakthrough. But the great wisdom was to take a step back and understanding what was the background. What is the foundation of deep learning? What is this long-term impact? What is its potential? And we realized that this technology has great potential to scale. An algorithm that was invented and discovered decades ago all of a sudden, because of more data, larger networks and very importantly, a lot more compute, all of a sudden, deep learning was able to achieve what no human algorithm was able to. Now imagine if we were to scale up the architecture even more, larger networks, more data and more compute, what could be possible? So we dedicated ourselves to reinvent everything. After 2012, we changed the architecture of our GPU to add Tensor Cores. We invented NVLink. That was 10 years ago now. cuDNN, TensorRT, NCCL, we bought Mellanox, TensorRT-LLM, the Triton Inference Server, and all of it came together on a brand-new computer nobody understood. Nobody asked for it. Nobody understood it. And in fact, I was certain nobody wanted to buy it. And so we announced it at GTC and OpenAI, a small company in San Francisco, saw it, and they asked me to deliver one to them. I delivered the first DGX, the world's first AI supercomputer to OpenAI in 2016. Well, after that, we continued to scale. From one AI supercomputer, one AI appliance, we scaled it up to large supercomputers, even larger. By 2017, the world discovered transformers so that we could train enormous amounts of data and recognize and learn patterns that are sequential over large spans of time. It is now possible for us to train these large language models to understand and achieve a breakthrough in natural language understanding. And we kept going after that. We built even larger ones. And then in November 2022, trained on thousands -- tens of thousands of NVIDIA GPUs in a very large AI supercomputer, OpenAI announced ChatGPT. 1 million users after 5 days, 1 million after 5 days, 100 million after 2 months, the fastest-growing application in history. And the reason for that is very simple. It is just so easy to use, and it was so magical to use, to be able to interact with a computer like it's human. Instead of being clear about what you want, it's like the computer understands your meaning. It understands your intention. I think, here, it asked the closest night market. As you know, the night market is very important to me. So when I was young, I was -- I think I was 4.5 years old. I used to love going to the night market because I just love watching people. And so we went -- my parents used to take us to the night market. [Foreign Language] And I love going. And one day, my face -- you guys might see that I have a large scar on my face. My face was cut because somebody was washing their knife and I was a little kid. But my memories of the night market is so deep because of that. And I used to love -- I just -- I still love going to the night market, and I just need to tell you guys this. The Tonghua Night Market is really good because there's a lady. She's been working there for 43 years. She's the fruit lady, and it's in the middle between the 2. Go find her. Okay? She -- [Foreign Language]. She's really terrific. I think it will be funny after this, all of you go to see her. She -- every year, she's doing better and better. Her cart has improved, and I just love watching her succeed. Anyways, anyways, ChatGPT came along. And something is very important in this slide. Here, let me show you something. This slide, okay, and this slide, the fundamental difference is this. Until ChatGPT revealed it to the world, AI was all about perception. Natural language understanding, computer vision, speech recognition. It's all about perception and detection. This was the first time the world saw a generative AI. It produced tokens, one token at a time. And those tokens were words. Some of the tokens, of course, can now be images or charts or tables, songs, words, speech, videos. Those tokens could be anything, anything that you can learn the meaning of. It could be tokens of chemicals, tokens of proteins, genes. You saw earlier, in Earth-2, we were generating tokens of the weather. We can learn physics. If you can learn physics, you could teach an AI model physics. The AI model can learn the meaning of physics, and it can generate physics. We were scaling down to 1 kilometer not by using filtering. It was generating. And so we can use this method to generate tokens for almost anything, almost anything of value. We can generate steering wheel control for a car. We can generate articulation for a robotic arm. Everything that we can learn, we can now generate. We have now arrived not at the AI era but a generative AI era. But what's really important is this. This computer that started out as a supercomputer has now evolved into a data center, and it produces one thing. It produces tokens. It's an AI factory. This AI factory is generating, creating, producing something of great value, a new commodity. In the late 1890s, Nikola Tesla invented an AC generator. We invented an AI generator. The AC generator generated electrons. NVIDIA's AI generator generates tokens. Both of these things have large market opportunities. It's completely fungible in almost every industry, and that's why it's a new industrial revolution. We have now a new factory producing a new commodity for every industry that is of extraordinary value. And the methodology for doing this is quite scalable, and the methodology of doing this is quite repeatable. Notice how quickly so many different AI models, generative AI models are being invented literally daily. Every single industry is now piling on. For the very first time, the IT industry, which is $3 trillion, $3 trillion IT industry is about to create something that can directly serve $100 trillion of industry. No longer just an instrument for information storage or data processing but a factory for generating intelligence for every industry. This is going to be a manufacturing industry. Not a manufacturing industry of computers but using the computers in manufacturing. This has never happened before, quite an extraordinary thing. What led -- started with accelerated computing led to AI, led to generative AI and now an industrial revolution. Now the impact to our industry is also quite significant. Of course, we could create a new commodity, a new product we call tokens for many industries but the impact to ours is also quite profound. For the very first time, as I was saying earlier, in 60 years, every single layer of computing has been changed. From CPUs, general purpose computing, to accelerated GPU computing, where the computer needs instructions, now computers process LLMs, large language models, AI models. And whereas the computing model of the past is retrieval-based, almost every time you touch your phone, some prerecorded text or prerecorded image or prerecorded video is retrieved for you and recomposed based on a recommender system to present it to you based on your habits. But in the future, your computer will generate as much as possible, retrieve only what's necessary. And the reason for that is because generated data requires less energy to go fetch information. Generated data also is more contextually relevant. It will encode knowledge. It will encode your understanding of you. And instead of get that information for me or get that file for me, you just say, ask me for an answer. And instead of a tool, instead of your computer being a tool that we use, the computer will now generate skills. It performs tasks and instead of an industry that is producing software, which was a revolutionary idea in the early '90s. Remember, the idea that Microsoft created for packaging software revolutionized the PC industry. Without packaged software, what would we use the PC to do? It drove this industry. And now we have a new factory, a new computer. And what we will run on top of this is a new type of software, and we call it NIMs, NVIDIA Inference Microservices. Now what happens is the NIM runs inside this factory. And this NIM is a pretrained model. It's an AI. Well, this AI is, of course, quite complex in itself, but the computing stack that runs AIs are insanely complex. When you go and use ChatGPT, underneath their stack is a whole bunch of software. Underneath that prompt is a ton of software, and it's incredibly complex because the models are large, billions to trillions of parameters. It doesn't run on just one computer, runs on multiple computers. It has to distribute the workload across multiple GPUs, Tensor parallelism, pipeline parallelism, data -- all kinds of parallelism, expert parallelism, all kinds of parallelism, distributing the workload across multiple GPUs, processing it as fast as possible because if you're in a factory, if you run a factory, your throughput directly correlates to your revenues. Your throughput directly correlates to quality of service, and your throughput directly correlates to the number of people who can use your service. We are now in a world where data center throughput, utilization is vitally important. It was important in the past but not vitally important. It was important in the past, but people don't measure it. Today, every parameter is measured, start time, uptime, utilization, throughput, idle time, you name it because it's a factory. When something is a factory, its operations directly correlate to the financial performance of the company. And so we realize that this is incredibly complex for most companies to do. So what we did was we created this AI in a box and the containers, an incredible amount of software. Inside this container is CUDA, cuDNN, TensorRT, Triton for inference services. It is cloud native so that you could auto scale in a Kubernetes environment. It has management services and hooks so that you can monitor your AIs. It has common APIs, standard APIs so that you could literally chat with this box. You download this NIM, and you can talk to it so long as you have CUDA on your computer, which is now, of course, everywhere. It's in every cloud, available from every computer maker. It is available in hundreds of millions of PCs. When you download this, you have an AI and you can chat with it like ChatGPT. All of the software is now integrated, 400 dependencies all integrated into one. We tested this NIM, each one of these pre-trained models against all kind -- our entire installed base that's in the cloud, all the different versions of Pascal and Amperes and Hoppers and all kinds of different versions. I even forget some. NIMs, incredible invention. This is one of my favorites. And of course, as you know, we now have the ability to create large language models and pre-trained models of all kinds. And we have all of these various versions whether it's language-based or vision-based or imaging-based or -- we have versions that are available for health care, digital biology. We have versions that are digital humans that I'll talk to you about. And the way we use this -- I'll just come to ai.nvidia.com. And today, we just posted up in Hugging Face, the Llama 3 NIM, fully optimized. It's available there for you to try. And you can even take it with you. It's available to you for free. And so you could run it in the cloud, run it in any cloud. You could download this container, put it into your own data center. And you could host it, make it available for your customers. We have, as I mentioned, all kinds of different domains, physics. Some of it is for semantic retrieval called RAGs, vision languages, all kinds of different languages. And the way that you use it is connecting these microservices into large applications. One of the most important applications in the coming future, of course, is customer service agents. Customer service agents are necessary in just about every single industry. It represents trillions of dollars of customer service around the world. Nurses are customer service agents in some ways. Some of them are nonprescription or nondiagnostics-based nurses, are essentially customer service. Customer service for retail, for quick service foods, financial services, insurance, just tens and tens of millions of customer service can now be augmented by language models and augmented by AI. And so these boxes that you see are basically NIMs. Some of the NIMs are reasoning agents, given a task, figure out what the mission is, break it down into a plan. Some of the NIMs retrieve information. Some of the NIMs might go and do search. Some of the NIMs might use a tool like cuOPT that I was talking about earlier. They could use a tool that could be running on SAP, and so it has to learn a particular language called ABAP. Maybe some NIMs have to do SQL queries. And so all of these NIMs are experts that are now assembled as a team. So what's happening? The application layer has been changed. What used to be applications written with instructions are now applications that are assembling teams, assembling teams of AIs. Very few people know how to write programs. Almost everybody knows how to break down a problem and assemble teams. Every company, I believe, in the future, will have a large collection of NIMs, and you would bring down the experts that you want. You connect them into a team. And you don't even have to figure out exactly how to connect them. You just give the mission to an agent, to a NIM to figure out who to break the task down and who to give it to. And that central -- the leader of the application, if you will, the leader of the team would break down the task and give it to the various team members. The team members would do their -- perform their task, bring it back to the team leader. The team leader would reason about that and present an information back to you just like humans. This is in our near future. This is the way applications are going to look. Now of course, we could interact with these large -- these AI services with text prompts and speech prompts. However, there are many applications where we would like to interact with what is otherwise a human-like form. We call them digital humans. NVIDIA has been working on digital human technology for some time. Let me show it to you. And what -- before I do that, hang on a second. Before I do that, okay, digital humans has the potential of being a great interactive agent with you. They make -- much more engaging. That could be much more empathetic. And of course, we have to cross this incredible chasm, this uncanny chasm of realism so that the digital humans would appear much more natural. This is, of course, our vision. This is a vision of where we love to go, but let me show you where we are. [Presentation]

Jensen Huang

executive
#6

Pretty incredible. Well, those ACE runs in the cloud, but it also runs on PCs. We had the good wisdom of including 10 Tensor Core GPUs in all of RTX. So we've been shipping AI GPUs for some time, preparing ourselves for this day. The reason for that is very simple. We always knew that in order to create a new computing platform, you need an installed base first. Eventually, the application will come. If you don't create the installed base, how could the application come? And so if you build it, they might not come. But if you build it -- if you don't build it, they cannot come and so we installed every single RTX GPU with Tensor Core processing. And now we have 100 million GeForce RTX AI PCs in the world, and we're shipping 200 and this COMPUTEX, we're featuring 4 new amazing laptops. All of them are able to run AI. Your future laptop, your future PC will become an AI. It will be constantly helping you assisting you in the background. The PC will also run applications that are enhanced by AI. Of course, all your photo editing, your writing and your tools, all the things that you use will all be enhanced by AI. And your PC will also host applications with digital humans that are AIs. And so there are different ways that AIs will manifest themselves and become used in PCs. But PCs will become a very important AI platform. And so where do we go from here? I spoke earlier about the scaling of our data centers. And every single time we scale, we found a new face change. When we scale from DGX into large AI supercomputers we enabled TRANSFORMERS to be able to train on enormously large data sets. Well, what happened was in the beginning, the data was human-supervised, it required human labeling to train AIs. Unfortunately, there is only so much you can human label. TRANSFORMERS made it possible for unsupervised learning to happen. Now TRANSFORMERS just look at an enormous amount of data or look at an enormous amount of video and look at enormous images, and it can learn from studying an enormous amount of data, find the patterns and relationships itself. Well, the next generation of AI needs to be physically based. Most of the AIs today don't understand the laws of physics. It's not grounded in the physical world. In order for us to generate images and videos and 3D graphics and many physics phenomenon. We need AIs that are physically based and understand the laws of physics. Well, the way that you could do that is, of course, learning from video is one source. Another way is synthetic data, simulation data. And another way is using computers to learn with each other. This is really no different than using AlphaGo having AlphaGo play itself, self-play and between the 2 capabilities, same capabilities, playing each other for a very long period of time. They emerge even smarter. And so you're going to start to see this type of AI emerging. Well, if the AI data is synthetically generated and using reinforcement learning, it stands to reason that the rate of data generation will continue to advance. And every single time, data generation grows, the amount of computation that we have to offer needs to grow with it. We are about to enter a phase where AIs can learn the laws of physics and understand and be grounded in physical world data. And so we expect that models will continue to grow. And we need larger GPUs. While Blackwell was designed for this generation. This is Blackwell and has several very important technologies. One, of course, is just the size of the chip. We took 2 of the largest -- a chip that is as large as you can make it at TSMC, and we connected 2 of them together with a 10 terabytes per second link between the world's most advanced SerDes connecting these 2 together. We then put 2 of them on a computer node connected with a gray CPU. The gray CPU could be used for several things. In the training situation, it could be used for fast checkpoint and restart. In the case of inference and generation, it could be used for storing context memory so that the AI has memory and understands the context of the conversation you would like to have. It's our second-generation Transformer Engine. Transformer Engine allows us to adapt dynamically to a lower precision based on the precision and the range necessary for that layer of computation. This is our second-generation GPU that has secure AI so that you could ask your service providers to protect your AI from being either stolen from theft or tampering. This is our fifth generation NVLink. NVLink allows us to connect multiple GPUs together, and I'll show you more of that in a second. And this is also our first generation with a reliability and availability engine. This system, this RAS system allows us to test every single transistor, flip-flop, memory on-chip, memory off chip so that we can in the field, determine whether a particular chip is failing. The MTBF, the mean time between failure, of a supercomputer with 10,000 GPUs is measured in hours. The mean time between failure of a supercomputer with 100,000 GPUs is measured in minutes. And so the ability for a supercomputer to run for a long period of time and train a model that could last for several months is practically impossible if we don't invent technologies to enhance its reliability. Reliability would, of course, enhances uptime, which directly affects the cost. And then lastly, decompression engine. Data processing is one of the most important things we have to do. We added a data compression engine, decompression engine so that we can pull data out of storage 20x faster than what's possible today. While all of this represents Blackwell, and I think we have one here that's in production. During GTC, I showed you Blackwell in a prototype state. The other side? This is why we practice. [Foreign Language] Ladies and gentlemen, this is Blackwell. Blackwell is in production, incredible amounts of technology. This is our production Board. This is the most complex, highest performance computer the world has ever made. This is the Grace CPU, and these are you could see each one of these Blackwell dies, 2 of them connected together. You see that. It is the largest die -- the largest chip the world makes and then we connect 2 of them together with a 10 terabyte per second link, okay? And that makes the Blackwell computer. And the performance is incredible. Take a look at this. So you see our -- the computational, the flops, the AI flops for each generation has increased by 1,000 times in 8 years. Moore's Law in 8 years is something along the lines of, oh, I don't know, maybe 40, 60 and in the last 8 years, Moore's Law has gone a lot, lot less. And so just to compare even Moore's Law at its best of times, compared to what Blackwell could do. So the amount of computation is incredible. And whenever we bring the computation high, the thing that happens is the cost goes down, and I'll show you. What we've done is we've increased through its computational capability, the energy used to train a GPT-4, 2 trillion parameter, 8 trillion tokens, the amount of energy that is used has gone down by 350x. Well, Pascal would have taken 1,000 gigawatt hours, 1,000 gigawatt hours means that it would take a gigawatt data center. The world doesn't have a gigawatt data center. But if you had a gigawatt data center, it would take a month. If you had a 100-watt, 100-megawatt data center, it would take about a year. And so nobody would, of course, create such a thing. And that's the reason why these large language models, ChatGPT wasn't possible only 8 years ago. By us driving down the increasing the performance, the energy efficient, while keeping and improving energy efficiency along the way, we have now taken with Blackwell, what used to be 1,000 gigawatt hours to 3, an incredible advance, 3 gigawatt hours. If it's a 10,000 GPUs, for example, it would only take 10,000 GPUs, I guess, it will take a few days, 10 days or so. So the amount of advance in just 8 years is incredible. Well, this is for inference, this is for token generation. Our token generation performance has made it possible for us to drive the energy down by 45,000x. 17,000 joules per token, that was Pascal, 17,000 joules is kind of like 2 light bulbs running for 2 days. It would take 2 light bulbs running for 2 days amounts of energy 200 watts running for 2 days to generate 1 token of GPT-4. It takes about 3 tokens to generate 1 word. And so the amount of energy used necessary for PASCAL to generate GPT-4 and have a ChatGPT experience with you was practically impossible. But now we only use 0.4 joules per token, and we can generate tokens at incredible rates and very little energy, okay? So Blackwell is just an enormous leap. Well, even so, it's not big enough. And so we have to build even larger machines. And so the way that we build it is called DGX. So this is our Blackwell chips, and it goes into DGX systems. So this is a DGX Blackwell. This has -- this is air-cooled, has 8 of these GPUs inside. Look at the size of the heat sinks on these GPUs, about 15 kilowatts, 15,000 watts and completely air cooled. This version supports x86 and it goes into the infrastructure that we've been shipping hoppers into. However, if you would like to have liquid cooling, we have a new system, and this new system, it's based on this Board, and we call it MGX for modular, and this modular system, you won't be able to see this, can they see this? Can you see this? You can? Are you? Okay. I see. And so this is the MGX system, and here's the 2 Blackwell board so this, 1 node has 4 Blackwell chips. These 4 Blackwell chips, this is liquid cooled, 9 of them, 9 of them -- well, 72 of these, 72 of these GPUs -- 72 of these GPUs are then connected together with a new NVLink. This is NVLink Switch fifth generation. And the NVLink Switch is a technology miracle. This is the most advanced switch the world's ever made. The data rate is insane. And these switches connect every single one of these Blackwells to each other so that we have one giant 72 GPU Blackwell. Well, the benefit, the benefit of this is that in 1 domain, 1 GPU domain, this now looks like 1 GPU. This 1 GPU has 72 versus the last generation of 8, so we increased it by 9x. The amount of bandwidth, we've increased by 18x. The AI flops would increase by 45x and yet the amount of power is only 10x. This is 100 kilowatts and that is 10 kilowatts. And that's for one. Now of course, well, you can always connect more of these together, and I'll show you how to do that in a second. But what's the miracle is this chip, this NVLink chip, People are starting to awaken to the importance of this NVLink chip as it connects all these different GPUs together because the large language models are so large, it doesn't fit on just 1 GPU. It doesn't fit on just 1 node. It's going to take the entire rack of GPUs like this new DGX that I was just standing next to, to hold a large language model that are tens of trillions of parameters large. NVLink Switch in itself is a technology miracle. It's 50 billion transistors, 74 ports at 400 gigabits each, 4 links, cross-sectional bandwidth of 7.2 terabytes per second. But one of the important things is that it has mathematics inside the switch so that we can do reductions, which is really important and deep learning right on the chip. And so this is what a DGX looks like now. And a lot of people ask us, they say -- and there's this confusion about what NVIDIA does. And how is it possible that the NVIDIA became so big building GPUs. And so there's an impression that this is what a GPU looks like. Now this is a GPU. This is one of the most advanced GPUs in the world, but this is a gamer GPU, but you and I know that this is what a GPU looks like. This is one GPU. Ladies and gentlemen, DGX GPU. The back of the GPU is the NVLink spine. The NVLink spine is 5,000 wires, 2 miles, and it's right here. This is an NVLink spine, and it connects 72 GPUs to each other. This is an electrical mechanical miracle. The transceivers makes it possible for us to drive the entire length in copper and as a result, this switch, the NV Switch, NVLink Switch driving the NVLink spine in copper makes it possible for us to save 20 kilowatts in 1 rack, 20 kilowatts can now be used for processing, just an incredible achievement. So this is the NVLink spine. And if you -- and even this is not big enough, even this is not big enough for AI factories, so we have to connect it all together with very high-speed networking. Well, we have 2 types of networking. We have InfiniBand, which has been used in supercomputing and AI factories all over the world. And it is growing incredibly fast for us. However, not every data center can handle InfiniBand because they've already invested their ecosystem in Ethernet for too long. And it does take some specialty and some expertise to manage InfiniBand switches and InfiniBand networks. And so what we've done is we've brought the capabilities of InfiniBand to the Ethernet architecture, which is incredibly hard. And the reason for that is this. Ethernet was designed for high average throughput. Because every single node, every single computer is connected to a different person than the Internet, and most of the communications is the data center with somebody on the other side of the Internet. However, deep learning and AI factories, the GPUs are not communicating with people on the Internet mostly, it's communicating with each other. They're communicating with each other because they're all -- they're collecting partial products, and they have to reduce it and then redistribute it, chunks of partial products, reduction, redistribution. That traffic is incredibly bursty. And it is not the average throughput that matters. It's the last arrival that matters because if you're reducing, collecting partial products from everybody, if I'm trying to take all of your [Foreign Language]. So it's not the average throughput is whoever gives me the answer last, okay? Ethernet has no provision for that. And so there are several things that we had to create. We created an end-to-end architecture so that the NIC and the Switch can communicate. And we applied 4 different technologies to make this possible: number one, NVIDIA has the world's most advanced RDMA. And so now we have the ability to have a network level RDMA for Ethernet that is incredibly great; number two, we have congestion control. The switch does telemetry at all times incredibly fast. And whenever the GPUs or the NICs are sending too much information, we can tell them to back off so that it doesn't create hotspots; number three, adaptive routing. Ethernet needs to transmit and receive in order. We see congestions or we see ports that are not currently being used. Irrespective of the ordering, we will send it to the available ports and BlueField, on the other end, reorders it so that it comes back in order. That adaptive routing incredibly powerful; and then lastly, noise isolation. There's more than one model being trained or something happening in the data center at all times. And their noise and their traffic to get into each other, it causes jitter. And so when the noise of 1 training model, 1 model training causes the last arrival to end up too late, it really slows down the training. Well, overall, remember, you have built a $5 billion or a $3 billion data center, and you're using this for training. If the utilization -- network utilization was 40% lower. And as a result, the training time was 20% longer. The $5 billion data center is effectively like a $6 billion data center. So the cost is incredible -- the cost impact is quite high. Ethernet with Spectrum-X basically allows us to improve the performance so much, but the network is basically free. And so this is really quite an achievement. We're very -- we have a whole pipeline of Ethernet products behind us. This is Spectrum-X800, it is 51.2 terabits per second and 256 Radix. The next one coming is 512 Radix is 1 year from now, 512 Radix, and that's called Spectrum-X800 Ultra and the one after that is X1600. But the important idea is this, X800 is designed for tens of thousands of GPUs. X800 Ultra is designed for hundreds of thousands of GPUs and X1600 is designed for millions of GPUs. The days of millions of GPU data centers are coming. And the reason for that is very simple. Of course, we want to train much larger models. But very importantly, in the future, almost every interaction you have with the Internet or with a computer will likely have a generative AI running in the cloud somewhere. And that generative AI is working with you, interacting with you, generating videos or images or text or maybe a digital human. And so you're interacting with your computer almost all the time, and there's always a generative AI connected to that. Some of it is on-prem, some of it is on your device and a lot of it could be in the cloud. These generative AIs will also do a lot of reasoning capability instead of just one shot answers, they might iterate on answers so that it improves the quality of the answer before they give it to you. And so the amount of generation we're going to do in the future is going to be extraordinary. Let's take a look at all of this put together. Now tonight, this is our first nighttime keynote. I want to thank all of you for coming out tonight at 07:00. And so what I'm about to show you has a new vibe, okay? There's a new vibe. This is kind of the nighttime keynote vibe. So enjoy this. [Presentation]

Jensen Huang

executive
#7

Now you can't do that on a morning keynote. I think that style of keynote has never been done in COMPUTEX ever, might be the last. Only NVIDIA can pull off that, only I could do that. Blackwell, of course, is the first generation of NVIDIA platforms that was launched at the beginning -- right as the world knows, the generative AI era is here. Just as the world realized the importance of AI factories just as the beginning of this new industrial revolution. We have so much support, nearly every OEM, every computer maker, every CSP, every GPU cloud, sovereign clouds, even telecommunication companies, enterprises all over the world, the amount of success, the amount of adoption, the amount of enthusiasm for Blackwell is just really off the charts, and I want to thank everybody for that. We're not stopping there. During the time of this incredible growth, we want to make sure that we continue to enhance performance, continue to drive down cost, cost of training, cost of inference and continue to scale out AI capabilities for every company to embrace. The further performance we drive up, the greater the cost decline. Hopper platform, of course, was the most successful data center processor probably in history. And this is just an incredible, incredible success story. However, Blackwell is here. And every single platform, as you'll notice, are several things. You got the CPU, you have the GPU, you have NVLink, you have the NIC, and you have the Switch. The NVLink Switch connects all of the GPUs together as large of a domain as we can and whatever we can do, we connect it with large -- very large and very high-speed switches. Every single generation, as you'll see, is not just the GPU, but it's an entire platform. We build the entire platform. We integrate the entire platform into an AI factory supercomputer. However, then we disaggregate it and offer it to the world. And the reason for that is because all of you could create interesting and innovative configurations and all kinds of different styles and to fit different data centers and different customers in different places, something for edge, some of it for telco. And all of the different innovation are possible if we made the systems open and make it possible for you to innovate. And so we designed it, integrated it, but we offer it to you disintegrated so that you could create modular systems. The Blackwell platform is here. Our company is on a 1-year rhythm. Our basic philosophy is very simple: one, build the entire data center scale, disaggregate it and sell to you in parts on a 1-year rhythm, and we push everything to technology limits. Whatever TSMC process technology will push it to the absolute limits, whatever packaging technology push it to the absolute limits, whatever memory technology push it to absolute limits, SerDes technology, optics technology, everything is pushed to the limit. Well, and then after that, do everything in such a way so that all of our software runs on this entire installed base. Software inertia is the single most important thing in computers. When a computer is backwards compatible and it's architecturally compatible with all the software that has already been created, your ability to go to market is so much faster. And so the velocity is incredible when we can take advantage of the entire installed base of software that has already been created. Well, Blackwell is here. Next year is Blackwell Ultra. Just as we had H100 and H200, you'll probably see some pretty exciting new generation from us for Blackwell Ultra, again pushed to the limits and the next-generation spectrum switches I mentioned. Well, this is the very first time that this next click has been made. And I'm not sure yet where I'm going to regret this or not. We have code names in our company, and we try to keep them very secret. Oftentimes, most of the employees don't even know. But our next-generation platform is called Rubin. The Rubin platform, the Rubin platform, I'm not going to spend much time on it. I know what's going to happen. You're going to take pictures of it and you're going to go look at the fine prints and feel free to do that. So we have the Rubin platform and 1 year later, we'd have the Rubin Ultra platform. All of these chips that I'm showing you here are all in full development, 100% of them. And the rhythm is 1 year at the limits of technology, all 100% architecturally compatible. So this is basically what NVIDIA is building and all of the richest software on top of it. So in a lot of ways, the last 12 years from that moment of ImageNet, and us realizing that the future of computing was going to radically change to today is really exactly as I was holding up earlier, GeForce pre-2012 and NVIDIA today. The company has really transformed tremendously, and I want to thank all of our partners here for supporting us every step along the way. This is the NVIDIA Blackwell platform. Let me talk about what's next. The next wave of AI is physical AI. AI that understands the laws of physics, AI that can work among us. And so they have to understand the world model so that they understand how to interpret the world, how to perceive the world, they have to, of course, have excellent cognitive capabilities so they can understand us, understand what we asked and performed the tasks. In the future, robotics is a much more pervasive idea. Of course, when I say robotics, there's a humanoid robotics that's usually the representation of that. But that's not at all true. Everything is going to be robotic. All of the factories will be robotic. The factories will orchestrate robots, and those robots will be building products that are robotic. Robots interacting with robots, building products that are robotic. Well, in order for us to do that, we need to make some breakthroughs and let me show you the video. [Presentation]

Jensen Huang

executive
#8

This isn't the future. This is happening now. There are several ways that we're going to serve the market. The first, we're going to create platforms for each type of robotic systems, one for robotic factories and warehouses. One for robots that manipulate things, one for robots that move and one for robots that are humanoid. And so each one of these robotics platform is like almost everything else we do, a computer, acceleration libraries and pretrained models, computers, acceleration libraries, pretrained models. And we test everything, we train everything and integrate everything insight, Omniverse, where Omniverse is as the video was saying, where robots learn how to be robots. Now of course, the ecosystem of robotic warehouses is really, really complex. It takes a lot of companies, a lot of tools, a lot of technology to build a modern warehouse and warehouses are increasingly robotic. One of these days will be fully robotic. And so in each one of these ecosystems, we have SDKs and APIs that are connected into the software industry, SDKs and APIs connected into edge AI industry and companies and then also, of course, systems that are designed for PLCs and robotic systems for the ODMs. It's been integrated by integrators creative for ultimately building warehouses for customers. Here, we have an example of CenMac building a robotic warehouse for Giant Group. [Presentation]

Jensen Huang

executive
#9

Okay. And then here, now let's talk about factories. Factories has a completely different ecosystem. And Foxconn is building some of the world's most advanced factories. Their ecosystem, again, edge computers and robotics, software for designing the factories, the workflows, programming the robots, and of course, PLC computers that orchestrate the digital factories and the AI factories. We have SDKs that are connected into each one of these ecosystems as well. This is happening all over Taiwan. Foxconn is building digital twins of their factories. Delta is building digital twins of their factories. By the way, half is real, half is digital, half is Omniverse. Pegatron is building digital twins of their robotic factories, Wistron is building digital twins of their robotic factories and this is really cold. This is a video of Foxconn's new factory. Let's take a look. [Presentation]

Jensen Huang

executive
#10

So a robotic factory is designed with 3 computers, train the AI on NVIDIA AI. You have the robot running on the PLC systems for orchestrating the factories and then you, of course, simulate everything inside Omniverse. Well, the robotic arm and the robotic AMRs are also the same way, 3 computer systems. The difference is the 2 Omniverse will come together. So they'll share 1 virtual space. When they share 1 virtual space, that robotic arm will become inside the robotic factory. And again, 3 computers, and we provide the computer, the acceleration layers and pretrained AI models. We connected NVIDIA manipulator and NVIDIA Omniverse with Siemens, the world's leading industrial automation software and systems company. This is really a fantastic partnership, and they're working on factories all over the world. Symantec Pick AI now integrates Isaac Manipulator and Symantec Pick AI runs, operates ABB, Kuka, Yaskawa, Fanuc, Universal Robotics and Techman. And so Siemens is a fantastic integration. We have all kinds of other integrations, let's take a look. [Presentation]

Jensen Huang

executive
#11

Robotics is here. Physical AI is here. This is not science fiction, and it's being used all over Taiwan and just really, really exciting. And that's the factory, the robots inside and, of course, all the products is going to be robotics. So there are 2 very high-volume robotics products. One, of course, is the self-driving car or cars that have a great deal of autonomous capability. NVIDIA again builds the entire stack. Next year, we're going to go to production with the Mercedes fleet, and after that in 2026, the JLR fleet. We offer the full stack to the world. However, you're welcome to take whichever parts, whichever layer of our stack just as the entire drive stack is open. The next high-volume robotics product that's going to be manufactured by robotic factories with robots inside will likely be humanoid robots. And this has great progress in recent years in both the cognitive capability because of foundation models and also the world understanding capability that we're in the process of developing. I'm really excited about this area because obviously, the easiest robot to adapt into the world are humanoid robots because we built the world for us. We also have the vast and most amount of data to train these robots than other types of robots because we have the same physique. And so the amount of training data we can provide through demonstration capabilities and video capabilities is going to be really great. And so we're going to see a lot of progress in this area. Well, I think we have some robots that we'd like to welcome. Here we go. About my size. And we have some friends to join us. So the future of robotics is here, the next wave of AI. And of course, Taiwan builds computers with keyboards. You build computers for your pocket. You build computers for data centers in the cloud. In the future, you're going to build computers that walk and computers that roll around. And so these are all just computers. And as it turns out, the technology is very similar to the technology of building all of the other computers that you already built today. So this is going to be a really extraordinary journey for us. Well, I want to thank -- I want to -- I want to thank -- I've made one last video, if you don't mind, something that we really enjoyed making. And if you -- let's run it. [Presentation]

Jensen Huang

executive
#12

[Foreign Language] Thank you. Thank you all for coming. Have a great COMPUTEX.

Marc Hamilton

executive
#13

Welcome, everyone. It's my great privilege to be kicking off the COMPUTEX Forum this morning. Last week, one of my engineers was showing me a computer-generated, an AI-generated news post on a social media site, and it said, Marc Hamilton, the famous actor, will be kicking off COMPUTEX Forum talking about AI. And the post went on to say even what movies I had starred in. I can assure you, I am not a famous actor. I actually work for NVIDIA and I build, help our customers and our partners build the AI factories that Jensen has been talking about. Those of you who know NVIDIA and know NVIDIA culture will know that we like to do. We don't like to talk. And so we talk quickly and we talk concisely, and we explain in simple terms what we do. So one day, several months ago, Jensen wrote a very short email, and he said, "Marc, what your team does is IBTG, infra, build, train, go." IBTG, and we use this now at NVIDIA. An AI factory is about the infrastructure. The infrastructure is, of course, not just many boxes of GPUs. The infrastructure has to be built, and we need all of our ecosystem partners to help us build that infrastructure and then it needs to be assembled into an AI factory. And why? As soon as you assemble it, you want to start training models. And after you train the models, you go. You run the models, you run your business, you start making money. So let's go ahead and get started, and I'll talk a little bit about how my team works with our partners, works with the ecosystem to help IBTG. These observations are not from PowerPoint slides or studies. This comes from installing hundreds of thousands of GPUs in AI factories. Our first AI factory at NVIDIA was built 8 years ago, 2016. My team and other engineers at NVIDIA started to build our first AI factory. We had just introduced at COMPUTEX, the P100, the NVIDIA P100 GPU. I'll show some compares of how we've gone in 8 years. But we've also learned many, many things in those 8 years about installing these systems. An AI factory is not regular data center or regular cloud, it is AI computing. An AI computing is not just about the GPU or adding the GPU to the server. An AI factory is an end-to-end solution. It is nearly impossible to simply build one part of the factory. If you only build a GPU, if you only build some software, if you only build part of the network, it's impossible to think through how you optimize from end to end. And it is, in fact, because NVIDIA for 8 years, has been building our own AI factories for our engineers to use. If you listen to the keynote or read it all about NVIDIA, you know how we don't just sell AI factory, we don't just build AI factory, but we use AI factory. We build a robotic software. We build AI software for autonomous vehicles. We build AI into our platforms for PC gaming and computer gaming that so many enthusiasts across Taipei, across Taiwan love. What are some of the lessons? How do I measure my team when they build hundreds of thousands of GPUs into hundreds of AI factories? We've actually come up with 3 very simple metrics. This is one of the things I love about NVIDIA. We take something very complicated building the world's most complex AI factory to train Llama 3, Mistral, other leading open-source models, and we condense or summarize into 3 things. The first, and I measure this for every AI factory I build, is time to first train, because if you're lucky enough to receive 1,000 or 10,000 GPUs, be it an end user, a cloud provider, another partner, you want to be able to quickly as possible, start training your models. You want to train your model so that you can be ahead of your competition, so that you can use the model to start running your business. Traditional supercomputers have often taken 6 months, 12 months, 18 months to set up. There has been, for many decades, a list of the world's TOP500 fastest supercomputers and many of these, of course, can run AI today if they are accelerated and be used. But these supercomputers were built by large national labs, by large universities where they had many PhD students, many researchers that -- their job was to build supercomputers. As AI factories move into the cloud, move into regular enterprises, we can't build every one of them differently like the TOP500 supercomputers. We need a recipe so that we can deploy them quickly. The GPU is simply too precious a resource to be deployed and sit in your data center while you try to assemble it. So time to first train. AI supercomputers actually are very good at traditional science as well. In fact, many of them end up being put on the TOP500 supercomputer list. There's a pharmaceutical company in the United States, Recursion Pharmaceutical. And several months ago, they ordered one of our DGX SuperPOD AI factories, and they waited a little bit too long. And so they were going to miss the deadline for the TOP500 list in June. So they came to us and said, "How quickly can you install it?" It was 512 GPUs, relatively small by the scale of systems today, but still relatively complex. That many GPUs, you might take months traditionally to set up. And the TOP500 deadline, by the time they receive their systems, was 1 week away, 5 days away. We were able to install their system in 1 week, run the TOP500 benchmark and get it on the top 50 of the TOP500 supercomputers. And it's there today. You can go look at top500.org and see Recursion Pharmaceutical. So time to first train. Second is GPU availability. Now you might say this is a funny metric. 100% of the GPU should be available. Now remember, Jensen talked about our latest GPU, the GB200 NVLink72. One rack, 600,000 components in that rack. And now many of those racks, millions of components in an AI factory. If you wait for every single one of those millions of components to be perfectly running, you will always be waiting. The software working together with the hardware has to work around failures in the system, be resilient to failures and continue to train the model, because these large models, 1 trillion parameter model may use tens of thousands of GPUs, many racks of GB200 for many weeks, maybe many months. Impossible to keep all the equipment going. So what is your availability at the time to first train? And finally, what is the time to train? To measure how good a job we did putting together the cluster and measure how good a job the OEM and our other partners did to make sure they followed our recipe, we compare against our internal supercomputer that is our reference, and I'll talk about some of our internal supercomputers. We get a time to train. We run this test for about 6 hours, can be run shorter or longer, depending on your needs. The test runs NVIDIA's own Megatron open-source LLM. Many people have maybe never heard about Megatron, certainly not as common as GPT or Llama or even Mistral, some of the other models. Because Megatron is open source, it's an open reference. Any OEM, any customer can get that benchmark, run it on their system, run it on their cloud and compare how good it is. And so for 250,000 GPUs that my team installs every few months, simply 3 numbers: time to first train, GPU availability and time to train. And with those 3 numbers, which we know for every size cluster, 1,000, 10,000, 20,000 and for every OEM, we can compare and judge how well we're doing. And why is that so important? It's, of course, so important because of the advances in generative AI that you're all hearing about. Now ChatGPT did a great thing. NVIDIA and many other companies had, of course, been working on generative AI for many years. [ Beta with ] researchers, it was in the lab, maybe some data scientists. The average CEO, the children and the parents of the average CEO had never heard of generative AI in large language models. And then in November of 2022, OpenAI released ChatGPT. And now not only every CEO, but the CEO's children or grandchildren and parents or grandparents for the younger CEOs, all of a sudden could see the power of GenAI. But of course -- and it's funny to talk about ChatGPT as that early or first generation of generative AI was relatively simple compared to what's being done today. It was simply text, chat, type in, type a few hundred words, get a few thousand words back. But now think about all the ways that we're learning to use generative AI. Generative AI is, of course, multimodal. It may not -- it could be text in, it could be audio in, it could be an image in, in any of those out. Think 100 words in, many images out. 100 words in, a multi-gigabyte video out. And then as GenAI applications are connected together with agents, you have one GenAI application calling another, calling another and interacting and then bringing you back a story operating a robot in a factory, showing a car how to drive down the street, automating many other tasks. So how does NVIDIA help customers move forward and get started? Beyond simply building GenAI factories. If we were to come in and build a GenAI factory or if you were to rent a GenAI factory from your favorite cloud provider, and all of the cloud providers work with NVIDIA and follow our blueprint for putting together these GenAI factories, then what do you do with that? Well, NVIDIA inference microservices or NIMs, a very short word, easy to pronounce, N-I-M, NIM. What is a NIM? Well, the challenge with -- the good thing about GenAI is that so much of it is done in the open source. You have, of course, proprietary models owned by companies. Not that proprietary is bad or ChatGPT, Gemini, other proprietary models. And then you have fully open-source models. In the open-source community, everyone can see as the code develops, as the models are released, can take advantage of those improvements in the model and other researchers can improve on top of that. And of course, companies will continue to [indiscernible] proprietary. And then when they're ready, release amazing new features for you to use and through APIs for their partners to use. But as a data scientist, as a developer in an enterprise, how do you get started with GenAI, with all these open-source models? A very good analogy is perhaps 20 years ago in the early days of Linux. Linux has, of course, open-source versions. The fact that the Linux kernel is open means that different companies around the world can go in and build Linux operating systems, add their value and distribute them. Red Hat, Red Hat Enterprise Linux, perhaps one of the best known enterprise distributions of Linux. And so what NVIDIA felt the world needed was not another GenAI model. NVIDIA, of course, has our own models under the Megatron name. Some are closed, some are open. There's great models out today and there's many open-source models. So we'll continue to develop our models when there's a need, but the open-source models are so good, but we needed an operating system for AI. Just like as an enterprise, when you started using Linux, you went to an operating system provider to get an enterprise copy of Linux. Where does an enterprise go to get an enterprise copy of everything you need to run -- to build and to run AI? So that is NVIDIA AI Enterprise, our AI operating system and our NVIDIA inference microservices. A microservice is, of course, means that you do not need to install the entire operating system. You can install the entire operating system in your data center and run a microservice on top of it using our APIs or you can simply call an NVIDIA inference microservice that is running in any of the clouds. So again, what then besides -- what is at the core of an NVIDIA inference microservice? We take all of the leading open-source models: Llama 3, Mistral, on and on, our own models, and we optimize them for our GPUs, not just for one GPU, but for all of our GPUs. The A100, which is still in many clouds, the H100 and the B100 and of course, thinking today about how do we optimize models in the future for new Vera Rubin GPUs coming out in 2026. Only NVIDIA could be thinking today about how we optimize AI software for a chip that is not even built yet. So what's the result when we put the 2 of these together? These are some of the internal supercomputers that NVIDIA has built. I mentioned that our first supercomputer was built back in 2016. It was called SATURNV. At the time, like the SATURNV rocket from NASA, we thought it was a very big supercomputer. It turned out being the 30th fastest supercomputer in the TOP500 list at the time. There was no measure back in 2016 of how fast an AI factory or AI supercomputer is. Today, of course, there's the MLPerf set of benchmarks. And these supercomputers that you see here have been, year after year, #1 in performance on MLPerf. And in a sense, that's not quite -- that's not very surprising. NVIDIA builds the chips. We build much of the software that's inside the large language model. We build entire supercomputers. So of course, it makes sense that we should be able to optimize and be among the fastest in the world. So our new superchip is called Blackwell, and we've heard a lot about it in the last several months, so I won't go and repeat too much here. But again, it includes many components built by our partners. The chip itself is, of course, fabricated by TSMC. We work very closely with TSMC on the 4N process for the transistors. And again, as great as TSMC is, we take all of their tools, all of their knowledge, our engineers sit together here in Taiwan and across the world, and we figure out how to make the process even better for Blackwell in future chips. It uses fast HBM memory from the 3 leading HBM memory providers. I believe Micron is talking later today, which is one of our partners for HBM. And what's the result when we put Blackwell together with all of our experience building AI factories? Well, again, this is looking at it in several different levels. And the first is you'll say, "Well, Marc, NVIDIA's chips keep on getting more expensive." And that's true if you look at the price of an individual chip. In fact, for $1 billion back in 2016, you could have bought about 45,000 chips. And today, you can only buy about 11,000 chips. But look at the performance improvements that you get. The GPU performance, the time it takes to train a model and the amount of power you use to run the model is all drastically reduced. So again, continuing to look at energy efficiency and to look at performance and price performance, i.e. value and delivering the best solutions. Our building blocks today for the AI factory of the future are around the GB200 NVLink72. As these systems get more complex, we have, again, so many great partners here in Taiwan that are building these individual parts, taking the NVLink Switch tray, taking the compute tray, taking the superchip module and building it into servers. As I said, it's not just about the server. It's understanding how all the connections work and how this works all the way out to the software. Thinking about just one component, NVLink. There's been a lot of discussion about NVLink the last few days. NVLink is not just a wire connecting to GPUs. NVLink, now in its fifth generation, is the entire end-to-end system for connecting the 72 GPUs in the rack. Jensen showed the large wiring connectors, 6 feet tall, the ability to send a signal out of one GPU through a set of NV switches through 6 feet of copper out to another GPU. NVLink in a sense is -- starts with software inside the GPU that initiates the communication, goes through the entire set of switches and cables, comes out the other end. It makes those 2 GPUs, those 72 GPUs act like one giant GPU. Now building the AI factory is a lot more than just putting the servers in the rack. In fact, the rack comes prebuilt in the factory, 2 miles of copper cables in the back of the rack. It weighs several thousand kilograms. You have tens of gallons of cooling liquid per minute flowing through the rack all the way to the GPUs, and it consumes 120 kilowatts of power. What it saves is thousands of kilowatts of power compared to older GPUs. Again, this system will cost millions of dollars, will start being deployed soon this year. And we're already in the process of planning very large data centers around the world for our partners, very large AI factories that will be deployed in every large cloud, in many regional clouds and in private data centers by private companies. So if you're not quite ready for this 72-GPU rack, we, of course, have smaller ways to get started. One of the new products we announced here this week at COMPUTEX was the GB200 NVLink 2. It's an industry standard MGX server that fits in any standard rack and it puts together 2 of the GB200 Superchip modules connected by NVLink. Very easy to deploy, very easy to scale out and ideal for inference. Remember, as Jensen has said, training the AI model, that can be done anywhere subject to constraints of where your data is. AI doesn't care where it goes to school. You don't always have to train your models in the city or the country or the data center. You can move that training to places where energy is less costly and more available. But then when you run your model, you want to run your model as close as possible to where the activity is. So this GB200 NVLink 2 being able to fit in a standard server in any data center of the world is super important. Now going the other way to build an entire data build an entire data factory, one rack of NVLink72 will not be enough. You, of course, read every day on how companies are using thousands or tens of thousands of GPUs to build AI models. So how do you connect together multiple of these racks? Again, this is very different than normal data center or normal cloud where you would simply take your racks of web servers or database servers and connect them together with regular networking. An AI factory is, of course, not regular compute, it's accelerated compute. So the DGX SuperPOD with DGX GB200 systems is a blueprint on how you can build an AI factory. This particular blueprint scales up to 576 GPUs in one scalable unit. And then you can then put multiple scalable units together with the 576-GPU building block, you can go up to 9,000 GPUs. Now of course, as I said, you've heard of customers using more than 9,000 GPUs today. So we continue to scale up. You can put together using a building block of 1,152 GPUs. You can build up to 18,000 GPUs. So this will be one of our standard AI factories deploying up to 18,000 GPUs. And to anyone in the audience that would like to build an AI factory of more than 18,000 GPUs, don't worry. Please come talk to your NVIDIA representative or to your OEM partner and we can -- we have designs to scale beyond this as well. And finally, while many people are used to InfiniBand networking, which Jensen showed continuing on our road map, for people who absolutely want to or would like to use Ethernet, we can now build AI factories with our Spectrum Ethernet. So I'd like to thank everyone for being here today, thank all of our partners that make the NVIDIA AI factories possible. We cannot build any of the AI factories, we cannot IBTG without these partners. So thank you very much.

Unknown Attendee

attendee
#14

Thank you, Mr. Hamilton, for your inspiring presentation.

This call discussed

For developers and AI pipelines

Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.