NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary

May 14, 2020

NASDAQ US Information Technology Semiconductors and Semiconductor Equipment conference_presentation 105 min

Earnings Call Speaker Segments

Jen-Hsun Huang

executive

#1

Hi. Welcome to NVIDIA GTC 2020. Our first kitchen keynote. I'm coming to you from my home in California, and I hope all of you are well and sheltering safely at home. Before we start, I want to take the opportunity to thank all the brave men and women who are fighting in the front lines against COVID-19: the nurses, the doctors, the truck drivers, the retail clerks, the warehouse workers, all of the people who are keeping the world going while we're sheltering at home. We're also doing our part to fight COVID-19. Scientists and researchers around the world are racing to find a vaccine for COVID-19. We're working with them across the entire spectrum, from containment, mitigation treatment to eventually tracking and monitoring. Oxford Nanopore, for example, was able to sequence the virus genome in just 7 hours using our technology. And working with Plotly, we can do now real-time infection rate analysis. And with Oak Ridge and Scripps, we were able to screen 1 billion drug compounds in a day versus a year. The team at Structura, NIH and UT of Austin use CryoSPARC to reconstruct the 3D structure of the virus spike protein. NIH and NVIDIA built an AI model to classify COVID-19. Kiwibot built a robot to deliver medical supply autonomously. And Whiteboard Coordinator built an AI system to automatically measure and screen the elevated body temperature. Researchers and scientists applying NVIDIA's accelerated computing to save lives is the perfect example of our company's purpose. We build computers to solve problems that normal computers cannot. We address applications from computer graphics, scientific computing, artificial intelligence to robotics. And our computing platforms are incorporated into PCs, supercomputers, cloud computers as well as autonomous machines. We work with scientists and researchers all over the world, and it's a great privilege for our company to partner with them to advance and discover the future. Our amazing creative team made a video to celebrate the great works of these researchers and scientists. Let me show it to you. [Presentation]

Jen-Hsun Huang

executive

#2

It is truly inspiring to see what is now possible because of accelerated computing. Accelerated computing starts with a specialized processor we call the GPU that offloads the computationally intensive tasks from the CPU. It also includes an acceleration stack. The software is a vitally important part of accelerated computing: the acceleration libraries, the algorithms, the system software and the optimizations that we develop together with the application developer. The system architecture is also important. It is important for us to be able to optimize the system, whether it's for high-performance computers, cloud data centers, PCs or autonomous machines. But ultimately, the most important part of accelerated computing is developers. Developers optimize their applications, which increases the performance and the value of the platform, which attracts customers and increases the installed base, which attracts other developers. The positive feedback system grows, and it is now very clear that NVIDIA's accelerated computing platform is at its tipping point. Over the last several years, 2 fundamentally new dynamics has happened to take accelerated computing to the next level. The first is the emergence of this new type of algorithm called data-driven or machine-learning algorithms. Data processing and the movement of data around the data center is more important than ever. The second is the applications that we're processing now are so large, it doesn't fit in any computer. No server, no matter how powerful are able to possibly process the type of application workloads that we're now looking at. In fact, the server is no longer the computing unit. The data center is the new computing unit. With software-defined data centers and application developers able to write applications that run in the entire data center, it is important now for us to think about optimizing across the entire end to end of a data center, from networking and storage to computing, for us to optimize the entire stack top to bottom, to be able to optimize at a data center scale is NVIDIA's new approach. And I believe that in the next decade, data center scale computing will be the norm and data centers will be the fundamental computing unit. The importance of high-speed networking and data processing in the data center is exactly the reason why we bought Mellanox. Mellanox is the world leader in high-performance networking. And high-speed networking and high-speed computing go hand-in-hand. Let me show you what they make. I have one here that I've been cooking for a while. This is a state-of-the-art Mellanox Spectrum 4000 Series Ethernet Switch. Each one of the ports can scale up to 400 gigabits per second. The entire switch has an astounding 25.4 terabits per second of bandwidth. And what makes this particular switch special, beyond the fact of its high performance and Mellanox's world-famous low latency, are 3 characteristics. First, this data-buffering architecture that makes it possible to meter the bandwidth across the entire switch so that every port gets a fair and predictable bandwidth; second, incredible virtualization technology so that you can virtualize VXLAN routing across your hyperscale data center; and third, you could find out exactly what's happening with this new technology called WJH, What Just Happened. On the other side of the switch is this programmable SmartNIC, the NVIDIA Mellanox BlueField-2, the world's most advanced programmable SmartNIC. BlueField-2 accelerates security and packet processing at line speeds. In this particular case, up to 200 gigabits per second. The networking stack, the storage stack, the security stack is now completely offloaded and runs on one of these programmable SmartNICs and what will eventually become essentially a data processing unit. This is going to represent 1 of the 3 major pillars of computing going forward: the CPU for general purpose computing; the GPU for accelerated computing; and the DPU, which moves data around the data center and does data processing. No one knows more about networking storage and security in the data center than Mellanox. And they are so vital to today's high-performance data center scale computing. I'm so happy that we're officially one company. I want to welcome all of our families at Mellanox to NVIDIA. GTC is all about developers, and developers are all about SDKs. This year, we announced and shipped 50 new SDKs. NVIDIA's SDK is formed in the stack that is represented here, and it's basically 3 layers. The first part, of course, is our CUDA architecture. It is architecturally compatible against the entire installed base of NVIDIA. And this year, we announced the 11th generation of CUDA. That tells you something about our commitment to this architecture to forward compatibility and backwards compatibility. A CUDA developer who develops on CUDA will know that the entire installed base of NVIDIA GPUs will run that application and run it wonderfully. The next layer is CUDA-X, our acceleration library, our linear algebra library, signal processing library, graph analytics libraries. And this year, we have several new ones: cuDNN 8th version; TensorRT 7.1, which is our deep learning network compiler and optimizer. And then on top of it, we have our market- or domain-specific libraries, from RTX, which is for ray tracing; HPC for high-performance computing; RAPIDS for data analytics; AI for AI; CLARA for health care and life sciences; Metropolis, our video analytics and our streaming signal AI platform; DRIVE, autonomous vehicles; Isaac for robotics; and AERIAL 5G, one of our latest for 5G virtual RAM processing. The number of developers on NVIDIA accelerated computing has continued to grow, and it's growing what appears to be at an accelerated rate. And this year, we have reached 1.8 million developers. I want to thank all of you for your support. We're committed to continue to advance this platform, and to continue our forward and backwards compatibility of our architecture so that your installed base continues to grow. This year, although we didn't have a live GTC, 46,000 developers signed on to watch GTC; 5.5 million developers downloaded CUDA; and we now have over 700 applications that are accelerated. I have a lot to announce this year. We have 4 new applications that are really important that I'm super excited to share with you. We have a new chip that we're announcing, and we have 4 new systems. So let's get started. Computer graphics is the driving force of NVIDIA. It is one of the world's most computationally intensive applications. It has been for decades, and it will continue to be for decades to come. 40 years ago, one of NVIDIA's researchers wrote a seminal paper on a technique to simulate light we call ray tracing. You trace the light beam through an environment bouncing off surfaces, reflecting, refracting or illuminating that surface ultimately generate what is a photorealistic image. Two years ago, at 2018 SIGGRAPH in Vancouver, BC, we announced one of our most ambitious endeavors, we call it the NVIDIA RTX. NVIDIA RTX fuses 2 groundbreaking technologies: the first is accelerated ray tracing; and the second breakthrough is deep learning. Ray tracing is so computationally intensive, even with the amazing accelerator that we've created, it just simply wasn't fast enough. And then the breakthrough of artificial intelligence happened. And over the last 3 years, we'd been be piling on to this technology to solve the last missing piece of the puzzle. We used ray tracing in our programmable shaders and the fastest possible GPUs we could make to generate a relatively low-resolution image. And in this particular case, 540p, not even anti-alias. It also generates, along with it, a motion vector. Where the pixel is and where it's traveling, that goes into an artificial intelligence network, which tries to synthesize a higher resolution image. We teach this artificial intelligence network what extremely high-resolution and high-quality images look like. In this particular case, we used a supercomputer to render 16K anti-alias resolution images. We then compare what comes out of the neural network with this ground truth. The difference propagates back into the network through a supercomputer. And it corrects the weight of the neurons as to improve its ability to guess the next time. We go through this trillions of times. Eventually, this neural network could take just a few pixels, 540p in this case, and synthesize what otherwise would be a beautiful image. Incredible. Then we take this neural network, we download it into your GeForce computers, particularly the ones with Turing, are now ready to receive this neural network and process it on the Tensor Core processor in the Turing GPU. We call this technology DLSS, Deep Learning Super Sampling. What you're looking at here is the image generated by the supercomputer, its 16K resolution is completely anti-alias. This is a scene from an Unreal Engine demo that Epic did called Infiltrator. It's really a beautiful demo. They did this several years ago. And here, what I'm showing you is 16K ground truth. The next scene is rendered and it's 720p. Notice how blurry it is. Let me just go back one more time so you could see it. This is ground truth 16K, look at the small lights. If you like, look at the leaves on the trees, the clouds from afar, it is so crisp. The detail's incredible. This is rendered at 720p. And here, what I'm showing you is our first try, and we call it DLSS 1.0, notice it improves the resolution that appears but only by a little bit. And most people felt that the artificial intelligence technology was not going to work. But we believed in it, and we didn't give up. This is, ladies and gentlemen, DLSS 2.0, scaling from 720p, generating the pixels necessary to create a 1080p anti-alias image. Look at that. First generation, a little blurry. Second generation, look at all the lights, it's much, much more than sharpening. Look at all the lights that all of a sudden appear that didn't appear before. How do you create content where content did not exist? Well, partly because the neural network has learned what the image should look like. And secondarily, because we have motion vectors and the pixels by observing across a few scenes, the neural network can predict what each scene should look like. Now if you were to render this with native 1080p, using the GPU to render each and every pixel anti-alias, this is what it looks like. This is native 1080p. And look, artificial intelligence actually does a better job going back to DLSS 2.0. Look at that, AI does a better job than 1080p native. That is a complete breakthrough. Suppose we started from 540p. And because there are so few pixels, most of the pixels are blurry when we scale it up to this image. Now imagine, if we were to take this 540p image and put it into an artificial intelligence network, DLSS 2.0, and this neural network had learned from beautiful images that were generated by the supercomputer and is now asked to recreate that image, look at that. This is the input, 540p, and this is the output, DLSS 2.0 540p to 1080p. What an amazing breakthrough. Let's take a look at the combination of RTX and DLSS on the most popular game in the world, Minecraft. Because each one of the worlds are created by the gamer, it is not possible to prebake a lot of the shadows and lighting effects that you see in very big blockbuster games. This is created by the users themselves. And so the lighting effects can't be cheated, and it has to be generated by the program, which is the reason why we chose to work with the team at Minecraft to bring RTX to it. Now with this particular scene, you can see that when we render Minecraft without DLSS, with just ray tracing, the frame rate was only 35 frames per second. With DLSS, we can render this beautiful image and then use DLSS to scale that load resolution image and still maintain the speed. So now you get beautiful image with ray tracing, high resolution, and high speed, all at the same time. And that is the requirement for modern computer graphics. Let me show you now a video that we just made of Minecraft. The reception has been incredible all over the world. You're going to love this video. [Presentation]

Jen-Hsun Huang

executive

#3

Ladies and gentlemen, RTX on, ray tracing, DLSS. We've made possible real-time ray tracing 10 years earlier than anybody thought was possible. When we launched it, people were skeptical. But now it is very, very clear that ray tracing is here. And it's the next big thing. Creating 3D content is hard. It takes so many different types of disciplines from artists to designers to software programs, uses all kinds of different tools, from Maya, 3D Studio Max to Photoshop, and they're creating these worlds that take enormous databases. That's one of the reasons why it's so expensive and so hard to create world-class 3D content. Well, we have a solution for that. We call it the NVIDIA Omniverse. And it leverages all of NVIDIA's technology over the last 10 years. And the foundation is our RTX server, our latest generation GPUs that is built on top of a virtual application server. Each one of the GPUs could be shared by many different designers using virtual Quadro or many GPUs can gang up to accelerate one application. The networking is accelerated and offloaded by Mellanox NICs, the SmartNICs that we were talking about earlier. And then one of the virtual machines is the Omniverse nucleus. This nucleus has created a shared space, a shared world. And this shared world has portals. The output of that portal is visualized and streamed to any device you like. Multiple designers could work on one design at the same time, and reviewers could ask for changes in real-time, the ultimate design collaboration platform. Let me show you a demo that's created by NVIDIA engineers. What I'm about to show you is really amazing. This was done over the course of the last couple of months. Artists, designers from different locations and it's never been seen before. It is completely ray-traced. None of the lights are baked. None of these shadows are baked. Everything is completely lit and shadowed in real-time. And one of the most important things is everything obeys the laws of physics. Let's roll it. [Presentation]

Jen-Hsun Huang

executive

#4

Isn't that amazing? Real-time ray tracing, physically based materials, obeys the laws of physics. It was created by just a few designers and engineers on top of Omniverse, working remotely from different states. Incredible achievement, just so beautiful. I love it. And so ladies and gentlemen, this is the NVIDIA Omniverse. It starts with a server with a whole bunch of RTX 8000s. These are the most powerful ray tracing GPUs in the world, Tensor Core processing to do AI, so that we can both have beautiful images and high resolution and high performance at the same time. The servers are available from Box, Dell, HP and Supermicro. And it's been preconfigured with all the hypervisors necessary, the networking stack and the virtual Quadros so that you can remotely run applications, so that you could create portals into the shared space. It's really, really an amazing thing. In today's world, where we had to work remotely and share and collaborate with large numbers of people, this couldn't have come at a better time. The NVIDIA Omniverse. [Presentation]

Jen-Hsun Huang

executive

#5

Let's talk about high-performance computing. It is clear now that acceleration is going to be the path forward for scientific and for high-performance computing. And as I mentioned before, accelerated computing has 4 pillars. The first, of course, is the accelerator, the advanced GPUs. The second is the stack, the acceleration stack for each one of the computational domains. The third is systems. And last is developers. Ultimately, the applications that we accelerate. This year, we did really great work. We've accelerated now over 700 applications. And each and every single year, at every conference, I show you our golden suite. The suite that we track on a regular basis to make sure that we continue to engineer advances into libraries into the stack, so that applications continue to improve in performance even if we don't introduce new GPUs. And as you see, over the course of the last 4 years, we increased application performance by 4x, and the green bar is something that I'm going to talk to you about. We're going to offer a new platform and it's going to give high-performance computing a huge boost. We also brought CUDA to ARM computing systems. ARM-server CPUs are seeing adoption all over the world. In hyperscale, there's Amazon; Fujitsu, supercomputing; Cavium, part of Marvell now; a new company, exciting company called Ampere Computing; or in China, Huawei, all of the suite of NVIDIA tools and libraries are now available for ARM. We also introduced a brand-new SDK for IO processing this year we call Magnum IO. Magnum IO includes all kinds of great things from RDMA, of course, to the ability to communicate across multiple nodes and move data directly from storage to our GPUs. This suite of libraries is going to continue to advance. Magnum IO is going to be one of our most important libraries. Data processing and networking and storage is going to become more and more important to data center scale computing over time. We introduced 2 new stacks this year: NVIDIA Parabricks for genomics processing, the ability to do variant calling at very high performances; and a large body of work that we've been working on for several years called NVIDIA RAPIDS for data analytics. Machine learning has become one of HPC's grand challenges. The advances of machine learning and the popularity of this approach has caused companies, institutes and data centers to collect a vast amount of data. The machine learning pipeline consists of 3 things: ETL, which creates the data frame, does all the feature engineering necessary for the machine learning algorithms to train on, which creates the model, which is then put into operations we call inference. These 3 stages of the pipeline have unique and different computational challenges. The first stage of the machine learning pipeline, data processing, is becoming more complex than ever. In fact, most data scientists will tell you, they spend the vast majority of their time doing feature engineering and data processing in the front stage of the machine learning pipeline. What used to be processing hundreds of megabytes to gigabytes to terabytes of data, companies are now routinely processing tens, if not hundreds of terabytes of data and moving to petabytes of data. It is the reason why Spark is so popular. Spark is an incredible computational platform. It turns an entire data center into a compute engine. It partitions a very large data set to be processed across a bunch of servers in the data center. It was the brainchild of Matei Zaharia at the Berkeley AMPLab and spun out and became Apache Spark. It now has over 1,000 companies contributing to it, nearly 1 million lines of code, 16,000-plus companies around the world uses it for data processing today. Well, the amount of data that they're processing is growing exponentially. It is now reaching the limits of what Spark can do. Here's the reason why. The CPUs that is being distributed across has a fundamental working set in the order of megabytes. A CPU naturally likes to work in its cache, and its cache is typically on the tens of megabytes. When the data set is now in the hundreds of terabytes and into petabytes, the overhead of coordinating the CPU servers is becoming the greatest bottleneck, and we're starting to see the limits now. What if instead of working on processors that has tens of megabytes of working set, let's move towards a processor that has tens of gigabytes of working set. And if we could use multiple GPUs to create large memories, then it is now possible for us to imagine scaling beyond that. We started working on GPU acceleration of the data processing stack several years ago, and it's a giant body of work. Ladies and gentlemen, today, we're announcing that Spark 3.0, the next generation of Spark, will be NVIDIA-accelerated. This is a collaboration between ourselves and a large community of researchers and developers in open source all around the world. And the results are really fantastic. It's possible because of several groundbreaking achievements. The first is the work that we did with Mellanox NVIDIA called GPUDirect Storage, and the acceleration of GPUDirect Storage and UCX, this framework that makes possible the management of IO and storage and multi-node computing lightning fast. Second is the scheduler of Spark. Scheduler of Spark now is aware of GPU and the GPU memory so that they can partition work to the GPUs and schedule it in a distributed way and manage the computation of this giant network of computers. Third, a library we call RAPIDS that has the ability to ingest data, create data frames, do feature engineering, do SQL queries and intercept the calls of Spark to be accelerated by our GPU. And then lastly, Spark has a Spark SQL accelerator, they call Catalyst, and that has been optimized for NVIDIA GPUs. These elements made possible Spark 3.0. Let me show you the potential acceleration that data scientists will be able to enjoy. What you're looking at here is the benchmark of RAPIDS, the foundation of Spark 3.0. This particular benchmark is TPCx-BB, big data benchmark. This particular data set is a scale factor of 10,000, which basically is 10 terabytes. The state-of-the-art is a Dell server, costs about $1 million and has the ability to deliver 17 gigabytes per second of data movement through this benchmark. This particular benchmark is hard to beat. And the reason for that is because not only does it have to be fast, it also has to be cost effective. And the reason for that is because price performance matters. The fastest in the world today is the Dell server at $1 million and 17 gigabytes per second. With Spark 3.0 sitting on top of RAPIDS, RAPIDS benchmarked on TPCx-BB delivers 163 gigabytes per second for $2 million. 10x the performance at only twice the cost. If you were to look at this in another way, suppose you were to create a data center that is able to achieve the same performance as $2 million of DGX's accelerating RAPIDS of this benchmark, TPCx-BB, it would cost you $10 million and 150 kilowatts. Now of course, data centers routinely process a lot more than terabytes. You're going to need data centers way larger than this in the future as data continues to grow exponentially. And so the ability to accelerate Spark 3.0 with a library we call RAPIDS is utterly groundbreaking. The result is really spectacular, 1/5 the cost, 1/3 the power. 1/5 the cost and 1/3 power. The more you buy, the more you save. In fact, Databricks, which offers industrial-strength Spark at a large scale as a service is doing fantastic. Every single day, 1 million virtual machines are spun up to do data processing on Spark. And they're still delighted by the work and acceleration that they're going go accelerate Databricks with NVIDIA GPUs. They're a fantastic partner. I'm so happy with all the work that we've done together. Leading cloud service providers are offering Spark accelerated in their cloud, or they're accelerating their proprietary machine learning pipeline and data processing pipeline with NVIDIA RAPIDS: Amazon SageMaker; Azure Machine Learning; Databricks, Google Cloud AI, Google Cloud Dataproc are now going to be accelerated with NVIDIA GPUs for data processing and data analytics. Spark acceleration is a great achievement. I'm so proud of the team. It's such a large body of work and has taken us years. And it requires the collaboration with hundreds of collaborators in open source built on several layers of foundational and fundamental new technology. And now the part that is growing exponentially difficult, the first stage of machine learning is now accelerated. Data scientists all over the world are going to be thrilled. Entire end to end, from data processing to inference. We have 3 libraries, RAPIDS for data processing; cuDNN, our core library for deep learning AI; and then third, TensorRT, our optimizing compiler for these computational graphs that are created by the training frameworks. The end-to-end acceleration is now complete. And we will continue to advance it over time, but this represents the foundation of NVIDIA AI. I can't be more proud. The team have done a great job. Thank you. Recommenders is one of the most important and complex machine learning pipelines. Creating recommenders is incredibly complex. However, the benefits are enormous to Internet services. It enhances their user engagement, the quality of service, and for many of them, it dramatically shapes their economics. Recommenders consists of a couple of different algorithms: collaborative filtering and content filtering. Collaborative filtering tries to predict user preferences from other user interactions that are similar. And content filtering tries to predict what items could be preferred based on similar items. The first stage of the recommender systems consists of taking high dimensional information of users and items and encodes it into a low-dimensional vector, in the process, computing the similarities of users to other users and items to other items. The reduction of high-dimensional information of users into low-dimensional vectors and the high-dimensional descriptions of items into low-dimensional vectors is called embedding. That is a computationally intensive process and requires deep learning to do so. The embeddings are then used by the recommender systems to learn to predict user preferences. So when the new user comes into a service and queries about a particular item, which could be a movie or a song or a book or a product or grocery, the first stage generates a large number from billions of options to thousands of candidates. From those thousands of candidates, it is then ranked by neural network that has learned this particular user's preference to provide a ranked list of preferred next items. And that's the reason why when you're watching a movie, the next movie that comes up is similar to the previous. And it could recommend, if you like this movie, you might like this movie. If you like this song, this would represent a great playlist. If you enjoy this book, you might consider these books. If you just bought this item, you might consider also buying this item. That ability to predict user preference based on your interaction and based on your queries is one of the major reasons why Internet is so personalized and is one of the reasons why the Internet is so useful to us. The recommender system is the engine of the personalized Internet. It is the reason why we're going to be able to enjoy an Internet that seems to practically designed for us. It's almost as if the recommender is going to be an AI that reads your mind. It could predict what you prefer based on the context, your circumstance. Maybe some vacation you're on, a birthday of somebody you love. And based on that, it would recommend a product or service to you. The recommender system is vitally important to the future of the Internet. There are trillions of items in the world, hundreds of billions of websites, hundreds of millions of books and songs. It is impossible to search. It is impossible to know what's out there. And so the Internet has to understand what you prefer and make a recommendation to you. For companies, it'll be a recommender system for sales automation or marketing automation. For health care, it might recommend therapies over time. The recommender system is going to be vital to every industry, whether it's shopping or supply chain management or customer service. Call centers will have recommender systems in the background, recommending solutions or recommending a path or a journey through the dialogue system. Recommender systems is foundational to all industries. However, building recommender systems is enormously complex. This is the endeavor of large Internet companies. What we would love to do is to simplify and to codify the complexity of modern recommender systems, and put it into an application framework so that we can democratize this capability for all industries. Ladies and gentlemen, we've done just that, and we call it NVIDIA Merlin. NVIDIA Merlin is a deep learning application framework. We've taken what is otherwise a very large system, a very complicated data processing and machine learning algorithm and coding, codified it into an application framework that's easy to use. The first stage of data processing. With NVTabular, with just a few lines of code you could create a data frame and do data processing for extract, transform and load from terabytes to hundreds of terabytes, and the scaling and the partitioning is done automatically. You don't have to think about the system underneath. You don't have to think about the system software. All of it is done in NVTabular, and it's easy to use. You then pass off the data frame that you processed into HugeCTR that allows you to learn embedding and the ranking systems automatically. All of that is codified into these libraries. It sits on top of RAPIDS. It sits on top of cuDNN. It has been optimized with the highest performance engineering that's been done, and we call that NVIDIA Merlin. Well, the results are really quite amazing. Here, I'm showing you a 1 terabyte ads database from [indiscernible]. What would otherwise take a couple of days of processing on a CPU, all of a sudden, with this framework, not only is it easier to use just a few lines of code, it now runs like speed of light. All of a sudden, from a couple of days of processing into minutes. Now this is for 1 terabyte of data. If you were to scale this up 1000x, you could just imagine the complexity. 1000x basically is a 1 petabyte database, and that 1 petabyte database is just around the corner. And so companies are now able to think about creating these really complex recommender systems, built on top of high-performance computing servers and have this application framework that is really easy to use. We call that NVIDIA Merlin. Inference is the last stage of the machine learning pipeline. This is where you take the model that you train and deploy it into services to make predictions. What comes out of the machine learning pipeline and frameworks are computational graphs that are incredibly complex. These are gigantic computational graphs. And there are so many different types of neural network architectures that the computer science of compiling these computational graphs into a target machine to run as fast as possible with all the different types of numeric precisions, incredibly complex problem. We created an optimizing compiler, TensorRT. We're now in our seventh generation. This generation can handle CNNs and transformers, and now we can handle RNNs as well. TensorRT 7.0 includes over 1,000 optimized kernels. And in fact, our developers could do the same for themselves and create all kinds of custom kernels and custom neural networks. We support precisions from FP32, FP16, all the way to 8-bit and 4-bit integer. TensorRT has been a huge success. The number of developers that use TensorRT increased 10x. The world's top 300 Internet services now have NVIDIA GPUs in their data centers to do inference. This just kind of shows you how many industries deep learning and AI is going to impact. One of the most important applications that we can now enable is conversational AI. It is one of the most challenging inference tasks because conversational AI requires interactive performance. The elements that make up a conversational AI pipeline has achieved tremendous breakthroughs recently. Starting from ASR, automatic speech recognition, to natural language understanding to text-to-speech, speech synthesis and now it's possible for us to imagine, for the very first time, having an interactive, low-latency, conversational AI pipeline. However, the application is very complex still. The models are state-of-the-art. Training those models take a tremendous amount of computation. And the ability to put together all of the models end-to-end on top of a computing platform, fully accelerated is something the world has never done before, until now. Ladies and gentlemen, we're announcing today a new application framework we call Jarvis. And because it has the ability to connect into Omniverse, we have the ability to create an interactive 3D chatbot. One of the AI models that makes this entire experience possible is what we call audio to face. It takes an audio input and has learned how to animate a geometric mesh to animate facial expressions while talking. That then drives our computer graphic system in Omniverse and creates an interactive chatbot. Let me now show you our AI model audio to face. This feature is coming from a rapper, one of our employees, John Della Bona, and it is really fantastic. Take a look. [Presentation]

Jen-Hsun Huang

executive

#6

Once we have audio to face, then we connect it to our conversational AI model, where you can ask it a question, it would understand the question, answer the question, and then synthesize speech from that, which goes into our audio to face model, which animates and drives a 3D model. We've created a demonstration, we call Misty, our first interactive 3D chatbot. Hello?

Misty

attendee

#7

Hi, I'm Misty.

Jen-Hsun Huang

executive

#8

How are you?

Misty

attendee

#9

I'm doing well. Thank you for asking.

Jen-Hsun Huang

executive

#10

What are you here for?

Misty

attendee

#11

I'm here to give you the latest weather information around the world, tell you interesting facts about meteorology and keep you company.

Jen-Hsun Huang

executive

#12

Should I bring an umbrella tomorrow?

Misty

attendee

#13

It is not expected to rain in Santa Clara, California tomorrow afternoon.

Jen-Hsun Huang

executive

#14

What is the hottest city in the world?

Misty

attendee

#15

The hottest city in the world is Mecca in Saudi Arabia, where temperatures can reach 122 degrees Fahrenheit.

Jen-Hsun Huang

executive

#16

What is the coldest city?

Misty

attendee

#17

The coldest city in the world is Yakutsk in Russia with temperatures around minus 40 degrees Fahrenheit in the winter months.

Jen-Hsun Huang

executive

#18

Tell me something about lightning.

Misty

attendee

#19

We see lightning first because light travels faster than sound.

Jen-Hsun Huang

executive

#20

How come you know so much about weather?

Misty

attendee

#21

I know the weather information from the Internet. I have lived for a long time and have been talking about the weather for as long as I know.

Jen-Hsun Huang

executive

#22

Where are you from, Misty?

Misty

attendee

#23

I'm from a place called Omniverse.

Jen-Hsun Huang

executive

#24

Who made you?

Misty

attendee

#25

I am created by the wonderful scientists of NVIDIA.

Jen-Hsun Huang

executive

#26

What powers you?

Misty

attendee

#27

I'm powered by NVIDIA GPU and the CUDA neurons inside of them.

Jen-Hsun Huang

executive

#28

Thank you, Misty.

Misty

attendee

#29

Thank you. It was a pleasure talking to you.

Jen-Hsun Huang

executive

#30

As you could see, the ability to have an interactive conversation requires the composition of AI to process the speech, the natural language understanding synthesize and also render the graphics as fast as possible. And you have to process the entire pipeline end-to-end, from speech recognition, language understanding, text to speech as well as driving and generating the computer graphics in just a few hundred milliseconds in order to feel like you're having an interactive conversation. And that's what NVIDIA Jarvis is about. A multimodal conversational AI service framework that simplifies the creation and the development of conversational AI services. It includes state-of-the-art models that has been prepipelined into these Helm Charts, optimized to run on NVIDIA's Triton Inference Server, which runs on top of our GPUs. And the performance is interactive and the entire pipeline end-to-end is only a few hundred milliseconds. But there's more. Jarvis comes with pretrained state-of-the-art models. These state-of-the-art models have been trained with a great deal of data and a lot of computation. In fact, what's available in NGC, the NVIDIA GPU Cloud, represents several hundred thousand DGX training hours. If you had one DGX, it would take you something like 10 to 20 years to be able to process all of the data necessary to train these models. We've done that for you. And then it comes with a new tool called NeMo, which takes the pretrained model and augments it with your data. Your data could -- because of a particular domain, it could be in health care or insurance or financial services or maybe it's a call center for a particular business. And there's special language or special vocabulary that Jarvis needs to learn. And so you would take that data, and you would train -- retrain the Jarvis models with this tool we call NeMo. This end-to-end pipeline from pretrained models, retraining with NeMo and then the application of all of these state-of-the-art models all put together in a Helm Chart running on a service on top of Triton, this entire end-to-end system we call Jarvis. Conversational AI is going to automate conversations. And it's going to make it possible for us to deploy automated services in all kinds of new applications, whether it's video conferencing, call centers, smart speakers. One of my favorite applications is video conferencing. When you have a group conference, one person can talk at a time. It would be nice sometimes if we can kind of hear multiple people talking. So wouldn't it be amazing if whenever anybody talks, it's picked up and close captioned or translated in real-time or at the end of a conference, a simple summary of transcription is done. Video conferencing with a conversational AI agent is going to be really transformed. We know that the number of people who are calling into call centers these days is growing, and many of the call centers have people wait half an hour to much longer. Jarvis conversational AI is going to be one of the most important real-time inference applications. So that's NVIDIA AI. It is now end-to-end, fully accelerated. From data processing, with the amount of data growing exponentially, Spark is more important than ever, and now it's accelerated by NVIDIA. Second, the deep learning and the model training pipeline, we have accelerated for a long time. One of the particular applications, recommender systems, is so complicated, we decided to create an application framework just for it so that we can democratize recommender systems. We call it NVIDIA Merlin. And then lastly, inference. TensorRT 7.0 is here, the number of people who are downloading it and using it is really growing. And we're super excited about that. We decided to create an application framework for every company to be able to create state-of-the-art conversational AI models. We call that NVIDIA Jarvis. This end-to-end acceleration platform and some of the reference applications is what we call NVIDIA AI. Modern data centers are complex. The number of workloads that has to run in a modern data center is growing incredibly. From scale-up applications, training and data analytics and data processing; to scale-out applications, inference and conversational AI; to public cloud applications, high-performance computing, cloud gaming or even remote workstations. The architecture of modern data centers is complex. It first started with the disaggregation of CPU servers and storage servers, and Mellanox's high-speed networking makes that possible. The second thing to happen was the acceleration of workloads, and by accelerating it, the throughput increases by orders of magnitude, reducing the cost. However, the type of accelerators we've offered to data centers over the last several years have been optimized for different tasks. One particular accelerator is designed for scale-up. We call that V100 SXMs with NVLink. Another type of data center is really designed for scale-out, T4 accelerators are used for inference. And some of it used for flexible clouds, V100 and PCI Express allows public cloud users to be able to use it for training or inference or high-performance computing and whatever application they would choose. It's designed really to be flexible. And yet, the data centers really want to be high performance as well as flexible at the same time. With all these different types of configurations of servers, the ability to predict exactly the amount of capacity you need for each configuration is difficult. Meanwhile, utilization ultimately drives TCO. And there's an insatiable demand for increasing the capacity of workloads as well as driving the cost down in the cloud data centers. Clouds now represent nearly $100 billion industry, growing at about 40% per year into an IT infrastructure industry that represents about $1 trillion. This is the largest growth opportunity of the computer industry, cloud computing. And so it stands to reason that there's so much focus to advance public clouds and cloud data centers. And wouldn't it be amazing if we could create an accelerator that increases the throughput of scale-up applications as well as scale-out applications, and yet completely fungible, completely flexible in 1-server architecture so that independent of what workload comes, that 1-server architecture is able to serve it. The ability to create a flexible, high-throughput acceleration architecture is something we have been pursuing for some time. Well, ladies and gentlemen, I have something new that I'd like to share with you. It's called the NVIDIA A100, our brand-new data center GPU. The NVIDIA A100 is based on an architecture we call Ampere, and there are several amazing breakthroughs that make this possible. The first is we're using TSMC's 7-nanometer process that's been optimized for NVIDIA. Using a packaging technology called CoWoS, Chip-on-Wafer-on-Substrate, 3D packaging technology, which puts the memory and the chip on the same substrate, which allows it to interoperate incredibly fast and we're connected to HBM2 memory that now provides for 1.5 terabytes of frame buffer bandwidth. This is the first processor in history that comfortably delivers over a terabyte per second of bandwidth. The second breakthrough of the Ampere GPU is the new Tensor Core architecture, and it has a new numerical format, Tensor Float 32. And as the range of FP32, the precision of FP16, you input in FP32, it processes it with Tensor Float 32 and accumulates it in FP32. As a result, no code change is necessary when you train. Now for some people who are ninjas, it is possible to optimize for FP16, but our experience is the vast majority of the world, simply trains in FP32 today. And so with this new format, TF32 and no code change, all of a sudden, we can accelerate training tremendously. Let me show you here. On the left is V100 Volta's FP32 matrix operations. And on the right is A100 Tensor Core accelerated TF32. The speed up is extraordinary. Ampere has a new Tensor Core acceleration for sparsity. It takes advantage of the fact that most neural networks are very heavily sparse. So it starts with a dense network, the original network. And it zeroes out the weights that are small, or close to 0, and then it retrains that network. As a result, this network can be compressed 2:1, and using the same data structure, using the same pipeline, we can now effectively accelerate processing by a factor of 2. Let me show you the performance. Here, I'm going to show you the performance of Volta, which is today's state-of-the-art GPU. This is the highest performance GPU in the world today. This is the industry standard of deep learning. The black is Volta. The gray on top of the black is the peak, and the solid is measured. In the case of Volta, V100 FP64 is 8 teraflops; FP32 is 16 teraflops peak; FP16, 125; and INT8 is 60. And the goal will be the new A100 Ampere GPU. Look at that. A100 FP64 is 20 teraflops; A100 TF32 is 160 teraflops, 10x; in the case of FP16, 310; in the case of INT8, 625 peak. Now this is without sparsity. With sparsity, you get another boost. Look at the factor of 2. A100's sparsity of 32 is now 310 peak. A100 sparse FP16, 625 peak. And for INT8, the world's first processor to achieve over 1 petaOPS. This is now 1.25 POPS, 1,250 tera operations per second. The inference performance of A100 is incredible. Now when we compare it as an X factor to Volta, and remind you, this is the most advanced processor in the world today. The Volta numbers are all normalized to basically 1x, the training peak of A100 sparse TF32, nearly 20x the performance of Volta. In the case of inference, 20x the peak of Volta. This is the greatest generation leap we have ever experienced. Let me show it to you. Ladies and gentlemen, the world's largest graphics card. This is the A100 processor board. It is 50 pounds, 8 GPUs connected by NVLink, 600 gigabytes per second, 6 NVSwitches, 1 million drill holes, 1 kilometer of traces connecting all of this, over 30,000 components, 50 pounds. And DGX, it moves 700 cubic feet per minute, the most transistors on one computer the world has ever made. Ladies and gentlemen, the NVIDIA A100 system board. It is just a technology marvel. But there's more. Ampere has a new architecture we call MIG. It stands for Multi-Instance GPU. It's the ability to turn 1 GPU into many. You could have 1 GPU, you could have up to 7 independent GPUs or some combination in between. In the past, we would have a rocket ship with a very large payload, and while the payload is being filled up, the rocket ship is waiting. But as soon as the payload is filled up, the rocket ship flies the space like you couldn't believe. Well, with A100, you have the ability to run it as 1 gigantic rocket shift with a very large payload, or you can have 7 independent rockets, each with a smaller payload, but they could take off as soon as they are ready. And so for inference or public cloud, instead of having 1 person use a GPU, fractionalize it, create 7 different instances so that each 1 of the customers could rent a smaller computer, you now have the flexibility to do that. MIG is going to have a profound impact on how we architect data centers. When I talked earlier about how we would like to have a universal, a unified server architecture that allows us to scale-up as well as scale-out, and to be able to configure it as the workload needs, this is exactly what we mean. Ampere is not only incredibly fast for training, not only is it incredibly fast for inference, it also has the ability to fractionalize and partition itself up into a large GPU for scale-up applications or a whole bunch of small GPUs to maximize scale-out. Whether it's for inference or public clouds, you now have the ability to have 1 data center architecture for acceleration that is flexible, high throughput and enables higher utilization. The performance is incredible right out of the box. This is BERT, one of the largest models and most important models today. BERT training and BERT inference. Compared to Volta, a platform that has been refined and optimized for 3 years now, right out of the box, A100 is 6x the performance in training, 6x. Transistor budget only increased about 70%. Now Ampere is the largest, most complex processor the world has ever made. Thousands of engineers worked on it for several years, and it came together in this 1 incredible chip. 70% more transistors with great architecture delivered 6x more performance out of the box. In the case of inference, it is 7x the performance of Volta and much more over 12x the performance of T4s. Now let me show you Ampere in a demo. This demo is a natural language understanding model that includes speech recognition, not speech recognition of a human, speech recognition of a bird. The question that has to answer is this. What is the native region of the bird that I'm hearing? First, it has to understand the question. It has to understand that the question has something to do with hearing the sound, understand the sound, classify the sound and then figure out from what region is that bird located and respond as quickly as possible. In the case of V100, this is the result. [Presentation]

Jen-Hsun Huang

executive

#31

First of all, it's a miracle in itself that we have an application that can do it at all. The breakthrough of artificial intelligence is pretty astounding. This next demo is A100 with 1 MIG. Just 1 of the 7 MIGs could achieve the same performance as an entire Volta, the world's state-of-the-art GPU. Now what happens if we put all the MIGs to work, all 7 MIGs. Wow, A100 with all 7 MIGs could do over 500 queries per second. With Volta, we could do about 80. So Ampere has 7x the inference performance of Volta. This is pretty extraordinary. Ladies and gentlemen, NVIDIA's brand-new data center GPU, the A100. DGX is our third-generation system. It's the world's first fully integrated AI system. It was designed to be the ultimate instrument of AI researchers. It's fully optimized, and you simply take it out of the box, plug it in and you have a state-of-the-art development system for AI. Now the previous generation of DGXs were really optimized for training. The DGX A100, our third-generation, this is the first one that's unified, in the sense that you can use it for data analytics, you could use it for training and you could also use it for inference. You could also split up this DGX and share it among 56 different users at one time. Each one of them could have the benefit of equivalent performance of a Volta. It is elastic for scale-up or scale-out computing. Inside this machine is 9 Mellanox CX-6 Virtual Protocol Interconnect. Each 1 of the NICs is 200 gigabits per second of network capability. Dual 64-core AMD Rome CPU with 1 terabyte of memory. 8 NVIDIA A100 GPUs, 6 NVIDIA NVSwitches. The reason for that is this. We want every 1 of the GPUs to be able to communicate with each other simultaneously without blocking. And with the new NVLinks, which is 600 gigabytes per second, the cross-sectional bandwidth is about 4.8 terabytes per second. It's like a high-end switch integrated into DGX, so that all the GPUs could communicate with each other simultaneously. And it comes with 15 terabytes of PCIe Gen4 NVMe solid state drive. This is the first computer ever made that in 1 node exceeds 5 petaFLOPS of computing capability. DGX is a marvelous machine. And our creative team using Omniverse and the amazing rendering capability of RTX created a short movie for you. Everything that you are about to see is based off of the original CAD design, and it's rendered photo realistically using NVIDIA's RTX. Let me show it to you. It's really quite amazing. [Presentation]

Jen-Hsun Huang

executive

#32

The performance of this machine is incredible. INT8, 10 petaOPS peak; FP16, 5 petaFLOPS peak; TF32 for training, 2.5 petaFLOPS peak; FP64 for scientific computing, 156 teraflops peak. This is an amazing level of performance. If you compare it to the highest-end servers, $10,000 server, DGX A100 is 150x its peak performance, 40x the memory bandwidth, 40x the IO bandwidth. 150 high-end servers, that's $1.5 million. Ladies and gentlemen, the NVIDIA DGX A100 is in full production, and it's available today at an amazing price of $199,000. Incredible performance, incredible value. You could also buy it in the form for hyperscalers. NVIDIA is an open computing platform company. We develop systems so that we can fully integrate new categories of products and engineer the highest performance components. We also offer the components disaggregated for all of our partners around the world. And so if you would like to build your own hyperscale data center using NVIDIA's HGX A100, the carrier boards, the motherboards are available separately. Let me show you what it's like when you put A100 in your data center. Here, we're showing you a modern typical AI data center. There's a lot of DGXs inside with Voltas running in parallel for training; and for data analytics, running Spark, for example; or inference, they typically run on CPUs. And so this particular data center has 50 DGX-1s and 600 CPU systems for AI inference and data processing. $11 million, 630 kilowatts is approximately the going price for a state-of-the-art AI data center. With the A100, this is what it looks like. Boom. Unbelievable. This is the benefit of the new architecture, the combination of the high throughput, the MIG instances and the ability to do data processing, deep learning and inference, all on one computing platform. And the acceleration software we developed from Spark to training, all the way to inference. We can now combine all of those different server architectures into one. And by doing so, we reduced an $11 million data center into a $1 million rack, 28 kilowatts instead of 630 kilowatts, 1/10 of the cost and 1/20 of the power. The more you buy, the more you save. Let's take a look at that one more time. This is before, $11 million, 50 DGX-1s, 600 CPU servers, modern AI data center. Ladies and gentlemen, now. Before and now. Incredible achievement. Now let's take a look at another algorithm. The famous PageRank algorithm. There are hundreds of billions of web pages and trillions of links. The PageRank algorithm crawls the Internet and creates a gigantic graph of hyperlinks and websites and analyzing it to determine which one of the websites is the most relevant. We've taken a publicly available database called the Common Crawl dataset. This is only 2.6 terabytes of data and has 128 billion edges. The Internet has several hundred billion websites and trillions of edges, so this is a very small fraction of that. And to do that, it takes 3,000 servers and 105 racks, and these 3,000 servers can analyze the relevance of web pages using the PageRank algorithm and deliver 52 billion edges per second. This is state-of-the-art, 52 billion edges per second. Now what you're about to see is 4 DGX A100s connected through NVLink into essentially 1 giant DGX. This giant DGX has 32 GPUs. Effectively, the performance is incredible. Ladies and gentlemen, this is what it looks like before, and this is what it looks like now. A reduction of 75x in cost, 688 billion edges per second. It's simultaneously 13x the performance and 1/75 the cost. You got to see that one more time. Before. After. Ladies and gentlemen, the more you buy, the more you save. The DGX SuperPOD connects 140 DGX A100 systems, creating a 1,120 A100 computer. It has a 170 Mellanox Quantum InfiniBand switch to have the lowest possible latency. Each one of the ports is 200 gigabits per second, and together, the Mellanox InfiniBand network fabric sustains 280 terabytes per second. 15 kilometers of optical cables, nearly 10 miles, which is one of the reasons why we preconfigure the networking for our customers. It's 700 petaFLOPS of AI performance. The fastest supercomputer in the world is about 300 petaFLOPS. This 1 POD delivers for AI performance nearly twice that, and this can be built in just 3 weeks. One of the first deliveries of the DGX is to NVIDIA, and we extended our SaturnV AI supercomputer with DGX A100s. The SaturnV supercomputer is used by all of our researchers to advance, for example, DLSS 2.0 to do pretrained models for Jarvis or Metropolis or Clara, to do collaboration research with researchers and scientists around the world in health care, for example. And we use it to train models for our self-driving cars and robots. Our supercomputer is super important to us, and nearly all of the software platforms we do today has some neural network component to it. Investing in our own supercomputers is vital to our business. SaturnV today has 1.8 exaFLOPS of performance. The fastest supercomputer in the world is about 300 petaFLOPS. And so we have several supercomputers inside the company that we've been using to develop all of these advances in AI. And now with just this extension with DGX A100, NVIDIA SaturnV will have 4.6 exaFLOPS of total AI capacity. That is just a gigantic jump with this extension. Our researchers are super excited about it. The brand-new NVIDIA A100 data center GPU and the DGX A100 integrated AI supercomputer, and of course, all of the software stacks that go along with it. It's available today at $199,000. One of the most exciting opportunities in computing is what's called Edge AI. This is where IoT and AI comes together to revolutionize devices. Trillions of devices will be all of the world, embedded with sensors, connected to the Internet, data centers running algorithms that infuse them with intelligence. This is going to be the Smart Everything Revolution. Just as the phone became the smartphone and revolutionized the industry of communications and created large industries around it, the same thing is going to happen to devices and things. Devices with sensors are going to be connected to the Internet, data centers are going to have amazing algorithms that infuse these devices with apparent intelligence. These devices are going to be everywhere. Whereas there were only billions of smartphones, there will be trillions of things, and they will be in all these different industries: agriculture, manufacturing, logistics, retail stores, warehouses, airports, train stations, streets. We will have sensors all over the world, and there will be intelligence applied to them. The fundamental difference between a smartphone and these Internet of things is the continuation of the sensor information that will be coming. Whereas most users interact with their phones every now and then, in electronic time, the time separation between clicks, is basically infinite. These sensors are on all the time. They are monitoring, picking up data, reasoning about what they're sensing and taking necessary action. It's important that the data center is placed close to the point of action where the data is being collected. Otherwise, the cost of streaming all of this enormous amount of data on to the Internet is going to be cost prohibitive. It's also important to put the data center close by so that you could sense and react to the environment as quickly as possible, where a few milliseconds makes a difference. The speed of light traveling to data centers far away takes too long. And finally, in many industries, there are data privacy and data sovereignty issues. In the future, there will be millions of data centers all over the world, close to the point of action, close to where the data is generated and sensed and processing AI instantaneously. These data centers are going to be very much like the cloud data centers today. They're cloud native, powerful and most importantly, secure. We created a computing platform for this Edge AI application. We call it the NVIDIA EGX. EGX was made possible by 2 advanced processors. The first is the Ampere GPU, which has been designed for high-speed AI processing, but there are other capabilities that Ampere provides. The first is secure and authenticated boot, so that you know that this computer is authorized to be on the network and run the applications. The second is a new security engine for confidential AI. The AI algorithms are sensing, reasoning and taking action, so it's vitally important that we know it hasn't been tampered. This secure confidential AI enclave is going to protect the AI model so that it's encrypted. And if it's tampered in any way, the program will not be allowed to run. The second processor is the NVIDIA Mellanox ConnectX-6 Dx. It's a dual 100-gigabit per second ethernet or InfiniBand, it is a crypto engine to process security protocols, TLS and IPSec at line speed. It has ASAP2 accelerated security and packet processing, with single root IO virtualization that allows this computer to be secure and virtualized. And then lastly, it has a brand-new capability called time-triggered transmission, which connects the EGX server with the 5G radio antenna and synchronizes the transmission between the 2 of them. The combination of these 2 processor makes up the NVIDIA EGX. And by installing the EGX into a standard x86 server, you turn it into a hyper-converged, secure, cloud-native, AI powerhouse. It's basically an entire cloud data center in 1 box. The NVIDIA EGX card is the starting point. What makes it amazing is the software stack on top. We call it the EGX stack. The magic of NVIDIA EGX comes alive because of the software stack. Fully integrated, fully optimized. It has 4 major pillars. The first is that it's cloud native, designed for Kubernetes, orchestrating containers, managed from afar, the ability to update software without rebooting the computer. Second, it's the world's first GPU-accelerated 5G baseband radio. Our first partner is Ericsson, who is already developing 5G stack on top of NVIDIA. Third, a fully optimized high-performance AI processing pipeline that we are world-famous for. And then lastly, this entire stack is optimized for networking, storage and most importantly, security. This 1 box is essentially a state-of-the-art cloud-native data center that is secure. It could be managed from afar, it is tamper-proof, data is protected in motion and in place, it is authenticated before it can come onto your network, and as a result, you could manage a fleet of data centers. And all of those data centers are sprawled out all over your geography and all over your market, connected to sensors, applying intelligence to the products and services that you offer. The EGX stack comes with several reference applications. The 5G reference application is one. Another reference application is called Metropolis. Metropolis is designed for connecting multiple high-speed cameras or high-speed sensors to process the streaming data and do AI processing on top of that. Metropolis also comes with a library of pretrained state-of-the-art models. These pre-trained models has been trained with a great deal of data. We also provide transfer learning tools to adapt these models to your use case. This end-to-end system is what we call NVIDIA EGX. This is a demonstration of live cameras feeding multiple streams of video, which then goes into a 5G radio emulator, turns the live video into 5G packets. The 5G packets comes into the EGX stack, which runs the 5G Aerial SDK, GPU software baseband radio. Aerial and Metropolis is running on top of the EGX stack, which is running Kubernetes, CUDA, AI processing, network processing, security processing and storage processing, all completely optimized, running on 1 computer, end-to-end. Let me show you another application of NVIDIA EGX. This is factory automation. The reference application we developed for this is called Isaac. It starts with Isaac Sim, a virtual reality environment that obeys the laws of physics, appear photo realistic. In that world, a robot thinks it's in the physical real world. We're going to teach it skills and refine its skills. Once the robotics model is developed, we would run it on the NVIDIA EGX computer, running the Isaac robotics stack. The Isaac robotics stack includes sensing models, localization, articulation models and navigation models. It receives sensor information streamed over 5G from the fleet of robots. It does the robotics processing, sensing, reasoning and action and sends the actuation commands back over 5G to the fleet of robots, with extremely low latency, which is one of the features of the 5G protocol. Isaac is end-to-end. From the virtual reality environment we call Isaac Sim, the robotic stack we call the Isaac SDK, the computer NVIDIA EGX and the environment for 5G and the robotics stack on top controlling a fleet of robots to automate factories of the future. Let me show it to you. [Presentation]

Jen-Hsun Huang

executive

#33

Isn't that incredible? Ladies and gentlemen, today, I have really exciting news. We've been working together with one of the world's leaders in building amazing machines. Ladies and gentlemen, BMW. BMW has chosen NVIDIA to build the factories of the future. BMW manufactures some of the most amazing machines in the world, and they do it in volume. But what you don't realize, each 1 of these cars, they have 40 different models, have 100 options. Every single day, 30 million raw parts comes in from nearly 2,000 suppliers. It goes to 30 factories around the world. And those 30 factories assemble 1 car every 56 seconds. These 30 million parts comes in, sent over to the workspace, just-in-time for the craftsman to assemble those parts into the car. The empty crate's taken away, a new crate arrives. Each and every step of the way, a robot will be involved. The splitting, the picking, the placing, the delivery, the picking up of the empties. This is really a logistics miracle and is one of the great challenges of automated factoring today. And this is the future. You're going to have a factory that's going to be designed as a robot. Mass production, customization, going hand-in-hand and what makes it possible is artificial intelligence and robotics. I can't be more delighted to work with all of the great people at BMW in this great challenge to invent the future of automated factories. As I mentioned, the fusion of IoT and artificial intelligence is going to create this whole new computing space we call Edge AI, and the applications for it will be in so many industries. We have already announced previously that we're working with Walmart on automated retail. We've also talked about how we're working with USPS, the world's highest volume, highest speed sorting system, is now powered by NVIDIA EGX. Today, I announced our partnership with BMW to apply EGX, IOT, AI and robotics technology to reinvent the future of factories. Well, these industries are large. The number of sensors around the world is going to be enormous, trillions of them, collecting and processing data continuously. Vast majority of the world data will be created and AI processing will be done. The NVIDIA EGX stack has so many partners from operating systems, security processing, network processing, and one of the most important is the 5G stack and our partnership with Ericsson and Mavenir. And in each one of the vertical industries, there are great partners we're working with to develop applications of specific types of AI and specific skills from industrial, to medical, to robotics, to intelligent video analytics. This is one of the most exciting computing platforms we've built, bringing together NVIDIA's great AI processing and Mellanox's great network storage and security processing. We've created a computing platform that has the power of clouds, but could be used and deployed everywhere at the edge. NVIDIA EGX. One of the most exciting applications for AI is autonomous AI. This is essentially a data center on wheels. And AI is processing at real-time with sensor feeding it, and it has to process it so fast that it can make decisions in a split of a second. Autonomous vehicles is one of the greatest computing challenges and will surely be one of the most impactful. This is one of the largest industries in the world, 10 trillion miles are driven each year. It is because of this we've created an architecture that is scalable. Ladies and gentlemen, today, I'm announcing that the Ampere architecture is coming to NVIDIA DRIVE. Because of the scalability of the Ampere architecture, the efficiency and the incredible computational performance, we can now scale our driving car computer from ADAS behind the windshield, all the way up to robotaxis. From a low-power chip for ADAS driver assistance to a powerful in-car computer for full self-driving autopilot, all the way up to driverless robotaxi, 1 single programmable architecture. This is basically the entire range of everything that moves. And it's our belief that everything that moves will eventually be autonomous. This will represent cars, passenger cars, taxis, trucks, shuttles, delivery bots. We're going to have autonomous vehicles of all kinds. The Ampere architecture will span this whole range. We also developed an end-to-end system for creating the AV autopilot application, testing it and simulating it, deploy it into the car, all the way to operating it with remote control. This entire system is built on 1 architecture. NVIDIA is collecting data, labeling and augmenting the data, training the AI models. We created a virtual reality driving simulator, testing the entire stack in the car, to an AI agent that would assist you, all the way to a virtual reality telepresence system that allows you to portal into the mind of the car so that you can remotely control the car. This end-to-end system and the entire scalable architecture from ADAS to autopilot to robotaxi is the drive system. Now let me show you our latest update. Before I show it to you, let me first remind you everything you're about to see is in virtual reality. All of our engineers are safely sheltering at home. They could develop their application at home and virtually drive their autopilot in our data centers. Please enjoy. [Presentation]

Jen-Hsun Huang

executive

#34

Wasn't that great? The NVIDIA DRIVE platform is an open platform. We are doing so, so that we can understand the great challenges of this new computing form. NVIDIA is developing the DRIVE system from beginning to end. From the collection of data all the way to the testing of the cars. We are developing the entire stack from top to bottom, and we're creating an architecture that spans ADAS to autopilot to robotaxis. This entire platform is open. You can engage the platform however you like. You could use our libraries, use our tools in any part of it to augment your own. The openness of our platform is the reason why we have so many partners around the world. Cars, trucks, tier 1 OEMS, mobility services, start-ups, software companies, mapping companies and simulation companies. We have partners all over the world using DRIVE, developing all kinds of different forms of autonomous transportation. This end-to-end pipeline, this end-to-end infrastructure is one of our greatest achievements, and we have learned so much in doing this about the future of autonomous vehicles. And so there you go. Direct from my kitchen, GTC 2020. We talked about a lot of stuff. Let me quickly summarize. The first thing we talked about was how accelerated computing is accelerating a momentum. And that we're taking it to the next level to data center scale computing, where accelerated computing and data processing and networking are both vitally important. I'm delighted that Mellanox and NVIDIA are now officially 1 company. The second thing we talked about was how real-time ray tracing, after all of these years, has enabled next generation of computer graphics. An Omniverse could be created with portals for designers with different tools, in different places, doing different parts of the design at the same time. They are able to do that because the world is interactively lit, it is physically based, it obeys the laws of physics and it could be created in real time. We call that Omniverse, and we're shipping it on RTX server. The third thing we talked about was NVIDIA AI. Machine learning is the greatest challenge of HPC today. And machine learning has 3 basic stages: the data preparation, the training of the model and the inference, the deployment, the production of the model. I spoke about the data preparation of the machine learning pipeline and how the amount of data that is being processed is going from tens of terabytes to hundreds of terabytes to very, very soon, petabytes of data. The leading compute engine is called Spark. Spark takes an entire data center, and it turns it into a compute engine, partitioning the large amounts of data into small chunks that are processed in clusters of computers. After several years of endeavor, today, we announced the acceleration of Spark 3.0. NVIDIA now has the entire pipeline of machine learning, from data processing to inference through training, all completely accelerated. I also spoke about 2 of the most important machine learning pipelines in the world today. The recommender system, the machine learning system that predicts user preferences for billions of people and trillions of items. Collecting an enormous amount of data about objects, about users, about usage patterns. And by doing so, create a predictive model of your preferences. We created an application framework called NVIDIA Merlin. That simplifies this enormously complicated distributed computing, machine learning pipeline called recommender systems, NVIDIA Merlin. We also spoke about conversational AI. Recent breakthroughs in speech recognition, natural language understanding and speech synthesis has made it possible for us for the very first time to imagine creating an AI model that allows you to have natural conversations. We created a conversational AI framework that codifies the entire pipeline with state-of-the-art AI models that are pretrained and optimized and tuned for performance and fast response. The entire pipeline can respond in just a couple of 200, 300 milliseconds. As a result, you can have a reasonable conversation, interactive conversation with an AI agent. Jarvis is used by enterprises around the world to adapt it to their domain, answer health care-related questions, insurance questions, financial services questions. It is now possible to retrain Jarvis for your domain. Conversational AI is now democratized. And right here from my kitchen, we announced the shipment of our Ampere GPU and the data center GPU, A100. Ampere is a miracle. Ampere is the largest and most complex processor the world has ever made. TSMC's 7-nanometer, 54 billion transistors connected to 1.5 terabytes of HBM2 memory on a 3D package called CoWoS. This processor goes into the DGX-1. As a third-generation Tensor Core has a peak throughput 20x greater than Volta, the most advanced processor in the world today. A100 is 20x the peak for training, 20x the peak for inference of V100. It has a brand-new architecture called MIG, Multi-Instance GPU. It could be configured as 1 or 7 or something in between. MIG allows Ampere to be used as scale up for data analytics or training as well as scale out for public cloud instances or inference. This is our first GPU that has such incredible throughputs and has the ability to configure itself into a large or small GPU. It's connected also by our next-generation NVLink, 600 gigabytes per second, 10x the bandwidth of PCI Express Gen4, DGX A100 is the most advanced AI instrument in the world. It is designed for the entire pipeline of machine learning from data processing, to training, to inference. It is the first computer we've ever built that is unified for all of those workloads, whether scale-up or scale-out. The NVIDIA DGX A100 is in production today, has the equivalent performance of 150 high-end servers, well over $1 million. DGX A100 is available today for $199,000. The more you buy, the more you save. We also talked about the coming together of IOT, the Internet of things and artificial intelligence, creating this brand-new opportunity called Edge AI. This is the beginning of the Smart Everything Revolution. There will be trillions of things connected to the Internet. With intelligent AI services, you could just imagine the scale of this opportunity. You have already heard us announce in previous keynotes how we're working with Walmart, who is using NVIDIA EGX for smart retail, or USPS using NVIDIA EGX for the highest logistics sorting operations in the world. Today, I announced that BMW has selected NVIDIA, and the EGX and the Isaac robotics platform to create their next-generation factories. That was a busy GTC. It is great to have all of you. I want to thank all of you for partnering with us, and I want to particularly recognize all the researchers, scientists, artists and designers that take advantage of our platform to invent the future. One more treat for you. This is something that I'm supremely proud of. NVIDIA is one of the companies in the world that has assembled a great team of designers and architects and software programmers and scientists and AI researchers, computational mathematicians as well as incredible artists. The fusion of art, engineering and science, all under 1 roof is 1 of the things that really inspires me. I want to share with you the behind the scenes of the creation of the video, "I Am AI." I think you're going to love it. Please enjoy. And see you next time. [Presentation]

This call discussed

For developers and AI pipelines

Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments, full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.