NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary
November 8, 2021
Earnings Call Speaker Segments
Jen-Hsun Huang
executiveWelcome to GTC. We have a lot of exciting things to show you so let's get started. We have a jampacked keynote, but let me first share with you this year's I Am AI, a celebration of groundbreaking work by scientists and researchers around the world. [Presentation]
Jen-Hsun Huang
executiveAccelerated computing starts with NVIDIA CUDA, general-purpose programmable GPUs. The magic of accelerated computing comes from the combination of CUDA, the acceleration libraries of algorithms that speed up applications and the distributed computing systems and software that scale processing across an entire data center. We have been advancing CUDA in the ecosystem for 15 years and counting. We optimized across the full stack, iterating between GPU, acceleration libraries, systems, applications continuously, all the while expanding the reach of our platform by adding new application domains that we accelerate. With our approach, end users experience speed-ups through the life of the product. It is not unusual for us to increase application performance by many x factors on the same chip over several years. Imagine the joy of a researcher whose simulation completed in half the time just by downloading new software. As we accelerate more applications, our network of partners experienced growing demand for NVIDIA platforms. Starting from computer graphics, the reach of our architecture has reached deep into the world's largest industries. We start with amazing chips. But for each field of science, industry and application, we create a full stack. We have over 150 SDKs that serve industries from gaming and design to life and earth sciences, quantum computing, AI, cybersecurity, 5G and robotics. We introduced 65 new and updated SDKs at GTC this year. One of the major new industries that's accelerating with NVIDIA is design automation. I am thrilled to see ANSYS, Synopsys, Cadence and Dassault accelerate the simulation of thermal, mechanical, 3D electromagnetics, RF interference and signal integrity. A super exciting development is the work we're doing with ANSYS to accelerate ANSYS Fluent, the world's leading industrial fluids simulation package. Early results with ANSYS multi-GPU solver show one DGX will replace 30 high-end dual CPU servers, leading to big savings in system cost and power. With the same total budget, customers can scale to much larger simulations. The number of developers that use NVIDIA has grown to nearly 3 million, by 6x over the past 5 years. CUDA has been downloaded 30 million times over the past 15 years and 7 million last year alone. The adoption of accelerated computing is accelerating. Our expertise in full stack acceleration and data center scale architectures lets us help researchers and developers solve problems at the largest scales. Our approach to computing is highly energy efficient, and the versatility of our architecture let us contribute to fields ranging from AI to quantum physics to digital biology to climate science. We have some great new acceleration levers for you today. The first is ReOpt, an accelerated solver for operations research optimization problems like delivery vehicle routing and warehouse picking and packing. There are 87 billion ways to deliver 14 pizzas. It's not so easy for Domino's to deliver pizza in under 30 minutes. Operations optimization is needed for last mile delivery but also warehouse and manufacturing logistics. [Presentation]
Jen-Hsun Huang
executiveQuantum computing. Relying on the natural quantum physics phenomenon of super position and entanglement has the potential of solving problems that grow with combinatorial complexity. Nearly 100 teams around the world in universities, science labs, enterprises and start-ups are doing research in quantum processors, systems, simulators and algorithms. It is expected to take another decade or 2 to build a useful quantum computer. In the meantime, the industry needs a superfast quantum simulator to validate their research. So we created the cuQuantum DGX appliance. With an acceleration library for quantum computing workflows that speeds up quantum circuit simulations using state vector and Tensor network methods. The first accelerated quantum simulator will be Google Cirq. The speed-up is terrific. Here are results of quantum Fourier Transform, Shor's algorithm used to break public key cryptography and Google's Sycamore circuit. A simulation that takes months can now be done in days. We're working on optimizing all the popular simulators. NVIDIA Research achieved a major milestone in quantum algorithm simulation using 1,688 cubits to find a solution for a MaxCut of 3,375 vertices. This is the largest-ever exact quantum circuit simulation, 8x more cubits than ever simulated before. With cuQuantum on DGX, quantum computer and algorithm researchers can invent the computer of tomorrow with the fastest computer today. The quantum DGX appliance will be available in Q1. Python is the programming language of scientists and ML and AI researchers. Python has a rich ecosystem of libraries: pandas for data analytics on data frames, NumPy for analytics of n-dimensional arrays and matrices, scikit-learn for machine learning, SciPy for scientific computing, PyTorch for deep learning and NetworkX for studying graphs and networks. There are nearly 20 million users of Python. Today, we're announcing cuNumeric, a drop-in accelerator for NumPy, 0 code change. cuNumeric accelerates NumPy, scaling it from one GPU to multi-GPU to multi-node clusters to the largest supercomputers in the world. The parallelism is done implicitly and automatically. cuDF is pandas-like. cuML is scikit-learn-like. cuGraph is NetworkX-like. They are part of NVIDIA's RAPIDS open-source Python data science kit. RAPIDS has been downloaded 0.5 million times this year, over 4x more than last year. cuNumeric is built on Legion, which schedules tasks across CPU, GPU and DPU computing units across a data center in very similar ways as a modern CPU schedules instructions across its ALUs and load-store units like modern out-of-order execution CPUs that automatically extracts instruction-level parallelism and dynamically reorder the execution. Legion extracts task-level parallelism and dynamically reorders and dispatches the execution of these tasks often out of order across the entire data center. Legion is a data center scale compute engine, and cuNumeric is a data center scale math library. NumPy was downloaded 122 million times in the last 5 years. NumPy is used by nearly 800,000 projects on GitHub. Developers are going to be thrilled with cuNumeric. The scalability of cuNumeric is excellent. On the famous CFD Python teaching code, cuNumeric scales to 1,000 GPUs with only a 20% loss from perfect scaling efficiency. ReOpt, cuQuantum, cuNumeric, 3 fantastic new libraries. Let me show you the road map of my talk. I'll update you on big initiatives we're working on and introduce new ones that will shape our industries. A constant theme you'll see, how Omniverse is used to simulate digital twins of warehouses and plants and factories, of physical and biological systems, the 5G edge, robotics, self-driving cars and even avatars. You'll see how leading edge computer graphics, physics simulations and AI came together to make Omniverse possible and how the computing platforms and accelerated libraries we built lay the foundation to make Omniverse a reality. Data center scale computing, million-x science, Omniverse, AI, avatars, robotics and self-driving cars. We have a jam-packed GTC. But before we jump into data centers, I want to show you something we've been building, a conversational AI, ToyMe. You're going to see speech understanding, natural language processing on the largest model ever trained, speech synthesis with my own voice, character animation and beautiful ray trace graphics all in real time. ToyMe was made with some amazing technologies that have become possible only recently and barely so. I asked a few friends to ask this cute little guy some tough questions. Let's take a look. [Presentation]
Jen-Hsun Huang
executiveIn distributed computing, the network is the vital central nervous system of the computer. The network connects thousands of GPUs into a giant supercomputer, determines its scalability and ultimate performance. Today, we're announcing the NVIDIA Quantum-2, the most advanced end-to-end networking platform ever built. Quantum-2 is a 400 gigabits per second InfiniBand platform and consists of the Quantum-2 switch, the ConnectX-7 NIC, the BlueField-3 DPU and a whole bunch of software for the new architecture. Quantum-2 is the first networking platform to offer the performance of a supercomputer and the shareability of cloud computing. This has never been possible before. Until Quantum-2, you get either bare metal high performance or secure multi-tenancy, never both. With Quantum-2, your valuable supercomputer will be cloud-native and far better utilized. Quantum-2 has some great new features. Performance isolation keeps the activity of one tenant from disturbing others. A telemetry-based congestion control system keeps high data rate centers from overwhelming the network and jamming the traffic for all. Generation 3 SHARP has 32x higher in-switch processing to speed up AI training. A nanosecond precision timing system can synchronize distributed applications like database processing, lowering the overhead of waiting and handshaking needed to avoid race conditions. Nanosecond timing will also allow cloud data centers to become part of the telecommunications network and house software-defined 5G radio services. If NVIDIA's Selene DGX supercomputer were equipped with Quantum-2 today, the total bandwidth would be 224,000 gigabytes per second or roughly 1.5x the total traffic over the Internet. Quantum-2 starts with the amazing InfiniBand switch chip, 57 billion transistors in TSMC 7-nanometer, as big as our A100 GPU. It has 64 ports at 400 gigabits per second or 128 ports at 200 gigabits per second. A Quantum-2 system can connect up to 2,048 ports versus 800 ports in Quantum-1. That's over 5x the switching capacity. And Quantum-2 can scale up to 1 million end points within the 3-hop dragonfly topology. That's 6.5x over the current generation. This networking speed, switching capacity and scalability is coming just in time for the giant HPC systems that the world needs to build. Quantum-2 switch is sampling now. Quantum-2 offers 2 networking end point options: CX-7 and BlueField-3. CX-7 is the fastest NIC ever built, 8 billion transistors in TSMC's 7. CX-7 doubles the data rate of the world's current fastest networking chip, CX-6, and doubles the performance of Mellanox' famous capabilities like RDMA, GPUDirect Storage, GPUDirect RDMA and in-network computing. A 256-thread data path processor does crypto at line rate. CX-7 will sample in January. Quantum-2 also offers BlueField-3 InfiniBand. 16 64-bit ARM CPUs offload and isolate the data center infrastructure stack. BlueField-3 is 22 billion transistors in TSMC's 7. BlueField-3 is sampling in May. The NVIDIA Quantum-2, the most advanced networking platform ever built. Quantum-2 will be available from top computer makers and in supercomputing centers all over the world. It's going to give high-performance computing quite the boost. Cloud computing and machine learning are driving a reinvention of the data center. Container-based applications give hyperscalers incredible abilities to scale out, allowing millions to use their services concurrently. The ease of scale-out and orchestration comes at a cost. East-West network traffic increased incredibly with machine-to-machine message passing, and these disaggregated applications open many ports inside the data center that need to be secured from cyberattack. A new type of processor is needed to offload the CPU of the burden of processing the networking, storage and security software. NVIDIA's BlueField DPU, an infrastructure computing platform, is designed to do exactly that. BlueField offloads and accelerates the infrastructure software, which is consuming some 30% and growing of the CPUs. For multibillion-dollar data centers, the freed up capacity can be a giant cost savings or a throughput boost. The reception of BlueField is phenomenal. Today, we're announcing BlueField DOCA 1.2, a suite of new cybersecurity capabilities that make BlueField the ideal platform for the industry to build their zero-trust security systems. Protection at the perimeter and work group segmentation are no longer sufficient. Every touch point of applications, data, users and devices are potential attack surfaces. Since BlueField is the networking end point, we can secure a data center at virtually every touch point. We are delighted to announce that the leading cybersecurity companies are working with us to provision their next-generation firewall services on BlueField: Checkpoint, F5, Fortinet, Juniper, Guardicore, Palo Alto Networks, Trend Micro and VMware. The BlueField ecosystem is expanding. The cloud data center movement affects every computing company. There are now 1,400 developers working with BlueField. And now cybersecurity companies on BlueField can provide zero-trust security as a service. Until every attack surface is secure, we should assume security will be or is already breached. State-of-the-art cybersecurity platforms monitored the torrential user machine and machine-to-machine transaction logs, yet they only parse a fraction of that data looking for anomalies. With accelerated computing and deep learning, we can process and study everything. We created Morpheus, a deep learning cybersecurity platform that can monitor and analyze subtle data center characteristics generated by every user, machine and service. Morpheus is built on NVIDIA RAPIDS and NVIDIA AI. Workflows of Morpheus create AI models, digital fingerprints for every combination of app and user to learn their usual patterns and look for abnormal transactions. These abnormal transactions, which may represent only a handful of millions of events, would trigger a security event and alert an analyst to respond just as many computing fields have enjoyed tremendous advances with accelerated computing and deep learning. With Morpheus, we are bringing this superpower to cybersecurity. Be sure to go see the talks by F5 and Splunk on cybersecurity works that we're doing together on Morpheus. [Presentation]
Jen-Hsun Huang
executiveThe software revolution of deep learning is coming to science. This is extremely exciting and will make a big impact. Three connected dynamics will give us 1 million x leap in computational sciences. Let me explain. First, accelerated computing, reinventing the full computing stack, from the chip in system, the acceleration libraries to the applications, gave us a 50x boost. Second, the boost launched deep learning, triggered the modern AI revolution and fundamentally changed software. The software that deep learning writes is highly parallel, making it even more conducive to GPU acceleration and scalable to multi-GPU and multi-node. Scaling to large systems like DGX SuperPOD gave us another 5,000x speed-up. Third, the AI software written with deep learning can predict results 1,000 to 10,000x faster than software written by hand, busting open completely the way we solve problems and the problems that are even solvable. 50x x 5,000x x 1,000x gives us 250 million. Of course, the mileage will vary and much depends on the scale you invest. But when a solution to a worthwhile problem is within grasp, the investments will come. Look at the investments that are going into AI, robotics, self-driving cars. The signs are clear: accelerated computing, doing AI at data center scale will give a giant boost in simulation performance. How do we apply deep learning to science? Sciences based on the laws of physics, Newton, Maxwell, the laws of thermodynamics, Ohm's law, Bernoulli's principle, the law of conservation of energy, to name a few. Researchers are creating AI models that learn physics and make predictions that obeys the laws of physics. The application of machine learning to improve physics simulation has been growing incredibly. Let me highlight just a couple that I know of. Karniadakis and the team at Brown described PIN, physics-informed neural networks. Li-Anandkumar, a team at Caltech and NVIDIA, described FNO, Fourier neural operator, that can learn to approximate any partial differential equation. The same team recently combined the benefits of PIN and FNO into PINO, a universal function learner that obeys the laws of physics. PINO can learn from a principled physics simulator or observed data. Once trained, it can emulate the principled physics models at extremely high speeds. And equally important, this model is highly paralyzable and so can scale to very large systems to get a combined million x factor. Virtual screening is one of the major pillars of modern drug discovery. It involves finding a drug chemical that will bind to and inhibit the function of a protein in the pathway of a disease. The virtual screening process is a molecular dynamic simulation of the atomic forces between the chemical and the protein. The atomic forces of molecules are determined by its 3D structure. The 3D structure of a human protein, the long strands of amino acids, is revealed through X-ray crystallography and cryo-electron microscopy. Using this painstaking method, only 17% of the roughly 25,000 human proteins have been decoded. In the vast space of human disease, without the 3D structure of most human proteins, computer-aided drug discovery is limited. Just this year, the researchers of AlphaFold and Rosetta net taught their AIs to predict the 3D shape of proteins from just their amino acid sequences. Overnight, DeepMind decoded over 20,000 human proteins. The tedious process of decoding a protein is turbocharged by an AI model. There are hundreds of millions of animal, plant and bacterial proteins that can now be decoded. Meanwhile, AI models are now able to learn characteristics of known effective chemicals and generate other potentially effective novel chemicals. Millions more potentially effective chemicals meet hundreds of thousands more protein structures, opening up a gigantic unexplored space of new opportunities. The opportunity space has increased many orders of magnitude, 1 million fold. This has created overnight a massive molecular simulation bottleneck. How do we get the million x leap to engage this opportunity? Physics ML might be the answer. Researchers at NVIDIA and Caltech used physics ML methods to teach a graph neural network to replace the expensive quantum calculations of atomic forces in a molecular simulation. OrbNet predicted Schrodinger's equation. The result is a 1,000-fold increase in simulation performance. Intas is a supercool company focused on using machine learning and computing to revolutionize drug discovery. This is a simulation of Hsp90, a chaperone protein that helps other proteins fold properly. You're watching something amazing. This video shows a simulation of the chemical reaction that is happening between the HsP90 protein and a candidate drug. The candidate drug is forming a chemical bond with an amino acid of the protein. These chemical reactions are rare events so we have to simulate at very long time scales. And because electrostatic simulations cannot model the atomic bond, quantum methods are needed to compute free energy of the reaction. This simulation took 3 hours on one GPU. Without the OrbNet physics ML, it would have taken over 3 months. The future of drug discovery is computational end-to-end, modeling the disease pathway, the genes involved, the drug target interactions and the off-target interactions. With the confluence of million x acceleration, ML for protein and chemical structure prediction and physics ML simulation approaches, we are witnessing the dawn of the biology revolution. For climate science, we may finally have a way to simulate the Earth's climate 10, 20 or 30 years from now, predict the regional impact of climate change and take action to mitigate and adapt before it's too late. Severe droughts are happening around the world. This is not caused by the lack of rain but higher evaporation from rising temperatures. The dryness is also causing more wildfires. Predicting climate change, so to develop strategies to mitigate and adapt, is arguably one of the greatest challenges facing society today. We don't currently have the ability to accurately predict the climate decades out. Although much is known about the physics, the scale of the simulation is daunting. Climate simulation is much harder than weather simulation, which largely models atmosphere physics, and the accuracy of the model can be validated every few days. Long-term climate prediction must model the physics of earth's atmosphere, oceans and waters, ice, the lands and human activities and all of their interplay. Further, simulation resolutions of 1 to 10 meters are needed to incorporate effects like low atmospheric clouds that reflect sun's radiation back to space. Ignoring these contributions accumulate to significant error in the long-term predictions. This is 10,000 to 100,000x higher resolution than any weather simulation today. There are no computers big enough that we can build. We need a computer science breakthrough. Today, we're announcing NVIDIA Modulus, a framework for developing physics ML models, train physics ML models using governing physics and data from principled models and observations. Modulus has been optimized to train multi-GPU and multi-node. The resulting model can emulate physics 1,000, 100,000x faster than simulation. With Modulus, scientists will be able to create digital twins to better understand large systems like never before. One important problem we can apply Modulus to solve is climate science. Climate change is reshaping the world. The largest reservoirs in the U.S. are at their lowest level in 2 decades, some 150 feet below where they were. The combination of accelerated computing, physics ML and giant computer systems can give us 1 million x leap and give us a shot. We will use principal physics models and observed data to teach AI to predict climate in super real time. We can create a digital twin of the earth that runs continuously to predict the future, calibrating and improving its predictions with observed data and predict again. Researchers trained a physics ML model using atmospheric data from ERA5 of ECMWF. The model took 4 hours to train on 128 A100 GPUs. The train model can predict hurricane severity and path at 30-kilometer spatial resolution. Seven days of prediction takes only 0.25 second on a GPU. That's 100,000x faster than simulation. Hopefully, in a couple of years, data will stream into a digital twin of earth running on Omniverse. An ensemble of physics ML models will predict the climate. Let's talk about Omniverse. The Internet changed everything is surely an understatement. We are always connected now. The Internet is essentially a digital overlay on the world. The overlay is largely 2D information: text, voice, images, video. But that's about to change. We now have the technology to create new 3D virtual worlds or model our physical world. These virtual worlds will obey the laws of physics or not. There can be AI or friends with you. We will jump from one world to another, like we do on the web with hypertext. This new world will be much larger than the physical world. We will buy and own 3D things like we buy 2D songs and books today. We will buy, own, sell homes, furniture, cars, luxury goods and art in this world. Creators will make more things in virtual worlds than they do in the physical world. We built Omniverse for builders of these virtual worlds. Some worlds will be both for gathering and games, but a great many will be built by scientists, creators and companies. Virtual worlds will crop up like websites today. Omniverse is very different than a game engine. Omniverse is designed to be data center scale and hopefully, someday, planetary scale. The portal of Omniverse is USD, Universal Scene Description, essentially a digital wormhole that connects people and computers to Omniverse and for one Omniverse world to connect to another. USD is to Omniverse what HTML is to websites. Omniverse is futuristic. Omniverse can connect design worlds. Things created in the Adobe world can be connected to those in the Autodesk world through Omniverse, enabling designers to collaborate in a shared space. Changes by a designer in one world are updated for all connected designers, essentially like a cloud shared document for 3D design. Omniverse will revolutionize how the 40 million 3D designers in the world collaborate. Companies can build virtual factories and operate them with virtual robots in Omniverse. The virtual factories and robots are the digital twins of their physical replica. The physical version is the replica of the digital since they're produced from the digital original. Omniverse digital twins are where we will design, train and continuously monitor robotics factories and buildings, warehouses and cars of the future. Let me show you some of the fundamental technologies that make Omniverse possible. [Presentation]
Jen-Hsun Huang
executiveWe are releasing a big update to Omniverse today with some exciting new features. Showroom, an Omniverse app for demos and samples that showcases core Omniverse technology, graphics, physics, materials and AI. Farm, a systems layer that orchestrates the processing of batch jobs across multiple systems, workstations, servers, bare metal or virtualized. Farm can be used for batch rendering, synthetic data generation for AI or distributed computing. Omniverse AR streams graphics to phones or AR glasses. Omniverse VR is the world's first full-frame interactive ray-traced VR. Since launch last year, Omniverse has been downloaded 70,000 times by designers and 500 companies. Along with us, the community, companies, tool providers are building Omniverse connectors. There are 14 available now and 15 more coming soon. Bentley announced that iTwin with Omniverse is now in early access. Bentley is not just connecting to Omniverse. They're building their digital platform on it. Bentley is used by 90% of engineering firms and has nearly 2 million users of Bentley iTwin. Heat recovery steam generators, or HRSG, converts the hot gas out of the combustion turbine into steam, which drives a steam turbine to generate electricity. Corrosion is certain so inspection and maintenance are needed. Siemens Energy estimates that by predicting corrosion accurately, they can reduce inspection during regular maintenance and unplanned downtime by 70%. Reducing the industry 5 to 7 days can save nearly $2 billion a year. HRSG corrosion is a multi-physics problem with a combination of flow characteristics, water chemistry and operating conditions. Using NVIDIA Modulus, the physics ML framework and Omniverse, we've created a digital twin platform with Siemens. Let's take a look. [Presentation]
Jen-Hsun Huang
executiveBMW produces one vehicle per minute, each with roughly 25,000 parts. There are 5 million parts on the factory floor at any time. At GTC Spring, BMW showed us how they're building a digital twin of the Regensburg factory. They have since expanded to 3 other factories totaling 10 million square meters. Their engineers are also using Isaac GIM built on Omniverse to teach their robots new skills. Let's take a look at the digital twin factories that BMW is building. [Presentation]
Jen-Hsun Huang
executiveEricsson is building a digital twin of a whole city to configure, operate and continuously optimize their fleet of 5G antennas and radios. This is a really great story. Take a look. [Presentation]
Jen-Hsun Huang
executiveSomeday, that fleet of antennas will use AI to learn the best beam forming and signal strengths to optimize the quality of service and throughput in the city while conserving energy. Omniverse, as you could see, is a foundational platform for digital twin virtual worlds, where AI systems are created. Let's talk about building AI models and systems. Graphs are the native format, the most natural data structure of the world's data, whereas CNNs learn from spatial data and RNNs learn sequences. Graph neural networks can learn relationships. How molecules connect to each other in a protein, how people are connected in a social graph, how roads are connected to town cities, all can be described as a graph. Deep Graph Library, DGL, is a python library built to implement graph neural networks on top of existing deep learning frameworks. We're working with the DGL community to accelerate GNN processing, like we have with CNNs, RNNs and transformers, from constructing the graph to sampling subgraphs and projecting graphs into a DNN framework. We are accelerating the workflow so that developers can train and inference graphs with billions and trillions of edges. GNNs are the new go-to models for financial services, drug discovery, digital biology and cybersecurity. Our early graph engagement partners have seen excellent results. PayPal significantly improved their collision fraud detection. Amazon used it to improve Amazon Search and reduce abuse and fraudulent sellers and buyers. They process graph sizes impossible before. Pinterest scaled search and recommendations to 0.5 billion users. We will have early access in December. Transformers are models that can learn sequence patterns in parallel. This breakthrough sped up language model training dramatically, which led to self supervised language learning. No longer limited by human data labeling, giant self-supervised Transformers benefit from the troves of digital knowledge on the Internet. The recent breakthrough of large language models is one of the great achievements in computer science. There's exciting work being done in self-supervised multimodal learning and models that can do tasks that has never been trained on called zero-shot learning. 10 new models were announced just last year alone. Training large language models is not for the faint of heart. $100 million systems training trillion parameter models on petabytes of data for months require conviction, deep expertise and an optimized stack. We created NeMo Megatron, a framework dedicated to training speech and language models of billions and trillion parameters. It is optimized to scale out to gigantic systems and sustain the highest computation efficiency. Our researchers trained GPT-3 on NVIDIA's 500-node Selene DGX SuperPOD in 11 days. And together with Microsoft, trained the Megatron MT-NLG 530 billion parameter model in 6 weeks. With NeMo Megatron, any company can train state-of-the-art large language models. Once trained, how do we run these large language models? Inference response time has to be sufficiently fast to be useful. On a high-end dual Xeon Platinum CPU server, inferencing Megatron 530B takes over a minute. For many applications, that's basically unusable. GPU accelerating these models is also challenging because the model sizes require much more than a frame BAR for size of a GPU. GPT-3, with 175 billion parameters, needs at least 350 gigabytes of memory. Megatron, with 530 billion parameters, needs over 1 terabyte of memory. So we created the world's first distributed inferencing engine. NVIDIA Triton now does distributed processing across multiple GPUs and multiple nodes. GPT-3 will fit easily on an 8-GPU server. Megatron 530B will distribute across 2 DGX systems. The performance is incredible, from over a minute to half a second. The capability and implication of large language models are profound. LLMs can answer deep domain questions; comprehend and summarize complex documents; translate languages; write stories; write computer software; understand intent; be trained without supervision are zero-shot, meaning that they can perform tasks without being trained on any examples. LLMs are pretrained on general knowledge and can be retrained to effectively serve new domains. There are 20 or 30 languages that represent 80% of the world's population. There's easily 100 industrial or science domains. And within them, plenty of use cases. Sweden is working on digitizing its history. Samsung is building a smart speaker for the over 200 million Portuguese speakers in South America. VinBrain is training a Vietnamese large language model for health care. JD is building an LLM for their e-commerce services to engage their 0.5 billion customers. Rakuten is building a Japanese LLM for their digital services. ServiceNow is building an IT help desk chatbot. And Xiaomi, the world's largest phone maker, is building an AI assistant. Customizing large language models for new languages and domains is likely the largest supercomputing application ever to come along. There are many AI models that are now mature and industrialized for broad enterprise use, computer vision, speech recognition, recommender systems, graphs and trees, time series models, generative models, variable encoders and large language models. There are excellent applications and uses of AI in leading companies across the world's industries. It's great to see them present their work at GTC. Be sure to go watch their talks. There are now 25,000 companies running AI on NVIDIA. Financial companies are looking to reduce fraud on over 1 billion credit card purchases a day, loss that costs the companies and consumers over $35 million a year. Customer and contact centers are overwhelmed. There are over 0.5 billion calls a day. This is a $20 billion industry. And of course, e-commerce product service recommendations for what is soon to be a $10 trillion industry. For all companies, automation is vital to growth, and AI is the most powerful automation technology we have ever known. Videoconferencing is one of the most important apps for most of us today. We're doing 15 billion meeting minutes a day. Microsoft has over 200 million active users. We're delighted to work with Microsoft to develop live captioning across 28 languages. This is an invaluable feature for those who are deaf or have hearing difficulties. Every captioning session is personalized for each meeting to understand names of people and specific jargon. Video conferencing technology is going to advance very fast. Going forward, a great deal more AI will be infused. In addition to background and noise removal, there will be AI for all kinds of amazing things. Language translation, eye contact, relighting and much more. Let's talk about inference. AI is a new way to write software, and inference is running the software written by AI. Inferencing is challenging on multiple dimensions. The computation intensity of the networks is high, but that's just the start. AI is data-driven. So the movement of data, the preprocessing and post-processing of data, all play into its performance. NVIDIA's CUDA GPU architecture shines at processing this end-to-end pipeline. AI applications have different requirements: response time, batch throughput, continuous streaming. Different use cases have different models, and deep learning architectures are really complex. There are different frameworks. There are different machine learning platforms. There are different platforms with different operating environments from cloud to enterprise, to edge, to embedded. There are different confidentiality, security, functional safety and reliability requirements. And the world has a large installed base of different CPUs and GPUs, each with different capabilities and performance characteristics. The combination of all those requirements is gigantic. Inferencing is arguably one of the most technically challenging run time engines the world has ever seen. Today, we're making the biggest release of inference tools ever. First, NVIDIA's TensorRT compiler has been integrated natively into TensorFlow and PyTorch. Many developers inference directly from the frameworks. It's easy, it always works, but it's slow. Now with one line of code, machine learning developers can get a 3x boost without lifting a finger, just one line of code. Tree models are ubiquitous, especially in finance. It is naturally explainable, and new predictive features can be added on without fear of regressions. Today, we're announcing that our Triton Inference Server will do inferencing on DL as well as ML models. The performance is fantastic and game changing. Here's an example on the IEEE fraud detection data set. The goal is to improve detection rate while responding in time to block the transaction. In this chart, that's staying to the right of the red line, which marks 1.5 milliseconds, the longest allowable processing time. For small trees, both CPUs and GPUs can do so. However, with large trees needed for better detection rates, inference time remains under 1.5 milliseconds for a GPU, while a CPU now takes 3.5 milliseconds, way too long to block the transaction. With this release, we open NVIDIA GPUs to the world of classical ML inferencing. Now with one inference platform, Triton lets you inference DL and ML on GPUs and CPUs, announcing a major upgrade to our Triton Inference Server, inference on all models, any framework, multiple query types, ML and DL for all platforms, cloud, on-prem, edge and embedded, multi-GPU, multi-node, on CUDA, x86 and ARM. One engine, NVIDIA Triton, for all inference workloads. The performance of Triton is spectacular across the board from imaging to speech AI, natural language processing, recommenders and reinforcement learning. For CSPs, Triton drives up the utilization and throughput of their infrastructure, freeing capacity for new growth. For users, Triton drives up throughput while reducing cost. This is one of the major benefits of the NVIDIA platform. With our full stack optimization and rich ecosystems, customers enjoy boosts in performance and new functionality throughout the entire life of use. Years after purchase, our chips keep getting faster and better. The more you buy, the more you save. Every company and every industry is looking to increase automation. To automate, we need to program computers to recognize patterns and execute a task repeatedly and safely. But the world is unstructured. The range of tasks human perform in an infinite range of circumstances is impossible to describe in programs and rules. Advances in AI have opened new opportunities to automate tasks unimaginable before. In computer industry parlance, the edge is where computers touch the world. A large number of edge applications today can be processed in the cloud. For example, people using phones connected to cloud services. For many edge applications, transit to the cloud is not possible for response time, data security or reliability reasons or the practicality of data transport costs for continuous high-speed sensing. Edge applications are essentially robotic applications. They perform similar tasks under similar requirements as self-driving cars. The unifying concept of edge computing is the need to process some combination of sensors, high-speed IO, data processing, signal and physics processing, AI inferencing and computer graphics. This is basically the robotics pipeline and must be processed in real time. Processing time translates to safety, cost, capacity and ultimately, usability. So how do you build the AI applications that process the robotics pipeline? We created the NVIDIA Unified Computing Framework. It lets us compose containers and microservices into a fast pipeline by chaining the processing of dedicated accelerators, the CUDA GPU, Tensor Core AI, RTX graphics, networking security and fast IO. UCF lets you build AI applications that can process the robotics pipeline. UCF can create applications that run in data center or embedded systems. Buildings, warehouses, factories, farms and roads will be able to sense in the future. Metropolis is our video processing and analytics platform. From streaming video, it can detect, track, count, infer 3D post and even reconstruct full 3D scenes in the future. We support cameras today, but it's an easy extension with UCF to support lidar, depth sensors, imaging radars, ultrasonics and infrared. Metropolis is full stack and open like all of our platforms. Customers can use Metropolis application as is or customize our graph. The stack can integrate third-party 5G radio accelerated by AERIAL CUDA PHY. And NVIDIA-certified edge computing systems are available from every computer maker. Mavenir, the leader in software-defined 5G solutions, builds 5G core and virtualized radio access networks. They use Metropolis platform to create a fully optimized construction kit to do AI on 5G for industrial applications in factories, plants, public spaces, farms and places where IT is limited. MAVedge-AI will be available to early access customers in Q1. The combination of rich sensors, computing and AI at the edge will inspire a wave of new ideas, applications that are simply not possible today. Here is an example of work that we're doing with Verizon. [Presentation]
Jen-Hsun Huang
executiveIn this chapter, I've shown you how we train new models, graph neural networks and large language models and how Triton is one engine for all inference workloads. And using UCF, how you can compose these models into edge applications. Our platform is full stack and open, runs from the cloud, on-prem to edge and embedded. With partners across the industry, we've built a rich ecosystem that connects NVIDIA AI into whatever IT infrastructure, software platform, workflow or integrator you choose. Still, there remains a great deal of engineering to stand up these stacks. For that last mile, we're partnering with Equinix to pre-install and integrate NVIDIA AI into their data centers around the world. We're making it easy for enterprises to test drive their workload. And when you're ready to scale out, the full recipe is available to our network of partners to help you do it at Equinix, your own data center or anywhere else. We call this LaunchPad. You will find LaunchPads all over the world, in Silicon Valley; Dallas; Washington, D.C.; London; Paris; Amsterdam; Frankfurt; Singapore; and Tokyo. If there are other locations you would like a LaunchPad, let us know. In the near future, there will be billions of robots to help us do things. Some will be physical robots. Most will be digital, virtual robots. Some virtual robots will be fully autonomous. Others, semiautonomous or even tele-operated, that is to say with human in the loop. Maxine is our avatar platform, our virtual robots platform. We've been showing you pieces of our technology for some time. Today, I'm going to put all the pieces together. Maxine can be autonomous or tele-operated, realistic or artistic. Maxine can be used for a broad range of applications. For example, customer service, live, on the web or in Omniverse. It can be used for video conferencing or for animating game characters or being integrated into a robot. The fundamental technologies of Maxine are just becoming possible. Computer vision, neural graphics, animation, speech AI, dialogue manager, natural language understanding, recommenders, these are foundational technologies we've been talking about for some time. This was pretty much impossible 5 years ago and barely so today. First, speech AI. Today, we're announcing the public release of NVIDIA Riva neural speech AI. This is the input-output of Maxine. We've dedicated significant R&D and built DGX SuperPODs to make Riva possible. Riva speaks English but recognizes 7 languages: English, Spanish, German, French, Japanese, Mandarin and Russian. Riva will speak more languages in the future. Riva can close caption, translate, summarize, answer questions and understand intent. Riva's accuracy is world-class, and the response time is unrivaled. And with only 30 minutes of training, Riva can be tuned to a specific voice. For example, one of a brand ambassador of your company. Riva can be fine-tuned for vocal pitch, duration and energy of human-like expressiveness. Riva can be deployed in any cloud or at the edge. Early customers have seen excellent results. [Presentation]
Jen-Hsun Huang
executiveThis is a simplified diagram of Maxine's UCF graph, Unified Computing Framework. The Maxine compute graph consists of video, audio, graphics and IO processing. AI models include vision, speech, animation, language, recommenders. Maxine runs NVIDIA AI on EGX servers and Omniverse on RTX servers. And all of it has to be interactive. Let's take a look. [Presentation]
Jen-Hsun Huang
executiveThis is a metropolis application. We call it Tokyo, a talking kiosk. A little animated robot is making eye contact and tracks the customers. From speech recognition to natural language dialogue manager that infers intent and actions. [Presentation]
Jen-Hsun Huang
executiveTo recommendation to natural speech. Tokyo responds in about 2 seconds. Very interactive. That was Maxine in autonomous and artistic mode. What if you would like to use Maxine in a tele-operated and realistic mode? This would be useful for customer service or video conferencing. Let's take a look. [Presentation]
Jen-Hsun Huang
executiveRiva converts the text to speech in those languages. Omniverse takes over and converts the speech to 3D facial animation. Alex' German, French and Spanish speaking avatars are streamed simultaneously depending on which version of. Alex' avatar you choose, she will speak to you in that language. [Presentation]
Jen-Hsun Huang
executiveIn this case, Maxine uses computer vision to track Alex' face and recognize her expressions. The 3D animation animates a virtual but realistic avatar of Alex. [Presentation]
Jen-Hsun Huang
executiveLet's talk about robots. Future medical instruments will become robotic. Recent advances in AI, physics ML, ray tracing and the computing advance we've spoken about will also revolutionize medical instruments. The algorithms will be reimagined by AI. The instrument will be reinvented by edge computing architectures. The business models will be revolutionized as instrument sales will be replaced by medical device Software-as-a-Service solutions. These dynamics are great for the patients, for the hospitals and for the instrument makers of this $200 billion industry. The industry needs a software-defined imaging platform to build this future on just as the auto industry needed a software-defined AV platform. Today, we're announcing NVIDIA Clara Holoscan, a software-defined programmable imaging platform. Holoscan is the culmination of many years of planning. It takes all of NVIDIA's technology to make Holoscan. The last 2 pieces of the puzzle are just coming online, Unified Computing Framework that I described and a new chip, Orin, a superfast sensor processing robotics chip. The base Holoscan platform consists of Orin and CX-7. Orin can process the entire robotics pipeline, sensors, physics, AI, imaging and graphics in a single chip, 12 ARM CPUs, 5.2 teraflops of FP32, 250 TOPS of AI, 740 gigabits per second, nearly 1 terabit per second of high-speed IO to connect sensors. You can optionally add an A6000 Ampere GPU and get another 39 teraflops of FP32 and over 600 TOPS of AI inference. With Clara Holoscan UCF, instrument makers have a development platform to build real-time applications that connect these powerful engines. The Holoscan platform is open. Third parties can build upon Holoscan's interfaces and APIs. Researchers can do great new science, and instrument makers can integrate Holoscan into their solutions. We are delighted that AJA Video Systems, KAYA Instruments, Verasonics and us4u are building front-end sensors to support Clara Holoscan. Holoscan applications can be deployed fully in instrument, in the hospital's data center or a mixture of both. This allows companies to develop applications that require more computing than is in the device or to upgrade the installed base of devices years after deployment. NVIDIA Clara Holoscan is a full stack open platform for next-generation software-defined instruments. Holoscan is our third robotics platform. We've got some great updates on our other 2, Isaac and DRIVE. The robotics industry is growing incredibly. Our Isaac ecosystem is now over 700 companies and partners. That number has grown 5x in the last 4 years. AMRs, or autonomous mobile robots, are being deployed in giant warehouses to handle the incredible growth of e-commerce fulfillment. Cleaning robots, restaurant and retail automation robots, last mile delivery robots, moving telepresence robots are all being worked on. A robot perceives the environment, reasons about where it is and where it needs to go, what it needs to do and then develops a plan to do it. There are 3 interconnected work streams in building a robotic system. First, train the robot's AI models to perceive. For training, we have NVIDIA AI and DGX. Second, in a simulator, train the robot to manipulate or navigate. For simulation, we have Isaac Sim on Omniverse running on RTX. The Isaac Sim Omniverse simulation will also serve as the digital twin of the robot when deployed. Third, operate the robot in the environment. Here, Isaac running on AGX does the perception, localization, mapping and planning or otherwise, the robotics pipeline in real time. If the robots are connected over 5G and orchestrated from a central server in the warehouse, we would operate the Isaac stack on EGX. So DGX, RTX, EGX and AGX systems and their appropriate software stacks make up the end-to-end machine learning loop. Isaac is a full stack and open platform. Isaac now supports the ROS ecosystem, the large open source robotics community. ROS has 700,000 developers and is growing fast. The Isaac run time can now be a node in the ROS framework. For example, to do object detection, segmentation, 3D post estimation, visual odometry or point cloud processing. All of that can be 10x faster. Instantly, ROS developer get a giant boost in performance and benefit from the algorithms in Isaac. ROS developers can also import the ROS URDF, robot definition format, directly into Isaac Sim to simulate the robots. Isaac Sim is the most realistic robotic simulator ever created. It's built on Omniverse. Sensors are modeled, physics is simulated, environments are photorealistic, robots and simulation are running their actual stack either SIL, software in the loop, or HIL, hardware in the loop. The robot is connected to an actual map. It really feels it's in the environment. The goal is for the robot to not know whether it is inside a simulation or the real world. We strive to minimize the sim to real domain gap. Training data is incredibly hard to create for robotics. Unlike cars on roads, the world of robots is far more random. Cars follow lanes and avoid other cars. Robots have no lanes and are designed to make contact. It is impossible to collect and label all the scenarios to train a robot. Isaac Sim Replicator is an engine to generate synthetic data to train robots. Replicator simulates the sensors, generate data that is automatically labeled and with the domain randomization engine, creates rich and diverse training data sets. The ROS community will be supercharged end-to-end with Isaac Replicator, Isaac Sim on Omniverse and Isaac ROS. Someday, everything that moves will be autonomous either fully or mostly autonomous. By 2024, the vast majority of new EVs will have substantial AV capability. We are developing an end-to-end flow for building autonomous vehicles as well as a full stack in-car AV system and a global cloud map. NVIDIA DRIVE is full stack and an open AV platform. Customers can decide to use just our development flow, parts of our driving computer, connect to our cloud map or partner with us end-to-end. We're working with companies building cars, SUVs, sports cars, trucks, vans, robotaxis and food delivery vehicles. Autonomous vehicles are robots, and the same 3 pillars of machine learning development apply. Training models with NVIDIA AI and DGX, simulation and synthetic data generation with DRIVE SIM on Omniverse and a real-time robotics pipeline with drive AV on the Orin robotics chip. The first goal is to transform the data from surround sensors into a 4D world model. The left image is showing the surround cameras. The right image is the world model, essentially the mind of the car. With a high integrity and high precision world model, we use it to avoid obstacles, localize to a map, reason about the environment and plan paths to reach our destination. It starts with the sensor and computing architecture of the car. The design should allow for high-fidelity sensing, redundancy and failover safety with sufficient computing power and programmability to process software improvements for the life of the car. This is Hyperion 8, the architecture of 2024 models. The sensor suite is 12 cameras, 9 radars, 12 ultrasonics and 1 front lidar. All of this is processed by 2 Orins. For the dev kit, we include Ampere GPUs to give plenty of performance headroom so that engineers can have the best environment to prototype new software. Hyperion 8 is available today. For anyone developing an AV or sensors for AV, Hyperion 8 is an ideal platform. We collect petabytes of road data from around the world and have some 3,000 trained labelers creating training data. Still, synthetic data generation is a cornerstone of our data strategy. Here, you see a scene through the simulated surround cameras with data labeled automatically. And on the right, some of the AI models that were trained with the data. DRIVE SIM replicator is a synthetic data generator for autonomous vehicles and is built on Omniverse. The lens models are simulated and consider motion blur, rolling shutter, LED flicker and Doppler effect. We work closely with sensor makers to accurately model their sensors. The camera, radar and lidar sensor models are path trades. The materials are physically simulated for accurate beam reflections. We built a lidar materials library, and now we're also building a radar materials library. Replicator is a game changer for us. Replicator bootstraps the AI labeling tools and the AI models before Hyperion 8 is even built and any data has been collected. Replicator can label ground truth in ways that humans cannot, tracking moving objects across sensors, velocity, distance, occlusion, severe weather conditions. It is accurate and low cost, and it augments data where we have no gaps. [Presentation]
Jen-Hsun Huang
executiveMapping is a critical pillar of driving. It is the collective memory of the fleet and can be considered another sensor. A couple of months ago, we welcomed DeepMap to our company. DeepMap is a world-leading expert in mapping for autonomous driving. Between DeepMap and our existing map team, we're scaling out globally. We do both survey mapping and fleet mapping. Fleet mapping crowd sources or with one car, incrementally builds up a drivable map. With each drive, more of the route is perceived and reconstructed in 3D. Survey mapping is a fleet dedicated to mapping. We will have a fleet to survey map the most popular areas in the world. Survey maps prime the fleet even before it's launched. It also serves as the ground truth data for our cloud mapping AI system. Since the last GTC, we've turned on urban driving and autonomous parking. We're now running Hyperion 8 sensors, 4D perception, deep learning-based multi-sensor fusion, feature tracking and a new planning engine. This is our Mercedes Hyperion 8 driving a route of urban streets and highways near our headquarters. You will see Mercedes negotiate merges, crosswalks, intersections, a roundabout, a cloverleaf, merge contenders, cut-ins and pedestrians. Enjoy. [Presentation]
Jen-Hsun Huang
executiveAV will revolutionize how cars drive and will greatly improve road safety. The inside of the car will also be revolutionized. The technology of Maxine will reimagine how we interact with our cars. With Maxine, your car will become a concierge. Maxine will show you what is on the mind of the chauffeur precisely and use neural graphics to reconstruct a 3D surround view so that you can have confidence in the autonomous driving. Maxine will summon valet mode, search for a parking spot and park the car. And Maxine, with all the amazing avatar technologies we're building, will seem incredibly smart. [Presentation]
Jen-Hsun Huang
executiveAll of this will run on the new Orin robotics chip. Future cars will be your personal AI chauffeur and AI concierge. The technologies I've shown you today make all this possible. Here, we've applied it to autonomous vehicles. But the technology can be generalized to all kinds of robotic applications: a robotic stack for navigation and manipulation and a robotic stack for human interaction. It's going to be pretty darn amazing. We covered a lot today. Let me put it together for you. Accelerated computing launched modern AI, and the waves it started are coming to science and the world's industries. It starts with 3 chips, GPU, CPU and DPU; and systems, DGX, HGX, EGX, RTX and AGX that span from cloud to the edge; 150 acceleration libraries for 3 million developers from graphics, AI, sciences, to robotics, serving $100 trillion of industries. NVIDIA accelerated computing is a full stack, data center scale and open platform. Quantum-2 is the most advanced networking platform ever built and with BlueField-3 welcomes cloud-native supercomputing. Cybersecurity is a top threat of companies and nations. We announced a 3-pillar zero-trust framework. BlueField isolates applications from infrastructure. DOCA 1.2 enables next-generation distributed firewalls. And Morpheus, assuming the intruder is already inside, uses the superpowers of accelerated computing and deep learning to detect intruder activities. We introduced new deep learning frameworks. NeMo Megatron trains large language models. LLMs will be the biggest mainstream HPC applications ever. Graphs can now be projected into DNN frameworks. And NVIDIA Modulus builds and trains physics-informed machine learning models that can learn and obey the laws of physics. GPU acceleration, data center scale and physics-informed machine learning will give us million x speed-ups and revolutionize drug discovery and climate science. Triton, an inference server for all workloads, now inferences force models and does multi-GPU, multi-node inference for large language models. We introduced many exciting new libraries: ReOpt for the $10 trillion logistics industry; cuQuantum to accelerate quantum computing research; cuNumeric to accelerate NumPy for millions in the python community. The next wave of AI is enterprise and industrial edge, where AI will automate at the point of action. We offer several edge robotics application frameworks: Metropolis, the new Clara Holoscan, Isaac and DRIVE. We highlighted 3 important technologies needed to enable edge AI, Unified Computing Framework, the new Maxine and Omniverse. NVIDIA Unified Computing Framework is built for robotics applications. Edge applications are different than cloud. They stream sensors, do signal or physics processing, AI inference, speech computer graphics all in real time. Clara Holoscan, built with UCF, is a new software-defined medical instruments platform and runs in the data center or Orin, our new superfast robotics processor. The new Maxine is an avatar platform. Maxine connects computer vision, Riva speech AI and avatar animation and graphics into a real-time conversational robot. Our Metropolis engineers used Maxine to create Tokyo, a smart kiosk application, a talking kiosk. This will be useful for smart retail, drive-throughs and customer service. Our DRIVE engineers used Maxine to create concierge. You can imagine Maxine being integrated into future Isaac and Clara Holoscan applications. Another demo showed Maxine as a video conferencing avatar doing simultaneous multi-language conferencing. And Omniverse, our virtual world simulation engine, was a common thread throughout our entire keynote. Robots, autonomous vehicle fleets, warehouses, factories, industrial plants and whole cities will be created, trained and operated in Omniverse digital twins. I have one more announcement. We will build a digital twin to simulate and predict climate change. The last supercomputer we built was called Cambridge-1 or C-1. This new supercomputer will be E-2, Earth-2, the digital twin of earth, running Modulus-created AI physics at million x speeds in Omniverse. All the technologies we've invented up to this moment are needed to make Earth-2 possible. I can't imagine a greater and more important news. See you next time.
This call discussed
For developers and AI pipelines
Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.