NVIDIA Corporation (NVDA) Earnings Call Transcript & Summary
June 29, 2023
Earnings Call Speaker Segments
Debraj Sinha
executiveHi, everyone. Thank you for joining us today for our Metropolis webinar. I'm Debraj Sinha, and I'm part of the Metropolis marketing team here at NVIDIA. So let's get started. We have some exciting topics to cover. But first, I would like to cover a few housekeeping items. All the windows on your screen are resizable and movable. At the bottom of your screen are some icons that offer more information. We really want this to be as interactive as possible. So if you have any questions or comments, please submit them throughout the talk using the Q&A window located to the right of the slide. We'll answer them after the presentation. Here is the agenda for today. First, we'll cover vision transformers and talk about how robust they are as compared to CNN-based models. And then we have Chintan Shah, who is the product manager for NVIDIA TAO Toolkit. He will talk about the new NVIDIA TAO Toolkit 5.0. Talk about his features, how it enhances AI model development, and also talk about the vision transformer models that are available with TAO. And then finally, we'll talk about NVIDIA L40 GPU that deliver increased performance and throughput and how it reduces development deployment costs for vision AI applications. So let's begin. So first, I want to talk about some challenges. One of the biggest problem when you deploy a model in the field is that it performs poorly, even though in the lab, it was performing really well. So there could be multiple reasons why model might perform poorly. There could be occlusions that you might not have accounted for in your training data. There can be like bad weather conditions. Such as snow, rainfall, but different lighting conditions such as low light contrast, poor lighting or even image corruption. So what I want to show you here is that in the real world, there are so many different imperfections that you need to account for. And your AI needs to be robust enough to account for all these noises and imperfections. So that's when Vision transformer come into the picture. So here, I just want to talk about like convolutional neural networks, known as CNN, which had previously been like dominated the computer vision space. Like CNN use local window operations, which lack global understanding of an image. But this is where vision transformers come into the picture because it provides long range dependencies, provides global context effectively by processing images in a parallel and self-attention based manner. So what this translates into is that it provides increased robustness against image corruption and noise. So with Vision Transformer, it just provides the next set of innovation in this space. It can unlock greater accuracy and robustness when compared to CNN-based models. So here, in the last past years that we are seeing is that transform based models have dominated the image classification challenge. As indicated in this graph, showing that transformer based models have much better -- perform much better in accuracy in terms of CNN-based models. So in the next example, I want to show you a segmentation model. So on top of the screen, you can see the input stream, which shows different types of noises or imperfections. And on the bottom left, it's a CNN-based segmentation model. And on the bottom right, it's a vision transformer-based SegFormer. So as you can see, the within transform based model always outperforms the CNN-based model. Now let's jump into some data. So here, we are plotting some robustness of different transformer models that are available with TAO. And Chintan will be going into much more detail about the different transformer models that are available with TAO, such as like FAN, Dino, SegFormer. So across all the noise conditions that we have seen, there is significant improvement for the accuracy. And this really shows that how robust Mission Transformers are as compared to CNN models. So finally, I'd like to mention that we are seeing Vision Transformers providing great results, and we are seeing its application in a wide range of like different types of vision AI application, such as like retail, industrial use cases. So we firmly believe that everyone is going to use Vision Transformers for their vision AI applications. And at NVIDIA, we want to make it easier for you. So in the next session, Chintan Shah is going to talk about how we can leverage transform models using TAO Toolkit. So over to you, Chintan.
Chintan Shah
executiveThank you, Debraj. Hello, everyone. My name is Chintan Shah, and I'm the Product Manager at NVIDIA. I'm responsible for TAO Toolkit and all the pretrained models. I'm excited to talk about all the new features that we are releasing in TAO 5.0. So let's get started. So let me quickly start with what we are building -- why we are building TAO and what are some of the challenges that we are trying to solve. So when our customers try to build AI models, we found that there are a lot of challenges that they run into. First, there are multiple training frameworks, TensorFlow, PyTorch, ImageNet, and so many new and AI itself is very evolving. There are new models coming out every day. There's new benchmarks every day. So it's a very fast-moving field. And one other thing is that Debraj pointed out that the overall data is not perfect. There could be noise, there could be all kinds of disturbances and the model has performed well across all those different conditions. And then lastly, once you create a model now you have to deploy. There might be multiple optimizations that you will have to do, if you're deploying it on the edge, if you're deploying it on the cloud, et cetera. And lastly, you need to know how -- why I made a decision. You want transparency. But with all of these challenges, we are coming up with TAO 5.0 which can help you address a lot of these challenges mentioned here. So TAO is our low code AI toolkit. It can help you build, optimize performance production-ready vision AI models. It builds on the deep learning frameworks like TensorFlow and PyTorch, but we have checked away a lot of the complexity. We provide a simple interface. There's a command-line interface. We also provide REST APIs, if you want to deploy this in a cloud-like setting and then just use API end points. And with this, users can start with a simple interface and can fix bugs to change their hyperparameters. And here, what they can do is, they can do all types of optimization -- training optimizations. We support multi-GPU, multi-node, and it's simple as changing [Indiscernible]. For inference optimization, we have model pruning, being able to reduce the size of the model, which can make our model much more efficient or inference. Low precision quantization, being able to quantize it to in-date to be able to run -- that will run much more efficiently. We also have RML features. So users don't have to manually to hyper parameters. They can say, select a handful of hyperparameters and say that find me the best model. So there's a AutoML as well. New features that we are adding is this AI Assisted Annotation, being able to generate segmentation mask for labeling. And this is extremely valuable because if you task a segmentation, the generating annotation data is extremely time consuming and expensive. Once the user creates this model, they can change this and deploy it on any platform. So they can deploy on any NVIDIA GPU, from Jetson, although to our larger data center GPU. [indiscernible]. You can also deploy it on other platforms. You can deploy and try it on CPU, see how the performance is, you can also try that as well. Margin controllers DLA or some limitation, but effectively, we now provide a way for users to open -- run their model on other platforms. And we also provide a catalog of pretrained models. And we have models for the traditional CNN models like the ResNet, the YOLOs of the world, but we are also adding lots of new vision transformer. I'm going to talk about it in the next slide. The new vision transformer that we are adding for detection task, segmentation task. You can fine-tune it on your own data, customized it for your use case to get a very high robust model. We have modules for -- new model for picture detection, recognition. So OCD, OCR being able to recognize text, and that can be used for varied use cases for inspection where you're looking at part number on the part, could be for logistics use case, you're looking at numbers data on a box. So there's a lot of use cases for OCR and that's one of the new models that we're providing as a part of TaO as well. With TAO 5.0, you can also bring your own model, and you will have ability to integrate your own monitor with TAO where you can customize your own model and fine-tune it with TAO. So with that said, let me quickly jump into our transformers. This is our major model that we're releasing in TAO 5.0, as my colleague Debraj mentioned about the robustness of TAO -- of the vision transformers. I'm not going to cover a lot into that, but I'm going to talk about all the different models and the benefit of each of those. We have a couple of feature extractor or backbones, which can be used for downstream task, such as FAN, which is a fully attentional network. And this gives you very high robustness and the way it achieved this is able to use this self attention and be able to focus on the features of an object that are important and be able to ignore the features which are not. So this gives you very high [accuracy]. This gives you a very high accuracy on a [Indiscernible] data set and when you combine this model for downstream task like classification, detection, segmentation, you will get -- you'll be able to get the same robustness as well. GC ViT is a very high accuracy backbone. As you can see from this chart here, it gives you higher accuracy on the ImageNet data set and the way this model does it is -- so with transformer, what makes transformer better is this concept of the self-attention where it can run relationship between different -- gives you a global context. But now with GC ViT, there is also a concept of global plus local awareness. So it gives you much better special context with GC ViT, which leads to a much higher accuracy. Combining this with like DINO, which is our object detection network, which is the next generation of DETR. The benefit of DINO is it's a large network, but it converges very fast with higher accuracy. The chart kind of shows it that you actually don't need to run for 20, 50, 100 [Indiscernible] To get a good accuracy on a data set, I believe this was trained COCO data set. But if you get a much higher accuracy in under 20 [indiscernible]. So it converges much more faster. And lastly SegFormer, you saw the demo that the Debraj played, it's basically SegFormer combined with the FAN network, and you can see the robustness of this segmentation. And you can see, the image is extremely noisy, but by using SegFormer, you're able to accurately segment in all the individual things, person, cars, roads, et cetera. So with transformers, you can use transformer models, you can also you use transformer on CNN. So let me kind of explain how it works. So you -- there's this feature extractor and for feature extractor, we have the ResNet which is a CNN, then we have the FAN and the GC-ViT along with MIT, which is used primarily with SegFormer, and then you would choose one of the heads that you could have classification head, if you want to do classification, and this is just a few fully connected layer. You can have a detection head. And here you're an option of using either CNN based detector head with the transformer feature extractor or you can use a transformer-based detection head. And lastly, for segmentation, for semantic segmentation, you have segmentation decoder. And here, you also have an option of using either a CNN or a transformer base. This table kind of shows all the different possibilities of using the new transformer models with the new feature extractor, as well as some of the original feature, the CNN-based feature extractor like ResNet as well, which can give you a slightly better performance or inference performance. Another new model that we're releasing is the [Indiscernible] network. And this is -- this can be used for a lot of different use cases. We're releasing a pretrained model for industrial inspection use case, [Indiscernible] use cases. [Indiscernible] network is kind of you have 2 identical networks, you have 2 identical networks and you pass 2 different images. One could be a golden image, which is, let's just say, in the case of inspection, you know this is a good image, it is a good part. And then on the other side, you passed like your test image basically, if you're trying to do defect detection, this is the part that's on the PCB. This is part that you want to see, that's a defect. So you pass both of them through the identical network. It generates a feature vector. And then you use contrastive loss to compute the difference between the 2. And then based on that, a generated loss score, and you can based on this, you can determine if the part is good or bad. And this is a lot more robust than just a simple single-stage classifier where you're saying good or bad. The reason is, I mean, inspection -- in industrial inspection, data set is always difficult to get, especially failing data. Obviously, you don't want a lot of failures. So your data is going to be heavily skewed to the good part. So what we have seen is that you're not going to be able to capture all the different failure points. And that's what it makes it difficult. So it setout just classifying a defect [Indiscernible]. Here, if you're able to compare the different -- the visual appearance of a device, versus what a [golden temple] would look like. And then based on the score how different -- how different the image is from the gold, we can make a better prediction if this network or this image or part is defective or not? That's really the idea, I mean you've seen very high accuracy on lot of our data sets. Another thing is like labeling is very expensive. You guys all know that. But it's extremely expensive for segmentation task. It kind of give you a perspective, right? Segmentation is typically 20x more expensive. So I do look at -- I just did a simple cost analysis, I looked at a couple of labeling pricing from our partners. And for a segmentation with single object it take about $0.80 whereas if you are just trying to do a bounding box, it takes to $0.04, with [28 different]. And if you do -- if you want to do a simple POC, not on a large data set, let's just say 1,000 images just 5 objects per image. It will take you $4,000 just to do the labeling plus the time it takes to do all the -- to do all the labeling or paying someone to do the labeling. So this is extremely expensive. And with TAO 5.0, what we are providing is, we are providing this new model for AI assisted annotation. And the idea here is that if you have data set -- if you have a data set with bounding boxes, you pass it to a TAO auto labeler and outcomes segmentation mask -- sorry, the mask over the bottle is somehow missing, but there is a segmentation mask of each of the bottle. So it will generate a mask data in COCO format that can be used for downstream training task. The network that we're using is the network called Mask Auto Labeler, it's a vision transformer model. [Indiscernible] this is a [Indiscernible] model. So this is not trained on few images and it just predicts those images. This is trained COCO data set, lots of data, so I think of it as almost like a foundational model. Like it's trained a lot of data sets and input is bounding boxes or the object that you want to annotate and output it's segmentations mask. Full training is supported. So if you find that you know what [indiscernible] this model doesn't perform very well on the task that I'm trying to do. You can always provide more data and you don't even actually need label data. You can just provide more data and then the model will slowly get better over time. So this can be a huge time saver, it will save you a lot of time and money because you're not going to be spending in that you're not going to be spending on labeling. This will allow you to move faster, create models much more quicker. So with that said, let me play a short demo on how to do AI assisted annotation with TAO. [Presentation]
Chintan Shah
executiveAll right. Now let's continue on the rest of the slide, few more slides left. So one of the major things that we are providing with TAO 5.0 is now we'll have access to the source code. And this gives you transparency, this gives you much better flexibility, much more of a control on your AI, on being able to build AI, it provides better [Indiscernible] capabilities. So if you get stuck, you can always look at the source code and see what went wrong. You can now add your own enhancement, bug fixes will be faster. So with 5.0 and moving forward, we'll be now providing TAO. We still continue to provide [TAO containers], but if somebody wants to play as a source codes, they want to build from source, they will now have that option with 5.0. TAO is completely cloud agnostic, you can run TAO on your local infrastructure with an immediate GPU or you can leverage any of the cloud services. We're integrated with all the major CSPs, you can use it at the compute level. So for example, on AWS, you can use directly on a -- on a VM, on [Indiscernible] same on Google Cloud or Azure. But if you want to use various services, you're integrated with the Amazon Kubernetes service, so you can deploy TAO on a Kubernetes cluster, manage Kubernetes cluster with either Amazon. We're going to add the feature in 5.0 of Google cloud as well, so you'll be able to integrate the [Indiscernible] on Azure side, you'll be able to run it on Azure AKS. We're also tightly integrating TAO with various ML platform from Google's Vertex AI to Azure machine learning, and we are looking to add support for AWS SageMaker. So we're continuously investing heavy in cloud and various cloud services. We want to make it easy for our customers to deploy it in cloud as it's a lot more manageable and scalable. So I briefly touched upon the API, rest APIs. I want to talk in a little bit more details. This is something that we announced in the last release. We have API support for TAO. So now you can run TAO completely as a service using Kubernetes. So we provide detailed [Indiscernible] charts that allows you to run this in a Kubernetes cluster, and this will kind of manage and orchestrate all your different runs. There's a lot of different containers that TAO has, a lot of our models on TensorFlow, PyTorch, you're adding a different one, like the data services container for AI assisted segmentation -- AI assisted annotation, which is running in a different service. Now with our API, there will be a form container, the API container, which kind of manage all the different back-end jobs. This can be used for our auto ML workflow where you want to orchestrate a number of jobs or all your GPUs and use simple ACTP interface to train -- to train your jobs. So we feel that this will make it easy to integrate TAO as a part of your workflow as a part of other work or build UI or client application with Tau. This is kind of what an API workflow looks like. It's kind of similar, but this kind of shows all the different API calls that you'll have to do. If you're going from data to model, you first have to create this data set object where you use the data set API, then you upload your data, you create your model object and you select the pretrained model that you want to use and then you run training, evaluation and then finally model export. Full documentation, [Indiscernible] is available on our documentation page, which kind of talks about all the different APIs, all the different options users have. On AutoML, I briefly touched upon the features of AutoML. On AutoML, it kind of allows you to automate the task of hyperparameter optimization. Now you can select what hyperparameters that you want to optimize instead of manually saying that, hey, let's optimize this versus that and doing it manually, which requires some knowledge, some intuition, but now with AutoML, you can just specify that, hey, let's TAO expose hundreds of hyperparameters. Obviously, we don't want to go on 300 of those so you can say, let's configure it saying sweep handful of hyperparameters. Very easy to use. We provide Jupyter Notebook for you to get started. In fact, early this year. We had a demo on using AutoML to train an object detection model. We support most of the networks in TAO, the ones in green are all the newer networks that I talked about, all the vision transformers, the DINOs, the FANs, GC-ViT. Those will all be supported with our [indiscernible]. And there's also a very detailed blog on using AutoML with TAO. All right. So just to kind of summarize. So TAO 5.0 is coming out, it's just around the corner, maybe a week or 2 weeks, new features that we are announcing, AI assisted annotation for segmentation, this can really accelerate your training -- sorry, this can really accelerate your labeling effort, especially for the TDS segmentation task. You announced a lot of vision transformers, the FAN, GC-ViT, Dino, which will give you much more robustness over the traditional CNN model. We're going to open source our platform. We're going to open our platform. We are going to allow deployment on other platforms. If this is super important because we have running NVIDIA GPU as well as other platforms. And then other features that we have are AutoML, API. And then lastly, our optimization. I want talk a lot about our [Indiscernible] optimization, but I want to kind of say that we have tuning and quantization on a lot of our models, you still don't have all the optimization on the Vision Transformer here. We are working through that, what that will look like, and we'll have more optimization. The performance will just get better. So if you start today and you might see the old performance is not very good, rest assured that once you deploy is in production in a few months, it will continue to get better. So that's kind of what we have in TAO. And lastly, TAO is available as a part of our NVIDIA AI Enterprise. So there's a couple of version. One, you can get the developer version, the free version that's available on that will be available on GitHub or [Indiscernible] catalog. But if you want enterprise support, if you want to make sure that you get the latest security patches, you get the full support, you can always get our enterprise support. And lastly, how do you get started with TAO, very simple. There are a few different ways. They have this lightweight Python app, which we call launcher, and this will kind of pull the right container as I said, TAO is available as a Docker container and with this lightweight launcher, it will automatically know which container to pull so that they can do training. So as a user, you don't have to say that this network needs to go into TensorFlow, [Indiscernible] container, PyTorch container. So with the launch and will automatically figure out. If you're trying to integrate this into other workflows, you need to work directly at a container level, then you also provide that option as well. You can pull individual containers or NGC and there's one Docker run and run it from there. And then lastly, APIs, we provide we have held charts to deploy this in a Kubernetes cluster. And when you do that, then now you're stack, your TAO [Indiscernible] you can just run API endpoints to it. It's very easy to get started. We will provide a couple of labs and Google Colab just to kind of get familiar with TAO. And then lastly, we have a lab on LaunchPad, which gives you a couple of -- 2 week access on to NVIDIA cloud, you can sign up on it and then you'll be able to at least try TAO, try our AutoML feature and yes, just kind of test drive it. And that said, I will pass it back to Debraj go over the next section.
Debraj Sinha
executiveThank you. Thanks, Chintan. So we have really shown you the vision in the beginning, we've shown you the vision of the future where algorithms are moving towards more in terms of Vision Transformers. And then Chintan shared with you the path forward in terms of training and optimization for a vision AI application with NVIDIA TAO Toolkit. Next, what I want to share with you is this incredible new GPU, this amazing new hardware platform, driving this vision AI workloads, tempting to running them at scale, reducing cost, we call it, NVIDIA L4 . So the biggest important thing that I want to -- the key takeaway here is that is the performance. It provides this amazing compute needed for your vision AI applications. So these GPUs are built are powered by the NVIDIA [Indiscernible] architecture, which is actually designed to actually transformative AI technologies. So basically, it is just built for your like next generation for the vision AI workloads. And given its single slot low-profile form factor makes it ideal for your vision AI deployments, even including edge locations. So it translate into 3 main things. Amazing performance, given its compute, it's about like 2 extra times of a T4 GPU. It helps you to like analyze more number of camera streams in a single server. So even gamma seen with higher resolutions. And then second, it provides you lower cost since it handles more camera screens. So you can analyze more number of cameras seen with a lesser number of GPU servers, so finally these are deployment costs. And then finally, we are also seeing many of our partners are using a number of AI models or using a number of applications per server. So given it's super high compute, you can -- you have more functionality, like you can implement multiple apps on a single server, which was not possible previously, let's say, with T4 server. So finally, it comes down to L4 GPU is providing unbeatable performance. So let's take a couple of examples for its performance. So it's pretty obvious here that L4 GPUs take the performance to the next level, whether it's degrowth performance, CNN or vision transforming in FANs or even like action recognition. So the key takeaway here is L4 GPU is right for your vision in our workloads. And if you think about it, let's start working about leading our vision AI apps on L4, it's pretty simple. You need the support for CUDA 12.0 or later, and you can start working on it. So the next is, I want to show you an example here. So this is like a vision analytics application. This is a realistic app, which uses GPU [Indiscernible], deep stream, basically the whole vision AI pipeline with preprocessing and inference. So let's say, in your example, in your deployment, you want to analyze 100 camera streams. So previously, you had to use 2 T4 GPUs. But right now, given for L4 with its high compute, you can analyze those 100 cameras streams with just one single L4 server. So this kind of cuts your deployment cost, like providing way more performance per dollars and then also reduces your servers also. Like you don't need to have like twice the number of servers to deploy. So basically reduce your cost statistically, so the stats are pretty obvious. This is just a comparison between L4 and T4 GPU. The deck is available in the resources. You can download it, take a look. So I just want to provide you -- provide amazing compute to your grade GPU memory and also double the number of encoders and decoders, which were available previously in Q4. So the stats improvement are pretty obvious, you should start looking into porting an application in L4, and we have numerous multiple ways that you can get your hands on L4 GPU right now. So one of the easiest way would be, if you're a Metropolis partner, you can reach out to your regional developer relations manager and ask them how we can get access to Metropolis labs. So this is basically NVIDIA hosted L4 GPs where you can come right out, test and validate your application. So on the way is also to get access to NVIDIA launch bank, where again, you get free access for a couple of weeks to try out applications. The links are provided in the in the presentation, and you can check it out. And right now, L4 servers are available, we have a list of OEMs. So we can buy one from there. And also, L4 GPUs are available in the [indiscernible], so do take a look. Now this is all that we had for the presentation. I would like to start with the Q&A section.
Debraj Sinha
executive[Operator Instructions] So let's get started. So Chintan, actually, we've seen a lot of questions on TAO. The first question to you. So what other different NVIDIA SDK and infrastructure does TAO leverage?
Chintan Shah
executiveSorry, can you repeat the question?
Debraj Sinha
executiveYes. So what other NVIDIA SDKs and infrastructure does TAO leverage?
Chintan Shah
executiveYes, good question. So TAO is -- it uses a lot of our lower level libraries like CUDA, cuDNN and TensorRT for optimization and like DALI for preprocessing. So it uses a lot of our lower level library. As far as what other -- I mean, from the output of Tau is a same model. This can then be deployed and integrated with Triton of if you're running it and like an inference serving use cases, you can also deploy it in a [Indiscernible] application if you're doing a very analytic uses cases. So models can pretty much be deployed on various NVIDIA SDKs as well. And we have examples and some example to deploy those on various platforms.
Debraj Sinha
executiveAnd is there a difference in inference time between Vision Transformer and like CNN-based models?
Chintan Shah
executiveCertainly, certainly. Yes, there is. In general, Vision Transformers are a lot more compute-heavy than CNN. So it tends to be a bit slower and there are other -- I mean other reason is, one, they are more complex, there is lot of computation happening. And second is, over the years, we've done a lot of optimization. CNN have been around for almost 10 years now. It has gone through a lot of optimization as I mentioned like [Indiscernible], quantization, fusing layers. So that gives you the CNN models much faster influence time. I think on the ViT, we're just kind of getting started. It might not be as optimal as CNN, but I think like in I'm saying couple of months to hear it, and be very efficient. And in fact, like on L4, we have the transformer engine, which runs on which runs much more optimized transformers. So we're looking into that as well along with Smart City is very important. We have sparsity in that form and other GPUs that can be used to make ViTs much more efficient.
Debraj Sinha
executiveGot it. So there is a question on the annotation, which model is behind the TAO auto annotator.
Chintan Shah
executiveYes. So the model that we're using is a model called [Indiscernible], which is a mask auto model, it will also be available on [ NGC ]. This model is trained on loss of data. It's trained on a lot of, it's almost a semi super wide model. So we provided objects to product bounding boxes. We did provide segmentation mass and then eventually the model start to learn what are the mask -- the segmentation mask. It's a cost-agnostic model. So you actually -- we have not trained on like a fixed number of cost. We have on lot of data set, it works reasonably well as you saw in the demo on a lot of different types of objects. I had oranges, people, cars, and it worked reasonably well. Is it perfect? Probably not. I mean they'll probably get you [Indiscernible] but it can save a lot of time. So yes, that's really the model behind the auto labeler.
Debraj Sinha
executiveGot it. And if someone wants to implement TAO for on-prem, what is the minimum infrastructure they need?
Chintan Shah
executiveMore information about the requirement always provided on our documentation, but I would say minimum is single GPU, single GPU is really all that's needed with 12 or 16 gig per annum. A lot of our models are getting big. So minimum is single GPU, but obviously if we are going to run some of our Vision Transformer, a single GPU might not be enough. So we do recommend at least 2 or 4 GPUs with 16 gig each. But yes, you don't need a lot of -- you don't even a lot of compute to get started as single GPU is fine and even you can also start a fit on the consumer GPU as well.
Debraj Sinha
executiveRight. So another question on running TAO locally. So you basically need GPU, right? So can you run on a local machine without any NVIDIA GPU [Indiscernible]?
Chintan Shah
executiveNo, that's not -- we don't support that. But you can run -- you can run TAO in cloud, I think on one of the Kubernetes cluster and then use your local machine to -- and to kind of access it, endpoint API, endpoints [Indiscernible] there. So that's an option where you deploy everything TAO than just use API end points.
Debraj Sinha
executiveGot it. And if TAO tool kid available on the [Indiscernible] platform, that's an [Indiscernible] platform?
Chintan Shah
executiveYes, okay. So the training will only run on the data center GPU with x86 CPU. But all models, even the Vision Transformers can be deployed on Jetson. So yes, I mean, the models can be deployed on the platform, including Jetson and this goes always like Vision Transformers. So yes, I mean the margin on Jetson.
Debraj Sinha
executiveAll right. And you already mentioned this, but when is TAO 5 going to be released?
Chintan Shah
executiveWhen is TAO 5 is going to be released, very good question. It's just amount of corner, I would say the a week or 2 weeks max. I don't have a definite date with calling on few things. But I would say, maximum 2 weeks when we will [Indiscernible] TAO. Yes. So I think once we do release it, we have a lot of marketing, promotion you see an e-mail in blogs, et cetera.
Debraj Sinha
executiveSeeing a lot of questions on TAO. So I'll do a couple more and then move over to L4 GPUs. So another question is, like do you need an API key for the rest interface in TAO?
Chintan Shah
executiveYes. Yes. So do you mean the API key, but this is just your [indiscernible]. If you have an [indiscernible] account, you can just generate the API key and use that for logging in, it's just more secure. So yes, it's available. I mean the API key is available on GC.
Debraj Sinha
executiveGot it. And what license does one need to use the state-of-the-art vision transform models that are available with TAO?
Chintan Shah
executiveWhat license? I think I would just -- it will be the same license. It will be covered under NVIDIA end user license agreement. So as long as you adhere to those licensing terms, you should be fine.
Debraj Sinha
executiveGot it. And so you mentioned about like now using TAO model, you can deploy to any platform -- any compute platform, so what other semiconduct company devices that TAO can support?
Chintan Shah
executiveYes. We are looking at few, we are working with few. But I mean, it's an [Indiscernible] I mean, we run it locally on CPUs and it seems to work well, obviously, not as efficient or fast, but if it can work, it works on CPU. We also tried it on microcontrollers as well, obviously not the large models, but some of our smaller models. It works okay. You're still able to -- we're able to run those models on some of those platforms. So yes, not all models will work on all platforms, but most models will work on a lot of the platforms.
Debraj Sinha
executiveGot it. And last question on TAO. Is it possible to integrate a pretrained model, let's say, an ONNX or ONNX model?
Chintan Shah
executiveYes, you can. We have a couple of ways to engage your own model. We have an ONNX input where you can -- if you have a trained graph in ONNX, you can use our ONNX [Indiscernible] bring your own model converter to convert it into the format that TAO support. There's some limitation to that, that only supports classification backbone, there's only a classification use cases and some semantic segmentation use cases. Other things that we are looking at is now provide with the source code availability, you can bring in other networks as well. You can add your own custom code if you have models, if you have other pretrained models. So that are more flexibility there.
Debraj Sinha
executiveThose are all the questions that we can go over given our limited time on TAO. So let's jump in. And we have joining us Abhishek Verma , who is the product manager for NVIDIA GPUs. So he's going to cover a couple of questions on L4. So Abhishek, first question. So does L4 support transformer engine?
Abhishek Verma
executiveYes. L4 supports transformer engine, we have added support for a new data type called FP8, and the transformer engine and FP8 are built to speed up transformer models.
Debraj Sinha
executiveAnd how many video channels can each L4 GPU support.
Abhishek Verma
executiveThat's a great question. The number of video channels that can be supported by EGPU depends on a lot on the application pipeline. As in how heavy the compute work load is, the [Indiscernible] part is in the whole application pipeline, matters a lot in how many concurrent camera streams you can support. But just to give you an idea, the maximum video channel decode capability of L4 is around 224 concurrent video channels using [indiscernible] 265 at 1080p, [Indiscernible] per second.
Debraj Sinha
executiveNice. And what data formats are supported on an L4?
Abhishek Verma
executiveL4 supports a wide variety of data format all the way from FP32 and in terms of Tensor Core, it supports TF32, bfloat16, floating 0.16, [Indiscernible] as well as FP8.
Debraj Sinha
executiveAnd how many CUDA cores are available on L4 as compared to T4 GPU.
Abhishek Verma
executiveSo L4 supports more than 7,000 CUDA cores. On the other hand, T4 supports roughly more than 2,000. So because a drastic -- drastic increase in the CUDA core counts between T4 and L4.
Debraj Sinha
executiveAnd so for our partners who have their vision AI application, if they need to port their application from T4 to L4, like what dependency do they need to like take care?
Chintan Shah
executiveI believe like they need to put the apps for CUDA 12.0 or later, right?
Abhishek Verma
executiveAnd that's correct. So one of the because L4 is based on our latest architecture, we of course, have a recommended software stack, which is fully optimized for inference applications. And so that recommended software is driver from R525 branch and CUDA 12.0.
Debraj Sinha
executiveRight. Thanks, Abhishek. So that's all the time we have today. Thank you so much for joining us for the webinar. Yes, an on-demand version of this webinar will be available shortly after this event end and can be accessed using the same link. Once again, thank you so much for joining. Have a great day. Take you care, goodbye.
This call discussed
For developers and AI pipelines
Programmatic access to NVIDIA Corporation earnings transcripts and 32,000+ others is available through the
EarningsCalls.dev REST API. Plans from $24.99/month — full transcripts, speaker segments,
full-text search, and the recently-added /api/v1/transcripts/recent polling endpoint for ETL pipelines.